Porting an unikernel to Xen: Perfomance comparison

Hello,

after finishing the implementation part last time, this week it is time for some performance comparison. Although both the HVM and PVH port are not 100% feature complete, they are in a state where basic bench marking is very much possible. The results, as you will see, look very promising when comparing them with the ones of the original implementation.

Basic Benchmark
Stream Benchmark
Boot time
Conclusion

HermitCore includes a number of different benchmarks to determine the performance. This post compares the results of some of these benchmarks for HermitCore running in different environments. The following environments were tested:

KVM
HermitCore is started in a virtual machine on Linux using QEMU with KVM acceleration
QEMU
A virtual machine on Linux using only QEMU without acceleration
HVM
Running as a fully virtualized guest on Xen
PVH
A hybrid PVH guest running on Xen

For each test, the virtual machines were given the same amount of resources. They were started with one virtual CPU core and 512 megabytes of RAM. All tests were performed on the same machine, a Lenovo Thinkpad T470 with the following specifications:

Intel Core i5-7300U CPU with 2.6 GHz and 4 Cores
16 gigabytes of RAM
256 gigabytes SSD storage
Running Arch Linux

Basic Benchmark

First the overhead of a system call and a reschedule was measured. The basic benchmark invokes the system calls getpid and sched_yield up to 10.000 times after the cache has been warmed up. It measures how many CPU cycles the respective calls need on average. Getpid is the system call with the shortest runtime, it can be used to determine the general overhead of a system call. Sched_yield checks if another task is ready to be executed and switches to this task. The benchmark also checks how long it takes to allocate a megabyte of memory and how long the first write access to a page table takes.

System activity	KVM	QEMU	HVM	PVH
getpid	9	122	12	12
sched_yield	79	360	90	83
malloc	5858	51812	51311	86658
write access	3368	34626	42607	83368

Results of the Basic Benchmark

It is not surprising that HermitCore running on KVM shows the best performance overall. It is however interesting to see that a PVH and HVM guest can almost keep up with it in regards to system call performance. What is also surprising is that the memory access of a PVH guest is much slower than that of a HVM guest considering that both their mechanism for page table management is virtualized in hardware.

Stream Benchmark

The Sustainable Memory Bandwidth in Current High Performance Computers STREAM benchmark is a synthetic test written in Fortran to measure the performance of four distinct long vector operations. They represent the elementary operations on which vector codes are based and are specifically intended to eliminate data re-use. The results display the sustainable memory bandwidth in megabytes per second and the corresponding computation time for the four vector operations.

Name	Function	bytes per iteration
Copy	`a(i) = b(i)`	16
Scale	`a(i) = q * b(i)`	16
Sum	`a(i) = b(i) + c(i)`	24
Triad	`a(i) = b(i) + q * c(i)`	24

Functions used in the STREAM benchmark

Environment	Bandwidth MB/s	Avg time	Min time	Max time
		Copy
KVM	23342.8	0.009865	0.009596	0.014814
QEMU	5153.7	0.045812	0.043464	0.047941
HVM	24294.7	0.009369	0.009220	0.010860
PVH	24141.9	0.009469	0.009278	0.012305

		Scale
KVM	16556.5	0.013793	0.013529	0.017478
QEMU	1094.8	0.218594	0.204610	0.229119
HVM	17263.1	0.013157	0.012976	0.015088
PVH	17189.3	0.013252	0.013031	0.019612

		Add
KVM	19264.9	0.017679	0.017441	0.020491
QEMU	1562.1	0.225724	0.215092	0.237715
HVM	20038.9	0.016974	0.016767	0.018722
PVH	19955.0	0.017068	0.016838	0.022021

		Triad
KVM	19088.8	0.017928	0.017602	0.021669
QEMU	897.4	0.394932	0.374413	0.415066
HVM	19856.3	0.017111	0.016922	0.018756
PVH	19772.2	0.017232	0.016994	0.021154

Results of the STREAM benchmark

It is very surprising to see, that a HVM guest running in Xen outperforms all others in terms of possible memory bandwidth and corresponding computing time. Especially if considered that a HVM guest in the testing environment is actually a virtual machine running inside a virtual machine (the Xen hypervisor) running on top of Linux. Although a KVM and PVH VM achieve almost the same results with only one to four percent deviation. You can also see that the virtualization purely based on QEMU seems to be rather inefficient and slow.

Boot time

At last the time needed by the VMs to boot was compared. The included hello world test was run in all environments and the reported boot time was noted. This time is how long it takes HermitCore until it is able to start the hello world application.

Environment	Time in ms
KVM	80
QEMU	60
HVM	6140
PVH	80

Boot time for the different Environments

HermitCore takes about the same time for it in all environments. The only exception is a HVM guest. It takes this guest very long in comparison to detect and start the emulated devices it is provided by Xen, which results in a about 80 times slower boot time.

Conclusion

The results of the previous benchmarks show that when running as either a HVM or PVH guest in Xen, HermitCore is definitely able to perform as well as a the original implementation running as a KVM accelerated virtual machine in QEMU. There are some small exceptions, notably the very slow boot time of a HVM guest in comparison to all others and the long memory access times of a PVH and HVM guest in terms of CPU cycles, but the overall results are very similar.

I hope you will join me again next time for the last post of this series. There i will provide a short summary and some possible outlooks into the future.

So long Jan

Table of contents

Basic Benchmark

Stream Benchmark

Boot time

Conclusion