Porting an unikernel to Xen: Perfomance comparison
Hello,
after finishing the implementation part last time, this week it is time for some performance comparison. Although both the HVM and PVH port are not 100% feature complete, they are in a state where basic bench marking is very much possible. The results, as you will see, look very promising when comparing them with the ones of the original implementation.
Table of contents
HermitCore includes a number of different benchmarks to determine the performance. This post compares the results of some of these benchmarks for HermitCore running in different environments. The following environments were tested:
-
KVM
HermitCore is started in a virtual machine on Linux using QEMU with KVM acceleration -
QEMU
A virtual machine on Linux using only QEMU without acceleration -
HVM
Running as a fully virtualized guest on Xen -
PVH
A hybrid PVH guest running on Xen
For each test, the virtual machines were given the same amount of resources. They were started with one virtual CPU core and 512 megabytes of RAM. All tests were performed on the same machine, a Lenovo Thinkpad T470 with the following specifications:
-
Intel Core i5-7300U CPU with 2.6 GHz and 4 Cores
-
16 gigabytes of RAM
-
256 gigabytes SSD storage
-
Running Arch Linux
Basic Benchmark
First the overhead of a system call and a reschedule was measured. The basic benchmark invokes the system calls getpid and sched_yield up to 10.000 times after the cache has been warmed up. It measures how many CPU cycles the respective calls need on average. Getpid is the system call with the shortest runtime, it can be used to determine the general overhead of a system call. Sched_yield checks if another task is ready to be executed and switches to this task. The benchmark also checks how long it takes to allocate a megabyte of memory and how long the first write access to a page table takes.
System activity | KVM | QEMU | HVM | PVH |
---|---|---|---|---|
getpid | 9 | 122 | 12 | 12 |
sched_yield | 79 | 360 | 90 | 83 |
malloc | 5858 | 51812 | 51311 | 86658 |
write access | 3368 | 34626 | 42607 | 83368 |
It is not surprising that HermitCore running on KVM shows the best performance overall. It is however interesting to see that a PVH and HVM guest can almost keep up with it in regards to system call performance. What is also surprising is that the memory access of a PVH guest is much slower than that of a HVM guest considering that both their mechanism for page table management is virtualized in hardware.
Stream Benchmark
The Sustainable Memory Bandwidth in Current High Performance Computers STREAM benchmark is a synthetic test written in Fortran to measure the performance of four distinct long vector operations. They represent the elementary operations on which vector codes are based and are specifically intended to eliminate data re-use. The results display the sustainable memory bandwidth in megabytes per second and the corresponding computation time for the four vector operations.
Name | Function | bytes per iteration |
---|---|---|
Copy | a(i) = b(i) |
16 |
Scale | a(i) = q * b(i) |
16 |
Sum | a(i) = b(i) + c(i) |
24 |
Triad | a(i) = b(i) + q * c(i) |
24 |
Environment | Bandwidth MB/s | Avg time | Min time | Max time |
---|---|---|---|---|
Copy | ||||
KVM | 23342.8 | 0.009865 | 0.009596 | 0.014814 |
QEMU | 5153.7 | 0.045812 | 0.043464 | 0.047941 |
HVM | 24294.7 | 0.009369 | 0.009220 | 0.010860 |
PVH | 24141.9 | 0.009469 | 0.009278 | 0.012305 |
Scale | ||||
KVM | 16556.5 | 0.013793 | 0.013529 | 0.017478 |
QEMU | 1094.8 | 0.218594 | 0.204610 | 0.229119 |
HVM | 17263.1 | 0.013157 | 0.012976 | 0.015088 |
PVH | 17189.3 | 0.013252 | 0.013031 | 0.019612 |
Add | ||||
KVM | 19264.9 | 0.017679 | 0.017441 | 0.020491 |
QEMU | 1562.1 | 0.225724 | 0.215092 | 0.237715 |
HVM | 20038.9 | 0.016974 | 0.016767 | 0.018722 |
PVH | 19955.0 | 0.017068 | 0.016838 | 0.022021 |
Triad | ||||
KVM | 19088.8 | 0.017928 | 0.017602 | 0.021669 |
QEMU | 897.4 | 0.394932 | 0.374413 | 0.415066 |
HVM | 19856.3 | 0.017111 | 0.016922 | 0.018756 |
PVH | 19772.2 | 0.017232 | 0.016994 | 0.021154 |
It is very surprising to see, that a HVM guest running in Xen outperforms all others in terms of possible memory bandwidth and corresponding computing time. Especially if considered that a HVM guest in the testing environment is actually a virtual machine running inside a virtual machine (the Xen hypervisor) running on top of Linux. Although a KVM and PVH VM achieve almost the same results with only one to four percent deviation. You can also see that the virtualization purely based on QEMU seems to be rather inefficient and slow.
Boot time
At last the time needed by the VMs to boot was compared. The included hello world test was run in all environments and the reported boot time was noted. This time is how long it takes HermitCore until it is able to start the hello world application.
Environment | Time in ms |
---|---|
KVM | 80 |
QEMU | 60 |
HVM | 6140 |
PVH | 80 |
HermitCore takes about the same time for it in all environments. The only exception is a HVM guest. It takes this guest very long in comparison to detect and start the emulated devices it is provided by Xen, which results in a about 80 times slower boot time.
Conclusion
The results of the previous benchmarks show that when running as either a HVM or PVH guest in Xen, HermitCore is definitely able to perform as well as a the original implementation running as a KVM accelerated virtual machine in QEMU. There are some small exceptions, notably the very slow boot time of a HVM guest in comparison to all others and the long memory access times of a PVH and HVM guest in terms of CPU cycles, but the overall results are very similar.
I hope you will join me again next time for the last post of this series. There i will provide a short summary and some possible outlooks into the future.
So long Jan
© 2024 JanMa's Blog ― Powered by Jekyll and hosted on GitLab