The History and Architecture of the BSD Hypervisor
bhyve, the BSD Hypervisor was unveiled at BSDCan 2011 by FreeBSD developers neel@ and grehan@
bhyve is a type 2 Hypervisor for FreeBSD and PC-BSD that is similar to Linux KVM and consists of the vmm.ko kernel module, a few support utilities and a library. Because these are all loadable external components, they can be easily packaged and installed on an unmodified host. A bhyve guest must currently be built with a few FreeBSD-specific shims that expedited development but the code is fundamentally portable. With a little help, bhyve could support unmodified guests and be ported to other operating systems thanks to its simple design and permissive license.
bhyve depends on Intel's "Nehalem" or later Virtualization Technology (VT-x) and specifically Extended Page Tables (EPT). bhyve optionally supports Direct Device Attach (VT-d) for PCI pass-through of storage and network devices. VT-x and EPT can be found on Intel Core i3, i5 and i7 processors, the Pentium G6950 and select Xeon processors. Only the i3 specifically does not include VT-d support. Intel is good about listing VT-x and VT-d support for given processors on their web site but unfortunately are not as clear about EPT. The most certain way to verify VT-x and EPT support on a given system is to watch for the VMX
and POPCNT
(Pop Count) features in your dmesg
output. Some systems may disable VT-x in BIOS and while POPCNT
does not directly confirm EPT support, these features are usually, if not always available together. In the words of an Intel rep at the recent Supercomputing conference, "We added POPCNT for the NSA" and he confirmed that one could theoretically probe for EPT support.
The NYC*BUG dmesg Database is quite useful for referencing candidate systems and the system I used for this article reports:
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x17bae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,AVX> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x1<LAHF>
bhyve support for AMD processors with AMD-V and Rapid Virtualization Indexing (RVI, formerly known as Nested Page Tables) is under development.
bhyve is comprised of a few key components:
bhyve host components: "/usr/sbin/bhyve", the user-space sequencer and I/O emulation /usr/src/usr.sbin/bhyve/* "/usr/sbin/bhyveload", the user-space FreeBSD loader that can load the kernel and metadata inside a bhyve-based virtual machine /usr/src/usr.sbin/bhyveload/* "/usr/sbin/vmmctl", a utility to dump hypervisor register state /usr/src/usr.sbin/vmmctl/* "/usr/lib/libvmmapi.a, /usr/lib/libvmmapi.so, /usr/lib/libvmmapi.so.5, /usr/lib/libvmmapi_p.a" The front-end to the vmm.ko chardev interface /usr/src/lib/libvmmapi/* "/boot/kernel/vmm.ko" Kernel module for VT-x, VT-d and hypervisor control /usr/src/sys/modules/vmm/* /usr/src/sys/amd64/vmm/* /usr/src/sys/amd64/include/vmm* bhyve guest kernel components: The BIOS MPTable in-memory structures used to get APIC ID's etc. /usr/src/sys/x86/x86/mptable.c /usr/src/sys/x86/x86/mptable_pci.c BVM, the "BSD Virtual Machine" Console /usr/src/sys/dev/bvm/* The modified local_apic.c to enable x2apic support for performance /usr/src/sys/x86/x86/local_apic.c The modified mp_machdep.c to allow CPUs without 'unrestricted guest' support to bypass the real-mode bootstrap when running under bhyve /usr/src/sys/amd64/amd64/mp_machdep.c The BHYVE kernel configuration file /usr/src/sys/amd64/conf/BHYVE device bvmconsole device mptable mptable will be obsoleted by ACPI support. bhyve guests rely on the following external components: The traditional FreeBSD tunnel software network interface /usr/src/sys/modules/if_tap/* The VirtIO modules (imported to FreeBSD 10.0-CURRENT) /usr/src/sys/dev/virtio/* /usr/src/sys/modules/virtio/*
~/src/sys/amd64/vmm/intel/vmx.c provides the key heavy lifting of the Hypervisor and you may want to study it.
I have consolidated all of my knowledge of bhyve configuration into a single menu-driven script that offers the following steps:
1. Add the subversion and binutils Packages 2. Configure the Host's /boot/loader.conf (reboot required) 3. Retrieve bhyve Sources and Set-up the Working Directory 4. Build bhyve Host Components and Package 5. Add the bhyve Package (pkg_add /usr/src-bhyve/bhyve_package225757.tar 6. Build BhyVe Guest 7. Clean Up /mnt/ /dev/md0 /usr/obj/ and /usr/src-bhyve 8. Delete the bhyve Package (pkg_delete bhyve-0.0.1r225757) 9. Exit
Each step is a shell function and I have made it as linear as possible for easy comprehension and modification: simply add an exit
anywhere that you are having trouble. This script will not modify /usr/src/
but rather union mount it on a working directory that will be populated with the bhyve sources via remote svn checkout
or export
. The /usr/src/
environment is needed to build the bhyve components but when the union mount is unmounted, the working directory remains with only the bhyve-specific sources and built binaries.
This script will build a package of the bhyve host components which can you can optionally use with Neel's downloadable bhyve guest.
Download: bhyve-menu.sh
PC-BSD 9.0 users should be able to use this script without modification using sudo
but I have yet to test it myself.
If everything goes according to plan, you can exit the script and follow the boot instructions. The resulting system boot should look like:
Wait until 20 seconds after boot for networking to work errno = 22 Consoles: userboot FreeBSD/amd64 User boot, Revision 1.1 (root@bhyve, Fri Mar 30 01:41:20 PDT 2012) Loading /boot/defaults/loader.conf /boot//kernel/kernel text=0x41f64f data=0x57810+0x273590 syms=[0x8+0x73788+0x8+0x6af0b] /boot//kernel/virtio.ko size 0x4bc0 at 0xbca000 /boot//kernel/if_vtnet.ko size 0xae10 at 0xbcf000 /boot//kernel/virtio_pci.ko size 0x57d8 at 0xbda000 /boot//kernel/virtio_blk.ko size 0x4f68 at 0xbe0000 ______ ____ _____ _____ | ____| | _ \ / ____| __ \ | |___ _ __ ___ ___ | |_) | (___ | | | | | ___| '__/ _ \/ _ \| _ < \___ \| | | | | | | | | __/ __/| |_) |____) | |__| | | | | | | | || | | | |_| |_| \___|\___||____/|_____/|_____/ ``` ` s` `.....---.......--.``` -/ ?????????????Welcome to FreeBSD???????????? +o .--` /y:` +. ? ? yo`:. :o `+- ? 1. Boot [ENTER] ? y/ -/` -o/ ? 2. [Esc]ape to loader prompt ? .- ::/sy+:. ? 3. Reboot ? / `-- / ? ? `: :` ? Options: ? `: :` ? 4. Boot Safe [M]ode: NO ? / / ? 5. Boot [S]ingle User: NO ? .- -. ? 6. Boot [V]erbose: NO ? -- -. ? ? `:` `:` ? ? .-- `--. ? ? .---.....----. ??????????????????????????????????????????? GDB: debug ports: bvm GDB: current port: bvm KDB: debugger backends: ddb gdb KDB: current backend: ddb Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.0-RELEASE #0: Fri Mar 30 01:41:05 PDT 2012 root@bhyve:/usr/obj/usr/src-bhyve/sys/BHYVE amd64 WARNING: WITNESS option enabled, expect reduced performance. CPU: Intel(R) Core(TM) i5-2400S CPU @ 2.50GHz (2499.87-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x206a7 Family = 6 Model = 2a Stepping = 7 Features=0x8fabab7f<FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,APIC,SEP,PGE,CMOV,PAT,PSE36,CLFLUSH,DTS,MMX,FXSR,SSE,SSE2,SS,PBE> Features2=0x97bae25f<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,SMX,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,AVX,HV> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x1<LAHF> TSC: P-state invariant real memory = 6442450944 (6144 MB) avail memory = 2729897984 (2603 MB) MPTable: <NETAPP vFiler > Event timer "LAPIC" quality 400 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 2 package(s) x 1 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 pcib0 pcibus 0 on motherboard pci0: <PCI bus> on pcib0 virtio_pci0: <VirtIO PCI Network adapter> port 0x2000-0x201f at device 1.0 on pci0 vtnet0: <VirtIO Networking Adapter> on virtio_pci0 virtio_pci0: host features: 0x18020 <Status,MrgRxBuf,MacAddress> virtio_pci0: negotiated features: 0x18020 <Status,MrgRxBuf,MacAddress> vtnet0: Ethernet address: 00:a0:98:f6:70:6c virtio_pci1: <VirtIO PCI Block adapter> port 0x2040-0x207f at device 2.0 on pci0 vtblk0: <VirtIO Block Adapter> on virtio_pci1 virtio_pci1: host features: 0x10000004 <RingIndirect,MaxNumSegs> virtio_pci1: negotiated features: 0x10000004 <RingIndirect,MaxNumSegs> vtblk0: 400MB (819200 512 byte sectors) cpu0 on motherboard cpu1 on motherboard isa0: <ISA bus> on motherboard Timecounters tick every 10.000 msec SMP: AP CPU #1 Launched! Timecounter "TSC-low" frequency 9765121 Hz quality 1000 WARNING: WITNESS option enabled, expect reduced performance. Trying to mount root from ufs:vtbd0 []... warning: no time-of-day clock registered, system time will not be set accurately Setting hostuuid: 837fa9d4-7a44-11e1-bef2-00a098f6706c. Setting hostid: 0xa9e35c5b. Entropy harvesting: interrupts ethernet point_to_point kickstart. Starting file system checks: /dev/vtbd0: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/vtbd0: clean, 29775 free (23 frags, 3719 blocks, 0.0% fragmentation) Mounting local file systems:. Setting hostname: bhyve-tap0. vtnet0: link state changed to UP Starting Network: lo0 vtnet0. lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=3<RXCSUM,TXCSUM> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 inet 127.0.0.1 netmask 0xff000000 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> vtnet0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE> ether 00:a0:98:f6:70:6c inet 192.168.1.151 netmask 0xffffff00 broadcast 192.168.1.255 inet6 fe80::2a0:98ff:fef6:706c%vtnet0 prefixlen 64 tentative scopeid 0x1 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet 1000baseT <full-duplex> status: active Starting devd. add net default: gateway 192.168.1.1 add net ::ffff:0.0.0.0: gateway ::1 add net ::0.0.0.0: gateway ::1 add net fe80::: gateway ::1 add net ff02::: gateway ::1 Generating host.conf. Creating and/or trimming log files. Starting syslogd. ELF ldconfig path: /lib /usr/lib /usr/lib/compat 32-bit compatibility ldconfig path: /usr/lib32 Clearing /tmp (X related). Updating motd:. Generating public/private rsa1 key pair. Your identification has been saved in /etc/ssh/ssh_host_key. Your public key has been saved in /etc/ssh/ssh_host_key.pub. The key fingerprint is: ab:9c:f9:9a:5b:b8:f4:91:97:9e:86:8a:b0:67:ba:41 root@bhyve-tap0 The key's randomart image is: +--[RSA1 1024]----+ | | | | | | | | | E S | | . . o . | | o o =.o | | +o+ O.+.. | | +=. @=o.o | +-----------------+ Generating public/private dsa key pair. ... Starting sshd. Starting cron. Starting background file system checks in 60 seconds. Fri Mar 30 08:44:11 UTC 2012 FreeBSD/amd64 (bhyve-tap0) (console) login:
The result is a genuine FreeBSD system running the paired-down BHYVE kernel.
The 400M disk image leaves about 84M of space for experimentation.
Filesystem Size Used Avail Capacity Mounted on /dev/vtbd0 393M 277M 84M 77% / devfs 1.0k 1.0k 0B 100% /dev
Try your favorite software and benchmarks on your bhyve guest and explore its limits. Every step of my testing has generated as many questions as answers and bhyve clearly offers a lot to explore.
Some versions of VMWare reportedly allow for VT-x features to be passed through to the emulator but unfortunately, my notebook does not support EPT.
If the updated binutils
package is not installed, you will see:
{standard input}: Assembler messages: {standard input}:160: Error: no such instruction: 'invept -16(%rbp),%rax'
If the hw.physmem="0x100000000"
is not set, you will see:
vm_setup_memory(lowmem): Cannot allocate memory
If the vmm.ko
module is not loaded when you try to boot the guest, you will see:
vm_create: No such file or directory
If the vmm.ko
is mismatched with the host kernel, you will see:
kldload: can't load vmm: Exec format error
An incompatible system such a Celeron U2300 with the following features in dmesg
will give the following errors when attempting to boot a guest:
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA, CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x400e3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,XSAVE> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> kldload vmm.ko vmx_init: processor does not support desired primary processor-based controls module_register_init: MOD_LOAD (vmm, 0xffffffff816127a0, 0) error 22
Note that ifconfig tap0 up
needs to be run after the guest as booted in order for it to work. A 20 second delay is included in vmrun.sh
to accomodate the time it takes to boot the guest kernel.
Note that "myguest" is an arbitrary guest name and you must be careful to keep track of guest names and their memory allocations. A mismatch can result in:
vm_setup_memory(highmem): Cannot allocate memory
If you receive this message, it can be cleared up by running kldunload vmm
and kldload vmm
to the reload the vmm
kernel module.
Building bhyve guest images is much like building FreeBSD Jails or Xen guests and you can use many of your favorite Jail building techniques to configure and customize them. The script populates a disk image with a userland, configures it to your needs and surrounds it with a loader, kernel, and the appropriate kernel modules. I have followed the layout of Neel's downloadable image in /usr/bhyve-guest/
:
/usr/guest/ boot/kernel/ Containing loader, kernel and modules diskdev Disk image for guest's / partition userboot.so Required by the bhyve* utilities /usr/guest/boot/ beastie.4th kernel menu.rc brand.4th loader.4th screen.4th check-password.4th loader.conf shortcuts.4th color.4th loader.help support.4th defaults loader.rc userboot.so delay.4th menu-commands.4th version.4th frames.4th menu.4th /usr/guest/boot/ if_vtnet.ko virtio.ko virtio_pci.ko kernel virtio_balloon.ko mdroot virtio_blk.ko
The /usr/guest/boot/
listing shows an mdroot
device as per Neel's downloadable image layout. This is the memory-backed disk that he used and is not used by my approach.
As this is a test environment, I run echo "PermitRootLogin yes" >> /mnt/etc/ssh/sshd_config
to allow root
to ssh
into the system.
I would like to thank NYC*BUG, the New York City *BSD User Group for providing bandwidth to share images generated with bhyve-menu.sh
at bhyve.org
I would also like to thank Paul Schenkeveld for helping me debug this at AsiaBSDCon 2012 and let's all thank Neel and Peter for their hard work.
You can find more information and the original BSDCan presentation in the bhyve section of the FreeBSD Wiki.
I welcome your corrections and contributions.
Copyright © 2011 – 2014 Michael Dexter unless specified otherwise. Feedback and corrections welcome.
Happy hacking!