bhyve-Bootable OpenZFS Boot Environments

bhyve-Bootable OpenZFS Boot Environments

http://cft.lv/25

#ZFS #bhyve

June 29th, 2016

Version 1.4

© Michael Dexter

Elegantly-simple multibooting and OS upgrades with ZFS Boot Environments and bhyve

If you are reading this, you have probably used a personal computer with a BSD or GNU/Linux operating system and at some point attempted to multiboot between multiple operating systems on the same computer. This goal is typically attempted with complex disk partitioning and a BSD or GNU/Linux boot loader like LILO or GRUB, plus several hours of frustrating experimentation and perhaps data loss. While exotic OS experimentation has driven my virtualization work since the late 1990s, there are very pragmatic reasons for multibooting the same OS on the same hardware, notable for updates and failback to "known good" versions. To its credit, FreeBSD has long had various strategies including the NanoBSD embedded system framework with primary and secondary root partitions, plus the nextboot(8) utility for selecting the "next" kernel with various boot parameters. Get everything set correctly and you can multiboot "with impunity".

That's a good start, and over time we have seen ZFS "boot environments" be used by PC-BSD and FreeNAS to allow for system updates that allow one to fall back to previous versions should something go wrong. Hats off to these efforts but they exist in essentially purpose-built appliance environments. I have long sensed that there is more fun to be had here and a wonderful thing happened with FreeBSD 10.3 and 11.0: Allan Jude added a boot environment menu to the FreeBSD loader:

  ______               ____   _____ _____
 |  ____|             |  _ \ / ____|  __ \ 
 | |___ _ __ ___  ___ | |_) | (___ | |  | |
 |  ___| '__/ _ \/ _ \|  _ < \___ \| |  | |
 | |   | | |  __/  __/| |_) |____) | |__| |
 | |   | | |    |    ||     |      |      |
 |_|   |_|  \___|\___||____/|_____/|_____/    ```                        `
                                             s` `.....---.......--.```   -/
 +============Welcome to FreeBSD===========+ +o   .--`         /y:`      +.
 |                                         |  yo`:.            :o      `+-
 |  1. Boot Multi User [Enter]             |   y/               -/`   -o/
 |  2. Boot [S]ingle User                  |  .-                  ::/sy+:.
 |  3. [Esc]ape to loader prompt           |  /                     `--  /
 |  4. Reboot                              | `:                          :`
 |                                         | `:                          :`
 |  Options:                               |  /                          /
 |  5. [K]ernel: default/kernel (1 of 2)   |  .-                        -.
 |  6. Configure Boot [O]ptions...         |   --                      -.
 |  7. Select Boot [E]nvironment...        |    `:`                  `:`
 |                                         |      .--             `--.
 |                                         |         .---.....----. 
 +=========================================+

That last entry, 7 is what I am talking about. Press it and you will see:

  ______               ____   _____ _____
 |  ____|             |  _ \ / ____|  __ \ 
 | |___ _ __ ___  ___ | |_) | (___ | |  | |
 |  ___| '__/ _ \/ _ \|  _ < \___ \| |  | |
 | |   | | |  __/  __/| |_) |____) | |__| |
 | |   | | |    |    ||     |      |      |
 |_|   |_|  \___|\___||____/|_____/|_____/    ```                        `
                                             s` `.....---.......--.```   -/
 +============Welcome to FreeBSD===========+ +o   .--`         /y:`      +.
 |                                         |  yo`:.            :o      `+-
 |  1. Active:                             |   y/               -/`   -o/
 |  2. bootfs: zfs:x220/ROOT/default       |  .-                  ::/sy+:.
 |  3. [P]age 1 of 1                       |  /                     `--  /
 |                                         | `:                          :`
 |  Boot Environments:                     | `:                          :`
 |  4. default                             |  /                          /
 |                                         |  .-                        -.
 |                                         |   --                      -.
 |                                         |    `:`                  `:`
 |                                         |      .--             `--.
 |                                         |         .---.....----. 
 +=========================================+

To understand what's going on here, we will look at the ZFS configuration on a freshly-installed FreeBSD 10.3 system:

root@x220:~ # zfs list
NAME                 USED  AVAIL  REFER  MOUNTPOINT
x220               1.58G   283G    96K  /x220
x220/ROOT           451M   283G    96K  none
x220/ROOT/default   450M   283G   450M  /
x220/tmp             96K   283G    96K  /tmp
x220/usr           1.14G   283G    96K  /usr
x220/usr/home        96K   283G    96K  /usr/home
x220/usr/ports      619M   283G   619M  /usr/ports
x220/usr/src        547M   283G   547M  /usr/src
x220/var            612K   283G    96K  /var
x220/var/audit       96K   283G    96K  /var/audit
x220/var/crash       96K   283G    96K  /var/crash
x220/var/log        132K   283G   132K  /var/log
x220/var/mail        96K   283G    96K  /var/mail
x220/var/tmp         96K   283G    96K  /var/tmp

The zpool name is "x220" in sync with the host name on a ThinkPad x220 and we see the various default FreeBSD datasets. Note that the third entry, x220/ROOT/default is set to / and is the same as the above bootfs: menu field. This is the current boot environment of the pool and the working "/" root, akin to a traditional root partition with separate /usr, /var and the like. Unlike traditional partitions however, these datasets share space with the "/" dataset and you should consider setting a quota on them if you want to mitigate the classic "fill up /tmp" denial of service attack. Lacking quotas, I am not convinced that so many datasets are required but that's okay. We can put them in their place and have some fun.

First however, a little context. As you may be aware, I have a non-trivial interest in virtualization and assisted the incorporation of the bhyve hypervisor into FreeBSD 10.0, rather than 11.0 or 12.0, followed by Docker and Xen. In the course of this journey, I built a non-trivial number of hand-crafted FreeBSD releases for use as bhyve hosts and virtual machines. When bhyve arrived in FreeBSD 10.0, I proceeded to continue to test it in "HEAD" development snapshots of FreeBSD couresy of FreeBSD release engineer Glen Barber. I have done this to a point that I have forgotten that FreeBSD has releases and I am very grateful that there are now packages built for FreeBSD HEAD that allow me to say, give a presentation. For those who had to see my presentations in vi(1), I apologize.

That said, I have an equally non-trivial interest in fresh OS installations and the average life expectancy of an installation on my systems is a matter of weeks on hardware and minutes on virtual machines. This has resulted in a non-trivial motivation to simplify this process and I promise that is the last time I use that phrase. So let's simplify the process.

Consolidating the default Boot Environment

Disclaimer: Do not try this on a production system

The first thing to do is to consolidate the "default" boot environment. That is, contain it all in a single dataset and prepare it for cohabitation with additional sibling boot environments. I can safely say that you do NOT want to mount multiple operating systems on the same mount point. i.e.: One "/" on another "/", or child datasets such as "/usr". Similarly, you do not want to forget to mount a desired dataset in a file system tree.

The standard canmountproperties on a FreeBSD installation are:

root@x220:~ # zfs list -H -o name | xargs zfs get canmount
NAME               PROPERTY  VALUE     SOURCE
x220               canmount  on        default
x220/ROOT          canmount  on        default
x220/ROOT/default  canmount  on        default
x220/tmp           canmount  on        default
x220/usr           canmount  off       local
x220/usr/home      canmount  on        default
x220/usr/ports     canmount  on        default
x220/usr/src       canmount  on        default
x220/var           canmount  off       local
x220/var/audit     canmount  on        default
x220/var/crash     canmount  on        default
x220/var/log       canmount  on        default
x220/var/mail      canmount  on        default
x220/var/tmp       canmount  on        default

With the mount points:

root@x220:~ # zfs list -H -o name | xargs zfs get mountpoint
NAME               PROPERTY    VALUE       SOURCE
x220               mountpoint  /x220       local
x220/ROOT          mountpoint  none        local
x220/ROOT/default  mountpoint  /           local
x220/tmp           mountpoint  /tmp        local
x220/usr           mountpoint  /usr        local
x220/usr/home      mountpoint  /usr/home   inherited from x220/usr
x220/usr/ports     mountpoint  /usr/ports  inherited from x220/usr
x220/usr/src       mountpoint  /usr/src    inherited from x220/usr
x220/var           mountpoint  /var        local
x220/var/audit     mountpoint  /var/audit  inherited from x220/var
x220/var/crash     mountpoint  /var/crash  inherited from x220/var
x220/var/log       mountpoint  /var/log    inherited from x220/var
x220/var/mail      mountpoint  /var/mail   inherited from x220/var
x220/var/tmp       mountpoint  /var/tmp    inherited from x220/var

At this point we must make a decision between preserving the default FreeBSD partitioning for our boot environment or go with a single dataset not unlike you would have with a FreeBSD jail. The challenge is that the canmount property is a blunt instrument: You want it "on" when you want a given file system hierarchy to mount at boot and you want it "off" or "noauto" for an inactive one. If you set the default layout to "noauto", the datasets below the /usr and /var datasets will not be mounted, causing for example, new logs and the to collect in folders in /var, rather than the original datasets. If you seek simplicity, skip to "Creating a new simple Boot Environment" section below and perform a zfs destroy -r on the default boot environment. Else, let's push some limits. Change the datasets with "canmount on" to "canmount noauto", keeping in mind that doing only this will cause the descendants of /usr and /var to not be mounted without additional steps:

root@x220:~ # zfs list -Hp -o name | xargs zfs get -H canmount | grep on | cut -f1 | xargs zfs set canmount=noauto

root@x220:~ # zfs list -Hp -o name | xargs zfs get -H canmount
x220	canmount	noauto	local
x220/ROOT	canmount	noauto	local
x220/ROOT/default	canmount	noauto	local
x220/tmp	canmount	noauto	local
x220/usr	canmount	off	local
x220/usr/home	canmount	noauto	local
x220/usr/ports	canmount	noauto	local
x220/usr/src	canmount	noauto	local
x220/var	canmount	off	local
x220/var/audit	canmount	noauto	local
x220/var/crash	canmount	noauto	local
x220/var/log	canmount	noauto	local
x220/var/mail	canmount	noauto	local
x220/var/tmp	canmount	noauto	local

Enter `beadm`

As for those additional steps. The sysutils/beadm port exists to simplify boot environment management via the activate and mount commands, among others. The activate command toggles canmount properties between noauto and on, plus sets the zpool's bootfs property accordingly.

Unfortunately, at this time of writing, beadm has a bug in which it does not preserve the canmount=off properties of the /usr and /var datasets, causing the system to fail to boot. Remember to set these manually and I will investigate this bug.

That said, let's perform the actual consolidation of the child datasets to x220/ROOT/default:

root@x220:~ # zfs rename x220/tmp x220/ROOT/default/tmp
root@x220:~ # zfs rename x220/usr x220/ROOT/default/usr
root@x220:~ # zfs rename x220/var x220/ROOT/default/var
cannot unmount '/var/log': Device busy
root@x220:~ # zfs rename -f x220/var x220/ROOT/default/var

root@x220:~ # zfs list
NAME                           USED  AVAIL  REFER  MOUNTPOINT
x220                         1.58G   283G    96K  /x220
x220/ROOT                    1.58G   283G    96K  none
x220/ROOT/default            1.58G   283G   450M  /
x220/ROOT/default/tmp          96K   283G    96K  /tmp
x220/ROOT/default/usr        1.14G   283G    96K  /usr
x220/ROOT/default/usr/home     96K   283G    96K  /usr/home
x220/ROOT/default/usr/ports   619M   283G   619M  /usr/ports
x220/ROOT/default/usr/src     547M   283G   547M  /usr/src
x220/ROOT/default/var         616K   283G    96K  /var
x220/ROOT/default/var/audit    96K   283G    96K  /var/audit
x220/ROOT/default/var/crash    96K   283G    96K  /var/crash
x220/ROOT/default/var/log     136K   283G   136K  /var/log
x220/ROOT/default/var/mail     96K   283G    96K  /var/mail
x220/ROOT/default/var/tmp      96K   283G    96K  /var/tmp

Notice that the /var dataset needed the -f "force" flag for the rename. The "default" dataset is now fully consolidated and two things stand out. First, the pool is still mounted on /x220 like any storage pool without an operating system. I find this default dataset very useful as a data directory that transcends different operating systems. Second, you can choose to keep a few datasets separate such as /usr/home which is supposed to be system agnostic or in all practicality, a custom /var/cache/pkg dataset if you find yourself constantly trying FreeNAS HEAD snapshots. I suggest a reboot to verify your work.

The pool should now be ready for completely-independent boot environments.

Duplicating the default boot environment

Considering that we are taking the limits-pushing route, let's look at making a copy of the ROOT/default boot environment:

zfs snap -r x220/ROOT/default@export
zfs create x220/ROOT/duplicate
zfs send -R x220/ROOT/default@export | zfs receive -F x220/ROOT/duplicate
zfs list
internal error: failed to initialize ZFS library
reboot

What we've done is send a snapshot of the ROOT/default boot environment to ROOT/duplicate. As for the internal error, my guess is that despite the fact that all canmount properties are set to noauto, the snapshot arrived in a mounted state, causing the default and duplicate boot environments to be mounted atop one another. Fortunately, a reboot will resolve this but keep in mind that the descendants of /usr and /var will not be mounted thanks to the canmount=noauto property.

With the system up, you can use beadm to properly set the canmount and bootfs properties but remember the bug I mentioned above:

root@x220:~ # beadm list
BE        Active Mountpoint  Space Created		 Nickname
default   -      -            2.9G 2016-07-29 17:34	 default
duplicate NR     /            2.9G 2016-08-02 07:23	 duplicate
root@x220:~ # beadm activate duplicate
Activated successfully
root@x220:~ # zfs set canmount=off x220/ROOT/duplicate/usr
root@x220:~ # zfs set canmount=off x220/ROOT/duplicate/var
reboot

There you have it, true FreeBSD on FreeBSD dual-booting based on the default boot environment. However, having beadm set canmount=on on all child datasets would cause a collision if you choose a different boot environment at boot. This may be resolvable with a few boot loader adjustments.

For now, the solution is to create a simpler setup with a similar or different version of FreeBSD.

Creating a new simple Boot Environment

At this point, nothing will behave differently from the standard installation but it does lay the groundwork for the fun part. Consider:

root@x220:~ # zfs create -o canmount=noauto -o mountpoint=/ x220/ROOT/11.0-CURRENT
root@x220:~ # mount -t zfs x220/ROOT/11.0-CURRENT /mnt

root@x220:~ # pkg install -y security/ca_root_nss
...

root@x220:~ # fetch -o - https://download.freebsd.org/ftp/snapshots/amd64/10.3-STABLE/base.txz | tar xf - -C /mnt/

root@x220:~ # fetch -o - https://download.freebsd.org/ftp/snapshots/amd64/10.3-STABLE/kernel.txz | tar xf - -C /mnt/

root@x220:~ # cp /boot/loader.conf /mnt/boot/

root@x220:~ # ls /mnt
.cshrc		boot		libexec		rescue		tmp
.profile	dev		media		root		usr
COPYRIGHT	etc		mnt		sbin		var
bin		lib		proc		sys

root@x220:~ # umount /mnt

We now have the latest snapshot of FreeBSD 11.0-CURRENT installed to a ZFS boot environment that was mounted on /mnt. Note /boot/loader.conf step as it is important to our success, else the ZFS pool will not be found because the kernel module is not loaded.

Reboot and choose the Select Boot Environment... menu in the loader to see the result:

  ______               ____   _____ _____
 |  ____|             |  _ \ / ____|  __ \
 | |___ _ __ ___  ___ | |_) | (___ | |  | |
 |  ___| '__/ _ \/ _ \|  _ < \___ \| |  | |
 | |   | | |  __/  __/| |_) |____) | |__| |
 | |   | | |    |    ||     |      |      |
 |_|   |_|  \___|\___||____/|_____/|_____/    ```                        `
                                             s` `.....---.......--.```   -/
 +============Welcome to FreeBSD===========+ +o   .--`         /y:`      +.
 |                                         |  yo`:.            :o      `+-
 |  1. Active:                             |   y/               -/`   -o/
 |  2. bootfs: zfs:x220/ROOT/default       |  .-                  ::/sy+:.
 |  3. [P]age 1 of 1                       |  /                     `--  /
 |                                         | `:                          :`
 |  Boot Environments:                     | `:                          :`
 |  4. 11.0-CURRENT                        |  /                          /
 |  5. default                             |  .-                        -.
 |                                         |   --                      -.
 |                                         |    `:`                  `:`
 |                                         |      .--             `--.
 |                                         |         .---.....----.
 +=========================================+

Press 4 to select 11.0-CURRENT and ENTER to boot to it.

This should boot to 11.0-CURRENT with no password set because it is a completely fresh installation. A reboot will return you to 10.3-STABLE but if you look closely at the boot loaders above, you will notice the bootfs property mentioned. This is the ZFS pool-level property which determines the default boot environment. To set our system to boot to 11.0-CURRENT rather than 10.3-STABLE, we run:

root@x220:~ # zpool set bootfs=x220/ROOT/11.0-CURRENT x220

With the new default boot environment set, you can DESTROY THE DEFAULT BOOT ENVIRONMENT IF YOU LIKE. The system is only dependent on the ZFS pool and 10.3/11.0 boot loader and hopefully the implications of this is starting to sink in. We have achieved a "dual boot" FreeBSD 10.3 and 11.0 system where each version is contained in a single dataset under /ROOT//. Taking this a step further, I have performed zfs send operations on FreeNAS 9 and 10 installations to new boot environments, thus quad-booting my system. Where this really gets interesting is in the fact that there are experimental OpenIndiana Illumos builds that use the FreeBSD UEFI boot loader. It should not be too difficult to truly dual boot FreeBSD and Illumos using the same ZFS pool. (!) I do not know if GNU/Linux has ZFS boot environment support or a plan for the fully-legal incorporation of ZFS.

Booting the new Boot Environment with bhyve

While the above steps have completely changed the way I track FreeBSD CURRENT, we should entertain the potential of the bhyve hypervisor in all this. You will want Git for this and the source tree for the currently-running system. The system x220 has a fixed IP address of 192.168.1.202 on interface em0. Consider this strategy based on the excellent work of oshogbo and Stefan Bethke:

root@x220:~ # pkg install -y git
...

root@x220:~ # git clone https://github.com/stblassitude/boot_root_nfs
root@x220:~ # cd boot_root_nfs
root@x220:~/boot_root_nfs # make

root@x220:~/boot_root_nfs # mount -t zfs x220/ROOT/11.0-CURRENT /mnt
root@x220:~/boot_root_nfs # echo "/mnt -mapall=0 -alldirs" >> /etc/exports
root@x220:~/boot_root_nfs # service nfsd onestart

root@x220:~/boot_root_nfs # ./boot_root_nfs 192.168.1.202:/mnt /
-e boot.nfsroot.server=192.168.1.202
-e boot.nfsroot.nfshandle=X631083b5dea37b840a00040000000000e10000000000000000000000X
-e boot.nfsroot.nfshandlelen=28
-e boot.nfsroot.path=/mnt

root@x220:~/boot_root_nfs # kldload vmm
root@x220:~/boot_root_nfs # ifconfig bridge0 create up
root@x220:~/boot_root_nfs # ifconfig bridge0 addm em0
root@x220:~/boot_root_nfs # ifconfig tap0 create up
root@x220:~/boot_root_nfs # ifconfig bridge0 addm tap0

root@x220:~/boot_root_nfs # bhyveload -h /mnt \
        -e boot.netif.name=vtnet0 \
        -e boot.netif.hwaddr=02:01:02:03:04:05 \
        -e boot.netif.ip=192.168.1.202 \
        -e boot.netif.netmask=255.255.255.0 \
        -e boot.nfsroot.server=192.168.1.202 \
        -e boot.nfsroot.nfshandle=X631083b5dea37b840a00040000000000e10000000000000000000000X \
        -e boot.nfsroot.nfshandlelen=28 \
        -e boot.nfsroot.path=/mnt \
        -e vfs.root.mountfrom=nfs:192.168.1.202:/mnt \
        -e vfs.root.mountfrom.options=rw \
        -m 1024 vm0
  ______               ____   _____ _____  
 |  ____|             |  _ \ / ____|  __ \ 
 | |___ _ __ ___  ___ | |_) | (___ | |  | |
 |  ___| '__/ _ \/ _ \|  _ < \___ \| |  | |
 | |   | | |  __/  __/| |_) |____) | |__| |
 | |   | | |    |    ||     |      |      |
 |_|   |_|  \___|\___||____/|_____/|_____/    ```                        `
                                             s` `.....---.......--.```   -/
 +============Welcome to FreeBSD===========+ +o   .--`         /y:`      +.
 |                                         |  yo`:.            :o      `+-
 |  1. Boot Multi User [Enter]             |   y/               -/`   -o/
 |  2. Boot [S]ingle User                  |  .-                  ::/sy+:.
 |  3. [Esc]ape to loader prompt           |  /                     `--  /
 |  4. Reboot                              | `:                          :`
 |                                         | `:                          :`
 |  Options:                               |  /                          /
 |  5. [K]ernel: kernel (1 of 2)           |  .-                        -.
 |  6. Configure Boot [O]ptions...         |   --                      -.
 |                                         |    `:`                  `:`
 |                                         |      .--             `--.
 |                                         |         .---.....----. 
 +=========================================+

/boot/kernel/kernel text=0x1408ed0 data=0x135ab8+0x4e6c78 syms=[0x8+0x160d70+0x8+0x179379]
/boot/kernel/zfs.ko size 0x38f990 at 0x2101000
loading required module 'opensolaris'
/boot/kernel/opensolaris.ko size 0xcb00 at 0x2491000
Booting...

root@x220:~/boot_root_nfs # bhyve -c 2 -m 1024 -H -A  \
-s 0,hostbridge \
-s 8:0,virtio-net,tap0,mac=02:01:02:03:04:05 \
-s 31,lpc -l com1,stdio \
vm0

...

FreeBSD/amd64 (Amnesiac) (ttyu0)

login: root

...

root@:~ # uname -a
FreeBSD  11.0-CURRENT FreeBSD 11.0-CURRENT #0 r300097: Wed May 18 01:54:55 UTC 2016
root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
root@:~ #

If everything goes according to plan, this should boot the x220/ROOT/11.0-CURRENT boot environment, mounted on /mnt and shared via NFS as 192.168.1.202:/mnt, under the bhyve hypervisor.

There are countless ways to adjust this but it provides the building blocks for you to not only install additional operating system versions effortlessly but also boot them under bhyve prior to booting them on bare metal. Upgraded and forgot some tweak you made on your previous system? Simply boot the previous boot environment and retrieve it. Preparing a major update? Snapshot your current boot environment and clone it to a new one.

Future Developments

Using NFS to achieve this is admittedly clumsy. Jakub Klama has been experimenting with Plan 9's 9P protocol and has diskless 9P bhyve booting working with GNU/Linux in FreeNAS 10. FreeBSD support is forthcoming. Kris Moore has also added Boot Environment import/export support to the beadm(1) Boot Environment Administration utility and I think we're on to something: Installing FreeBSD releases, updates and derivatives like FreeNAS and pfSense can be delivered directly to a new boot environment via a ZFS send. I welcome your ideas on how this can work.

BSDNow Feedback

Thank you Allan Jude and Kris Moore for the recent mention of this article on BSDNow. While using the built-in NFS sharing tied to ZFS, the target boot environment must be in a directory in which bhyveload -h can find the boot directory and kernel. In the long run, 9P will be the way to go for diskless bhyve boots.