Elegantly-simple multibooting and OS upgrades with ZFS Boot Environments and bhyve
If you are reading this, you have probably used a personal computer with a BSD or GNU/Linux operating system and at some point attempted to multiboot between multiple operating systems on the same computer. This goal is typically attempted with complex disk partitioning and a BSD or GNU/Linux boot loader like LILO or GRUB, plus several hours of frustrating experimentation and perhaps data loss. While exotic OS experimentation has driven my virtualization work since the late 1990s, there are very pragmatic reasons for multibooting the same OS on the same hardware, notable for updates and failback to "known good" versions. To its credit, FreeBSD has long had various strategies including the NanoBSD embedded system framework with primary and secondary root partitions, plus the nextboot(8)
utility for selecting the "next" kernel with various boot parameters. Get everything set correctly and you can multiboot "with impunity".
That's a good start, and over time we have seen ZFS "boot environments" be used by PC-BSD and FreeNAS to allow for system updates that allow one to fall back to previous versions should something go wrong. Hats off to these efforts but they exist in essentially purpose-built appliance environments. I have long sensed that there is more fun to be had here and a wonderful thing happened with FreeBSD 10.3 and 11.0: Allan Jude added a boot environment menu to the FreeBSD loader:
______ ____ _____ _____ | ____| | _ \ / ____| __ \ | |___ _ __ ___ ___ | |_) | (___ | | | | | ___| '__/ _ \/ _ \| _ < \___ \| | | | | | | | | __/ __/| |_) |____) | |__| | | | | | | | || | | | |_| |_| \___|\___||____/|_____/|_____/ ``` ` s` `.....---.......--.``` -/ +============Welcome to FreeBSD===========+ +o .--` /y:` +. | | yo`:. :o `+- | 1. Boot Multi User [Enter] | y/ -/` -o/ | 2. Boot [S]ingle User | .- ::/sy+:. | 3. [Esc]ape to loader prompt | / `-- / | 4. Reboot | `: :` | | `: :` | Options: | / / | 5. [K]ernel: default/kernel (1 of 2) | .- -. | 6. Configure Boot [O]ptions... | -- -. | 7. Select Boot [E]nvironment... | `:` `:` | | .-- `--. | | .---.....----. +=========================================+
That last entry, 7 is what I am talking about. Press it and you will see:
______ ____ _____ _____ | ____| | _ \ / ____| __ \ | |___ _ __ ___ ___ | |_) | (___ | | | | | ___| '__/ _ \/ _ \| _ < \___ \| | | | | | | | | __/ __/| |_) |____) | |__| | | | | | | | || | | | |_| |_| \___|\___||____/|_____/|_____/ ``` ` s` `.....---.......--.``` -/ +============Welcome to FreeBSD===========+ +o .--` /y:` +. | | yo`:. :o `+- | 1. Active: | y/ -/` -o/ | 2. bootfs: zfs:x220/ROOT/default | .- ::/sy+:. | 3. [P]age 1 of 1 | / `-- / | | `: :` | Boot Environments: | `: :` | 4. default | / / | | .- -. | | -- -. | | `:` `:` | | .-- `--. | | .---.....----. +=========================================+
To understand what's going on here, we will look at the ZFS configuration on a freshly-installed FreeBSD 10.3 system:
root@x220:~ # zfs list NAME USED AVAIL REFER MOUNTPOINT x220 1.58G 283G 96K /x220 x220/ROOT 451M 283G 96K none x220/ROOT/default 450M 283G 450M / x220/tmp 96K 283G 96K /tmp x220/usr 1.14G 283G 96K /usr x220/usr/home 96K 283G 96K /usr/home x220/usr/ports 619M 283G 619M /usr/ports x220/usr/src 547M 283G 547M /usr/src x220/var 612K 283G 96K /var x220/var/audit 96K 283G 96K /var/audit x220/var/crash 96K 283G 96K /var/crash x220/var/log 132K 283G 132K /var/log x220/var/mail 96K 283G 96K /var/mail x220/var/tmp 96K 283G 96K /var/tmp
The zpool name is "x220" in sync with the host name on a ThinkPad x220 and we see the various default FreeBSD datasets. Note that the third entry, x220/ROOT/default
is set to /
and is the same as the above bootfs:
menu field. This is the current boot environment of the pool and the working "/" root, akin to a traditional root partition with separate /usr
, /var
and the like. Unlike traditional partitions however, these datasets share space with the "/" dataset and you should consider setting a quota on them if you want to mitigate the classic "fill up /tmp" denial of service attack. Lacking quotas, I am not convinced that so many datasets are required but that's okay. We can put them in their place and have some fun.
First however, a little context. As you may be aware, I have a non-trivial interest in virtualization and assisted the incorporation of the bhyve hypervisor into FreeBSD 10.0, rather than 11.0 or 12.0, followed by Docker and Xen. In the course of this journey, I built a non-trivial number of hand-crafted FreeBSD releases for use as bhyve hosts and virtual machines. When bhyve arrived in FreeBSD 10.0, I proceeded to continue to test it in "HEAD" development snapshots of FreeBSD couresy of FreeBSD release engineer Glen Barber. I have done this to a point that I have forgotten that FreeBSD has releases and I am very grateful that there are now packages built for FreeBSD HEAD that allow me to say, give a presentation. For those who had to see my presentations in vi(1)
, I apologize.
That said, I have an equally non-trivial interest in fresh OS installations and the average life expectancy of an installation on my systems is a matter of weeks on hardware and minutes on virtual machines. This has resulted in a non-trivial motivation to simplify this process and I promise that is the last time I use that phrase. So let's simplify the process.
Disclaimer: Do not try this on a production system
The first thing to do is to consolidate the "default" boot environment. That is, contain it all in a single dataset and prepare it for cohabitation with additional sibling boot environments. I can safely say that you do NOT want to mount multiple operating systems on the same mount point. i.e.: One "/" on another "/", or child datasets such as "/usr
". Similarly, you do not want to forget to mount a desired dataset in a file system tree.
The standard canmount
properties on a FreeBSD installation are:
root@x220:~ # zfs list -H -o name | xargs zfs get canmount NAME PROPERTY VALUE SOURCE x220 canmount on default x220/ROOT canmount on default x220/ROOT/default canmount on default x220/tmp canmount on default x220/usr canmount off local x220/usr/home canmount on default x220/usr/ports canmount on default x220/usr/src canmount on default x220/var canmount off local x220/var/audit canmount on default x220/var/crash canmount on default x220/var/log canmount on default x220/var/mail canmount on default x220/var/tmp canmount on default
With the mount points:
root@x220:~ # zfs list -H -o name | xargs zfs get mountpoint NAME PROPERTY VALUE SOURCE x220 mountpoint /x220 local x220/ROOT mountpoint none local x220/ROOT/default mountpoint / local x220/tmp mountpoint /tmp local x220/usr mountpoint /usr local x220/usr/home mountpoint /usr/home inherited from x220/usr x220/usr/ports mountpoint /usr/ports inherited from x220/usr x220/usr/src mountpoint /usr/src inherited from x220/usr x220/var mountpoint /var local x220/var/audit mountpoint /var/audit inherited from x220/var x220/var/crash mountpoint /var/crash inherited from x220/var x220/var/log mountpoint /var/log inherited from x220/var x220/var/mail mountpoint /var/mail inherited from x220/var x220/var/tmp mountpoint /var/tmp inherited from x220/var
At this point we must make a decision between preserving the default FreeBSD partitioning for our boot environment or go with a single dataset not unlike you would have with a FreeBSD jail. The challenge is that the canmount
property is a blunt instrument: You want it "on" when you want a given file system hierarchy to mount at boot and you want it "off" or "noauto" for an inactive one. If you set the default layout to "noauto", the datasets below the /usr
and /var
datasets will not be mounted, causing for example, new logs and the to collect in folders in /var
, rather than the original datasets.
If you seek simplicity, skip to "Creating a new simple Boot Environment" section below and perform a zfs destroy -r
on the default
boot environment.
Else, let's push some limits.
Change the datasets with "canmount on" to "canmount noauto", keeping in mind that doing only this will cause the descendants of /usr
and /var
to not be mounted without additional steps:
root@x220:~ # zfs list -Hp -o name | xargs zfs get -H canmount | grep on | cut -f1 | xargs zfs set canmount=noauto root@x220:~ # zfs list -Hp -o name | xargs zfs get -H canmount x220 canmount noauto local x220/ROOT canmount noauto local x220/ROOT/default canmount noauto local x220/tmp canmount noauto local x220/usr canmount off local x220/usr/home canmount noauto local x220/usr/ports canmount noauto local x220/usr/src canmount noauto local x220/var canmount off local x220/var/audit canmount noauto local x220/var/crash canmount noauto local x220/var/log canmount noauto local x220/var/mail canmount noauto local x220/var/tmp canmount noauto local
beadm
As for those additional steps. The sysutils/beadm
port exists to simplify boot environment management via the activate
and mount
commands, among others. The activate
command toggles canmount
properties between noauto
and on
, plus sets the zpool's bootfs
property accordingly.
Unfortunately, at this time of writing, beadm
has a bug in which it does not preserve the canmount=off
properties of the /usr
and /var
datasets, causing the system to fail to boot. Remember to set these manually and I will investigate this bug.
That said, let's perform the actual consolidation of the child datasets to x220/ROOT/default
:
root@x220:~ # zfs rename x220/tmp x220/ROOT/default/tmp root@x220:~ # zfs rename x220/usr x220/ROOT/default/usr root@x220:~ # zfs rename x220/var x220/ROOT/default/var cannot unmount '/var/log': Device busy root@x220:~ # zfs rename -f x220/var x220/ROOT/default/var root@x220:~ # zfs list NAME USED AVAIL REFER MOUNTPOINT x220 1.58G 283G 96K /x220 x220/ROOT 1.58G 283G 96K none x220/ROOT/default 1.58G 283G 450M / x220/ROOT/default/tmp 96K 283G 96K /tmp x220/ROOT/default/usr 1.14G 283G 96K /usr x220/ROOT/default/usr/home 96K 283G 96K /usr/home x220/ROOT/default/usr/ports 619M 283G 619M /usr/ports x220/ROOT/default/usr/src 547M 283G 547M /usr/src x220/ROOT/default/var 616K 283G 96K /var x220/ROOT/default/var/audit 96K 283G 96K /var/audit x220/ROOT/default/var/crash 96K 283G 96K /var/crash x220/ROOT/default/var/log 136K 283G 136K /var/log x220/ROOT/default/var/mail 96K 283G 96K /var/mail x220/ROOT/default/var/tmp 96K 283G 96K /var/tmp
Notice that the /var
dataset needed the -f
"force" flag for the rename. The "default" dataset is now fully consolidated and two things stand out. First, the pool is still mounted on /x220
like any storage pool without an operating system. I find this default dataset very useful as a data directory that transcends different operating systems. Second, you can choose to keep a few datasets separate such as /usr/home
which is supposed to be system agnostic or in all practicality, a custom /var/cache/pkg
dataset if you find yourself constantly trying FreeNAS HEAD snapshots. I suggest a reboot to verify your work.
The pool should now be ready for completely-independent boot environments.
Considering that we are taking the limits-pushing route, let's look at making a copy of the ROOT/default
boot environment:
zfs snap -r x220/ROOT/default@export zfs create x220/ROOT/duplicate zfs send -R x220/ROOT/default@export | zfs receive -F x220/ROOT/duplicate zfs list internal error: failed to initialize ZFS library reboot
What we've done is send a snapshot of the ROOT/default
boot environment to ROOT/duplicate
. As for the internal error
, my guess is that despite the fact that all canmount
properties are set to noauto
, the snapshot arrived in a mounted state, causing the default
and duplicate
boot environments to be mounted atop one another. Fortunately, a reboot
will resolve this but keep in mind that the descendants of /usr
and /var
will not be mounted thanks to the canmount=noauto
property.
With the system up, you can use beadm
to properly set the canmount
and bootfs
properties but remember the bug I mentioned above:
root@x220:~ # beadm list BE Active Mountpoint Space Created Nickname default - - 2.9G 2016-07-29 17:34 default duplicate NR / 2.9G 2016-08-02 07:23 duplicate root@x220:~ # beadm activate duplicate Activated successfully root@x220:~ # zfs set canmount=off x220/ROOT/duplicate/usr root@x220:~ # zfs set canmount=off x220/ROOT/duplicate/var reboot
There you have it, true FreeBSD on FreeBSD dual-booting based on the default boot environment. However, having beadm
set canmount=on
on all child datasets would cause a collision if you choose a different boot environment at boot. This may be resolvable with a few boot loader adjustments.
For now, the solution is to create a simpler setup with a similar or different version of FreeBSD.
At this point, nothing will behave differently from the standard installation but it does lay the groundwork for the fun part. Consider:
root@x220:~ # zfs create -o canmount=noauto -o mountpoint=/ x220/ROOT/11.0-CURRENT root@x220:~ # mount -t zfs x220/ROOT/11.0-CURRENT /mnt root@x220:~ # pkg install -y security/ca_root_nss ... root@x220:~ # fetch -o - https://download.freebsd.org/ftp/snapshots/amd64/10.3-STABLE/base.txz | tar xf - -C /mnt/ root@x220:~ # fetch -o - https://download.freebsd.org/ftp/snapshots/amd64/10.3-STABLE/kernel.txz | tar xf - -C /mnt/ root@x220:~ # cp /boot/loader.conf /mnt/boot/ root@x220:~ # ls /mnt .cshrc boot libexec rescue tmp .profile dev media root usr COPYRIGHT etc mnt sbin var bin lib proc sys root@x220:~ # umount /mnt
We now have the latest snapshot of FreeBSD 11.0-CURRENT installed to a ZFS boot environment that was mounted on /mnt
. Note /boot/loader.conf
step as it is important to our success, else the ZFS pool will not be found because the kernel module is not loaded.
Reboot and choose the Select Boot Environment...
menu in the loader to see the result:
______ ____ _____ _____ | ____| | _ \ / ____| __ \ | |___ _ __ ___ ___ | |_) | (___ | | | | | ___| '__/ _ \/ _ \| _ < \___ \| | | | | | | | | __/ __/| |_) |____) | |__| | | | | | | | || | | | |_| |_| \___|\___||____/|_____/|_____/ ``` ` s` `.....---.......--.``` -/ +============Welcome to FreeBSD===========+ +o .--` /y:` +. | | yo`:. :o `+- | 1. Active: | y/ -/` -o/ | 2. bootfs: zfs:x220/ROOT/default | .- ::/sy+:. | 3. [P]age 1 of 1 | / `-- / | | `: :` | Boot Environments: | `: :` | 4. 11.0-CURRENT | / / | 5. default | .- -. | | -- -. | | `:` `:` | | .-- `--. | | .---.....----. +=========================================+
Press 4
to select 11.0-CURRENT
and ENTER to boot to it.
This should boot to 11.0-CURRENT with no password set because it is a completely fresh installation. A reboot will return you to 10.3-STABLE but if you look closely at the boot loaders above, you will notice the bootfs
property mentioned. This is the ZFS pool-level property which determines the default boot environment. To set our system to boot to 11.0-CURRENT rather than 10.3-STABLE, we run:
root@x220:~ # zpool set bootfs=x220/ROOT/11.0-CURRENT x220
With the new default boot environment set, you can DESTROY THE DEFAULT BOOT ENVIRONMENT IF YOU LIKE. The system is only dependent on the ZFS pool and 10.3/11.0 boot loader and hopefully the implications of this is starting to sink in. We have achieved a "dual boot" FreeBSD 10.3 and 11.0 system where each version is contained in a single dataset under
/. Taking this a step further, I have performed zfs send
operations on FreeNAS 9 and 10 installations to new boot environments, thus quad-booting my system. Where this really gets interesting is in the fact that there are experimental OpenIndiana Illumos builds that use the FreeBSD UEFI boot loader. It should not be too difficult to truly dual boot FreeBSD and Illumos using the same ZFS pool. (!) I do not know if GNU/Linux has ZFS boot environment support or a plan for the fully-legal incorporation of ZFS.
While the above steps have completely changed the way I track FreeBSD CURRENT, we should entertain the potential of the bhyve hypervisor in all this. You will want Git for this and the source tree for the currently-running system. The system x220 has a fixed IP address of 192.168.1.202 on interface em0. Consider this strategy based on the excellent work of oshogbo and Stefan Bethke:
root@x220:~ # pkg install -y git ... root@x220:~ # git clone https://github.com/stblassitude/boot_root_nfs root@x220:~ # cd boot_root_nfs root@x220:~/boot_root_nfs # make root@x220:~/boot_root_nfs # mount -t zfs x220/ROOT/11.0-CURRENT /mnt root@x220:~/boot_root_nfs # echo "/mnt -mapall=0 -alldirs" >> /etc/exports root@x220:~/boot_root_nfs # service nfsd onestart root@x220:~/boot_root_nfs # ./boot_root_nfs 192.168.1.202:/mnt / -e boot.nfsroot.server=192.168.1.202 -e boot.nfsroot.nfshandle=X631083b5dea37b840a00040000000000e10000000000000000000000X -e boot.nfsroot.nfshandlelen=28 -e boot.nfsroot.path=/mnt root@x220:~/boot_root_nfs # kldload vmm root@x220:~/boot_root_nfs # ifconfig bridge0 create up root@x220:~/boot_root_nfs # ifconfig bridge0 addm em0 root@x220:~/boot_root_nfs # ifconfig tap0 create up root@x220:~/boot_root_nfs # ifconfig bridge0 addm tap0 root@x220:~/boot_root_nfs # bhyveload -h /mnt \ -e boot.netif.name=vtnet0 \ -e boot.netif.hwaddr=02:01:02:03:04:05 \ -e boot.netif.ip=192.168.1.202 \ -e boot.netif.netmask=255.255.255.0 \ -e boot.nfsroot.server=192.168.1.202 \ -e boot.nfsroot.nfshandle=X631083b5dea37b840a00040000000000e10000000000000000000000X \ -e boot.nfsroot.nfshandlelen=28 \ -e boot.nfsroot.path=/mnt \ -e vfs.root.mountfrom=nfs:192.168.1.202:/mnt \ -e vfs.root.mountfrom.options=rw \ -m 1024 vm0 ______ ____ _____ _____ | ____| | _ \ / ____| __ \ | |___ _ __ ___ ___ | |_) | (___ | | | | | ___| '__/ _ \/ _ \| _ < \___ \| | | | | | | | | __/ __/| |_) |____) | |__| | | | | | | | || | | | |_| |_| \___|\___||____/|_____/|_____/ ``` ` s` `.....---.......--.``` -/ +============Welcome to FreeBSD===========+ +o .--` /y:` +. | | yo`:. :o `+- | 1. Boot Multi User [Enter] | y/ -/` -o/ | 2. Boot [S]ingle User | .- ::/sy+:. | 3. [Esc]ape to loader prompt | / `-- / | 4. Reboot | `: :` | | `: :` | Options: | / / | 5. [K]ernel: kernel (1 of 2) | .- -. | 6. Configure Boot [O]ptions... | -- -. | | `:` `:` | | .-- `--. | | .---.....----. +=========================================+ /boot/kernel/kernel text=0x1408ed0 data=0x135ab8+0x4e6c78 syms=[0x8+0x160d70+0x8+0x179379] /boot/kernel/zfs.ko size 0x38f990 at 0x2101000 loading required module 'opensolaris' /boot/kernel/opensolaris.ko size 0xcb00 at 0x2491000 Booting... root@x220:~/boot_root_nfs # bhyve -c 2 -m 1024 -H -A \ -s 0,hostbridge \ -s 8:0,virtio-net,tap0,mac=02:01:02:03:04:05 \ -s 31,lpc -l com1,stdio \ vm0 ... FreeBSD/amd64 (Amnesiac) (ttyu0) login: root ... root@:~ # uname -a FreeBSD 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r300097: Wed May 18 01:54:55 UTC 2016 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 root@:~ #
If everything goes according to plan, this should boot the x220/ROOT/11.0-CURRENT
boot environment, mounted on /mnt and shared via NFS as 192.168.1.202:/mnt
, under the bhyve hypervisor.
There are countless ways to adjust this but it provides the building blocks for you to not only install additional operating system versions effortlessly but also boot them under bhyve prior to booting them on bare metal. Upgraded and forgot some tweak you made on your previous system? Simply boot the previous boot environment and retrieve it. Preparing a major update? Snapshot your current boot environment and clone it to a new one.
Using NFS to achieve this is admittedly clumsy. Jakub Klama has been experimenting with Plan 9's 9P protocol and has diskless 9P bhyve booting working with GNU/Linux in FreeNAS 10. FreeBSD support is forthcoming. Kris Moore has also added Boot Environment import/export support to the beadm(1)
Boot Environment Administration utility and I think we're on to something: Installing FreeBSD releases, updates and derivatives like FreeNAS and pfSense can be delivered directly to a new boot environment via a ZFS send. I welcome your ideas on how this can work.
Thank you Allan Jude and Kris Moore for the recent mention of this article on BSDNow. While using the built-in NFS sharing tied to ZFS, the target boot environment must be in a directory in which bhyveload -h
can find the boot directory and kernel. In the long run, 9P will be the way to go for diskless bhyve boots.
Copyright © 2011 – 2016 Michael Dexter unless specified otherwise. Feedback and corrections welcome.