Awesome FOSS Logo
Discover awesome open source software
Launched 🚀🧑‍🚀

How to Install newer versions and bleeding edge ZFS

Categories
OpenZFS logo / Hetzner logo

tl;dr - Line by line explanation of my ansible-powered ZFS install script for use on Hetzner’s dedicated hardware (Ubuntu 20.04 - “Focal”) – it’s not perfect/minimal, but it works for me.

A while back I started using ZFS on all my bare metal dedicated hardware hosted at Hetzner to wrangle the attached HDDs and SSDs. There are lots of choices in the space (standard LVM, mdraid btrfs, etc), but I chose ZFS for it’s feaureset and ergonomics.

I won’t bore anyone with the why, but one of the things I needed to navigate was how to install newer versions of ZFS on Ubuntu 20.04 (one of the supported operating systems at Hetzner). I encountered some issues (particularly post-install) while setting up ZFS on my systems so I wanted to give a walkthrough of how I did it.

This post is a stanza-by-stanza explanation of my Ansible scripts that install ZFS.

Step 0: RTFM

Before you know if ZFS (or any filesystem) is worth switching to, you’re probably going to want to RTFM. ZFS has a lot of ins and outs and I certainly am not an expert in it, but knowing the how and why, as well as terminology and even history of the project is very important.

A few links to get you started:

Along with ZFS (which is likely the most unknown quantity here) you probably want to be familiar with Ubuntu and general systems administration. That’s quit a big area to cover but here are a few links:

It almost goes without saying but if you’re not very familiar with linux system administration at this point, you probably shouldn’t be attempting this.

Step 1: (optional) Partition your drives

I personally choose to have my disks in a somewhat custom configuration – I have RAID1 (mirroring) setup via mdraid, along with partitions on each disk that I can give to ZFS to manage. Ideally, I’d have entire disks to give to ZFS but partitions works too.

OS setup (installimage) which includes disk setup is run from Hetzner’s rescue mode and can be guided with a file like the following:

# Drive declarations used by installimage
DRIVE0 /dev/nvme0n1
DRIVE1 /dev/nvme1n1

# Enable Software RAID (for the OS disk)
SWRAID 1
SWRAIDLEVEL 1

# Bootloader (generally grub)
BOOTLOADER grub

# Machine hostame
HOSTNAME machine01

# Partition configuration
PART swap  swap 32G
PART /boot ext4 1G
PART /     ext4 128G

# This last parittion will get made (wiped & recreated as ZFS later)
PART /root-disk-remaining ext4 all

# You can specify images that Hetzner uses by accessing their network share
IMAGE /root/.oldroot/nfs/images/Ubuntu-2004-focal-64-minimal.tar.gz

This is obviously quite Hetzner-specific, but do whatever you need to do on your own systems to have partitions/drives available for ZFS to utilize.

Step 2: Hold back all the Ubuntu repository ZFS versions

Step 2.0: Hold zfsutils-linux and zfs-zed

In Ansible YAML the step looks like this:

- name: Hold all zfs-related upstream packages
  ansible.builtin.shell: |
    apt-mark hold zfsutils-linux
    apt-mark hold zfs-zed    

As of the writing of this post the version of zfs-linux in Ubuntu Focal is 0.8.3. Since our goal here is to install and keep using a newer version, we want to make sure that any apt updates do not replace our installed version with an older version.

Step 2.1: Purge ZFS upstream packages

While we’re here, let’s purge the packages as well.

- name: Purge ZFS upstream (ubuntu) packages if installed
  ignore_errors: yes
  ansible.builtin.command: |
    apt purge --allow-change-held-packages zfsutils-linux zfs-zed    

You can read the docs on apt to see what purge does – it’s similar to remove but also removes configuration files if present.

Step 3: Set up the build environment

Step 3.0: Install system dependencies for OpenZFS

You can install the dependencies for OpenZFS like so:

- name: Install requirements for building ZFS
  ansible.builtin.apt:
    name: "{{ packages }}"
    update_cache: yes
    state: present
  vars:
    packages:
      - build-essential
      - autoconf
      - automake
      - libtool
      - gawk
      - alien
      - fakeroot
      - dkms
      - libblkid-dev
      - uuid-dev
      - libudev-dev
      - libssl-dev
      - zlib1g-dev
      - libaio-dev
      - libattr1-dev
      - libelf-dev
      # The line below calls for the output of `uname -r`, for example "5.16.5-arch1-1" for my arch system
      # so the line would resolve to something like "linux-headers-5.16.5-arch1-1" (not valid on Ubuntu of course, but as an example)
      - linux-headers-{{ uname_r.stdout }}
      - python3
      - python3-dev
      - python3-setuptools
      - python3-cffi
      - libffi-dev
      - python3-packaging
      - git
      - libcurl4-openssl-dev

There may be a few dependencies that aren’t strictly necessary but just about all of them should be required.

Step 3.1: Create /opt/zfs

Here is ZFS:

- name: Create /opt/zfs
  ansible.builtin.file:
    path: /opt/zfs
    state: directory

Step 3.2: Download or copy in ZFS source code

If you want to download ZFS:

- name: Download zfs
  ansible.builtin.get_url:
    url: "https://github.com/openzfs/zfs/releases/download/zfs-{{ _zfs_version }}/zfs-{{ _zfs_version }}.tar.gz"
    checksum: "{{ _zfs_tarball_sha256_checksum }}"
    mode: 0755
    dest: "/opt/zfs/zfs-{{ _zfs_version }}.tar.gz"

Here the substitution {{ _zfs_version }} ( {{ is the Ansible templating syntax) is 2.1.1. You’ll also want to do the download yourself and collect/generate a checksum to use. Never download stuff from the internet that isn’t supposed to change without a checsum!

And if you want to copy it in from the computer where Ansible is running (this is what I do):

- name: Copy zfs package (avoid rate limit)
  ansible.builtin.copy:
    src: "../../files/zfs/zfs-{{ _zfs_version }}.tar.gz"
    mode: 0755
    dest: "/opt/zfs/zfs-{{ _zfs_version }}.tar.gz"

Regardless of how you choose to get the source you’ll need to unzip it of course:

- name: Unzip zfs code
  ansible.builtin.unarchive:
    src: "/opt/zfs/zfs-{{ _zfs_version }}.tar.gz"
    dest: "/opt/zfs"
    remote_src: yes

Step 4: Build the code

Step 4.0: Start the build process

To start the build process:

- name: Setup and Build ZFS
  ansible.builtin.shell: |
    ./autogen.sh
    ./configure
    make clean
    make -j    
  args:
    chdir: "/opt/zfs/zfs-{{ _zfs_version }}"

If you’re familiar with building things from source, various projects use this common toolset (some variant of make, autogen and configure scripts). As you might expect this might take a while so give it some time.

Step 4.1: Install ZFS

After building ZFS

- name: Install ZFS
  ansible.builtin.shell: |
    make install    
  args:
    chdir: "/opt/zfs/zfs-{{ _zfs_version }}"

Normally you’d think we’d be done right here, but this post exists because doing this is generally not enough! Let’s press onwards.

Step 4.2: Forceful unload & reload of ZFS modules, with helper install

First force the unload of the ZFS module(s) if they are running:

- name: Force unload of ZFS module(ss)
  ansible.builtin.shell: |
    ./scripts/zfs.sh -u    
  args:
    chdir: "/opt/zfs/zfs-{{ _zfs_version }}"

It would be ideal to not be running any workloads when this is happening, as you might expect.

While the modules are unloaded, we can instlal some helpers:

- name: Post-Install ZFS Helpers
  ansible.builtin.shell: |
    ./scripts/zfs-helpers.sh -i    
  args:
    chdir: "/opt/zfs/zfs-{{ _zfs_version }}"

After the helpers are installed, let’s force reload the ZFS module:

- name: Force reload of ZFS module
  ansible.builtin.shell: |
    ./scripts/zfs.sh    
  args:
    chdir: "/opt/zfs/zfs-{{ _zfs_version }}"

Step 4.3: Build the ZFS module

I’ve found that building and using the deb module (the format used by Debian for package installation) helped as well with getting the installation to stick and not be replaced by Ubuntu package defaults.

First build the ZFS deb package from the source code:

- name: Build ZFS deb package
  ansible.builtin.shell: |
    make deb    
  args:
    chdir: "/opt/zfs/zfs-{{ _zfs_version }}"

And then install it

- name: Install ZFS deb packages
  ansible.builtin.shell: |
    yes | dpkg -i --force-overwrite ./*.deb
    apt install -f -y ./*.deb    
  args:
    chdir: "/opt/zfs/zfs-{{ _zfs_version }}"

Theoretically this step should not be necessary (or should be used alone), as we’ve already run an install process, but I’ve found that when doing one or the other I’d have situations where restarts would prompt an older version of ZFS to be used (despite the apt purge) – particularly at the kernel level.

Step 5: Kernel install

Step 5.0: Enable in current kernel via modprobe

If you want to enable the ZFS module that was installed immediately, use modprobe:

- name: Modprobe zfs module
  block:
    - name: Install zfs kernel module
      community.general.modprobe:
        name: zfs
        state: present

Step 5.1: Ensure locally installed kernel modules are read before others

Installing ZFS means installing a kernel module, but since we haven’t quite baked it in via Dynamic Kernel Module Support, we need to enable locally installed kernel modules to be used:

- name: Ensure extra is in front
  ansible.builtin.lineinfile:
    path: /etc/modules-load.d/modules.conf
    regexp: '^search'
    line: "search extra updates ubuntu built-in"
    state: present

What this does is ensure that the file that manages the kernel module search path /etc/modules-load.d/modules.conf has a line in it that has search extra updates ... specified. Normally there is at least one line that starts with search, and what we want to do is make sure extra module locations are searched early in the process.

Step 5.2: Perform the DKMS install

We’ll want to install via the DKMS subsystem:

- name: dkms install zfs
  ansible.builtin.shell: |
    dkms install zfs/{{ _zfs_version }}    
  args:
    chdir: "/opt/zfs/zfs-{{ _zfs_version }}"

At this point, you should be able to check the current active ZFS version, and it should show you something like this:

root@machine01 ~ # zfs version
zfs-2.1.1-1
zfs-kmod-2.1.1-1

Step 6: Post installation settings changes

Step 6.0: Enable ZFS SystemD units

There are a few SystemD units (as usual Digital Ocean has some great docs) that are created but need to be enabled for ZFS to use:

- name: Ensure zfs-related systemd units are enabled
  block:
    - ansible.builtin.systemd:
        name: "{{ item }}"
        state: started
        enabled: yes
      loop:
        - zfs-import-cache.service
        - zfs-import.target
        - zfs-mount.service
        - zfs-share.service
        - zfs-zed.service
        - zfs-volume-wait.service
        - zfs.target

The variables being looped over are akin to running systemd start <unit> and systemd enable <unit>.

Step 6.1: (optional) Prevent kernel upgrades

The security-minded reader at home is no doubt cringing into the atmosphere by now, but making sure my kernel upgrades are manual is the only way I’ve found to ensure that the installed version of ZFS was not disabled/unexpectedly altered/broken by kernel upgrades:

- name: Hold all kernel upgrades to prevent custom built ZFS from doing fallback
  ansible.builtin.shell: |
    apt-mark hold linux-headers-generic
    apt-mark hold linux-image-generic
    apt-mark hold {{ uname_r.stdout }} # you'll need that `uname -r` output again here    

I personally prefer to hold back any kernel upgrades and instead perform them at machine setup rather than during operation. DKMS should ensure that with the building of any new kernel the ZFS code is rebuilt but it’s been flaky in the past so I find it hard to trust.

Step 6.2: Add kernel module load configuration

Along with all the kernel changes we’ve made so far (good and questionable), one thing we’ll want to do is add the configuration necessary to ensure the kernel module is loaded:

- name: Ensure zfs kernel module comes up with next restart
  tags: [ "zfs:post-install:config" ]
  block:
    - name: Add zfs module load file
      ansible.builtin.template:
        src: ../../templates/kernel-modules/zfs.conf.j2
        dest: /etc/modules-load.d/zfs.conf
        owner: root
        group: root
        mode: 0644

Step 7: Setup your ZPools and Datasets

Now that we have ZFS installed (restart once to check!), I’m going to leave the rest of the cluster setup to you. There are a lot of ways to use ZFS and setup zpools (RAID0/1/5/Z, etc) as well as the devices that utilize that storage (datasets, ZVOLs).

Actually using ZFS properly is out of scope (I can only hope I’ve covered installing it properly, at least), but please refer to the usual manuals there to set up your ZFS pool and storage requirements. Trying to cover even the basics of how to setup ZFS for your drives once it’s been installed is certainly a lot of reading so we’ll leave it there for today.

This is the part where you “draw the rest of the owl” (sorry).

Extra: Support modifying txg_timeout

While experimenting with some write modes for Postgres on ZFS, I looked into an optimization modes that involved reducing the txg_timeout from it’s default of 5 seconds to 1 second. While I won’t get into the tradeoffs implied by that move (please read my post on Postgres + ZFS), here is the setup that I’ve used for modifying txg_timeout per-machine (some machines might use the optimization some might not):

- name: Add systemd unit to support txg timeout customization
  tags: [ "zfs:post-install:config" ]
  block:
    - name: Set current ZFS txg_timeout (to 1)
      ansible.builtin.shell: |
        echo 1 > /sys/module/zfs/parameters/zfs_txg_timeout        
    - name: install zfs-config-txg-timeout service
      ansible.builtin.template:
        src: ../../templates/zfs-config-txg-timeout.service.j2
        dest: /etc/systemd/system/zfs-config-txg-timeout.service
        owner: root
        group: root
        mode: 0644
    - name: start & enable zfs-config-txg-timeout service
      ansible.builtin.systemd:
        name: zfs-config-txg-timeout.service
        state: started
        enabled: yes
        daemon_reload: yes

The template that goes with that looks like this:

[Unit]
Description=Set ZFS txg_timeout
After=zfs.target
ConditionPathIsDirectory=/sys/module/zfs

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/bash -c 'echo {{ _zfs_txg_timeout_seconds | default(zfs_txg_timeout_seconds) }} > /sys/module/zfs/parameters/zfs_txg_timeout'

# Not needed since we want it to always be whatever the setting is
# ExecStop=/usr/bin/bash -c 'echo {{ _zfs_txg_timeout_seconds | default(zfs_txg_timeout_seconds) }} > /sys/module/zfs/parameters/zfs_txg_timeout'

[Install]
WantedBy=multi-user.target

Obviously you’ll want to define {{ _zfs_txg_timeout_seconds }} and the zfs_txg_timeout_seconds. In general, this will make sure that the txg_timeout is set to what you want it to be upon startup after ZFS has started.

Wrapup

Wasn’t that easy? If you’re answering no, it wasn’t easy for me to figure out either! I spent a lot of time wondering why DKMS wasn’t working properly (which is why the hack of preventing kernel upgrades is still included) and dealing with other issues. The OpenZFS documentation is pretty great but seems to be missing a guide somewhat like this, so hopefully this post will save people some time going forward when trying to experiment with new ZFS setups on Hetzner.

Another thing I’m hoping for with this post is to be corrected – if you see something glaringly wrong in my setup, please reach out.

The more people can use low cost infrastructure providers like Hetzner, the more cool software we get and the more interesting/innovative products can be built. To that end I’m working on NimbusWS (I’m behind on this but should launch by end of Q1 2022) – if you’re interested please sign up for the service and kick the tires (free tier usage will be available).