K8s storage provider benchmarks round 2, part 1

Categories
Kuberentes logo + OpenEBS logo / Rook logo / LINBIT logo

tl;dr - I did another round of drive testing (originally I only tested OpenEBS and hostPath), this time with some rented Hetzner machines and Ansible-powered automation. The GitLab repository isn’t ready for mass consumption yet but I’ll update here (and this tl;dr) when it is, along with the results.

UPDATE (04/09/2020)

The GitLab repository is up! You can skip this entire article and just go there.

NOTE: This a multi-part blog-post!

  1. Part 1 - Intro & Cloud server wrangling (you are here)
  2. Part 2 - Installing storage plugins
  3. Part 3 - Installing more storage plugins
  4. Part 4 - Configuring the tests
  5. Part 5 - The results

A bit of context to get us started

Any deployment platform (Nomad, Kubernetes, AWS) needs to give you three things generally – computer, network, and storage. Yes, they give you “tasks”, “instances”, “VPCs”, “ELBs” and all these other terms but at the core, they must provide you with a way to run the program you’ve written, with access to the network so you have the option of talking to the outside world and storage so that you have the option of writing out the results of your hard work. Today we’re going to focus on the storage bit of building cloud platforms with Kubernetes (sorry no Nomad or AWS talk today), and how my personal favorite solutions (and one newcomer) perform against one another.

Which solutions am I going to compare?

The solutions that I’m going to compare are:

I like these solutions in the k8s landscape because they “scale” from hobbyist to enterprise. They’re not the fastest solutions out there – that would probably be hardware RAID, administered by appropriately bearded sysadmins in your local data center. What they do offer is the functionality you’d expect from a resilient cloud platform, and the ability to run that technology locally. Ceph is well known, well documented open source software used by large organizations ([if it’s good enough for CERN][ceph-cern] it’s definitely good enough for your homelab or 10/100/1000-customer SaaS company), and OpenEBS is a newcomer but introduced a cloud-first approach that was the easiest to deploy and use on Kubernetes by far. OpenEBS has been refining their solution and building exciting new solutions before operators were called operators, and are trusted by some very interesting companies – see the “social proof” on their landing page. I point out that it’s “social proof” not to malign OpenEBS – they’re used by some impressive organizations and I support them wholeheartedly, it’s amazing software that is being given away for free – but just as a tidbit for anyone who’s not aware why companies post sections like that on the landing page.

What do I currently run?

I run OpenEBS myself, but run such an old version (v0.8.0) that the documentation isn’t even listed anymore – I started writing a post on how I was going to backup and transfer my drives from one version to another with downtime by stopping containers, hooking up other containers doing some rsyncing, etc – but I figured it was worth taking another look at the performance characteristics of the options in the space. There are ~5 ways to actually deploy OpenEBS (essentially different backends that your PVCs will run on) and Rook deserved to be tested alongside. The thing about Rook that’s made it hard to test in the past is that it needs complete control over a local disk. Since I run on “bare metal” (a confusing term which means you have full control of a physical machine, not a VM on someone’s physical machine) this isn’t hard to come by, but I almost exclusively use Hetzner and they give you two disks but pre-installed with mdadm-administered software RAID. In the past I found out rather painfully that any installs of grub would clobber the disassembled RAID and render the Hetzner dedicated server unbootable, so I opted to go with OpenEBS running on the undisturbed software RAID – this is what birthed the post doing a comparison with hostPath.

Why LINSTOR?

LINSTOR is the newcomer this time which unlike the others I haven’t ever set up or used before now, but I think it’s worth checking out because it is built on some well-grokked F/OSS graybeard tech that is usually just quietly good and purring along powering an unreasonable amount of systems. Usually if something is widely used but derided/has some weird orders you see people complain about it a lot, if not in half-jest. It’s really hard to navigate the LINSTOR site and find your way to the code with how enterprise-ready ™ and with how confusingly the site is laid out, but don’t let that discourage you.

OK well let’s get to it – this time I’m going to aim a bit higher for my analysis on the reproducability (i.e. proper science & engineering) side of things – the general plan is to write some code that will rent a machine from Hetzner Cloud, install our storage plugin of choice, do all the testing we want and report results in a fully automated manner.

RTFM

Before we get started, there is a lot to read up on and (re-)introduce yourself to:

And for the particular technologies we’ll be using:

I’ve been familiar with these concepts for a long time so I can’t imagine what it’d be like to be thrown into this today suddenly and have to read all of it to get a firm grasp on what’s about to happen but remember there’s no shame in skimming if your eyes start to glaze over. For most of these things having a basic understanding of how they work and what pieces are involved is enough, if you’re going to get the hands-on experience with them in the near future anyway.

The plan

Before I start projects I like to organize my thoughts in to a general plan with the simple steps I need to accomplish – the idea of how we can do one set of drive testing is pretty simple, but achieving 100% automated runs is not so simple. Let’s lay out what we need for one run before things start getting really complicated:

  1. Provision a machine on Hetzner Cloud (let’s call this $SERVER)
  2. Get $SERVER’s IP and use ansible to set up the server for use with our storage plugin of choice, let’s call this $STORAGE_PLUGIN (ex. installing iscsi for some OpenEBS variants)
  3. Set up Kubernetes on $SERVER (we’ll probably be using k0s here)
  4. Set up $STORAGE_PLUGIN on $SERVER (required DaemonSets, Deployments, StorageClasses, CRDs etc)
  5. Run one or more $TESTs on $STORAGE_PLUGIN
  6. Copy the results of the Job to the repository with some good naming scheme (ex. <single-node|cluster>_<plugin>_<plugin subtype/detail>_<pvc size>.json)

Let’s spend some time going into various holes not-well-understood areas of this plan.

Why k0s (as opposed to k3s)?

I really like the idea of a single binary for running Kubernetes, I think that’s actually one of the things that could be simplified about it (I’ve written about some other ideas for a competing platform), and I personally like just about all of k0s’s choices:

Another big thing that drew to the project is the involvement of Natanael Copa – the creator of Alpine Linux, he is absolutely prolific and I remember just running into his name all over Alpine Linux package lists (which makes sense of course) and almost implicitly trusting software he’s worked on. I actually never tried to even find details about him until this post (I watched a nice interview with him), and I’m only more impressed. Alpine Linux is obviously the result of the work of lots of contributors and tireless (and tired) open source developers and I think he’s run the project well, and has been insanely generaous with his time and skills to the open source community (and the large number of corporations that rely on his distribution. I’d venture to guess that most people get their first introduction to true static binaries with Alpine (because half of the time the easiest way to get there is to just run an alpine container and do your build there). Anyway, I try not to take part in cults of personality so I’ll stop gushing there.

There’s been some criticism of k0s from competing projects and to be honest I thought it was kind of in bad taste. Binary size is such a silly measurement that when I see it come up I dismiss the person (in my head) almost offhand – I will happily give 200MB of disk space for a single binary that will turn a random server into a flexible, directable member of a resource pool. Windows installs are ~20GB (right? I haven’t checked in a while actually), a Ubuntu install needs a minimum of 1.1GB of disk space, disks are getting bigger all the time. If I really needed the space I could move to a container linux distro (ex. Flatcar linux, Fedora CoreOS) and save myself lots of space. Anyway, I digress.

In general, the landing page and documentation of k0s also speaks to me much more than the landing page and documentation for k3s. I do not need Edge, IoT, CI (???), or Embedded use cases – I just need a Kubernetes distro that is low maintenance and runs on servers in the cloud. In more specific terms the following differences do it for me versus k3s:

  • Calico > Flannel + kube-router (kube-router is a fantastic project which I run right now, but I’ve been getting wandering eyes in calico/cilium’s direction lately)
  • Klipper-LB looks cool but I don’t generally need it (I usually just set my ingress controller the inlet and am off to the races)
  • I don’t want helm-controller pre-installed (though I realize that most people want this, I’m still against Helm usage and need to find some time to evaluate Helm 3 and maybe change my mind)
  • I don’t want local-path-provisioner pre-installed

Yes, you can swap out or change all of these, but why bother when k0s just about all the choices the way I want from the get-go, doesn’t make choices in the places I may disagree, and the two projects are very similar. I may end up missing k3s’s helpful inclusion of many host utilities (iptables/nftables, ethtool, etc) but we’ll see about that.

I need raw disk access – should I used Hetzner Cloud for this?

One of the big problems with the current plan is that I’ll need to have access to one or more (as many as the machine has) raw disks. This is really only possible on Hetzner dedicated servers (AFAIK) and can’t really be done manually because every time the server is wiped it needs to be put into recovery mode and restarted again. In addition to going into recovery mode and restarting, I’m going to have to undo the software RAID for the given machine as well, to get access to the “raw performance” of the underlying drives without the added benefits of duplication.

Linode?

Maybe I shouldn’t use Hetzner for this and should instead use Linode? Well actually I can’t do that either, because it looks like Linode doesn’t offer nested virtualization support – I should have realized this when the pricing page said dedicated CPU. Seems like Linode Bare Metal is still “coming soon” as well. Linode does have a way to use custom images though which is cool, but not much help if virtualization is going to be degraded.

It looks like Linode also has had some trouble with the consistency of the disk I/O on their machines:

I don’t want these issues to possibly affect the quality of the results so I’ll skip Linode for this

DigitalOcean?

Digital Ocean isn’t in japan yet as far as I can tell:

Though they did just go public, so maybe they will soon. They support using custom virutal disk images for droplets which is pretty cool as well, but not going to be able to help me on this particular quest.

OK, Hetzner – but Robot, not Cloud

So it looks like in general [I|P]aaS offerings should be out – I don’t want to deal with the possibility of some provider-optimized storage solution inconsistency to threaten the benchmark. Avoiding even Hetzner Cloud in favor of Hetzner Robot should give me the best chance at unfettered access to the disks on the machines I use.

Unfortunately this means one of the first things I’m going to have to do is write Ansible roles that can automate the Rescue System and disassembly of a Hetzner machine’s software RAID, probably by performing some crafted web requests or running some puppeteer scripts. See? Things are already getting complicated!

So in the end looks like I’ll have to stick with Hetzner dedicated servers, which means I won’t be able to spin up new machines quite as easily as I thought – the machine(s) will have to be more pets than cattle.

A note on Loopback/virtual disks

One idea I had early on writing these notes (and trying to figure out how to run this experiment was whether or not to use loopback/virtual disks as the basis of the experimentation. It’s massively simpler to create a bunch of /dev/disks/virtual-[1-10].img files and mount them as disks but the issue and seeming performance variability is something I just don’t want poisoning my analysis. Technically, it would poison all the analysis in the same manner but I just don’t want to have to be checking what kind of writes and sync patterns the workloads are using – I want the tests to reflect what a naive installation (i.e. most installations, at least initially) would do.

Some approaches (namely OpenEBS) also use virtual disks and the creation of a virtual disk inside a virtual disk is expected to create some artificial slowdowns that I want to avoid as well.

What kind of things should I $TEST?

There are a lot of tests we could possibly run, here are the ones I want to make possible:

In the past I used dd, iozone, sysbench, but this time I’m going to only use fio (which is what longhorn/dbench and it’s predecessor, leeliu/debnch use). Along with simply testing the drives I want to make sure I run some tests that are somewhat close to the interesting things I tend to do with the PVCs and one of those is definitely “run databases” – in my case, Postgres whenever it’s an option or I have a choice – pgbench and oltpbench wil help me accomplish that strain of testing.

I’m choosing right now to have all this orchestrated with make. There are a lot of options (scripting languages, things I’ve build in the past, etc) on how I could run these tests, but I think I can get away with the least amount of time spent building automation by writing some Makefile scripts with the right amount of for loops and clever-but-not-too-clever $VARIABLEs. I’m going to need to manage disparate tools (ansible, kubernetes, etc) which are written in different languages and Makefiles are excellent glue when they’re not too complicated. I very much want to be able to end up with a flow like this:

$ export CLOUD=hetzner
$ export STORAGE_PLUGIN=openebs-mayastor
$ make provision k8s-install test-dbench cleanup

Setting up the $SERVER

Writing an Ansible playbook to perform Hetzner resets and disable Software RAID

Here’s our first side quest, made necessary since we’re going to be using Hetzner dedicated machines, I need to write a reusable way to:

  • trigger rescue mode
  • disable software RAID
  • (optional) install a custom server image (with properly configured networking)
  • disable grub updates for applicable OSes so the server doesn’t brick
  • reset the machine

All of this is possible via the online “Robot” interface but unfortunately not possible through the Cloud API (which is to be fair, for Hetzner Cloud). Looks like I’ll have to string together some curl and/or some puppeteer/playwright scripting. Luckily for me it turns out there’s a Robot Webservice API and all I needed was to use ansible.builtin.uri – no crazy automation here.

Here’s what the server-reset Makefile target looks like:

server-reset: require-target
ifeq ("true",$(SKIP_SERVER_RESET))
    @echo "[info] SKIP_SERVER_RESET=true, skipping server reset step..."
else
    @echo "[info] resetting server @ [$(TARGET)]..."
    @$(ANSIBLE_PLAYBOOK) server-reset.yml \
        -i $(INVENTORY_PATH)  \
        --limit=$(TARGET) \
        $(ANSIBLE_ARGS)
endif

Here’s the playbook, server-reset.yaml:

#
# Playbook for resetting a (hetzner) server
#
# Before this playbook is run, the server is expected to be in a state where ansible is
# functional and usable (after fresh machine allocation or after some other run has finished), and normally accessible.
#
# The installimage bits are heavily inspired by
# https://github.com/andrelohmann/ansible-role-hetzner_installimage
#
---
- name: k8s-setup all-in-one server setup
  hosts: "{{ ansible_limit | default(omit) }}"
  remote_user: root
  # pre-server reset ansible is expected to be usable
  # (*should* also work if this playbook is run while machine is in rescue mode)
  gather_facts: yes
  vars:
    public_key: ~/.ssh/id_rsa.pub
    # host key for the machine is going to change during this playbook
    # (during rescue mode entry)
    # so we need to disable strict host key checking
    ansible_ssh_common_args: |
      -o StrictHostKeyChecking=no -o userknownhostsfile=/dev/null

    known_hosts_path: ~/.ssh/known_hosts

    # Hetzner install image options
    hetzner_installimage_drives:
      - DRIVE1 /dev/sda
      - DRIVE2 /dev/sdb
    hetzner_installimage_raid:
      - SWRAID 0
      - SWRAIDLEVEL 0 # doesn't matter if SWRAID is 0
    hetzner_installimage_hostname: "{{ inventory_hostname.split('.') | first }}"
    hetzner_installimage_partitions:
      # We'll be running with swap even though kubernetes suggests against it.
      - PART swap swap 32G
      - PART /boot ext4 1G
      - PART / ext4 30G
      - PART /root-disk-remaining ext4 all
    hetzner_installimage_image: /root/.oldroot/nfs/images/Ubuntu-2004-focal-64-minimal.tar.gz
    hetzner_installimage_bootloader: grub

  tasks:
    #################
    # Machine reset #
    #################

    - name: Ensure required required variables are provided
      ansible.builtin.fail:
        msg: "hetzner_webservice_(username|password) variables are empty/not defined"
      no_log: true
      when: 'item == ""'
      with_items:
        - "{{ hetzner_webservice_username }}"
        - "{{ hetzner_webservice_password }}"

    - name: Generate local SSH key fingerprint
      delegate_to: localhost
      shell: "ssh-keygen -E md5 -lf ~/.ssh/id_rsa.pub | cut -d' ' -f2 | cut -d':' -f2-"
      register: ssh_key_fingerprint

    - name: Trigger rescue mode on server
      ansible.builtin.uri:
        method: POST
        url: "https://robot-ws.your-server.de/boot/{{ hostvars[inventory_hostname]['ansible_env'].SSH_CONNECTION.split(' ')[2] }}/rescue"
        user: "{{ lookup('env', 'HETZNER_WEBSERVICE_USERNAME') | default(hetzner_webservice_username) }}"
        password: "{{ lookup('env', 'HETZNER_WEBSERVICE_PASSWORD') | default(hetzner_webservice_password) }}"
        status_code: [200, 409] # 409 returned when it is already set
        body_format: form-urlencoded
        body:
          os: linux
          arch: 64
          authorized_key:
            - "{{ ssh_key_fingerprint.stdout }}"

    ###############
    # Rescue mode #
    ###############

    - name: Restart server to enter rescue mode
      delegate_to: localhost
      ansible.builtin.uri:
        method: POST
        url: "https://robot-ws.your-server.de/reset/{{ hostvars[inventory_hostname]['ansible_env'].SSH_CONNECTION.split(' ')[2] }}"
        user: "{{ hetzner_webservice_username }}"
        password: "{{ hetzner_webservice_password }}"
        status_code: [200, 503] # 503 is returned when rescue mode is already enabled
        body_format: form-urlencoded
        body:
          type: hw # "power button"

    - name: Wait for SSH to come back up (enter rescue mode) -- this will take a while
      ansible.builtin.wait_for_connection:
        sleep: 10 # wait 30 seconds between checks
        delay: 30 # wait 30 seconds by default
        timeout: 600 # 10 minutes max wait

    - name: Get MOTD from rescue system shell
      shell: cat /etc/motd
      register: motd_output

    - name: Ensure MOTD is expected value for Hetzner
      fail:
        msg: MOTD not similar to Hetzner rescue mode MOTD
      when: motd_output.stdout.find('Welcome to the Hetzner Rescue System.') == -1

    ###################
    # Drive discovery #
    ###################

    - name: Automatically determine first disk device
      shell: |
        lsblk | grep disk | awk '{split($0, a, " "); print a[1]}' | sort | head -1 | tail -1
      register: first_disk_device

    - name: Print first disk
      debug:
        var: first_disk_device.stdout

    - name: Automatically determine second disk device
      shell: |
        lsblk | grep disk | awk '{split($0, a, " "); print a[1]}' | sort | head -2 | tail -1
      register: second_disk_device

    - name: Print second disk
      debug:
        var: second_disk_device.stdout
      when: second_disk_device.stdout != ""

    - name: Set hetzner_installimage_drives for single disk ({{ first_disk_device.stdout }})
      set_fact:
        hetzner_installimage_drives:
          - DRIVE1 /dev/{{ first_disk_device.stdout }}

    - name: Set hetzner_installimage_drives for multiple disks
      set_fact:
        hetzner_installimage_drives:
          - DRIVE1 /dev/{{ first_disk_device.stdout }}
          - DRIVE2 /dev/{{ second_disk_device.stdout }}
      when: second_disk_device.stdout != ""

    ########################
    # Hetzner installimage #
    ########################

    - name: Copy current authorized keys into file for installimage
      shell: |
        /usr/bin/tail -1 /root/.ssh/authorized_keys > /root/ssh-authorized-keys

    - name: Create installimage utility configuration file
      template:
        src: installimage.j2
        dest: /autosetup
        owner: root
        group: root
        mode: 0644

    - name: Run Hetzner installimage
      command: |
        /root/.oldroot/nfs/install/installimage -g -K /root/ssh-authorized-keys
      register: installimage_result

    - name: Remove server entries from known_hosts file ({{ hostname }})
      tags: [ "known_hosts:delete" ]
      delegate_to: localhost
      ansible.builtin.known_hosts:
        path: "{{ known_hosts_path }}"
        name: "{{ hostname }}"
        state: absent
      vars:
        hostname: "{{ inventory_hostname }}"

    - name: Remove server entries from known_hosts file ({{ ip }})
      tags: [ "known_hosts:delete" ]
      delegate_to: localhost
      ansible.builtin.known_hosts:
        path: "{{ known_hosts_path }}"
        name: "{{ ip }}"
        state: absent
      vars:
        ip: "{{ hostvars[inventory_hostname]['ansible_env'].SSH_CONNECTION.split(' ')[2] }}"

    - name: Reboot the machine to get out of rescue mode
      ansible.builtin.reboot:

    - name: Wait for SSH to the new server to be up
      connection: local
      wait_for:
        host: '{{ inventory_hostname }}'
        search_regex: OpenSSH
        delay: 10
        port: 22

    - name: Add server entries from known_hosts file ({{ hostname }})
      tags: [ "known_hosts:add" ]
      delegate_to: localhost
      ansible.builtin.known_hosts:
        path: "{{ known_hosts_path }}"
        name: "{{ hostname }}"
        state: present
        key: "{{ lookup('pipe', 'ssh-keyscan {{ hostname }}') }}"
      vars:
        hostname: "{{  inventory_hostname }}"

    - name: Add server entries from known_hosts file ({{ ip }})
      tags: [ "known_hosts:add" ]
      delegate_to: localhost
      ansible.builtin.known_hosts:
        path: "{{ known_hosts_path }}"
        name: "{{ ip }}"
        state: present
        key: "{{ lookup('pipe', 'ssh-keyscan {{ ip }}') }}"
      vars:
        ip: "{{ hostvars[inventory_hostname]['ansible_env'].SSH_CONNECTION.split(' ')[2] }}"

In a futile effort to keep this short I’m going to be listing only the names of tasks and including the source code where appropriate. I have left in anything that was interesting though, like the SSH hostkey setting stuff at the top and the install image parameters (huge thanks to andrelohmann/ansible-role-hetzner_installimage). The playbook takes a bit of time to run, but it’s repeatable and automated which is great.

Since deciding to release this post in parts, since the repo isn’t ready yet I’ll just post the contents of the files in their entirety and take out the GitLab links.

Miscellaneous setup

In addition to just getting a fresh install of Ubuntu 20.04 on the machine, I wanted/needed to do some basic setup and a tiny bit of server hardening. Here’s a look at what those playbooks look like:

pre-ansible-setup.yaml

#
# Play for executing tasks to set up ansible on a server that doesn't
# necessarily have Python/ansible requirements installed
#
---
- name: pre-ansible setup
  hosts: "{{ ansible_limit | default(omit) }}"
  remote_user: root
  gather_facts: no # would fail since python may not necessarily be installed
  vars:
    username: ubuntu
    public_key: ~/.ssh/id_rsa.pub
  tasks:
    - name: "Set hostname ({{ generated_hostname }})"
      ansible.builtin.hostname:
        name: "{{ generated_hostname }}"
      vars:
        generated_hostname: "{{ inventory_hostname.split('.') | first }}"

    #####################
    # Passwordless Sudo #
    #####################

    - name: check for passwordless sudo
      raw: "timeout 1s sudo echo 'check'"
      register: passwordless_sudo_check
      ignore_errors: yes
      no_log: true

    - name: create admin group
      when: passwordless_sudo_check["rc"] != 0
      raw: |
        echo {{ ssh_initial_password }} | sudo -Ss &&
        sudo groupadd admins --system || true

    - name: add user to admin group
      when: passwordless_sudo_check["rc"] != 0
      raw: |
        echo {{ ssh_initial_password }} | sudo -Ss &&
        sudo usermod -a -G admins {{ ssh_user }}

    - name: copy sudoers file, make temporary editable
      when: passwordless_sudo_check["rc"] != 0
      raw: |
        echo {{ ssh_initial_password }} | sudo -Ss &&
        sudo cp /etc/sudoers /etc/sudoers.bak &&
        sudo cp /etc/sudoers /etc/sudoers.tmp &&
        sudo chmod 777 /etc/sudoers.tmp

    - name: add admins no passwd rule for sudoers file
      when: passwordless_sudo_check["rc"] != 0
      raw: |
        echo {{ ssh_initial_password }} | sudo -Ss &&
        sudo echo -e "\n%admins ALL=(ALL:ALL) NOPASSWD:ALL" >> /etc/sudoers.tmp &&
        sudo chmod 440 /etc/sudoers.tmp

    - name: check and install new sudoers
      when: passwordless_sudo_check["rc"] != 0
      raw: |
        echo {{ ssh_initial_password }} | sudo -Ss &&
        sudo visudo -q -c -f /etc/sudoers.tmp &&c
        sudo cp -f /etc/sudoers.tmp /etc/sudoers

    ###################
    # Ansible install #
    ###################

    - name: check for installed ansible (apt)
      register: ansible_check
      ignore_errors: yes
      no_log: true
      shell: |
        dpkg -s ansible

    # see: https://stackoverflow.com/questions/33563425/ansible-1-9-4-failed-to-lock-apt-for-exclusive-operation
    - name: (apt) Ensure apt list dir exists
      when: ansible_check["rc"] != 0
      file:
        path: /var/lib/apt/lists/
        state: directory
        mode: 0755

    - name: (apt) Install software-properties-common
      when: ansible_check["rc"] != 0
      ansible.builtin.apt:
        update_cache: yes
        name:
          - software-properties-common

    - name: (apt) Enable universe repository
      become: yes
      when: ansible_check["rc"] != 0
      ansible.builtin.command: add-apt-repository universe

    - name: apt-get install software-properties-common
      when: ansible_check["rc"] != 0
      ansible.builtin.apt:
        name:
          - software-properties-common

    - name: add apt repo for ansible
      when: ansible_check["rc"] != 0
      shell: |
        apt-add-repository -y ppa:ansible/ansible

    - name: apt-get update and install ansible
      when: ansible_check["rc"] != 0
      ansible.builtin.apt:
        update_cache: yes
        name:
          - ansible

post-ansible-setup.yaml

#
# Play for executing tasks to set up ansible on a server that doesn't
# necessarily have Python/ansible requirements installed
#
---
- name: post-ansible setup
  hosts: "{{ ansible_limit | default(omit) }}"
  remote_user: root
  gather_facts: yes
  tasks:
    - name: Install generally useful packages
      become: yes
      ansible.builtin.apt:
        name: "{{ packages }}"
        update_cache: yes
        state: present
      vars:
        packages:
        - make
        - libseccomp2
        - apt-transport-https
        - ufw

    - name: Enable UFW, default reject
      tags: [ "ufw" ]
      become: yes
      community.general.ufw:
        state: enabled
        policy: reject

    - name: (ufw) Allow SSH access
      tags: [ "ufw" ]
      become: yes
      community.general.ufw:
        rule: allow
        name: OpenSSH

    - name: (ufw) Limit SSH
      tags: [ "ufw" ]
      become: yes
      community.general.ufw:
        rule: limit
        port: ssh
        proto: tcp

    ###################
    # Default Tooling #
    ###################

    - name: Install performance measurement tooling
      ansible.builtin.apt:
        name: "{{ packages }}"
        update_cache: yes
        state: present
      vars:
        packages:
        - iotop
        - htop
        - sysstat # contains iostat and others
        - unattended-upgrades

    - name: Install some creature comforts
      ansible.builtin.apt:
        name: "{{ packages }}"
        update_cache: yes
        state: present
      vars:
        packages:
        - tree

    #################################
    # Unatttended security upgrades #
    #################################

    - name: Install performance measurement tooling
      ansible.builtin.apt:
        name: "{{ packages }}"
        update_cache: yes
        state: present
      vars:
        packages:
        - unattended-upgrades

    - name: Ensure unattended upgrades service is running
      ansible.builtin.systemd:
        name: unattended-upgrades
        state: started

    - name: Install unattended upgrade config w/ blacklist and email notifications
      template:
        src: 50-unattended-upgrades.conf.j2
        dest: /etc/apt/apt.conf.d/50-unattended-upgrades.conf
        owner: root
        group: root
        mode: 0644

    - name: Install automatic upgrades
      template:
        src: 20-auto-upgrades.conf.j2
        dest: /etc/apt/apt.conf.d/20-auto-upgrades.conf
        owner: root
        group: root
        mode: 0644

    - name: Hold all grub-related packages
      shell: |
        apt-mark hold grub*
        apt-mark hold grub
        apt-mark hold grub-common
        apt-mark hold grub2
        apt-mark hold grub2-common
        apt-mark hold grub-pc
        apt-mark hold grub-pc-bin

The end stuff about holding back grub-related packages is pretty important as well – if we don’t do this we will brick the machine every time we restart, I’ve posted about this before (3 years ago, wow time flies).

Installing $STORAGE_PLUGIN-specific system requirements

Since different packages have different requirements, there are some variations on things we need to install to run each. Since we’ve opted to use Ubuntu 20.04 the stuff in this section will be somewhat specific, and I’ll include package names here. The way I’ve gone about this is by defining a list of “supported” storage plugins:

---
- name: Storage plugin setup
  hosts: "{{ ansible_limit | default(omit) }}"
  remote_user: root
  vars:
    supported_storage_plugins:
      - all
      - rook-ceph-lvm
      - linstor-rbd9
      - openebs-mayastor
      - openebs-cstor
      - openebs-jiva
      - openebs-localpv-hostpath
      - openebs-localpv-device
      - openebs-localpv-zfs
  tasks:
    # ... elided ... #

Then relevant tasks will scope themselves down to the ones they care about:

    - name: Do a thing that is useful for only OpenEBS cStor and Jiva
      when: storage_plugin == "all" or storage_plugin in target_plugins
      thing:
        args: that is being done
      vars:
        target_plugins:
          - openebs-cstor
          - openebs-jiva

Note that most of the dependencies will be ansible.builtin.apt tasks, which is the point of using a distribution with nice package management and a wide ecosystem.

Here are some of those template files:

50-unattended-upgrades.conf.j2:

Unattended-Upgrade::Allowed-Origins {
  "${distro_id}:${distro_codename}";
  "${distro_id}:${distro_codename}-security";
  "${distro_id}ESM:${distro_codename}";
}

Unattended-Upgrade::Package-Blacklist {
  "grub";
}

{% if unattended_upgrade_email is defined %}
Unattended-Upgrade::Mail "{{ unattended_upgrade_email }}";
Unattended-Upgrade::MailOnlyOnError "true";
{% endif %}

20-auto-upgrades.conf.j2

APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::AutocleanInterval "7";

installimage.j2 (again, mostly stolen from andrelohmann/ansible-role-hetzner_installimage):

{% for drive in hetzner_installimage_drives %}
{{ drive }}
{% endfor %}

{% for raid in hetzner_installimage_raid %}
{{ raid }}
{% endfor %}

BOOTLOADER {{ hetzner_installimage_bootloader }}
HOSTNAME {{ hetzner_installimage_hostname }}

{% for partition in hetzner_installimage_partitions %}
{{ partition }}
{% endfor %}

IMAGE {{ hetzner_installimage_image }}

Dependencies for STORAGE_PLUGIN=rook-ceph-lvm

The dependency list for ZFS-powered Ceph via Rook is pretty short:

Here’s that expressed an an ansible task:

    - name: Install LVM
      when: storage_plugin == "all" or storage_plugin in target_plugins
      block:
        - name: Install lvm2
          ansible.builtin.apt:
            name: lvm2
            update_cache: yes
            state: present
        - name: Ensure rbd kernel module is installed
          community.general.modprobe:
            name: rbd
            state: present
      vars:
        target_plugins:
          - rook-ceph-lvm
          - linstor-rbd9

Dependencies for STORAGE_PLUGIN=rook-ceph-zfs

The dependency list for ZFS-powered Ceph via Rook:

  • zfsutils-linux (Ubuntu 20.04 seems to only need this, no zfs or zfs-linux package)

Expressed as an ansible task:

    - name: Install ZFS
      when: storage_plugin == "all" or storage_plugin in target_plugins
      ansible.builtin.apt:
        name:
          - zfs-linux
          - zfsutils-linux
        update_cache: yes
        state: present
      vars:
        target_plugins:
          - rook-ceph-zfs
          - openebs-localpv-zfs
          - linstor-bd9

Dependencies for STORAGE_PLUGIN=openebs-mayastor

The dependency list for OpenEBS MayaStor:

Expressed as ansible tasks:

    - name: Enable huge page support
      when: storage_plugin == "all" or storage_plugin in target_plugins
      block:
        - name: set nr_hugepages
          ansible.builtin.shell: |
            echo 512 | tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
        - name: set nr_hugepages via sysctl
          ansible.builtin.shell: |
            echo vm.nr_hugepages = 512 | tee -a /etc/sysctl.conf
      vars:
        target_plugins:
          - openebs-mayastor

    - name: Ensure nvme-oF over TCP support
      when: storage_plugin == "all" or storage_plugin in target_plugins
      community.general.modprobe:
        name: nvme-tcp
        state: present
      vars:
        target_plugins:
          - openebs-mayastor

    - name: Install iscsi
      when: storage_plugin == "all" or storage_plugin in target_plugins
      block:
        - name: Install apt package
          ansible.builtin.apt:
            name: open-iscsi
            update_cache: yes
            state: present
        - name: Enable iscsid
          ansible.builtin.systemd:
            name: iscsid
            state: started
            enabled: yes
      vars:
        target_plugins:
          - openebs-mayastor
          - openebs-cstor
          - openebs-jiva

Dependencies for STORAGE_PLUGIN=openebs-cstor

The dependency list for OpenEBS cStor:

iscsi has already been covered so I’ll leave it out here.

Dependencies for STORAGE_PLUGIN=openebs-jiva

The dependency list for OpenEBS Jiva (rancher/longhorn under the covers):

iscsi has already been covered so I’ll leave it out here as well.

Dependencies for STORAGE_PLUGIN=openebs-localpv-hostpath

Nothing required for hostpath that isn’t already included in Ubuntu 20.04 I think – which is what one might expect given how simple it is (in theory).

Dependencies for STORAGE_PLUGIN=openebs-localpv-device

Same story as openebs-localpv-hostpath – requirements should be all present.

Dependencies for STORAGE_PLUGIN=openebs-localpv-zfs

The dependency list for OpenEBS LocalPVs via ZFS:

  • zfsutils-linux (Ubuntu 20.04 seems to only need this, no zfs or zfs-linux package)

No huge surprise here, you’re going to need ZFS if you want to run hostpaths built on it.

Dependencies for STORAGE_PLUGIN=linstor-rbd9

The dependency list for running LINSTOR:

  • zfsutils-linux (LINSTOR can run on ZFS)
  • lvm2
  • ppa:linbit/linbit-drbd9-stack apt repository
  • drbd-dkms (in ppa:linbit/linbit-drbd9-stack)
  • drbd-utils (in ppa:linbit/linbit-drbd9-stack)
  • linstor-controller (in ppa:linbit/linbit-drbd9-stack)
  • linstor-satellite (in ppa:linbit/linbit-drbd9-stack)
  • linstor-client (in ppa:linbit/linbit-drbd9-stack)

The ansible you haven’t seen yet:

    - name: Add drbd9 apt repositories
      when: storage_plugin == "all" or storage_plugin in target_plugins
      ansible.builtin.apt_repository:
        repo: ppa:linbit/linbit-drbd9-stack
        state: present
      vars:
        target_plugins:
          - linstor-rbd9

    - name: Install LINSTOR components
      when: storage_plugin == "all" or storage_plugin in target_plugins
      block:
        - name: Install drbd packages
          ansible.builtin.apt:
            name:
              - drbd-dkms
              - drbd-utils
            update_cache: yes
            state: present
        - name: Install linstor components
          ansible.builtin.apt:
            name:
              - linstor-controller
              - linstor-satellite
              - linstor-client
            update_cache: yes
            state: present
        - name: Ensure rbd kernel module is installed
          community.general.modprobe:
            name: rbd
            state: present
      vars:
        target_plugins:
          - linstor-rbd9

I keep it simple and just install all the pieces of linstor on every machine for easy flexibility now and in the future.

Installing Kubernetes with k0s

Obviously before we get started using kubectl we’re going to need to at least install Kubernetes! Now that the machine-level dependencies for the storage plugins are done let’s actually install k8s, with k0s.

The Makefile target k8s-install is pretty similar to the rest of the targets:

k8s-install: require-target
    @echo "[info] performing k8s install..."
    $(ANSIBLE_PLAYBOOK) k8s-install.yml \
        -i $(INVENTORY_PATH) \
        --limit=$(TARGET) \
        $(ANSIBLE_ARGS)

And the ansible code:

#
# Playbook for setting up kubernetes
#
---
- name: k8s-setup all-in-one server setup
  hosts: "{{ ansible_limit | default(omit) }}"
  remote_user: root
  vars:
    k0s_version: v0.12.0
    k0s_checksum: sha256:0a3ead8f8e5f950390eeb76bd39611d1754b282536e8d9dbbaa0676550c2edbf
  tasks:
    - name: Populate service facts
      ansible.builtin.service_facts:

    - name: Download k0s
      ansible.builtin.get_url:
        url: |
          https://github.com/k0sproject/k0s/releases/download/{{ k0s_version }}/k0s-{{ k0s_version }}-amd64
        checksum: "{{ k0s_checksum }}"
        mode: 0755
        dest: /usr/bin/k0s
      when: ansible_facts.services["k0scontroller.service"] is not defined

    - name: Create /var/lib/k0s folder
      ansible.builtin.file:
        path: /var/lib/k0s
        state: directory
      when: ansible_facts.services["k0scontroller.service"] is not defined

    - name: Add k0s config file
      ansible.builtin.template:
        src: k0s-config.yaml.j2
        dest: /var/lib/k0s/config.yaml
        owner: root
        group: root
        mode: 0644
      when: ansible_facts.services["k0scontroller.service"] is not defined

    - name: Install k0s
      ansible.builtin.command: |
        k0s install controller -c /var/lib/k0s/config.yaml --single
      when: ansible_facts.services["k0scontroller.service"] is not defined

    - name: Start the k0s service
      ansible.builtin.systemd:
        name: k0scontroller
        state: started
        enabled: yes
      when: ansible_facts.services["k0scontroller.service"] is not defined

    - name: Create worker join token (saved @ /tmp/worker-token)
      shell: |
        k0s token create --role=worker --expiry=168h > /tmp/worker-token
      when: ansible_facts.services["k0scontroller.service"] is not defined

    - name: Copy out worker token
      ansible.builtin.fetch:
        src: /tmp/worker-token
        dest: output

    - name: Copy out cluster configuration
      ansible.builtin.fetch:
        src: /var/lib/k0s/pki/admin.conf
        dest: output

    - name: Replace localhost in cluster configuration
      delegate_to: localhost
      ansible.builtin.replace:
        path: "output/{{ inventory_hostname }}/var/lib/k0s/pki/admin.conf"
        regexp: 'https://localhost:6443'
        replace: "https://{{ cluster_external_address | default(inventory_hostname) }}:6443"

    - name: (ufw) Allow TCP access on port 6443
      tags: [ "ufw" ]
      become: yes
      community.general.ufw:
        rule: allow
        port: '6443'
        proto: tcp

    - name: (ufw) Allow UDP access on port 6443
      tags: [ "ufw" ]
      become: yes
      community.general.ufw:
        rule: allow
        port: '6443'
        proto: udp

Easy peasy – getting a Kubernetes cluster functioning has never been so easy (well technically k3s also made it similarly easy first)! I might actually stop messing with kubeadm and orchestrating/running it myself if it’s going to be this easy.

Configuring k0s

One thing I did have to do was spend some time configuring k0s so here’s what the template that was used above look like:

---
apiVersion: k0s.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s
spec:
  api:
    externalAddress: {{ cluster_external_address | default(inventory_hostname) }}
    address: {{ hostvars[inventory_hostname]['ansible_env'].SSH_CONNECTION.split(' ')[2] }}
    sans:
      - {{ hostvars[inventory_hostname]['ansible_env'].SSH_CONNECTION.split(' ')[2] }}
      - {{ cluster_external_address | default(inventory_hostname) }}
      - {{ inventory_hostname }}
  storage:
    type: etcd
    etcd:
      peerAddress: {{ hostvars[inventory_hostname]['ansible_env'].SSH_CONNECTION.split(' ')[2] }}
  network:
    podCIDR: 10.244.0.0/16
    serviceCIDR: 10.96.0.0/12
    provider: calico
    calico:
      mode: vxlan
      vxlanPort: 4789
      vxlanVNI: 4096
      mtu: 1450
      wireguard: false
      flexVolumeDriverPath: /usr/libexec/k0s/kubelet-plugins/volume/exec/nodeagent~uds
      withWindowsNodes: false
      overlay: Always
  podSecurityPolicy:
    defaultPolicy: 00-k0s-privileged
  telemetry:
    interval: 10m0s
    enabled: true
  installConfig:
    users:
      etcdUser: etcd
      kineUser: kube-apiserver
      konnectivityUser: konnectivity-server
      kubeAPIserverUser: kube-apiserver
      kubeSchedulerUser: kube-scheduler
  images:
    default_pull_policy: IfNotPresent
    konnectivity:
      image: us.gcr.io/k8s-artifacts-prod/kas-network-proxy/proxy-agent
      version: v0.0.13
    metricsserver:
      image: gcr.io/k8s-staging-metrics-server/metrics-server
      version: v0.3.7
    kubeproxy:
      image: k8s.gcr.io/kube-proxy
      version: v1.20.5
    coredns:
      image: docker.io/coredns/coredns
      version: 1.7.0
    calico:
      cni:
        image: calico/cni
        version: v3.16.2
      flexvolume:
        image: calico/pod2daemon-flexvol
        version: v3.16.2
      node:
        image: calico/node
        version: v3.16.2
      kubecontrollers:
        image: calico/kube-controllers
        version: v3.16.2

Generally, if you’re not using the CRD/File-based setup for tools like kubeadm or k0s you’re missing out – it’s a nice way to use some more declarative configuration and YAML at this size/complexity (just a single configuration file for a single tool) is quite nice.

Wrapup

OK, now we’ve got a nice repeatable process for getting a pre-provisioned (purchased) Hetzner dedicated server to the point of running a single node kubernetes cluster with k0s. Fully automating as we go takes much more time, but will pay off in spades once we’re doing test running (and in general for me in the future).

I can’t believe I thought this would all fit in one post originally – I’ve since split the posts up into 5 parts, and this is the end of Part 1. Hopefully parts 2-5 won’t be take too much longer (I do have some work I wanted to do), stay tuned to this space (or however you found this article) for the rest!