Rootless Containers in 2020 on Arch Linux

Categories
Arch Linux logo + Podman logo

tl;dr - Quick guide to getting rootless containers up and running on Arch Linux (also see the excellent Arch Wiki entry)

Steps

As usual, the Arch Wiki is a fantastic resource, and has basically everything you need, if not a little bit spread out. The relevant pages you’ll want to look at:

1. Enable the kernel.unprivileged_userns_clone=1 setting

This is something I think I did a long time ago when first trying to get podman working on my system (with a previous version). You can add it with sysctl:

$ sudo sysctl kernel.unprivileged_userns_clone=1

You can also make this permanent by adding it to /etc/sysctl.d/userns.conf (in a root shell):

# echo 'kernel.unprivileged_userns_clone=1' > /etc/sysctl.d/userns.conf

For me this value was already set to 1, so I assume I must have done this earlier (though I didn’t have a /etc/sysctl.d/userns.conf file).

2. Enable cgroups v2

You’ll want to enable cgroups v2 support (which is normally disabled) in the kernel as well. The Arch Wiki cgroups entry covers it very well.

I have grub on my system so I followed the grub kernel parameter instructions, meaning I edited /etc/default/grub and ran grub-mkconfig (making sure to backup /boot/grub/grub.cfg before doing anything), and added systemd.unified_cgroup_hierarchy=1 to the kernel params in GRUB_CMDLINE_LINUX_DEFAULT.

3. Download crun

Before we get into what crun we have to talk about runc – the defacto OCI-spec container runtime engine. As far as terminology goes if we consider containerd to be a runtime, then I’d consider runc to be an engine “powering” that runtime.

crun is relevant and necessary because runc actually does not support cgroups v2 (as of the writing of this post) – crun needs to be present on the machine for recent versions of podman to work.

crun is a community package and is installable the usual way:

$ sudo pacman -S crun

4. Create the /etc/subuid and /etc/subgid files if they don’t exist

The key feature of running rootless containers is support for linux user namespacing. All the “rootless” container solutions and products out there work the same way:

In our daily reminder that the kernel is just software – this feature requires the files /etc/subuid and /etc/subgid along with some other utilities to exist. I somehow didn’t have these files on my system, despite having binaries like the newgidmap/newuidmap binaries, so I just touch’d them into existence:

$ sudo touch /etc/subuid
$ sudo touch /etc/subgid

5. Create a UID/GID mapping for yourself

You can add a mapping to free up UIDs and GIDs for yourself to use by using usermod with the following syntax: usermod --add-subuids <start>-<end> --add-subgids <start>-<end> <username>. What I ran:

$ sudo touch /etc/subuid # create the subuid file (only necssary if it doesn't exist)
$ sudo touch /etc/subgid # create the subgid file (only necssary if it doesn't exist)
$ sudo usermod --add-subuids 100000-165535 --add-subgids 100000-165535 <username>

NOTE FROM THE FUTURE - You should consider using a much wider range as input to this command (for both sub-uids and sub-gids), I run into limitations on a big build in the future.

NOTE FROM THE FUTURE, 2 - I’ve updated the usermod command in the post and added some clarifying commands, it should be correct. With these commands, the /etc/subuid and /etc/subgid files should be owned by root but contain rules that are relevant to <username>

6. Install podman and alias docker

Well this is pretty obvious, just sudo pacman -S podman. After installing podman you might want to alias docker to it in your .bashrc or .bash_profile. Note that if you’re using docker in Makefiles you may need to do something like:

.PHONY: all some-target

all: some-target

DOCKER ?= podman

some-target:
    $(DOCKER) subcommand ...

7. Test it out with docker run alpine

Here’s what it looks like when everything works:

$ docker run --rm alpine
Trying to pull docker.io/library/alpine...
Getting image source signatures
Copying blob df20fa9351a1 done
Copying config a24bb40132 done
Writing manifest to image destination
Storing signatures

If you want to run alpine but actually get into a shell, run docker run --rm -it alpine /bin/ash.

ISSUE: There might not be enough IDs available, /etc/shadow invalid argument

I found the solution to this issue on Github in the podman repo but I’ll go into it here. The output looked like this:

$ docker run alpine
Trying to pull docker.io/library/alpine...
Getting image source signatures
Copying blob df20fa9351a1 done
Copying config a24bb40132 done
Writing manifest to image destination
Storing signatures
  Error processing tar file(exit status 1): there might not be enough IDs available in the namespace (requested 0:42 for /etc/shadow): lchown /etc/shadow: invalid argument

In the end the solution was to do as instructed in the issue:

  • Remove all local container content from your homedir
    • rm -r ~/.config/containers (you should add f to the -r yourself make sure you don’t accidentally rm -rf something you don’t want to)
    • rm -r ~/.local/share/containers (you should add -f yourself)
  • Run podman system migrate
  • Run podman unshare cat /proc/self/uid_map

DETAIL: useradd populates /etc/subuid and /etc/subgid, but not for system users

Thanks to u/MachaHack on reddit for noting this:

One slightly annoying detail is that useradd will populate /etc/subuid and /etc/subgid with a reasonable automatic range (~1000 entries by default) if they exist… Unless you make the user a system user. In which case they get none and you need to calculate/add a new range manually. This isn’t documented, and the only note on it is in a bug tracker where another distributions maintainer mentioned they didn’t need it for system users as they had their own tooling for that.

I tend to run my rootless containers under their own users still to avoid issues if there’s a breakout vulnerability - a single purpose user for e.g. graylog is less of an impact than someone having access to my personal account.

So note that system users are not given a reasonable automatic range after they’re added. And the point on breakout vulnerabilities is also great – in a callback to the times where user-level isolation was the way, it might make sense to run some program like graylog with an isolated/constrained container as a constrained graylog user for added security.

Wrapup

With this I finally have daemon-less podman available and I can even systemctl disable the docker systemd unit (well actually I always manually start it after restarts but I digress).

Hopefully this succint guide helps someone out there to get their set up figured out!