tl;dr - Quick guide to getting rootless containers up and running on Arch Linux (also see the excellent Arch Wiki entry)
As usual, the Arch Wiki is a fantastic resource, and has basically everything you need, if not a little bit spread out. The relevant pages you’ll want to look at:
kernel.unprivileged_userns_clone=1
settingThis is something I think I did a long time ago when first trying to get podman
working on my system (with a previous version). You can add it with sysctl
:
$ sudo sysctl kernel.unprivileged_userns_clone=1
You can also make this permanent by adding it to /etc/sysctl.d/userns.conf
(in a root shell):
# echo 'kernel.unprivileged_userns_clone=1' > /etc/sysctl.d/userns.conf
For me this value was already set to 1, so I assume I must have done this earlier (though I didn’t have a /etc/sysctl.d/userns.conf
file).
You’ll want to enable cgroups v2 support (which is normally disabled) in the kernel as well. The Arch Wiki cgroups entry covers it very well.
I have grub
on my system so I followed the grub
kernel parameter instructions, meaning I edited /etc/default/grub
and ran grub-mkconfig
(making sure to backup /boot/grub/grub.cfg
before doing anything), and added systemd.unified_cgroup_hierarchy=1
to the kernel params in GRUB_CMDLINE_LINUX_DEFAULT
.
crun
Before we get into what crun
we have to talk about runc
– the defacto OCI-spec container runtime engine. As far as terminology goes if we consider containerd
to be a runtime, then I’d consider runc
to be an engine “powering” that runtime.
crun
is relevant and necessary because runc
actually does not support cgroups v2 (as of the writing of this post) – crun
needs to be present on the machine for recent versions of podman
to work.
crun
is a community package and is installable the usual way:
$ sudo pacman -S crun
/etc/subuid
and /etc/subgid
files if they don’t existThe key feature of running rootless containers is support for linux user namespacing. All the “rootless” container solutions and products out there work the same way:
lxd
(which runs lxc
underneath)’s “System Containers” have had user namespacing the longestcontainerd
’s rootless containersdocker
’s container isolationpodman
’s rootless containersIn our daily reminder that the kernel is just software – this feature requires the files /etc/subuid
and /etc/subgid
along with some other utilities to exist. I somehow didn’t have these files on my system, despite having binaries like the newgidmap
/newuidmap
binaries, so I just touch
’d them into existence:
$ sudo touch /etc/subuid
$ sudo touch /etc/subgid
You can add a mapping to free up UIDs and GIDs for yourself to use by using usermod
with the following syntax: usermod --add-subuids <start>-<end> --add-subgids <start>-<end> <username>
. What I ran:
$ sudo touch /etc/subuid # create the subuid file (only necssary if it doesn't exist)
$ sudo touch /etc/subgid # create the subgid file (only necssary if it doesn't exist)
$ sudo usermod --add-subuids 100000-165535 --add-subgids 100000-165535 <username>
NOTE FROM THE FUTURE - You should consider using a much wider range as input to this command (for both sub-uids and sub-gids), I run into limitations on a big build in the future.
NOTE FROM THE FUTURE, 2 - I’ve updated the usermod command in the post and added some clarifying commands, it should be correct. With these commands, the /etc/subuid
and /etc/subgid
files should be owned by root
but contain rules that are relevant to <username>
podman
and alias docker
Well this is pretty obvious, just sudo pacman -S podman
. After installing podman
you might want to alias docker
to it in your .bashrc
or .bash_profile
. Note that if you’re using docker
in Makefiles you may need to do something like:
.PHONY: all some-target
all: some-target
DOCKER ?= podman
some-target:
$(DOCKER) subcommand ...
docker run alpine
Here’s what it looks like when everything works:
$ docker run --rm alpine
Trying to pull docker.io/library/alpine...
Getting image source signatures
Copying blob df20fa9351a1 done
Copying config a24bb40132 done
Writing manifest to image destination
Storing signatures
If you want to run alpine
but actually get into a shell, run docker run --rm -it alpine /bin/ash
.
/etc/shadow
invalid argumentI found the solution to this issue on Github in the podman repo but I’ll go into it here. The output looked like this:
$ docker run alpine
Trying to pull docker.io/library/alpine...
Getting image source signatures
Copying blob df20fa9351a1 done
Copying config a24bb40132 done
Writing manifest to image destination
Storing signatures
Error processing tar file(exit status 1): there might not be enough IDs available in the namespace (requested 0:42 for /etc/shadow): lchown /etc/shadow: invalid argument
In the end the solution was to do as instructed in the issue:
rm -r ~/.config/containers
(you should add f
to the -r
yourself make sure you don’t accidentally rm -rf
something you don’t want to)rm -r ~/.local/share/containers
(you should add -f
yourself)podman system migrate
podman unshare cat /proc/self/uid_map
useradd
populates /etc/subuid
and /etc/subgid
, but not for system usersThanks to u/MachaHack on reddit for noting this:
One slightly annoying detail is that useradd will populate /etc/subuid and /etc/subgid with a reasonable automatic range (~1000 entries by default) if they exist… Unless you make the user a system user. In which case they get none and you need to calculate/add a new range manually. This isn’t documented, and the only note on it is in a bug tracker where another distributions maintainer mentioned they didn’t need it for system users as they had their own tooling for that.
I tend to run my rootless containers under their own users still to avoid issues if there’s a breakout vulnerability - a single purpose user for e.g. graylog is less of an impact than someone having access to my personal account.
So note that system users are not given a reasonable automatic range after they’re added. And the point on breakout vulnerabilities is also great – in a callback to the times where user-level isolation was the way, it might make sense to run some program like graylog
with an isolated/constrained container as a constrained graylog
user for added security.
With this I finally have daemon-less podman available and I can even systemctl disable
the docker
systemd
unit (well actually I always manually start it after restarts but I digress).
Hopefully this succint guide helps someone out there to get their set up figured out!