Awesome FOSS Logo
Discover awesome open source software
Launched 🚀🧑‍🚀

Novice Arch Pitfall: watch out for kernel mismatches (after system updates)

Categories
arch linux logo

tl;dr - Make sure the reported linux version of uname -a and pacman -Q linux versions match after updates (or just be more cognizant of full system upgrades, i.e. updating the linux package).

Recently after restarting and performing an system update (pacman -Syu) post-restart, I tried to start doing some dev work and was greeted by a broken docker systemd service (root-full docker, so not the user-level docker service you’d use in rootless docker). The error was somewhat cryptic:

Jul 24 17:14:41 mroryxman dockerd[187376]: time="2021-07-24T17:14:41.813714960+09:00" level=warning msg="could not create bridge network for id a90c13da340d93b2b9280fac74aa5441d31f21f4f69b541945a4981cd454363a bridge name br-a90c13da340d w>
Jul 24 17:14:41 mroryxman dockerd[187376]: time="2021-07-24T17:14:41.846625784+09:00" level=warning msg="could not create bridge network for id a9d70feda31cb1f075abac57df9229a32d0f387b61104245802f046460edb292 bridge name docker0 while boo>
Jul 24 17:14:41 mroryxman dockerd[187376]: time="2021-07-24T17:14:41.880982682+09:00" level=warning msg="could not create bridge network for id f016bd9ec6833d5d2e2ab5ca63fb1d0b6809c2ab6e1483141eff4e30da405ff5 bridge name br-f016bd9ec683 w>
Jul 24 17:14:41 mroryxman dockerd[187376]: time="2021-07-24T17:14:41.886169036+09:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Jul 24 17:14:41 mroryxman dockerd[187376]: time="2021-07-24T17:14:41.936215595+09:00" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
Jul 24 17:14:41 mroryxman dockerd[187376]: failed to start daemon: Error initializing network controller: Error creating default "bridge" network: Failed to program NAT chain: Failed to inject DOCKER in PREROUTING chain: iptables failed: >
Jul 24 17:14:41 mroryxman dockerd[187376]: Try `iptables -h' or 'iptables --help' for more information.
Jul 24 17:14:41 mroryxman dockerd[187376]:  (exit status 2)
Jul 24 17:14:41 mroryxman systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ An ExecStart= process belonging to unit docker.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Jul 24 17:14:41 mroryxman systemd[1]: docker.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ The unit docker.service has entered the 'failed' state with result 'exit-code'.

iptables is high up on the list of things that you almost never expect to fail. OK so I don’t know what all of tha tmeans but I think I can try and figure out some of it, in particular the important parts (emphasis mine):

Jul 24 17:14:41 mroryxman dockerd[187376]: failed to start daemon: Error initializing network controller: Error creating default “bridge” network: Failed to program NAT chain: Failed to inject DOCKER in PREROUTING chain: iptables failed: iptables –wait -t nat -A PREROUTING -m addrtype –dst-type LOCAL -j DOCKER: iptables v1.8.7 (legacy): Couldn’t load match ‘addrtype’:No such file or directory Jul 24 17:14:41 mroryxman dockerd[187376]: Try `iptables -h’ or ‘iptables –help’ for more information. Jul 24 17:14:41 mroryxman dockerd[187376]: (exit status 2)

So it looks like I’m missing addrtype which is some piece of iptables? I obviously haven’t changed iptables recently, but when I check the iptables service isn’t running (it’s installed but disabled)… Maybe that could be it?

False solution: Rolling back docker

A quick google search lead me to the first wrong dead-end – maybe something had changed in docker like in the past? My first instinct was to revert the docker package using the usual means. For those who are not familiar with the “usual means”, if you’re running Arch and have recently installed a bad package or two, you can trusty the local pacman package cache to possibly lend a helping hand:

  • Look for your package by name-prefix in /var/cache/pacman (ex. docker-1:20.10.5-1-x86_64.pkg.tar.zst)
  • Install older version(s) with pacman -U <path to cached package> (ex. sudo pacman -U /var/cache/pacman/pkg/docker-1\:20.10.5-1-x86_64.pkg.tar.zst)

After doing this, docker still failed to start, so clearly I hadn’t found the issue just yet.

False solution: starting iptables

Well let’s spend some more time thinking about the error message – iptables seems to not support that kind of entry type for some reason. Weirdly enough, there’s an iptabels service but it wasn’t running – maybe that was the issue? Starting the iptables service was error-free but did not ultimately change things:

mrman 17:25:21 [development] $ systemctl status iptables
● iptables.service - IPv4 Packet Filtering Framework
     Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
     Active: active (exited) since Sat 2021-07-24 17:04:12 JST; 21min ago
   Main PID: 185526 (code=exited, status=0/SUCCESS)
        CPU: 2ms

Jul 24 17:04:12 mroryxman systemd[1]: Starting IPv4 Packet Filtering Framework...
Jul 24 17:04:12 mroryxman systemd[1]: Finished IPv4 Packet Filtering Framework.

So I shut that down and went back… what about iptables could be misconfigured? iptables requires kernel level coordination and support, maybe I’m having an issue with my kernel?

Investigation: Is it a kernel problem?

I started with the possibility that I somehow lost an extension to the kernel that was supposed to be there. One thing that seemed to support this was that my kernel literally doesn’t have too many custom modules:

mrman 17:39:13 [development] $ tree /lib/modules/5.12.12-arch1-1/
/lib/modules/5.12.12-arch1-1/
├── kernel
│   └── drivers
│       └── video
│           └── nvidia-uvm.ko.xz
└── updates
    └── dkms
        └── system76_acpi.ko.xz

5 directories, 2 files

If you look around the internet, it seems like these folders are supposed to be full of modules, but I sure don’t see very much! Thanks to the cache behaviors of Arch (pacman) I was able to check previous kernel versions and see that I actually never had more than this held around:

mrman 17:39:16 [development] $ tree /lib/modules/5.10.8-arch1-1/
/lib/modules/5.10.8-arch1-1/
├── kernel
│   └── drivers
│       └── video
│           └── nvidia-uvm.ko.xz
└── updates
    └── dkms
        └── system76_acpi.ko.xz

5 directories, 2 files

I’ll save you the trouble though, br_netfilter or any other such kernel module or configuration wasn’t the issue – my installed kernel and the kernel I was running on were mismatched (due to the recent post-restart update).

Solution: My kernel versions were mismatched

I came across an SO post which was the solution to my problem – as it so succinctly put:

You are running on kernel 4.16.12, but you have updated kernel to 4.17.2. After each kernel upgrade you need to restart your machine.

iptable fails, because it tries to load module iptable_filter, however the file /usr/lib/modules/4.16.12-1-ARCH/kernel/net/ipv4/netfilter/iptable_filter.ko.xz no longer exists, because you have updated your kernel with pacman -S linux. To change the running kernel you need to restart your machine. After restart, if the running kernel as reported by uname -a matches the version that is installed pacman -Q linux, then the iptables command should successfully load iptable_filter module.

Obviously the kernel versions don’t apply as this comment was made in 2018, but I did realize that I was in that exact situation. First I checked the usual trusty uname -a:

mrman 17:28:03 [development] $ uname -a
Linux mroryxman 5.12.12-arch1-1 #1 SMP PREEMPT Fri, 18 Jun 2021 21:59:22 +0000 x86_64 GNU/Linux

Then pacman -Q linux:

mrman 17:28:01 [development] $ pacman -Q linux
linux 5.13.4.arch1-1

And that was easy to see, plain as day! A prompt restart and I was off to the races and able to actually get to work. Another sunny day, another surprising novice pitfall I fell into in linux land. I’m pretty sure the long term solution to this isn’t to hold back kernel updates – probably better to just restart more often.

Just goes to show – you (re)learn something new (but it’s really actually old and well known, just not to you) every day!