tl;dr - I tried to get LXD working on Container Linux but stopped short. Maybe if anyone picks it up (assuming the
lxd team doesn’t tackle it eventually), they can learn from my failed effort.
I’ve recently gotten pretty excited about the concept of running higher isolation paradigms (VMs, LXD) in my cluster for larger untrusted workloads. A lot of interest in those concepts has been generated by the idea in the back of my head of building (or at least figuring out how I would build) a system that could spin up mini Kubernetes clusters – like an EKS/AKS/GKE, but easily self-hostable. Kubernetes makes running all these workloads across a large cluster (or a small cluster and scaling up) seem like a possible feat so I wanted to see if it could be done.
While I learned about LXD a while ago, I didn’t really think it to be too useful because I didn’t really have a use for it – docker and other container runtimes, along with Kubernetes were already providing my isolation needs (I also didn’t think I needed any more protection), so I noted it as cool tech but ultimately ignored it. Around the time I was thinking about this stuff I came across a fantastic talk from Kubecon 2017 called “Cluster-in-a-Box: Deploying Kubernetes on lxd” given by Rye Terrell of Canonical and Marco Ceppi of The Silph Road. This talk was a great motivator, as they had basically done what I wanted to do and had been greatly successful with it.
I had one problem – I don’t run Ubuntu, Debian, Fedora, or CentOS, I run CoreOS’s Container Linux. CoreOS has a very specific set of things installed, and installing new software can be very difficult, mostly because that’s just not how you’re supposed to use it – it’s mainly made for running containers, and that’s exactly what I like about it. There’s a bit of a problem here though – LXD is not meant to be run from inside a container – I thought this left me with a few ways to make it happen:
These are my notes as I tried both the options above and failed spectacularly. Astute readers will note that the easiest answer is probably just to change to a debian-based distro like Ubuntu, but that would have bene too easy (plus I had just switched off of Arch for my server, and back to Container Linux, so I didn’t want to switch again).
I did of course ask on the LXD forum, and found out that there were no plans (at the time) to port LXD over to container linux. Off we go!
Well I just skipped this – this is too much boiling of the ocean even for my taste. CoreOS really isn’t for building software stuff, it’s for running containers, and if I’m going to go the building route, I might as well do it from Alpine (which actually has a package for lxd).
Before trying to get this done inside Kubernetes, I figured I’d just try to get it working locally using
docker. Then I remembered that I don’t use
docker normally, and it’s disabled since I use
containerd as my container runtime – so my testing was actually from inside kubernetes, start with a command something like this:
$ kubectl exec -it test-lxd --image=alpine /bin/ash
Basically the idea is to just start a container and get in there and see what will happen.
As I noted before, Alpine actually provides packages for both
lxd. Some small changes to
/etc/apk/repositories and I was good to go. I also took note of various lib/dev libs that came with, like
After the required repository config changes, a simple
apk --udpate add lxc lxd was all it took to get a theoretically working
lxd naively yielded a few warnings but actually didn’t fail right away:
/ # lxd WARN[05-24|03:44:47] Error reading default uid/gid map err="User \"root\" has no subuids." WARN[05-24|03:44:47] Only privileged containers will be able to run WARN[05-24|03:44:47] AppArmor support has been disabled because of lack of kernel support
Very suspicious at this point, I tried next to run a container:
/ # lxc launch ubuntu:16.04 first Creating first Retrieving image: rootfs: 100% (77.69MB/s)EROR[05-24|03:45:43] Failed creating container ephemeral=false name=first Error: Failed container creation: LXD doesn't have a uid/gid allocation. In this mode, only privileged containers are supported.
The important bit is that last line there, and actually makese sense – it just seems like the container doesn’t have enough permissions to do the things it needs to do.
Since the error was pretty clear, I figured to spend some time attempting to fix the namespacing issues. I’m way out of my depth here, as I normally don’t work so close to the linux subsystems but I figured I’d give it a shot anyway.
One thing I remembered almost instantly was a talk from Kubecon 2018 in which they briefly covered namespacing strategies and mechanics. The information I remembered hearing about was ~19 minutes into the talk, discussion around using
newuuidmap to manage mappings that were allowed to happen, which was a somewhat hacky solution to seemingly this class of issues, embraced by
I also found some resources online:
The last post seemed particularly useful:
To be able to start containers without “privileged” set to “true”, you need to add the “root” into subuid/subgid:
echo “root:100000:65536” | sudo tee /etc/subuid /etc/subgidOtherwise you get this error message: error: LXD doesn’t have a uid/gid allocation. In this mode, only privileged containers are supported.
I figured at this point that I was too far in over my head – before I tried to dive in and really understand the uid/gid/subuid/subgid namespacing issues and figure out how to fix it, I’d try and get a “privileged” container working (since the error did note that only those type of containers would work).
To run a privileged container in lxd you run
lxc launch ubuntu:16.04 first -c security.privileged=true -c security.nesting=true. It’s not immediately obvious, but I came across a fantastic blog post that helped a lot.
This got me past the namespacing issue, but that turned out to just be the tip of the iceberg.
/ # lxc launch ubuntu:16.04 second -c security.privileged=true -c security.nesting=true Creating second Starting second / # EROR[05-24|04:13:03] balance: Unable to set cpuset err="Failed to set cgroup cpuset.cpus=\"0,1,10,11,2,3,4,5,6,7,8,9\": setting cgroup item for the container failed" name=first value=0,1,10,11,2,3,4,5,6,7,8,9 EROR[05-24|04:13:03] balance: Unable to set cpuset err="Failed to set cgroup cpuset.cpus=\"0,1,10,11,2,3,4,5,6,7,8,9\": setting cgroup item for the container failed" name=second value=0,1,10,11,2,3,4,5,6,7,8,9
Clearly, there’s way more going wrong here (you can tell, I’m also on my second try as the container is named
second) – CGroups are not being negotiated properly. At htis point I wasn’t too comfortable going much further – allowing LXC to do the
cpuset wasn’t something I was comfortable wading into.
LXC inside LXC seems possible however, but what I was trying to do was run LXC from inside a different container runtime, and this seemed less possible.
The next thing I tried was to statically build
lxd. I was encouraged by the fact that they’re both built in Golang, which is notoroiously simple to do static builds with. It ocurred to me that I had never really dealt too much with forcing go to do completely static builds so I had to find an SO post with the incantations necessary.
Unfortunately, large parts of
lxd rely on dynamically-linked C libraries, and this meant that I needed to try and build all the requirements of the C dependencies of the Go code as well. At this point I was already feeling pretty fatigued, but decided to reand up on Alpine’s build processes to give it a shot – APKBuild and the
I figured since
lxc was an official alpine package, the best place to start would be to try to
abuild it from the
aports repo. At that point I experienced submission at the hands of a bunch of cuts. Generally the stepped I followed:
abuilddoesn’t want to be run as root, so need to create a user, stole instructions a random SO post](https://stackoverflow.com/questions/50258121/building-llvm-6-under-linux-alpine), turns out you can just
abuildcommand but it’s kind of tedious.
py3-setuptoolsas root (outside of
su), but it’s in
edge/mainso you need to edit repositories
apk add autoconf automake libtool linux-headers py3-setuptools
~/packagesas per the alpine instructions. Building the ports works, and produces the
lxcalpine package and a bunch of other packages… Which is great, but not what I want, since what I want to do a from-source build
abuild unpackand it will give you a
srcfolder itself (no
targetfolder), so I was a bit confused
./etc/init.d/lxc: a /sbin/openrc-run script, ASCII text executable
At this point I ran
lxc-usernsexec (randomly chosen
lxc related command) to see what it was like when
enable-static was used to build it:
qemu-test:~/packages/main/x86_64$ file ./usr/bin/lxc-usernsexec ./usr/bin/lxc-usernsexec: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, stripped qemu-test:~/packages/main/x86_64$ ldd ./usr/bin/lxc-usernsexec /lib/ld-musl-x86_64.so.1 (0x7f33716c2000) liblxc.so.1 => /usr/lib/liblxc.so.1 (0x7f3371206000) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f33716c2000) libseccomp.so.2 => /usr/lib/libseccomp.so.2 (0x7f3370fc6000) libcap.so.2 => /usr/lib/libcap.so.2 (0x7f3370dc1000)
So at this point it looks like
liblxc is a requirement here, and once I looked into trying to staticlaly build that my head was spinning. I reached out to the lxd discussion forum and got some help from some generous maintainers.
And this is where our adventure ends… I think it’s unlikely that I’m going to try and pick this up again, there’s just too much friction while other alternatives exist for me to use to run untrusted workloads in my kubernetes cluster.
Again, the easy solution would have been to just switch the underlying OS to Ubuntu or something debian based or anything that supported LXC, but where’s the fun in that? :)
So both method 2 and 3 failed… In every case it’s certainly true that if I had known more of the underlying subsystems I could go much further, but since my end goal was just running untrusted workloads (or VMs), there were (and still are) many non-LXD options available (containerd’s untrusted workload API, kata-runtime, etc), and it didn’t make sense for me to spend too much time dieing on this particular hill. Even the work involved in trying to build
libseccomp and the rest statically is technically overcomable but at this point I dont’t see a point in doing so when so many other options exist.
Maybe someday I’ll revisit this and give it another shot, but this day wasn’t the one. Hopefully this information helps someone out there who decides to go down this path.