tl;dr - I tried to get LXD working on Container Linux but stopped short. Maybe if anyone picks it up (assuming the
lxd team doesn’t tackle it eventually), they can learn from my failed effort.
I’ve recently gotten pretty excited about the concept of running higher isolation paradigms (VMs, LXD) in my cluster for larger untrusted workloads. A lot of interest in those concepts has been generated by the idea in the back of my head of building (or at least figuring out how I would build) a system that could spin up mini Kubernetes clusters – like an EKS/AKS/GKE, but easily self-hostable. Kubernetes makes running all these workloads across a large cluster (or a small cluster and scaling up) seem like a possible feat so I wanted to see if it could be done.
While I learned about LXD a while ago, I didn’t really think it to be too useful because I didn’t really have a use for it – docker and other container runtimes, along with Kubernetes were already providing my isolation needs (I also didn’t think I needed any more protection), so I noted it as cool tech but ultimately ignored it. Around the time I was thinking about this stuff I came across a fantastic talk from Kubecon 2017 called “Cluster-in-a-Box: Deploying Kubernetes on lxd” given by Rye Terrell of Canonical and Marco Ceppi of The Silph Road. This talk was a great motivator, as they had basically done what I wanted to do and had been greatly successful with it.
I had one problem – I don’t run Ubuntu, Debian, Fedora, or CentOS, I run CoreOS’s Container Linux. CoreOS has a very specific set of things installed, and installing new software can be very difficult, mostly because that’s just not how you’re supposed to use it – it’s mainly made for running containers, and that’s exactly what I like about it. There’s a bit of a problem here though – LXD is not meant to be run from inside a container – I thought this left me with a few ways to make it happen:
- Method 1: Download a build toolchain on CoreOS, build go-lxc + lxd + go-sqlite3 statically
- Method 2: Run lxd from a privileged container
- Method 3: Statically compile lxd on a different OS (alpine) and build a binary CoreOS can run
These are my notes as I tried both the options above and failed spectacularly. Astute readers will note that the easiest answer is probably just to change to a debian-based distro like Ubuntu, but that would have bene too easy (plus I had just switched off of Arch for my server, and back to Container Linux, so I didn’t want to switch again).
I did of course ask on the LXD forum, and found out that there were no plans (at the time) to port LXD over to container linux. Off we go!
Method 1: Download and build from CoreOS
Well I just skipped this – this is too much boiling of the ocean even for my taste. CoreOS really isn’t for building software stuff, it’s for running containers, and if I’m going to go the building route, I might as well do it from Alpine (which actually has a package for lxd).
Method 2: Running LXD from a (privileged) container
Starting the container
Before trying to get this done inside Kubernetes, I figured I’d just try to get it working locally using
docker. Then I remembered that I don’t use
docker normally, and it’s disabled since I use
containerd as my container runtime – so my testing was actually from inside kubernetes, start with a command something like this:
$ kubectl exec -it test-lxd --image=alpine /bin/ash
Basically the idea is to just start a container and get in there and see what will happen.
Installing LXC and LXD in an Alpine container
As I noted before, Alpine actually provides packages for both
lxd. Some small changes to
/etc/apk/repositories and I was good to go. I also took note of various lib/dev libs that came with, like
After the required repository config changes, a simple
apk --udpate add lxc lxd was all it took to get a theoretically working
Attempting to start
lxd naively yielded a few warnings but actually didn’t fail right away:
/ # lxd WARN[05-24|03:44:47] Error reading default uid/gid map err="User \"root\" has no subuids." WARN[05-24|03:44:47] Only privileged containers will be able to run WARN[05-24|03:44:47] AppArmor support has been disabled because of lack of kernel support
Very suspicious at this point, I tried next to run a container:
/ # lxc launch ubuntu:16.04 first Creating first Retrieving image: rootfs: 100% (77.69MB/s)EROR[05-24|03:45:43] Failed creating container ephemeral=false name=first Error: Failed container creation: LXD doesn't have a uid/gid allocation. In this mode, only privileged containers are supported.
The important bit is that last line there, and actually makese sense – it just seems like the container doesn’t have enough permissions to do the things it needs to do.
DEBUG: user/group namespace issuses
Since the error was pretty clear, I figured to spend some time attempting to fix the namespacing issues. I’m way out of my depth here, as I normally don’t work so close to the linux subsystems but I figured I’d give it a shot anyway.
One thing I remembered almost instantly was a talk from Kubecon 2018 in which they briefly covered namespacing strategies and mechanics. The information I remembered hearing about was ~19 minutes into the talk, discussion around using
newuuidmap to manage mappings that were allowed to happen, which was a somewhat hacky solution to seemingly this class of issues, embraced by
I also found some resources online:
The last post seemed particularly useful:
To be able to start containers without “privileged” set to “true”, you need to add the “root” into subuid/subgid:
echo “root:100000:65536” | sudo tee /etc/subuid /etc/subgidOtherwise you get this error message: error: LXD doesn’t have a uid/gid allocation. In this mode, only privileged containers are supported.
I figured at this point that I was too far in over my head – before I tried to dive in and really understand the uid/gid/subuid/subgid namespacing issues and figure out how to fix it, I’d try and get a “privileged” container working (since the error did note that only those type of containers would work).
Running a “privileged” lxc container
To run a privileged container in lxd you run
lxc launch ubuntu:16.04 first -c security.privileged=true -c security.nesting=true. It’s not immediately obvious, but I came across a fantastic blog post that helped a lot.
This got me past the namespacing issue, but that turned out to just be the tip of the iceberg.
/ # lxc launch ubuntu:16.04 second -c security.privileged=true -c security.nesting=true Creating second Starting second / # EROR[05-24|04:13:03] balance: Unable to set cpuset err="Failed to set cgroup cpuset.cpus=\"0,1,10,11,2,3,4,5,6,7,8,9\": setting cgroup item for the container failed" name=first value=0,1,10,11,2,3,4,5,6,7,8,9 EROR[05-24|04:13:03] balance: Unable to set cpuset err="Failed to set cgroup cpuset.cpus=\"0,1,10,11,2,3,4,5,6,7,8,9\": setting cgroup item for the container failed" name=second value=0,1,10,11,2,3,4,5,6,7,8,9
Clearly, there’s way more going wrong here (you can tell, I’m also on my second try as the container is named
second) – CGroups are not being negotiated properly. At htis point I wasn’t too comfortable going much further – allowing LXC to do the
cpuset wasn’t something I was comfortable wading into.
LXC inside LXC seems possible however, but what I was trying to do was run LXC from inside a different container runtime, and this seemed less possible.
Method 3: Statically building lxc/lxd
The next thing I tried was to statically build
lxd. I was encouraged by the fact that they’re both built in Golang, which is notoroiously simple to do static builds with. It ocurred to me that I had never really dealt too much with forcing go to do completely static builds so I had to find an SO post with the incantations necessary.
Unfortunately, large parts of
lxd rely on dynamically-linked C libraries, and this meant that I needed to try and build all the requirements of the C dependencies of the Go code as well. At this point I was already feeling pretty fatigued, but decided to reand up on Alpine’s build processes to give it a shot – APKBuild and the
Attempting (and giving up on) building
I figured since
lxc was an official alpine package, the best place to start would be to try to
abuild it from the
aports repo. At that point I experienced submission at the hands of a bunch of cuts. Generally the stepped I followed:
- Get some build tools (starting with
abuilddoesn’t want to be run as root, so need to create a user, stole instructions a random SO post](https://stackoverflow.com/questions/50258121/building-llvm-6-under-linux-alpine), turns out you can just
abuildcommand but it’s kind of tedious.
- I needed to install
py3-setuptoolsas root (outside of
su), but it’s in
edge/mainso you need to edit repositories
- The build tool installation command ended up looking something like:
apk add autoconf automake libtool linux-headers py3-setuptools
- The build completes! The outputs are installed to
~/packagesas per the alpine instructions. Building the ports works, and produces the
lxcalpine package and a bunch of other packages… Which is great, but not what I want, since what I want to do a from-source build
- To try and do the build statically with a source you need to run
abuild unpackand it will give you a
- I tried making the project normally (no static shenanigans yet)
- It completes, so at least I am starting from a working dynamically-linked build. I get the usual ouptut you’d expect from something that is building – lots and lots of compilation output – weirdly enough, the outputs get generated in the
srcfolder itself (no
targetfolder), so I was a bit confused
- I got even more confused when I realized that lxc is really a bunch of scripts (the
- Here’s what I got when I ran
./etc/init.d/lxc: a /sbin/openrc-run script, ASCII text executable
- Here’s what I got when I ran
- At this point I also found that someone else had asked for this earlier in 2018
- At this point I ran
lxcrelated command) to see what it was like when
enable-staticwas used to build it:
qemu-test:~/packages/main/x86_64$ file ./usr/bin/lxc-usernsexec ./usr/bin/lxc-usernsexec: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, stripped qemu-test:~/packages/main/x86_64$ ldd ./usr/bin/lxc-usernsexec /lib/ld-musl-x86_64.so.1 (0x7f33716c2000) liblxc.so.1 => /usr/lib/liblxc.so.1 (0x7f3371206000) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f33716c2000) libseccomp.so.2 => /usr/lib/libseccomp.so.2 (0x7f3370fc6000) libcap.so.2 => /usr/lib/libcap.so.2 (0x7f3370dc1000)
So at this point it looks like
liblxc is a requirement here, and once I looked into trying to staticlaly build that my head was spinning. I reached out to the lxd discussion forum and got some help from some generous maintainers.
And this is where our adventure ends… I think it’s unlikely that I’m going to try and pick this up again, there’s just too much friction while other alternatives exist for me to use to run untrusted workloads in my kubernetes cluster.
Again, the easy solution would have been to just switch the underlying OS to Ubuntu or something debian based or anything that supported LXC, but where’s the fun in that? :)
So both method 2 and 3 failed… In every case it’s certainly true that if I had known more of the underlying subsystems I could go much further, but since my end goal was just running untrusted workloads (or VMs), there were (and still are) many non-LXD options available (containerd’s untrusted workload API, kata-runtime, etc), and it didn’t make sense for me to spend too much time dieing on this particular hill. Even the work involved in trying to build
libseccomp and the rest statically is technically overcomable but at this point I dont’t see a point in doing so when so many other options exist.
Maybe someday I’ll revisit this and give it another shot, but this day wasn’t the one. Hopefully this information helps someone out there who decides to go down this path.