tl;dr - I came across
rkt’s ability to use alternate stage 1s, got it working, but then abandoned it due to problems getting rook running and a lack of CRI compatability (at the time), before even trying to compare with the QEMU-in-a-pod approach. These notes are very old (I don’t use container linux for my cluster anymore) and I can’t believe I quit so quickly without more thorough investigation but evidently I did so there’s not much to see in this post, but maybe it will serve as a starting point for others.
This blog post is part of a multi-part series:
kata-runtimea fair shake
One of the thing that fell out during my experimentation in the first post was that
rkt could actually use alternative stage1 images to start containers (github docs) (if you’re new to rkt, and are not sure what “stage 1” means, check out the ACI implementors docs).
rkt was the first runtime I used for kubernetes and I was pretty excited that it could possibly solve my untrusted container goals. I was originally drawn to the approach because since
rkt was well supported by CoreOS, I wouldn’t need to install
runv (and consequently
hyperstart, which I failed at). My goal was to eventuallyc ompare the
stress-ng results between the QEMU-in-a-pod and
rkt approach, but (spoiler alert?) I quit relatively early on due to incompatabilities of
rktwith CRI and rook, a took I was (and still am) really fond of.
Thinking back to when these notes were made,
rkt seemed to not be getting much love from the community, which factored into why I gave up so quickly. I seem to have a thing for supporting underrated and/or dying tech. When I look at the
rkt project on Github (in particular their releases), I think I’m actually wrong about
rkt’s usage levels. While rktnetes support was deprecated in Kubernetes 1.10.0,
rktlet was probably part of the reason, they’d found a new better way forward with integrating
rkt and kubernetes. To this day though,
rktlet doesn’t seem to have a stable release (at the time of this post only
v0.1.0 is released, with basic support).
Either way, let’s dive into what happened while I tried (and succeeded) reconfiguring my container linux box to use
rkt and an alternate stage1.
So starting from where I was last time, there were two big steps to take:
rktto use an alternative stage 1 (which would use a VM image)
For the first bit, I actually got it working relatively quickly, then plucked the process out of these notes and wrote about it, which is great news because it makes this article much shorter! To make things even easier, CoreOS has some great documentation on using rkt with kubernetes (I’m unsure how long the page will be around, if that link dies check github) – that documentation was basically all I needed to feel like I knew enough to get started.
After doing everything in the CoreOS alternative stage1 documentation, it turned out that actually getting
rkt up and running as the primary runtime and switching stage1s wasn’t the biggest problem. I discovered that
rook, a kubernetes operator for managing Ceph and using it to create PersistentVolumeClaims actually doesn’t work on
rkt due to issues inside
rkt itself (there are also two linked
rkt issues, seccomp filters and runtime spec support). I actually posted that github issue, did some digging, was helped out by others in the thread, and decided basically right then and there that I couldn’t use
rkt going forward. I found
containerd so easy to use and small/focused where it needed to be compared to
rkts relative complexity. I really need to find some time to go back and explore
rktlet and see if I was wrong about
rkt as a container runtime.
I’m roughly 80% sure that I gave up on
rkt and probably for the wrong reasons – it seemed
rkt’s default-provided kernel permission capabilities contrasted with what
rook expected to be available, which is what made it not work. At first I thought the solution was easy – just add the necessary capabilities to the
securityContext of the
Deployment for the operator, but it’s not that simple. The problem is that in addition to the operator itself, a bucnh of added capabilities needed to be injected/patched on to anything (
StatefulSets, etc) that operator created. These days I can imagine myself properly solving the problem with
PodSecurityPolicy objects, but when I was trying all these out I believe PSPs were still relatively new and I either tried it shallowly or chose to not mess with it.
One of the great things about using container linux was that theoretically
stage1-kvm-qemu.aci was available from the start as a stage 1, without me having to do any image building myself, according to the docs. Also, annotation-based configuration suport landed which made it super easy to use.
NOTE If you’re trying to reproduce this after the publish time of this post, you should almost certainly be using
rktlet and probably won’t have this problem.
While following the rktnetes guide I found that access to the
rkt API endpoint was required to be present for the
kublet, but I found that it wasn’t started. It took me a little bit to find the documentation for the API service, but after working through that I was able to get it running by adding a
systemd Service to ensure it was running.
Unfortunately, right after getting everything working (switching from
rkt set up with alternate stage1 support), I ran into issues with rook and the other problems with rktnetes (which has been deprecated in favor of
rktlet these days), and decided to switch back to
containerd almost immediately. Looks like in the end I’ll be going back to trying to make
containerd (and it’s untrusted workload) work.
One of these days I’ll find some time to spin up a completely new cluster and give
rktlet a fair shot and see how it does, but that’s probably far in the future.