tl;dr - I came across rkt
’s ability to use alternate stage 1s, got it working, but then abandoned it due to problems getting rook running and a lack of CRI compatability (at the time), before even trying to compare with the QEMU-in-a-pod approach. These notes are very old (I don’t use container linux for my cluster anymore) and I can’t believe I quit so quickly without more thorough investigation but evidently I did so there’s not much to see in this post, but maybe it will serve as a starting point for others.
This blog post is part of a multi-part series:
kata-runtime
a fair shakeOne of the thing that fell out during my experimentation in the first post was that rkt
could actually use alternative stage1 images to start containers (github docs) (if you’re new to rkt, and are not sure what “stage 1” means, check out the ACI implementors docs). rkt
was the first runtime I used for kubernetes and I was pretty excited that it could possibly solve my untrusted container goals. I was originally drawn to the approach because since rkt
was well supported by CoreOS, I wouldn’t need to install runv
(and consequently hyperstart
, which I failed at). My goal was to eventuallyc ompare the stress-ng
results between the QEMU-in-a-pod and rkt
approach, but (spoiler alert?) I quit relatively early on due to incompatabilities of rkt
with CRI and rook, a took I was (and still am) really fond of.
Thinking back to when these notes were made, rkt
seemed to not be getting much love from the community, which factored into why I gave up so quickly. I seem to have a thing for supporting underrated and/or dying tech. When I look at the rkt
project on Github (in particular their releases), I think I’m actually wrong about rkt
’s usage levels. While rktnetes support was deprecated in Kubernetes 1.10.0, rktlet
was probably part of the reason, they’d found a new better way forward with integrating rkt
and kubernetes. To this day though, rktlet
doesn’t seem to have a stable release (at the time of this post only v0.1.0
is released, with basic support).
Either way, let’s dive into what happened while I tried (and succeeded) reconfiguring my container linux box to use rkt
and an alternate stage1.
So starting from where I was last time, there were two big steps to take:
rkt
instead of containerd
rkt
to use an alternative stage 1 (which would use a VM image)For the first bit, I actually got it working relatively quickly, then plucked the process out of these notes and wrote about it, which is great news because it makes this article much shorter! To make things even easier, CoreOS has some great documentation on using rkt with kubernetes (I’m unsure how long the page will be around, if that link dies check github) – that documentation was basically all I needed to feel like I knew enough to get started.
After doing everything in the CoreOS alternative stage1 documentation, it turned out that actually getting rkt
up and running as the primary runtime and switching stage1s wasn’t the biggest problem. I discovered that rook
, a kubernetes operator for managing Ceph and using it to create PersistentVolumeClaims actually doesn’t work on rkt
due to issues inside rkt
itself (there are also two linked rkt
issues, seccomp filters and runtime spec support). I actually posted that github issue, did some digging, was helped out by others in the thread, and decided basically right then and there that I couldn’t use rkt
going forward. I found containerd
so easy to use and small/focused where it needed to be compared to rkt
s relative complexity. I really need to find some time to go back and explore rktlet
and see if I was wrong about rkt
as a container runtime.
I’m roughly 80% sure that I gave up on rkt
and probably for the wrong reasons – it seemed rkt
’s default-provided kernel permission capabilities contrasted with what rook
expected to be available, which is what made it not work. At first I thought the solution was easy – just add the necessary capabilities to the securityContext
of the Deployment
for the operator, but it’s not that simple. The problem is that in addition to the operator itself, a bucnh of added capabilities needed to be injected/patched on to anything (Pod
s, DaemonSet
s, StatefulSet
s, etc) that operator created. These days I can imagine myself properly solving the problem with PodSecurityPolicy
objects, but when I was trying all these out I believe PSPs were still relatively new and I either tried it shallowly or chose to not mess with it.
One of the great things about using container linux was that theoretically stage1-kvm-qemu.aci
was available from the start as a stage 1, without me having to do any image building myself, according to the docs. Also, annotation-based configuration suport landed which made it super easy to use.
NOTE If you’re trying to reproduce this after the publish time of this post, you should almost certainly be using rktlet
and probably won’t have this problem.
While following the rktnetes guide I found that access to the rkt
API endpoint was required to be present for the kublet
, but I found that it wasn’t started. It took me a little bit to find the documentation for the API service, but after working through that I was able to get it running by adding a systemd
Service to ensure it was running.
Unfortunately, right after getting everything working (switching from containerd
, getting rkt
set up with alternate stage1 support), I ran into issues with rook and the other problems with rktnetes (which has been deprecated in favor of rktlet
these days), and decided to switch back to containerd
almost immediately. Looks like in the end I’ll be going back to trying to make containerd
(and it’s untrusted workload) work.
One of these days I’ll find some time to spin up a completely new cluster and give rktlet
a fair shot and see how it does, but that’s probably far in the future.