Running Untrusted Workloads K8s Container Linux Part 2

Second part of my adventures and mistakes in trying to get untrusted workloads on k8s (with container linux underneath)

vados

6 minute read

CoreOS logo + rkt logo + Kubernetes logo + QEMU logo

tl;dr - I came across rkt’s ability to use alternate stage 1s, got it working, but then abandoned it due to problems getting rook running and a lack of CRI compatability (at the time), before even trying to compare with the QEMU-in-a-pod approach. These notes are very old (I don’t use container linux for my cluster anymore) and I can’t believe I quit so quickly without more thorough investigation but evidently I did so there’s not much to see in this post, but maybe it will serve as a starting point for others.

This blog post is part of a multi-part series:

  1. Part 1 - Hacking my way to container+VM inception
  2. Part 2 - rkt alternate stage1 experiments (this post)
  3. Part 3 - giving kata-runtime a fair shake

One of the thing that fell out during my experimentation in the first post was that rkt could actually use alternative stage1 images to start containers (github docs) (if you’re new to rkt, and are not sure what “stage 1” means, check out the ACI implementors docs). rkt was the first runtime I used for kubernetes and I was pretty excited that it could possibly solve my untrusted container goals. I was originally drawn to the approach because since rkt was well supported by CoreOS, I wouldn’t need to install runv (and consequently hyperstart, which I failed at). My goal was to eventuallyc ompare the stress-ng results between the QEMU-in-a-pod and rkt approach, but (spoiler alert?) I quit relatively early on due to incompatabilities of rktwith CRI and rook, a took I was (and still am) really fond of.

Thinking back to when these notes were made, rkt seemed to not be getting much love from the community, which factored into why I gave up so quickly. I seem to have a thing for supporting underrated and/or dying tech. When I look at the rkt project on Github (in particular their releases), I think I’m actually wrong about rkt’s usage levels. While rktnetes support was deprecated in Kubernetes 1.10.0, rktlet was probably part of the reason, they’d found a new better way forward with integrating rkt and kubernetes. To this day though, rktlet doesn’t seem to have a stable release (at the time of this post only v0.1.0 is released, with basic support).

Either way, let’s dive into what happened while I tried (and succeeded) reconfiguring my container linux box to use rkt and an alternate stage1.

Switching the container runtime from contianerd to rkt

So starting from where I was last time, there were two big steps to take:

  1. Reconfigure the machine to use rkt instead of containerd
  2. Configure rkt to use an alternative stage 1 (which would use a VM image)

For the first bit, I actually got it working relatively quickly, then plucked the process out of these notes and wrote about it, which is great news because it makes this article much shorter! To make things even easier, CoreOS has some great documentation on using rkt with kubernetes (I’m unsure how long the page will be around, if that link dies check github) – that documentation was basically all I needed to feel like I knew enough to get started.

Running into issues with Rook and CRI compatability

After doing everything in the CoreOS alternative stage1 documentation, it turned out that actually getting rkt up and running as the primary runtime and switching stage1s wasn’t the biggest problem. I discovered that rook, a kubernetes operator for managing Ceph and using it to create PersistentVolumeClaims actually doesn’t work on rkt due to issues inside rkt itself (there are also two linked rkt issues, seccomp filters and runtime spec support). I actually posted that github issue, did some digging, was helped out by others in the thread, and decided basically right then and there that I couldn’t use rkt going forward. I found containerd so easy to use and small/focused where it needed to be compared to rkts relative complexity. I really need to find some time to go back and explore rktlet and see if I was wrong about rkt as a container runtime.

I’m roughly 80% sure that I gave up on rkt and probably for the wrong reasons – it seemed rkt’s default-provided kernel permission capabilities contrasted with what rook expected to be available, which is what made it not work. At first I thought the solution was easy – just add the necessary capabilities to the securityContext of the Deployment for the operator, but it’s not that simple. The problem is that in addition to the operator itself, a bucnh of added capabilities needed to be injected/patched on to anything (Pods, DaemonSets, StatefulSets, etc) that operator created. These days I can imagine myself properly solving the problem with PodSecurityPolicy objects, but when I was trying all these out I believe PSPs were still relatively new and I either tried it shallowly or chose to not mess with it.

One of the great things about using container linux was that theoretically stage1-kvm-qemu.aci was available from the start as a stage 1, without me having to do any image building myself, according to the docs. Also, annotation-based configuration suport landed which made it super easy to use.

DEBUG: Ensuring the RKT API Endpoint is started

NOTE If you’re trying to reproduce this after the publish time of this post, you should almost certainly be using rktlet and probably won’t have this problem.

While following the rktnetes guide I found that access to the rkt API endpoint was required to be present for the kublet, but I found that it wasn’t started. It took me a little bit to find the documentation for the API service, but after working through that I was able to get it running by adding a systemd Service to ensure it was running.

To be continued

Unfortunately, right after getting everything working (switching from containerd, getting rkt set up with alternate stage1 support), I ran into issues with rook and the other problems with rktnetes (which has been deprecated in favor of rktlet these days), and decided to switch back to containerd almost immediately. Looks like in the end I’ll be going back to trying to make containerd (and it’s untrusted workload) work.

One of these days I’ll find some time to spin up a completely new cluster and give rktlet a fair shot and see how it does, but that’s probably far in the future.

Did you find this read beneficial? Send me questions/comments/clarifciations.
Want my expertise on your team/project? Send me interesting opportunities!