tl;dr - After struggling through settting up containerd
’s untrusted workload runtime, building a static kata-runtime
and a neutered-but-static qemu-system-x86_64
to use, I succeeded in hooking up containerd
to use kata-runtime
only to fail @ the last step since the pods that were created ran qemu
properly but couldn’t be communicated with and would immediately make k8s node they were running on move to the NotReady
due to PLEG errors. I did a lot of work to partially succeed (if you want to run QEMU on container linux, this is the post for you), but hopefully these notes will help someone else out. Also, you’ll hate me for this, but I didn’t make a Dockerfile and a gitlab project like I did for static-kata-runtime, so basically the operational knowledge is scatterred throughout this post. Basically, if you’re here from a google search from trying to make things work, welcome.
If you're actually trying to install kata-containers, use kata-deploy -- they've done what I did here (except successfully), and it's *super* easy to install, by utilizing non-privileged DaemonSets and Node tagging on your cluster. It's super slick. It seems reasonably likely to work on container linux, given that they've statically compiled qemu
(and kata-runtime
are pretty easy to statically install because golang) -- YMMV!
This blog post is part of a multi-part series:
kata-runtime
a fair shake (this post)After running into the issues mentioned in part two with using rkt
as my primary runtime, I decided to go ahead and give containerd
(with it’s support for untrusted workloads) along with kata-runtime
a try. This exploration actually happened during Part 2 bug I figured it was so far off in left field that I needed to make it a separate post. If you look back at part one, proper runtime support was “Method 2”, and is still the most desirable as it is the most properly integrated method. I originally tried to use “Method 2” by using just runv
, but couldn’t install hyperstart
, but this time, we’re going to use kata-containers
, which is the merging of the Intel clear-containers
and runv
projects.
A refreshing surprise was that kata-containers
was very easily buildable, thanks probably in large part to the fact that it’s a golang application! While there wasn’t a dedicated build command for fully static builds, the kata-containers
project’s build process was pretty easy to follow, so I hacked it and filed an issue to make it officially easier to do static builds. To elaborate, the usual tricks of making sure to do a Golang static build and build it in Alp[ine Linux (using musl libc) were all I needed to get a binary that I could SCP right over to the container linux box. I didn’t get to test it right away due to not having any OCI bundles on-hand – I actually had a hard time finding any to use for testing either so it was kind of inconvenient – I opted instead to test by spinning up the whole cluster and configuring containerd
to actualy use insecure runtimes, which is kind of insane (due to how large a gamble it is and the amount of complexity from other systems I’m pulling in before I can validate the first step) but whatever.
At this point I’d been actually been using rkt
as my runtime after the experimentation in part two.I consider rkt
to be the safer runtime when compared to both docker
and containerd
as it has much safer defaults around privileges for example (which ironically is exactly why rook
didn’t work). However, for this experiment, I’d need to switch back to my containerd
setup, which meant rebuilding the machine with CoreOS’s ignition. I’ve also previously written about the ignition config, so feel free to check that out.
kata-runtime
After doing a bunch of experimentation (which I’ll elide here) getting kata-runtime
to build properly, I wanted to standardize the process, and building a simple container was the perfect, easy way to do that. By writing a little Dockerfile
, I could do the static build in an easily portable container which when finished would contain the binaries needed by container linux.
I say it often on the blog, but I absolutely love Gitlab. One of the many reasons I love Gitlab (in this instance) is that they support releases, which makes my life way easier, for versioning and making the static binaries available – unfortunately it’s not available in Gitlab Community Edition. I do pay for a higher tier of Gitlab, but for now I can easily replicate the functionality by just making release branches that only contain the build binary. It’s a bit of a hack, but it will make it easier to download the releases properly.
The result of that work was the static-kata-runtime
gitlab repository! One of the more interesting of codeto me is the Makefile
, because it encapsulates how to do this releases hack:
.PHONY: check-tool-docker image retrieve-artifacts release
all: image retrieve-artifacts
VERSION := 1.0.0
DOCKER := $(shell command -v docker 2> /dev/null)
IMAGE_NAME := static-kata-container
check-tool-docker:
ifndef DOCKER
$(error "`docker` is not available please install docker (https://docs.docker.com/install/)")
endif
image: check-tool-docker
$(DOCKER) build -t $(IMAGE_NAME) .
retrieve-artifacts: check-tool-docker
$(DOCKER) run --rm --entrypoint cat $(IMAGE_NAME) /kata-runtime > kata-runtime
$(DOCKER) run --rm --entrypoint cat $(IMAGE_NAME) /kata-runtime.sha512 > kata-runtime.sha512
# The below target is only to be run on version (`vX.X.X`) branches -
# command runs with the expectation that the resulting binary will be downloaded over HTTPS from some source-control mechanism
release: check-tool-docker image retrieve-artifacts
rm -rf Dockerfile Makefile README.md Makefile
Of course, the (likely more interesting) bits are in the Dockerfile
as that actually contains everything you need to statically build the binary.
Now that we have a static kata-runtime
binary, it’s time to try and use it with containerd
!
containerd
to use the alternate runtimeSupport for running untrusted workloads on a CRI-compliant runtime of your choice was added to containerd
in version v1.1.0. It’s supported by annotations and looks pretty easy and simple to use, requiring only an update to containerd
’s configuration. Up until now I actually haven’t had to configure containerd
at all, and didn’t even have a config file that I was using which meant i needed to create one that matched what containerd
was expecting. The default configuration file can be found @ /etc/containerd/config.toml
(according to the documentation), so that is where I started. Unfortunately I didn’t keep a spare copy of what the configuration file looked like when all was said and done (I also remember having some slight trouble figuring out what it should look like, but hopefully it won’t be too hard.
Eventually the filewoudl have made a great addition to the ignition configuration I used for the machine, but in the moment I just did it live and added the file to the box itself. Either way, it’s almost too easy to get containerd
to start using kata-runtime
… Let’s see if we can test it all out.
Theoreticlaly with annotation-based support, I should be able to start two otherwise identical pods, and if I ensure one has the appropriate annotation, one will be in a regular container (sandboxed/isolated process) and the other will be in a full-blown qemu
-powered VM. If all goes well (SPOILER: it didn’t really), I should be able to SSH into either of them, using the usual kubectl exec -it <pod> /bin/sh
and see different outputs for a command like uname -a
.
Here’s the YAML config for the pod (the one with the annotation):
---
apiVersion: v1
kind: Pod
metadata:
name: vm-shell
annotations:
io.kubernetes.cri.untrusted-workload: "true"
spec:
containers:
- name: shell
image: alpine:3.7
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 2
memory: 2Gi
requests:
cpu: 2
memory: 2Gi
command: [ "/bin/ash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
After running the usual kubectl run
, things seemed to start fine, but I was greeted with an error when I checked the kubectl describe
output for the pod:
Normal Scheduled 25s default-scheduler Successfully assigned vm-shell-pod to localhost
Normal SuccessfulMountVolume 25s kubelet, localhost MountVolume.SetUp succeeded for volume "default-token-j9x2h"
Warning FailedCreatePodSandBox 11s (x3 over 25s) kubelet, localhost Failed create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for untrusted workload is configured
At this point I realized that my configuration for containerd
must have been wrong – the annotation was being read, but containerd
(underneath kubernetes) couldn’t find the sandboxd runtime that I specified. This meant that despite putting what I thought was the right TOML (again sorry I can’t reproduce it here) @ /etc/containerd/config.toml
, the configuration still wasn’t being picked up. I quickly realized that I wasn’t also correcting the systemd service file I was using for containerd
itself which was @ /etc/systemd/system/containerd.service.d
! After some fiddling with the service parameters I made some progress, leading to my next batch of error output, this time from the containerd
service itself:
May 27 14:25:10 localhost containerd[25158]: time="2018-05-27T14:25:10Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:vm-shell-pod,Uid:40ee2d9a-61b8-11e8-8de7-8c89a517d15e,Namespace:default,Attempt:0,} failed, error" error="failed to start sandbox container:
failed to create containerd task: OCI runtime create failed: Cannot find usable config file (config file "/etc/kata-containers/configuration.toml" unresolvable: file /etc/kata-containers/configuration.toml does not exist, config file "/usr/share/defaults/kata-containers/configuration.toml" unresolvable: file /usr/share/defaults/kata-containers/configuration.toml does not exist): unknown"
OK, so it looks like now Kuberentes is calling containerd
correctly, containerd
is finding the right sandbox runtime (kata-runtime
), but I haven’t configured how to call kata-runtime
properly – in particular, files that configure kata-runtime
which would have existed if I’d installed kata-runtime
on any other distribution are missing. I didn’t have a good idea what these configurations were supposed to look like, but luckily I was able to find a very long example in the kata-conainers/runtime
repo, a file called configuration.toml.in. This file is input to some transformation process but it was clear enough for me to use to figure out what needed to be configured. I crafted as minimal a configuration as I could (sorry, I didn’t save this either :( ), and tried again.
I forgot to get the statically built qemu
binaries from multiarch/qemu-user-static
. In particular, the file I was looking for was `qemu-x86_64-static.
NOTE FROM THE FUTURE I’m still dealing with the wrong qemu
binaries here, I’m dealing with “user” qemu
binaries (i.e. qemu-x86_64-static
) when I should be using qemu-system-x86_64
(-static
).
With the minimal kata-runtime
config in place and the missing qemu
binary installed I tried again:
May 27 14:38:05 localhost containerd[25158]: time="2018-05-27T14:38:05Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:vm-shell-pod,Uid:40ee2d9a-61b8-11e8-8de7-8c89a517d15e,Namespace:default,Attempt:0,} failed, error" error="failed to start sandbox container: failed to create containerd task: OCI runtime create failed: /etc/kata-containers/configuration.toml: file /usr/share/kata-containers/vmlinuz.container does not exist: unknown"
Unfortunately I needed to specify a few configuration options that I was hoping to be able to leave blank – in particular I needed to generate a rootfs image and a kernel image. I found and started using the clearcontainers/osbuilder
and set out to start using it, but was pretty put off by the amount of software I seemingly needed to add to my system to build everything. Luckily, they have a docker-based appraoch at the bottom of the README so that’s what I used. Here’s what my work directory looked like at the start (I think I ran make
or something):
$ ls workdir/
clear-dnf.conf container.img image_info img rootfs
Here’s what the commands I ran looked like:
$ mkdir /tmp/image-builder && cd /tmp/image-builder
$ export USE_DOCKER=true
$ scripts/kernel_builder.sh prepare
$ sudo mv /tmp/image-builder/linux /tmp/image-builder/workdir/linux # the script is written just slightly incorrectly I think, need to copy the linux folder into workdir to make sure container can see it
$ sudo -E make rootfs # produces 'rootfs' (folder), along with 'workdir'
$ mv linux workdir # make sure that the linux that was retrieved is in the workdir folder fo rhte kernel generation step to use
$ edit scripts/Dockerfile to include `elfutils-libelf-devel` in the `dnf install` @ the start
$ sudo -E make kernel # produces 'vmlinuz.container'
$ sudo -E make image # produces container.img
OK, so now I have a kernel image (vmlinuz.container
), and the image (container.img
, which I assumed contained a root fs), I copied them over to the server. I was a little unclear on the difference between the two files and file
came to my rescue:
$ file workdir/vmlinuz.container
workdir/vmlinuz.container: Linux kernel x86 boot executable bzImage, version 4.14.22 (root@) #1 SMP Sun May 27 15:05:48 UTC 2018, RO-rootFS, swap_dev 0x5, Normal VGA
$ file workdir/vmlinux.container
workdir/vmlinux.container: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=2b4cc92cdc5fa2cc64f9421253265c628319c3e8, not stripped
$ file workdir/container.img
workdir/container.img: DOS/MBR boot sector; partition 1 : ID=0xee, start-CHS (0x0,0,2), end-CHS (0x3ff,254,63), startsector 1, 262143 sectors, extended partition table (last)
As you can see, vmlinuz.container
is definitely the kernel, at this point I wasn’t absolutely sure how I was supposed to use vmlinux.container
, but a quick check of the documentation revealed that one is just a compressed version of the other (vmlinuz
is a compressed form of vmlinux.container
).
With these files copied over and put in the right place in the configuration for kata-runtime
I was able to make some more progress (logs are from containerd
I believe):
May 27 15:15:55 localhost kata-runtime[16254]: time="2018-05-27T15:15:55.734367385Z" level=error msg="Invalid config type" command=create name=kata-runtime pid=16254 source=runtime
May 27 15:16:11 localhost kata-runtime[16465]: time="2018-05-27T15:16:11.733472292Z" level=info msg="loaded configuration" command=create file=/etc/kata-containers/configuration.toml format=TOML name=kata-runtime pid=16465 source=runtime
May 27 15:16:11 localhost kata-runtime[16465]: time="2018-05-27T15:16:11.733608005Z" level=info arguments="\"create --bundle /run/containerd/io.containerd.runtime.v1.linux/k8s.io/ac17e9aeb57c0bb68470e6ac32e672083ec4543be11e91bd6b55796de6c0410d --pid-file /run/containerd/io.containerd.runtime.v1.linux/k8s.io/ac17e9aeb57c0bb68470e6ac32e672083ec4543be11e91bd6b55796de6c0410d/init.pid ac17e9aeb57c0bb68470e6ac32e672083ec4543be11e91bd6b55796de6c0410d\"" command=create commit=9fb0b337ef997079b304fe895dacd5d96d6f2fb6-dirty name=kata-runtime pid=16465 source=runtime version=1.0.0
May 27 15:16:11 localhost kata-runtime[16465]: time="2018-05-27T15:16:11.746939364Z" level=warning msg="shortening QMP socket name" arch=amd64 name=kata-runtime new-name=mon-35406192-727e-418e-ba72-c6 original-name=mon-35406192-727e-418e-ba72-c67c24daf587 pid=16465 source=virtcontainers subsystem=qemu
May 27 15:16:11 localhost kata-runtime[16465]: time="2018-05-27T15:16:11.746987164Z" level=warning msg="shortening QMP socket name" arch=amd64 name=kata-runtime new-name=ctl-35406192-727e-418e-ba72-c6 original-name=ctl-35406192-727e-418e-ba72-c67c24daf587 pid=16465 source=virtcontainers subsystem=qemu
May 27 15:16:11 localhost kata-runtime[16465]: time="2018-05-27T15:16:11.747126345Z" level=error msg="Create new sandbox failed" arch=amd64 error="Invalid config type" name=kata-runtime pid=16465 sandbox-id=ac17e9aeb57c0bb68470e6ac32e672083ec4543be11e91bd6b55796de6c0410d sandboxid=ac17e9aeb57c0bb68470e6ac32e672083ec4543be11e91bd6b55796de6c0410d source=virtcontainers subsystem=sandbox
So again, this was a problem with the configuration (this time, of kata-runtime
, but everything seemed right after double-checking so I enabled the runtime.debug
option and got a LOT more information out of containerd
:
May 27 15:22:20 localhost kata-runtime[23186]: time="2018-05-27T15:22:20.728528994Z" level=info msg="loaded configuration" command=create file=/etc/kata-containers/configuration.toml format=TOML name=kata-runtime pid=23186 source=runtime
May 27 15:22:20 localhost kata-runtime[23186]: time="2018-05-27T15:22:20.728653637Z" level=info arguments="\"create --bundle /run/containerd/io.containerd.runtime.v1.linux/k8s.io/e4218ad3719bf72259986a23459ef8f164f048607dbd5db900c4fb7a4a46abc8 --pid-file /run/containerd/io.containerd.runtime.v1.linux/k8s.io/e4218ad3719bf72259986a23459ef8f164f048607dbd5db900c4fb7a4a46abc8/init.pid e4218ad3719bf72259986a23459ef8f164f048607dbd5db900c4fb7a4a46abc8\"" command=create commit=9fb0b337ef997079b304fe895dacd5d96d6f2fb6-dirty name=kata-runtime pid=23186 source=runtime version=1.0.0
May 27 15:22:20 localhost kata-runtime[23186]: time="2018-05-27T15:22:20.728751825Z" level=debug msg="converting /run/containerd/io.containerd.runtime.v1.linux/k8s.io/e4218ad3719bf72259986a23459ef8f164f048607dbd5db900c4fb7a4a46abc8/config.json" name=kata-runtime pid=23186 source=virtcontainers/oci
May 27 15:22:20 localhost kata-runtime[23186]: time="2018-05-27T15:22:20.729637389Z" level=debug msg="container rootfs: /run/containerd/io.containerd.runtime.v1.linux/k8s.io/e4218ad3719bf72259986a23459ef8f164f048607dbd5db900c4fb7a4a46abc8/rootfs" name=kata-runtime pid=23186 source=virtcontainers/oci
May 27 15:22:20 localhost kata-runtime[23186]: time="2018-05-27T15:22:20.729847179Z" level=debug msg="Creating bridges" arch=amd64 name=kata-runtime pid=23186 source=virtcontainers subsystem=qemu
May 27 15:22:20 localhost kata-runtime[23186]: time="2018-05-27T15:22:20.72989443Z" level=debug msg="Creating UUID" arch=amd64 name=kata-runtime pid=23186 source=virtcontainers subsystem=qemu
May 27 15:22:20 localhost kata-runtime[23186]: time="2018-05-27T15:22:20.741793676Z" level=debug msg="Disable nesting environment checks" arch=amd64 inside-vm=false name=kata-runtime pid=23186 source=virtcontainers subsystem=qemu
May 27 15:22:20 localhost kata-runtime[23186]: time="2018-05-27T15:22:20.741898982Z" level=warning msg="shortening QMP socket name" arch=amd64 name=kata-runtime new-name=mon-f7c46de3-533d-4ee9-84a3-bb original-name=mon-f7c46de3-533d-4ee9-84a3-bb7489376242 pid=23186 source=virtcontainers subsystem=qemu
May 27 15:22:20 localhost kata-runtime[23186]: time="2018-05-27T15:22:20.741940933Z" level=warning msg="shortening QMP socket name" arch=amd64 name=kata-runtime new-name=ctl-f7c46de3-533d-4ee9-84a3-bb original-name=ctl-f7c46de3-533d-4ee9-84a3-bb7489376242 pid=23186 source=virtcontainers subsystem=qemu
May 27 15:22:20 localhost kata-runtime[23186]: time="2018-05-27T15:22:20.742078196Z" level=error msg="Create new sandbox failed" arch=amd64 error="Invalid config type" name=kata-runtime pid=23186 sandbox-id=e4218ad3719bf72259986a23459ef8f164f048607dbd5db900c4fb7a4a46abc8 sandboxid=e4218ad3719bf72259986a23459ef8f164f048607dbd5db900c4fb7a4a46abc8 source=virtcontainers subsystem=sandbox
May 27 15:22:20 localhost kata-runtime[23186]: time="2018-05-27T15:22:20.742123471Z" level=error msg="Invalid config type" command=create name=kata-runtime pid=23186 source=runtime
After reading through the output for a while, it looked like most options were actually just fine, which made me even more confused. While thinking about it I did figure out that it’s actually either/or the initrd or the image, thanks to a random commit from the kata-containers/runtime
code.
While trying to figure out what other configuration I was missing, I searched on Github to try and find out if the agent configuration was required, and found two places of interest: kata_agent.go
which made it very clear that agent configuration was indeed required – which makes sense, given that kata-agent
is the thing that helps your VM talk to the outside world (I didn’t know that early on).
The config for kata-agent
actually gets generated for you, according to the kata-containers
developer guide – to get it I just had to head back to my previous work for static-kata-runtime
and docker run -it <container> /bin/bash
(and complete the make install
step) to generate the file, according to the static build. At this point I realized that not understanding how kata-agent
interacted with the rest of the things was probably a big red flag that I didn’t do enough research so I wondered if I’d have the same problems with the proxy
(kata-proxy
) and shim
(kata-shim
) configs but figured I’d try without them first, to at least see some more progress.
After some more config fiddling I made some more progress, kata-runtime
was actually trying to run qemu
now! Here’s the output from containerd
, filtered with journalctl -xef -u containerd | grep "kata-container"
:
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.747857731Z" level=info msg="loaded configuration" command=create file=/etc/kata-containers/configuration.toml format=TOML name=kata-runtime pid=10639 source=runtime
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.747986078Z" level=info arguments="\"create --bundle /run/containerd/io.containerd.runtime.v1.linux/k8s.io/52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07 --pid-file /run/containerd/io.containerd.runtime.v1.linux/k8s.io/52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07/init.pid 52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07\"" command=create commit=9fb0b337ef997079b304fe895dacd5d96d6f2fb6-dirty name=kata-runtime pid=10639 source=runtime version=1.0.0
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.748079309Z" level=debug msg="converting /run/containerd/io.containerd.runtime.v1.linux/k8s.io/52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07/config.json" name=kata-runtime pid=10639 source=virtcontainers/oci
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.748941795Z" level=debug msg="container rootfs: /run/containerd/io.containerd.runtime.v1.linux/k8s.io/52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07/rootfs" name=kata-runtime pid=10639 source=virtcontainers/oci
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.749137875Z" level=debug msg="Creating bridges" arch=amd64 name=kata-runtime pid=10639 source=virtcontainers subsystem=qemu
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.749177883Z" level=debug msg="Creating UUID" arch=amd64 name=kata-runtime pid=10639 source=virtcontainers subsystem=qemu
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.760881384Z" level=debug msg="Disable nesting environment checks" arch=amd64 inside-vm=false name=kata-runtime pid=10639 source=virtcontainers subsystem=qemu
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.760981229Z" level=warning msg="shortening QMP socket name" arch=amd64 name=kata-runtime new-name=mon-a66a34e8-9787-4041-b468-e6 original-name=mon-a66a34e8-9787-4041-b468-e634d94a15fe pid=10639 source=virtcontainers subsystem=qemu
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.761022484Z" level=warning msg="shortening QMP socket name" arch=amd64 name=kata-runtime new-name=ctl-a66a34e8-9787-4041-b468-e6 original-name=ctl-a66a34e8-9787-4041-b468-e634d94a15fe pid=10639 source=virtcontainers subsystem=qemu
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.761088153Z" level=debug msg="Could not retrieve anything from storage" arch=amd64 name=kata-runtime pid=10639 source=virtcontainers subsystem=kata_agent
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.761561131Z" level=info msg="Attaching virtual endpoint" arch=amd64 name=kata-runtime pid=10639 source=virtcontainers subsystem=network
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.76751231Z" level=info msg="Starting VM" arch=amd64 name=kata-runtime pid=10639 sandbox-id=52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07 source=virtcontainers subsystem=sandbox
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.767684044Z" level=info msg="Adding extra file [0xc42000ec98 0xc42000eca0 0xc42000eca8 0xc42000ecb0 0xc42000ecb8 0xc42000ecc0 0xc42000ecc8 0xc42000ecd0 0xc42000ec58 0xc42000ec60 0xc42000ec68 0xc42000ec70 0xc42000ec78 0xc42000ec80 0xc42000ec88 0xc42000ec90]" arch=amd64 name=kata-runtime pid=10639 source=virtcontainers subsystem=qmp
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.767749592Z" level=info msg="launching qemu with: [-name sandbox-52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07 -uuid a66a34e8-9787-4041-b468-e634d94a15fe -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/vc/sbs/52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07/mon-a66a34e8-9787-4041-b468-e6,server,nowait -qmp unix:/run/vc/sbs/52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07/ctl-a66a34e8-9787-4041-b468-e6,server,nowait -m 2048M,slots=2,maxmem=25121M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2 -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/sbs/52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/home/core/clear-container-image/container.img,size=134217728 -device virtio-scsi-pci,id=scsi0 -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/sbs/52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07/kata.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-kataShared,mount_tag=kataShared -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=da:c3:4f:c8:08:8c,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /home/core/clear-container-image/vmlinuz.container -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 quiet systemd.show_status=false panic=1 initcall_debug nr_cpus=4 ip=::::::52891c2ef940064ab2b1652e2015f9fb546bc814ea397c4cc025cbbd08f9af07::off:: init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket -smp 1,cores=1,threads=1,sockets=1,maxcpus=4]" arch=amd64 name=kata-runtime pid=10639 source=virtcontainers subsystem=qmp
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.768894409Z" level=error msg="Unable to launch qemu: exit status 1" arch=amd64 name=kata-runtime pid=10639 source=virtcontainers subsystem=qmp
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.76897353Z" level=error msg="qemu: unknown option 'name'\n" arch=amd64 name=kata-runtime pid=10639 source=virtcontainers subsystem=qmp
May 27 15:40:21 localhost kata-runtime[10639]: time="2018-05-27T15:40:21.76905962Z" level=error msg="qemu: unknown option 'name'\n" command=create name=kata-runtime pid=10639 source=runtime
After this output, it looks like the command being sent to qemu
is unknown to begin with – my first thought was that maybe the proxy
was needed in actuality, so I decided to just stop and go back to static-kata-runtime
and ensure the proxy
and shim
were also built, installed on the machine, and configured properly.
kata-proxy
and kata-shim
I needed to change the static-kata-runtime
repo I set up to include static builds for kata-proxy
and kata-shim
, as they seemed to be needed. I did the same sort of hacks for kata-runtime
, with some minor modifications due to -ldflags
already being used in teh commands. Command looked something like this:
go build -o kata-proxy -ldflags "-linkmode external -extldflags '-static' -X main.version=1.0.0-a69326b63802952b14203ea9c1533d4edb8c1d64-dirty"
After doing that, the ldd output is empty (as it should be for a statically built program):
~/go/src/github.com/kata-containers/shim # ldd /root/go/src/github.com/kata-containers/shim/kata-shim
ldd (0x7f5921fe4000)
After making sure the kata-shim
was built statically as well I copied both binaries (kata-proxy
and kata-shim
) to the actual machine and tried again.
kata-shim
& kata-proxy
So at this point, I’m trying to back into the configuration, uncomment all the commented lines, and start pointing at the necessary kata-proxy
and kata-shim
binaries in /opt/bin
(in container linux a bunch of system directories are read-only). Upon doing this, the errors didn’t change, and I started looking into the qemu
setup only to realize that the wrong commands were getting passed to qemu
, which seemed weird, so I started taking a look at qemu
itself. First I checked the version (2.12.0, which was recent at the time), and confirmed with the qemu
documentation that the arguments being passed were valid, but eventually I started wondering if I had the right binary in the first place (obtaining it did seem too easy):
$ which qemu
/opt/bin/qemu
$ qemu --version
qemu-x86_64 version 2.12.0 (qemu-2.12.0-1.fc29)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
One thing I tried was to open up the CoreOS toolbox (using the toolbox
command), and run dnf install -y qemu
, and it downloaded a TON of stuff that I didn’t realize was needed. Considering the qemu
binary that I downloaded was <5MB I assumed there’s no way I had gotten the right thing. This is the point at which I actually realized that I had downloaded the wrong qemu
– I had the qemu
user binary but I needed the system binary – i.e. qemu-system-x86_64
. I also found a really informative post about everything.
At this point, you should already know what’s coming next – I’m going to try and compile qemu-system-x86_64
from source.
qemu-system-x86_64
from source It was great to see that Alpine already had support for QEMU – this made me hopeful that a static binary (that used musl libc) was possible, if it wasn’t even necessarily easy to get. First I installed the qemu
(userland) package on alpine, and saw what all the userland stuff looked like. That stuff is actually required by the qemu
system packages, which is what I was looking for. First step is to take a look what the installed binary actually calls upon:
# file /usr/bin/qemu-system-x86_64
/usr/bin/qemu-system-x86_64: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, stripped
# ldd /usr/bin/qemu-system-x86_64
/lib/ld-musl-x86_64.so.1 (0x7ff3c0292000)
libepoxy.so.0 => /usr/lib/libepoxy.so.0 (0x7ff3bef1f000)
libgbm.so.1 => /usr/lib/libgbm.so.1 (0x7ff3bed12000)
libz.so.1 => /lib/libz.so.1 (0x7ff3beafb000)
libaio.so.1 => /usr/lib/libaio.so.1 (0x7ff3be8f9000)
libnfs.so.11 => /usr/lib/libnfs.so.11 (0x7ff3be6bd000)
libcurl.so.4 => /usr/lib/libcurl.so.4 (0x7ff3be454000)
libssh2.so.1 => /usr/lib/libssh2.so.1 (0x7ff3be22c000)
libbz2.so.1 => /usr/lib/libbz2.so.1 (0x7ff3be01f000)
libpixman-1.so.0 => /usr/lib/libpixman-1.so.0 (0x7ff3bdd8f000)
libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x7ff3bdb37000)
libasound.so.2 => /usr/lib/libasound.so.2 (0x7ff3bd844000)
libvdeplug.so.3 => /usr/lib/libvdeplug.so.3 (0x7ff3bd63e000)
libpng16.so.16 => /usr/lib/libpng16.so.16 (0x7ff3bd410000)
libjpeg.so.8 => /usr/lib/libjpeg.so.8 (0x7ff3bd1b1000)
libnettle.so.6 => /usr/lib/libnettle.so.6 (0x7ff3bcf7d000)
libgnutls.so.30 => /usr/lib/libgnutls.so.30 (0x7ff3bcc45000)
liblzo2.so.2 => /usr/lib/liblzo2.so.2 (0x7ff3bca28000)
libsnappy.so.1 => /usr/lib/libsnappy.so.1 (0x7ff3bc81f000)
libspice-server.so.1 => /usr/lib/libspice-server.so.1 (0x7ff3bc526000)
libusb-1.0.so.0 => /usr/lib/libusb-1.0.so.0 (0x7ff3bc310000)
libusbredirparser.so.1 => /usr/lib/libusbredirparser.so.1 (0x7ff3bc109000)
libglib-2.0.so.0 => /usr/lib/libglib-2.0.so.0 (0x7ff3bbe18000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7ff3bbc06000)
libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7ff3c0292000)
libexpat.so.1 => /usr/lib/libexpat.so.1 (0x7ff3bb9e5000)
libwayland-client.so.0 => /usr/lib/libwayland-client.so.0 (0x7ff3bb7d7000)
libwayland-server.so.0 => /usr/lib/libwayland-server.so.0 (0x7ff3bb5c6000)
libdrm.so.2 => /usr/lib/libdrm.so.2 (0x7ff3bb3b6000)
libssl.so.44 => /lib/libssl.so.44 (0x7ff3bb16a000)
libcrypto.so.42 => /lib/libcrypto.so.42 (0x7ff3badc4000)
libp11-kit.so.0 => /usr/lib/libp11-kit.so.0 (0x7ff3bab68000)
libunistring.so.2 => /usr/lib/libunistring.so.2 (0x7ff3ba804000)
libtasn1.so.6 => /usr/lib/libtasn1.so.6 (0x7ff3ba5f4000)
libhogweed.so.4 => /usr/lib/libhogweed.so.4 (0x7ff3ba3c1000)
libgmp.so.10 => /usr/lib/libgmp.so.10 (0x7ff3ba15d000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7ff3b9e0b000)
libcelt051.so.0 => /usr/lib/libcelt051.so.0 (0x7ff3b9bfe000)
libopus.so.0 => /usr/lib/libopus.so.0 (0x7ff3b99aa000)
libgio-2.0.so.0 => /usr/lib/libgio-2.0.so.0 (0x7ff3b963a000)
libgobject-2.0.so.0 => /usr/lib/libgobject-2.0.so.0 (0x7ff3b93f9000)
libsasl2.so.3 => /usr/lib/libsasl2.so.3 (0x7ff3b91e0000)
libpcre.so.1 => /usr/lib/libpcre.so.1 (0x7ff3b8f85000)
libintl.so.8 => /usr/lib/libintl.so.8 (0x7ff3b8d77000)
libffi.so.6 => /usr/lib/libffi.so.6 (0x7ff3b8b6f000)
libgmodule-2.0.so.0 => /usr/lib/libgmodule-2.0.so.0 (0x7ff3b896b000)
libmount.so.1 => /lib/libmount.so.1 (0x7ff3b8720000)
libblkid.so.1 => /lib/libblkid.so.1 (0x7ff3b84dc000)
libuuid.so.1 => /lib/libuuid.so.1 (0x7ff3b82d6000)
Believe it or not, this list was actually encouraging – it was way smaller than when I the same exploration on Fedora (inside the container launched by the CoreOS toolbox
command). Time to try and statically build this beast.
qemu-system-x86_64
from tarball sourceMy first instict was to try to build the binary starting from the qemu
source code. Here’s what my shell-based exploration was like
# pull required packages on alpine
$ apk add --update python alpine-sdk linux-headers zlib-dev glib-dev pixman-dev
# run configure and make in the qemu source code directory
./configure && make # dont' forget -j to speed things up
This process failed, with the output below:
/root/qemu-2.12.0/linux-user/syscall.c:6542:22: error: 'F_EXLCK' undeclared here (not in a function)
TRANSTBL_CONVERT(F_EXLCK),
^
/root/qemu-2.12.0/linux-user/syscall.c:6537:51: note: in definition of macro 'TRANSTBL_CONVERT'
#define TRANSTBL_CONVERT(a) { -1, TARGET_##a, -1, a }
^
/root/qemu-2.12.0/linux-user/syscall.c:6543:22: error: 'F_SHLCK' undeclared here (not in a function)
TRANSTBL_CONVERT(F_SHLCK),
^
/root/qemu-2.12.0/linux-user/syscall.c:6537:51: note: in definition of macro 'TRANSTBL_CONVERT'
#define TRANSTBL_CONVERT(a) { -1, TARGET_##a, -1, a }
^
/root/qemu-2.12.0/linux-user/syscall.c: In function 'target_to_host_sigevent':
/root/qemu-2.12.0/linux-user/syscall.c:7132:14: error: 'struct sigevent' has no member named '_sigev_un'; did you mean 'sigev_value'?
host_sevp->_sigev_un._tid = tswap32(target_sevp->_sigev_un._tid);
^~
/root/qemu-2.12.0/linux-user/syscall.c:7132:25: error: '(const bitmask_transtbl *)&<erroneous-expression>' is a pointer; did you mean to use '->'?
host_sevp->_sigev_un._tid = tswap32(target_sevp->_sigev_un._tid);
^
->
/root/qemu-2.12.0/linux-user/syscall.c:7132:5: warning: statement with no effect [-Wunused-value]
host_sevp->_sigev_un._tid = tswap32(target_sevp->_sigev_un._tid);
^~~~~~~~~
make[1]: *** [/root/qemu-2.12.0/rules.mak:66: linux-user/syscall.o] Error 1
make: *** [Makefile:478: subdir-aarch64-linux-user] Error 2
Turns out there’s an issue with compiling qemu
with musl libc. Thanks a lot to Nathaniel Coppa though, he shows up all over the place doing the gymnastics necessary to get things building on Alpine. Rather than trying to patch things up myself, I’m going to get the source from aports and try with that instead, since someone’s already done the hard work (patching).
I’m going to rely heavily on aports
like I did before – knowing that the qemu-system
stuff is an alpine package means that other folks have done a lot of hard work already that I can take advantage of, rather than necessarily downloading the QEMU source tarball off the bat. So basically, I’ll be trying to rebuild the qemu-system-x86_64
package. The alpine documentation on how to work with the aports
tree was fantastic to skim through to get a feel for how things work. In addition reading up on how to use aports
, I needed to do some user trickery, since normally root
is not allowed to do aports
-based builds (using abuild
). In particular I needed to add a user to do builds with:
$ adduser <user>
$ chmod -R <user> <aports dir>
$ chgrp -r <user> <aports dir>
$ addgroup <user> wheel
$ addgroup <user> abuild
$ su - <user>
While you can use abuild
with the -F
option, you have to put it on every command and things still go a little wonky, I just didn’t bother and added a whole separate user. After making this user I went into main/qemu
and ran abuild -r
, just to make sure things worked (they should, since this package is obviously a working package in the alpine linux repos). There was a bunch of patching required (so basically the fixes required to overcome the musl libc issues), and it was nice to have them available to look at (the applied patches are part of the aports
package distribution). After the basic no-modification build was done, qemu-system
was installed @ ~/packages
.
Now that the regular basic build was working, it was time to try and get the thing to statically compile. After glancing at a few files in the source, it looks like there were two big options determining how things built – --enable-user
and --disable-system
seemed to be what toggled user/system builds, and --enable-static
seemed to be the option that enabled static builds, though the option was only used on the user side (which explains how it was so easy to get the user binaries). The first instinct was to try --enable-static
on the system side. I also noticed that there were a ton of architectures that it built for by default – knowing that I was working on a pretty standard x86_64
machine, and probably always would be, I limited the architectures by modifying the subsystems
variable in ABUILD
(a file in the aports
package distribution).
Next thing I needed to do was add some more dependencies, here’s a consolidated list:
$ apk add lzo-dev libseccomp-dev gtk+3.0-dev libcap-ng-dev alsa-lib-dev snappy-dev xen-dev cyrus-sasl-dev xfsprogs-dev jpeg-dev vde2-dev bluez-dev`
I thought I was ready to do another build but using abuild
to do the second build was a little confusing at this point if you delete the contents of ~/packages
and rebuild, the output doesn’t go there the second time (or it didn’t for me). I had to go into the src
directory generated by abuild -r
and start setting configuration directly there. Since that was bascially just the raw codebase, I figured I’d deal with getting abuild
to work properly later and just see if I could get qemu
to build. Of course, I’m exactly where I was before if I don’t apply the patches developed by the Alpine people – so after some reading I found I could use abuild prepare
to apply the patches I cared about. Next step is to actually run the configure
part of the build:
$ ./configure --static --cpu="x86_64"
After running this I got tons of errors as various static dependencies weren’t found. Yes, you read that right – errors in the configure
step which is normally a fire-and-forget, here’s an example of one:
cc -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -L/usr/lib -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -m64 -static -g -lsnappy
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lsnappy
collect2: error: ld returned 1 exit status
I checked and found that /usr/lib
contained lsnappy.so
but what I needed was a statically linked version of lsnappy
(ex. lsnappy.a
). But before I jump into the rabbit hole of trying to statically build that library or any of the others, I wondered if I could just cut out more unneeded functionality to avoid building stuff I didn’t need to. Here’s what the list of not-properly-linked libraries looked like:
aac5c740b857:/aports/main/qemu/src/qemu-2.12.0# cat config.log | grep "cannot find"
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lsnappy
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lsasl2
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lncursesw
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lncursesw
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lcursesw
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lncursesw
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lncursesw
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lcursesw
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lssh2
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lssh2
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lbluetooth
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgthread-2.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lglib-2.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lintl
/usr/lib/gcc/x86_64-alpine-linux-musl/6.4.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lintl
That’s a bunch, but like half of them are ncurses
related, and it turns out container linux actually has some of these dependencies already installed. Looking at the list of shared libraries required by qemu-system
but not provided by container linux already, here’s the list of things I have to actually statically compile:
At this point I started wondering just how many fo these I actually needed to just get a qemu-system
that would boot up. Obviously stuff like libgnutls
and libdrm
would be necessary, but I’m not sure about libasound
or libwayland-*
. At this point the list kindf of freaked me out so I started removing flags (functionality of qemu-system
) from the build to lessen the load, then my mind wandered….
At this point everything was getting pretty heavy and it seemed like I’d never get anything working so I started wondering if I gave rkt
enough of a shot. It really seemed like it running an alternative stage1 was going to be so much easier. Well, it turns out that evne that avenue of retreat was impossible because I discovered that using rkt
for untrusted runtimes seemed to require securityContext: privileged
, supposedly to allow people to disable the feature. This is about where I discovered rktlet
’s getting started guide, but I felt like I was going to run into the exact same issues as in the second post with the restrictive permissions issues (which again, would be more secure to just solve properly than ignore, but I was looking for a quick win, not death by 1000 cuts just yet).
qemu-system-x86_64
OK, after that brief bout of panic, let’s get back to building qemu-system
. After running configure
(as I wrote out earlier), the next build actually locked up my relatively beefy desktop. This time I added -j
to make, and that was enough to hobble me. With this ominous start I took some time to think of other ways to possibly get what I wanted.
SIDETRACK/DIGRESSION ALERT One thing I wondered if it was possible to give acess to /dev/kvm
to a non-root user – producing a way for me to “share” the device amongst containers running on the same system without giving them root privileges on the system. I found a stack exchange post that seemed to point to being able to authorize groups but at the time I looked into it, the runAsGroup
feature flag support was still in alpha so I passed on it. This could also be a really really easy way forward for others.
OK, so back to the grind again – I found an excellent resource on how to build qemu
for only x86_64
, which gave me some confidence. After skimming it I changed my configure
command to:
$ ./configure --target-list=x86_64-softmmu --enable-debug
I was wrong earlier – it wasn’t the --cpu
flag that I needed to set. After doing this, I found out that I could actually fix just about all my problems running configure
by installing glib-static
. With this, I could get the above configure
command to run wonderfully! Unfortunately, there were still probems with the actual make
build, here’s the updated list of missing libraries, from the make
command (the actual build) this time:
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -latk-1.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -latk-bridge-2.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -latspi
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lcairo
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lcairo-gobject
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -ldbus-1
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -ldrm
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lepoxy
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lfontconfig
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lfreetype
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgbm
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgdk-3
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgdk_pixbuf-2.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgraphite2
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgtk-3
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lharfbuzz
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpango-1.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpangocairo-1.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpangoft2-1.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpixman-1
Again, as weird as it sounds, this actually encourages me at this point – this list is also not so bad! Let’s just go off and build all the things! Here’s a quick rundown of the commands to get myself to this point:
$ apk add --update python alpine-sdk linux-headers zlib-dev glib-dev pixman-dev lzo-dev libseccomp-dev gtk+3.0-dev libcap-ng-dev alsa-lib-dev snappy-dev xen-dev cyrus-sasl-dev xfsprogs-dev jpeg-dev vde2-dev bluez-dev
$ git clone git://dev.alpinelinux.org/aports
$ cd aports/main/qemu
$ abuild -rF unpack
$ abuild -F prepare
$ cd src/qemu-2.12.0
$ ./configure --target-list=x86_64-softmmu --enable-debug --cpu=x86_64 --static
atk
First up is atk
, which as far as I can tell is an accessibility toolkit for GTK. I couldn’t find a static binary distribution like atk-static
of it for alpine, but I was able to find the atk-dev
however, so I started from there. Luckily, building the atk
library statically was pretty easy, here’s the step-by-step:
main/atk
abuild -F unpack
to get the sourceabuild -F prepare
(just in case, there aren’t really any patches./configure --enable-static
make
cp ./atk/.libs/* /usr/lib
(make install
does not copy the static libs to /usr/lib
, I had to find . -name "*.a"
to find them)After installing, here’s the list, down by one, indicating slow-but-steady progress:
~/aports/main/qemu/src/qemu-2.12.0 # make -j4 2>&1 | grep "cannot find" | sort | uniq
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -latk-bridge-2.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -latspi
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lcairo
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lcairo-gobject
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -ldbus-1
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -ldrm
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lepoxy
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lfontconfig
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lfreetype
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgbm
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgdk-3
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgdk_pixbuf-2.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgraphite2
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgtk-3
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lharfbuzz
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpango-1.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpangocairo-1.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpangoft2-1.0
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpixman-1
One down, 19 to go! After this, I did take some time to think whether I actually need or can disable GTK support, since I’m not going to be using any of the features on an OS that’s going to be used to primarily run containers. After taking a look at the build code, I came up with the following configure
command:
./configure --target-list=x86_64-softmmu --enable-debug --cpu=x86_64 --static --disable-gtk --disable-user
This made things WAY EASIER – check out the updated error list:
~/aports/main/qemu/src/qemu-2.12.0 # make -j4 2>&1 | grep "cannot find" | sort | uniq
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -ldrm
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lepoxy
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgbm
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpixman-1
Yeah, so at this point I was internally and externally yelling “FUCK GTK, THEN!” despite having nothing against GTK. I did waste time building atk
but the time I saved not building all the other things that were required because of it was a serious weight off my shoulders.
libdrm
Top of the updated list of missing libraries is libdrm
so I set off to start building it. Here’s the step-by-step:
configure --enable-static
errored, complaining about not finding libpciaccess!apk add libpciaccess-dev
cleared that up… but it seems like when I build statically it’s going to be an issue (fractally)make
While things ran smoothly , I find a few more binaries than I expected:
~/aports/main/libdrm/src/libdrm-2.4.89 # find . -name "*.a"
./libkms/.libs/libkms.a
./radeon/.libs/libdrm_radeon.a
./nouveau/.libs/libdrm_nouveau.a
./.libs/libdrm.a # <--- there it is
./intel/.libs/libdrm_intel.a
./amdgpu/.libs/libdrm_amdgpu.a
./tests/util/.libs/libutil.a
./tests/kms/.libs/libkms-test.a
I figured I’d only copy the one I needed for the build so I copied libdrm.a
(cp .libs/libdrm.a /usr/libs
) and was on my way. As you’d expect, the build of qemu-system
has one less error!
~/aports/main/qemu/src/qemu-2.12.0 # make -j4 2>&1 | grep "cannot find" | sort | uniq
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lepoxy
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgbm
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpixman-1
libepoxy
Next up is libepoxy
– at this point you likely know the drill so I’ll just boil it down to the hiccups:
apk add autoconf
)apk add automake
)apk add util-macros
)apk add libtool
)After this everything was the same running ./configure --enable-static
and make
worked and the static binary went to libs/libepoxy.a
. As you’d expect, qemu-system
has one less error for me
~/aports/main/qemu/src/qemu-2.12.0 # make -j4 2>&1 | grep "cannot find" | sort | uniq
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lgbm
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpixman-1
lgbm
(AKA mesa-gbm
)Despite the package being called lgbm
in the error message, the code I was looking for was mesa-gbm
– another seeminly unnecessary library but I’ll try and build it anyway. Most of it was the same – IIRC I ran abuild unpack
and abuild prepare
, but I ran into an issue with the --enable-llvm
flag being required when building on an r300
series discrete GFX chip (I have an RADEON r370 in my desktop and recognized the designation). Since it really didn’t matter if my qemu-system-x86_64
was able to support advanced graphics, I had two choices; remove the device or enable LLVM. I found a random post on phoronix
that left a good hint on where to go. Rather than try and get it to work I just removed that from the mesa-gbm
build by editing the configure
command I was using:
`./configure --enable-static --with-gallium-drivers=`
After this I ran into another issue – it turns out the DRI libraries can’t be built statically, like it’s literally impossible:
checking for LIBDRM... yes
configure: error: DRI cannot be build as static library
That’s all I got from configure
, which was pretty disheartening. After reading the Arch Linux qemu
documentation I found out that the gallium drivers (which I disabled) are what supports virtio
’s operation, so this means the configure
change has to change again:
./configure --enable-static --with-dri-drivers= --with-gallium-drivers=virgl --disable-shared --disable-driglx-direct`
Running thi sproduced an error however:
configure: error: gbm cannot be build as static library
At this point I thought building mesa-gbm
/gbm
statically was fucked, but I came across a build file from yocto linux’s build of QEMU, and was able to find the configure
command that got me across the finish line by disabling OpenGL (again something I don’t need on a VM that’s just going to run non-graphically 100% of the time):
./configure --target-list=x86_64-softmmu --enable-debug --cpu=x86_64 --static --disable-gtk --disable-user --disable-opengl
After this, we’re only down to ONE missing library in the output of the build from qemu-system-x86_64
!
/usr/lib/gcc/x86_64-alpine-linux-musl/6.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lpixman-1
pixman-dev
Building this was super easy to do statically, very similar to libepoxy
and atk
, so I won’t even write it up here.
Well, at this point, skepticism was through the roof, because I’d seemingly done it – resolved all the dependencies, and was only a make
command away from having a statically compiled qemu-system-x86_64
. So I did what you do when you’re standing on the precipice – I jumped off and ran make
and produced a binary, that should have been statically linked. Of course, when things finished successfully I couldn’t believe it so I turned to file
and ldd
:
~/aports/main/qemu/src/qemu-2.12.0 # file x86_64-softmmu/qemu-system-x86_64
x86_64-softmmu/qemu-system-x86_64: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, with debug_info, not stripped
~/aports/main/qemu/src/qemu-2.12.0 # ldd x86_64-softmmu/qemu-system-x86_64
ldd (0x7f1cbd295000)
This is pretty awesome – while I’m running a heavily neutered version of qemu-system
I’m super happy it works, and the binary itself was only ~45MB:
$ du -hs qemu-system-x86_64
45M qemu-system-x86_64
So at this point, paranoia washed over me, and I felt that I needed to save my work in a Dockerfile
as quickly as possible so I could build everything easily and repeatably. Unfortunately, I didn’t act on those instincts so I don’t have a Dockerfile
I can point you at now, but I copied the statically built qemu-system-x86_64
to the server directly. At first, I tried to test if the image actually worked, following the qemu documentation on testing system images:
$ core@localhost ~ $ ./qemu-system-x86_64 or1k-linux-4.10
WARNING: Image format was not specified for 'or1k-linux-4.10' and probing guessed raw.
Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
Specify the 'raw' format explicitly to remove the restrictions.
qemu: could not load PC BIOS 'bios-256k.bin'
$ core@localhost ~ $ wget https://stable.release.core-os.net/amd64-usr/current/coreos_production_iso_image.iso
--2018-05-29 05:51:32-- https://stable.release.core-os.net/amd64-usr/current/coreos_production_iso_image.iso
Resolving stable.release.core-os.net... 104.16.21.26, 104.16.20.26, 2400:cb00:2048:1::6810:141a, ...
Connecting to stable.release.core-os.net|104.16.21.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 378535936 (361M) [application/x-iso9660-image]
Saving to: 'coreos_production_iso_image.iso'
coreos_production_iso_image.iso 100%[====================================================================================================================================================================>] 361.00M 113MB/s in 3.4s
2018-05-29 05:51:35 (106 MB/s) - 'coreos_production_iso_image.iso' saved [0/0]
$ core@localhost ~ $ ls
coreos_production_iso_image.iso or1k-linux-4.10 qemu-system-x86_64
$ core@localhost ~ $ ./qemu-system-x86_64 -cdrom coreos_production_iso_image.iso
qemu: could not load PC BIOS 'bios-256k.bin'
As you can see I actually try with the or1k
image and also the coreos production ISO in CD-ROM mode, and get the same bios error, so I think QEMU is actually working as far as I can tell. This is flimsly logic, but not seeing a segfault or any other more serious problem was pretty encouraging at the time. I did find more information on the error that occurred and that was even more encouraging, given that I was simply missing things qemu
required, not because the qemu
binary was broken.
After going through all the steps above again (I still don’t have a proper Dockerfile
) - After going through all the steps again in generating all the executables and pieces you need (VM image, VM kernel, kata-runtime
configs, etc), It was time to actually start trying to run an untrusted container. I set up all the config for containerd
and kata-runtime
(so they could talk to each other), pointed everything at the right places, and created a pod with the right annotation, and intently watched kubelet
(journalctl -xef -u kubelet
) and containerd
(journalctl -xef -u containerd
) to start the debug loop.
The first issue I ran into was similar to that of when I was running qemu
alone – I needed a BIOS for this VM image! The error from inside kata-runtime
(through containerd
):
ERROR: OCI runtime create failed: qemu: could not load PC BIOS 'bios-256k.bin': unknown"
I was able to find a bios in the qemu
repository @ https://github.com/qemu/qemu/blob/master/pc-bios/bios-256k.bin. I had to create a script that would do the starting of qemu
so I could more directly modify what was happening when kata-runtime
tried to start qemu
. Here’s how the script started out:
#!/bin/bash
/opt/bin/qemu-system-x86_64 -L /var/lib/kata-containers/bios $@
As you can see the simplest version is to just take the commands that kata-runtime
would normally try to pass to qemu
and inject the flag that specified the location of the BIOS in.
no_timer_check
file/directory missingThe second issue that I ran into was something a little more obscure, but the the containerd
logs were great in showing what went wrong:
Error: qemu-system-x86_64: -append tsc=reliable: Could not open 'no_timer_check': No such file or directory: unknown
May 29 11:48:17 localhost kata-runtime[25014]: time="2018-05-29T11:48:17.564507816Z" level=info msg="launching qemu with: [-name sandbox-497560af811a91e8ae0d92fd1a43aee8a26a10d380a6e4295d4bcbcebb949d8f -uuid 295a9200-e7fc-470f-a83b-62665cd2192f -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/vc/sbs/497560af811a91e8ae0d92fd1a43aee8a26a10d380a6e4295d4bcbcebb949d8f/mon-295a9200-e7fc-470f-a83b-62,server,nowait -qmp unix:/run/vc/sbs/497560af811a91e8ae0d92fd1a43aee8a26a10d380a6e4295d4bcbcebb949d8f/ctl-295a9200-e7fc-470f-a83b-62,server,nowait -m 2048M,slots=2,maxmem=25121M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2 -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/sbs/497560af811a91e8ae0d92fd1a43aee8a26a10d380a6e4295d4bcbcebb949d8f/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/var/lib/kata-containers/container.img,size=134217728 -device virtio-scsi-pci,id=scsi0 -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/sbs/497560af811a91e8ae0d92fd1a43aee8a26a10d380a6e4295d4bcbcebb949d8f/kata.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-kataShared,mount_tag=kataShared -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/497560af811a91e8ae0d92fd1a43aee8a26a10d380a6e4295d4bcbcebb949d8f,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=8e:0d:dd:b9:e0:db,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /var/lib/kata-containers/vmlinuz.container -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 quiet systemd.show_status=false panic=1 initcall_debug nr_cpus=12 ip=::::::497560af811a91e8ae0d92fd1a43aee8a26a10d380a6e4295d4bcbcebb949d8f::off:: init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket -smp 1,cores=1,threads=1,sockets=1,maxcpus=12]" arch=amd64 name=kata-runtime pid=25014 source=virtcontainers subsystem=qmp
Fixing this meant extending the bash script hack – clearly I was going to have to get more and more familiar with how kata-runtime
was trying to start qemu
. There’s a TON of flags in there, hopefully I don’t have to investigate every single one.
I realized that I needed to copy all of the bios files from qemu
’s pc-bios
folder, out of the docker container where I did the build.
After lots more hacking here’s what the script looked like:
SCRIPT:
#!/bin/bash
args=$@
pre_append=$(echo $args | sed 's/\-append.*//')
pre_append=$(echo $pre_append | sed 's/\-device virtio-9p-pci,fsdev=extra-9p-kataShared,mount_tag=kataShared//')
post_append=$(echo $args | sed 's/.*\-append//')
echo -e "$pre_append $post_append" >> /tmp/qemu-calls
/opt/bin/qemu-system-x86_64 -L /var/lib/kata-containers/bios $pre_append -append "$post_append"
This is what madness looks like – I’m literally intercepting and modifying the flags passed by qemu-runtime
and doing string munging to try and get it to generate the right flags for qemu
to take.
After all this hacking, I finally run it, and… it starts up! But k8s isn’t reporting on the pod properly, while it thinks it did start. A look at the logs and nothing is immediately wrong:
May 29 13:39:56 localhost kata-runtime[24041]: time="2018-05-29T13:39:56.63968976Z" level=info msg="launching qemu with: [-name sandbox-6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a -uuid 0ae6d9a1-66b4-4e3d-b9f4-7578ae12e14e -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/vc/sbs/6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a/mon-0ae6d9a1-66b4-4e3d-b9f4-75,server,nowait -qmp unix:/run/vc/sbs/6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a/ctl-0ae6d9a1-66b4-4e3d-b9f4-75,server,nowait -m 2048M,slots=2,maxmem=25121M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2 -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/sbs/6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/var/lib/kata-containers/container.img,size=134217728 -device virtio-scsi-pci,id=scsi0 -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/sbs/6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a/kata.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-kataShared,mount_tag=kataShared -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=96:09:e1:57:c8:f8,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /var/lib/kata-containers/vmlinuz.container -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 quiet systemd.show_status=false panic=1 initcall_debug nr_cpus=12 ip=::::::6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a::off:: init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket no_timer_check -smp 1,cores=1,threads=1,sockets=1,maxcpus=12]" arch=amd64 name=kata-runtime pid=24041 source=virtcontainers subsystem=qmp
May 29 13:39:56 localhost kata-runtime[24041]: time="2018-05-29T13:39:56.700216428Z" level=info msg="{\"QMP\": {\"version\": {\"qemu\": {\"micro\": 0, \"minor\": 12, \"major\": 2}, \"package\": \"\"}, \"capabilities\": []}}" arch=amd64 name=kata-runtime pid=24041 source=virtcontainers subsystem=qmp
May 29 13:39:56 localhost kata-runtime[24041]: time="2018-05-29T13:39:56.700407212Z" level=info msg="QMP details" arch=amd64 name=kata-runtime pid=24041 qmp-capabilities= qmp-major-version=2 qmp-micro-version=0 qmp-minor-version=12 source=virtcontainers subsystem=qemu
May 29 13:39:56 localhost kata-runtime[24041]: time="2018-05-29T13:39:56.700473689Z" level=info msg="{\"execute\":\"qmp_capabilities\"}" arch=amd64 name=kata-runtime pid=24041 source=virtcontainers subsystem=qmp
May 29 13:39:56 localhost kata-runtime[24041]: time="2018-05-29T13:39:56.700838482Z" level=info msg="{\"return\": {}}" arch=amd64 name=kata-runtime pid=24041 source=virtcontainers subsystem=qmp
May 29 13:39:56 localhost kata-runtime[24041]: time="2018-05-29T13:39:56.700919512Z" level=info msg="VM started" arch=amd64 name=kata-runtime pid=24041 sandbox-id=6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a source=virtcontainers subsystem=sandbox
May 29 13:39:56 localhost kata-runtime[24041]: time="2018-05-29T13:39:56.701388061Z" level=info msg="proxy started" arch=amd64 name=kata-runtime pid=24041 proxy-pid=24077 proxy-url="unix:///run/vc/sbs/6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a/proxy.sock" sandbox-id=6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a source=virtcontainers subsystem=kata_agent
May 29 13:39:56 localhost kata-runtime[24041]: time="2018-05-29T13:39:56.701489259Z" level=warning msg="unsupported address" address="fe80::9409:e1ff:fe57:c8f8/64" arch=amd64 name=kata-runtime pid=24041 source=virtcontainers subsystem=kata_agent unsupported-address-type=ipv6
May 29 13:39:56 localhost kata-runtime[24041]: time="2018-05-29T13:39:56.701609103Z" level=warning msg="unsupported route" arch=amd64 destination="fe80::/64" name=kata-runtime pid=24041 source=virtcontainers subsystem=kata_agent unsupported-route-type=ipv6
Weirdly enough, I see 3 processes when I pgrep for qemu
:
core@localhost ~ $ pgrep -a qemu
23507 /opt/bin/qemu-system-x86_64 -L /var/lib/kata-containers/bios -name sandbox-70e865fe9662e87889534dcfc5486868bdc50ac7c948c6aa42829ac36a5a455c -uuid 048d74e0-4ae7-4bb7-8d11-5fb3416854bb -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/vc/sbs/70e865fe9662e87889534dcfc5486868bdc50ac7c948c6aa42829ac36a5a455c/mon-048d74e0-4ae7-4bb7-8d11-5f,server,nowait -qmp unix:/run/vc/sbs/70e865fe9662e87889534dcfc5486868bdc50ac7c948c6aa42829ac36a5a455c/ctl-048d74e0-4ae7-4bb7-8d11-5f,server,nowait -m 2048M,slots=2,maxmem=25121M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2 -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/sbs/70e865fe9662e87889534dcfc5486868bdc50ac7c948c6aa42829ac36a5a455c/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/var/lib/kata-containers/container.img,size=134217728 -device virtio-scsi-pci,id=scsi0 -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/sbs/70e865fe9662e87889534dcfc5486868bdc50ac7c948c6aa42829ac36a5a455c/kata.sock,server,nowait -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/70e865fe9662e87889534dcfc5486868bdc50ac7c948c6aa42829ac36a5a455c,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=62:b4:c0:a3:25:4d,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /var/lib/kata-containers/vmlinuz.container -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 quiet systemd.show_status=false panic=1 initcall_debug nr_cpus=12 ip=::::::70e865fe9662e87889534dcfc5486868bdc50ac7c948c6aa42829ac36a5a455c::off:: init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket no_timer_check -smp 1,cores=1,threads=1,sockets=1,maxcpus=12
23827 /opt/bin/qemu-system-x86_64 -L /var/lib/kata-containers/bios -name sandbox-beb0cfa06a6ce4208484239fbf935667d982718b12ce6e67eb109d6000bb30ae -uuid fa8e69c7-428e-4779-b1e7-ee9ae9b37527 -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/vc/sbs/beb0cfa06a6ce4208484239fbf935667d982718b12ce6e67eb109d6000bb30ae/mon-fa8e69c7-428e-4779-b1e7-ee,server,nowait -qmp unix:/run/vc/sbs/beb0cfa06a6ce4208484239fbf935667d982718b12ce6e67eb109d6000bb30ae/ctl-fa8e69c7-428e-4779-b1e7-ee,server,nowait -m 2048M,slots=2,maxmem=25121M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2 -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/sbs/beb0cfa06a6ce4208484239fbf935667d982718b12ce6e67eb109d6000bb30ae/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/var/lib/kata-containers/container.img,size=134217728 -device virtio-scsi-pci,id=scsi0 -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/sbs/beb0cfa06a6ce4208484239fbf935667d982718b12ce6e67eb109d6000bb30ae/kata.sock,server,nowait -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/beb0cfa06a6ce4208484239fbf935667d982718b12ce6e67eb109d6000bb30ae,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=ea:68:5e:95:ac:7c,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /var/lib/kata-containers/vmlinuz.container -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 quiet systemd.show_status=false panic=1 initcall_debug nr_cpus=12 ip=::::::beb0cfa06a6ce4208484239fbf935667d982718b12ce6e67eb109d6000bb30ae::off:: init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket no_timer_check -smp 1,cores=1,threads=1,sockets=1,maxcpus=12
24062 /opt/bin/qemu-system-x86_64 -L /var/lib/kata-containers/bios -name sandbox-6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a -uuid 0ae6d9a1-66b4-4e3d-b9f4-7578ae12e14e -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/vc/sbs/6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a/mon-0ae6d9a1-66b4-4e3d-b9f4-75,server,nowait -qmp unix:/run/vc/sbs/6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a/ctl-0ae6d9a1-66b4-4e3d-b9f4-75,server,nowait -m 2048M,slots=2,maxmem=25121M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2 -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/sbs/6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/var/lib/kata-containers/container.img,size=134217728 -device virtio-scsi-pci,id=scsi0 -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/sbs/6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a/kata.sock,server,nowait -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=96:09:e1:57:c8:f8,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /var/lib/kata-containers/vmlinuz.container -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 quiet systemd.show_status=false panic=1 initcall_debug nr_cpus=12 ip=::::::6461cd25700b6f8c285e784bfe4e3df204bbb6feeda709100431d15df952061a::off:: init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket no_timer_check -smp 1,cores=1,threads=1,sockets=1,maxcpus=12
This is awesome, because it means it’s definitely running, despite not being able to actually properly access the pod from kubernetes. I’ve succeeded… a little bit? I tried deleting the old pods, and then seeing if the extra qemu
processes would go away, and this is when I first faced kubernetes’ PLEG error – basically, the node had become NotReady
due to inability to access and properly control the pods that it started (the untrusted workload pods I started). In the meantime I sudo pkill qemu
’d the processes. For future reference the logged kubernetes errors I was seeing looked like this:
Ready False Tue, 29 May 2018 22:49:26 +0900 Tue, 29 May 2018 22:43:05 +0900 KubeletNotReady PLEG is not healthy: pleg was last seen active 9m29.776835455s ago; threshold is 3m0s
After a restart the kubelet
was able to recover from the botched pod deletion. While I’ve experienced some small success (the processes were running), I wasn’t sure what was wrong, but I wanted to look back at the hacks I was applying and see if there was something I did wrong:
virtio-9p-pci
driverI thought one of hte reasons behind the issue was the fact that I wasn’t using the virtio-9p-pci
driver that I wasn’t yet using, but was required IIRC by kata-runtime
(the command comes in with it specified). If this was true, I had two options:
virtio-9p-pci
driver in the command with another driver that was included with the binary I built (turns out you can list the supported drivers with qemu-system-x86_64 --device help
)$ qemu-system-x86_64 --device help
.... lots more output ...
Storage devices:
name "am53c974", bus PCI, desc "AMD Am53c974 PCscsi-PCI SCSI adapter"
name "dc390", bus PCI, desc "Tekram DC-390 SCSI adapter"
name "floppy", bus floppy-bus, desc "virtual floppy drive"
name "ich9-ahci", bus PCI, alias "ahci"
name "ide-cd", bus IDE, desc "virtual IDE CD-ROM"
name "ide-drive", bus IDE, desc "virtual IDE disk or CD-ROM (legacy)"
name "ide-hd", bus IDE, desc "virtual IDE disk"
name "isa-fdc", bus ISA
name "isa-ide", bus ISA
name "lsi53c810", bus PCI
name "lsi53c895a", bus PCI, alias "lsi"
name "megasas", bus PCI, desc "LSI MegaRAID SAS 1078"
name "megasas-gen2", bus PCI, desc "LSI MegaRAID SAS 2108"
name "nvme", bus PCI, desc "Non-Volatile Memory Express"
name "piix3-ide", bus PCI
name "piix3-ide-xen", bus PCI
name "piix4-ide", bus PCI
name "pvscsi", bus PCI
name "scsi-block", bus SCSI, desc "SCSI block device passthrough"
name "scsi-cd", bus SCSI, desc "virtual SCSI CD-ROM"
name "scsi-disk", bus SCSI, desc "virtual SCSI disk or CD-ROM (legacy)"
name "scsi-generic", bus SCSI, desc "pass through generic scsi device (/dev/sg*)"
name "scsi-hd", bus SCSI, desc "virtual SCSI disk"
name "sdhci-pci", bus PCI
name "usb-bot", bus usb-bus
name "usb-mtp", bus usb-bus, desc "USB Media Transfer Protocol device"
name "usb-storage", bus usb-bus
name "usb-uas", bus usb-bus
name "vhost-scsi", bus virtio-bus
name "vhost-scsi-pci", bus PCI
name "vhost-user-blk", bus virtio-bus
name "vhost-user-blk-pci", bus PCI
name "vhost-user-scsi", bus virtio-bus
name "vhost-user-scsi-pci", bus PCI
name "virtio-blk-device", bus virtio-bus
name "virtio-blk-pci", bus PCI, alias "virtio-blk"
name "virtio-scsi-device", bus virtio-bus
name "virtio-scsi-pci", bus PCI, alias "virtio-scsi"
.... lots more output ...
I figured I’d try to build with virtfs
included first – turns out there’s a similar looking feature (that I previously disabled) in the configure
script called virtfs
. It’s based on VirtFS, and it looks like that is based on the 9p
(as in virtio-9p-pci
) library, so I might be on the right track. the build failed immediately but that’s good, since I was expecting some change:
~/aports/main/qemu/src/qemu-2.12.0 # ./configure --target-list=x86_64-softmmu --enable-debug --cpu=x86_64 --static --disable-gtk --disable-user --disable-opengl --enable-virtfs
ERROR: VirtFS requires libcap devel and libattr devel
And now it’s time to build/install libcap devel
and libattr devel
– luckily I accomplished this by simply running apk add libcap-dev
! Attempting the build after that was pretty easy, when I next ran the listing of devices command I got the following output:
Storage devices:
name "am53c974", bus PCI, desc "AMD Am53c974 PCscsi-PCI SCSI adapter"
name "dc390", bus PCI, desc "Tekram DC-390 SCSI adapter"
name "floppy", bus floppy-bus, desc "virtual floppy drive"
name "ich9-ahci", bus PCI, alias "ahci"
name "ide-cd", bus IDE, desc "virtual IDE CD-ROM"
name "ide-drive", bus IDE, desc "virtual IDE disk or CD-ROM (legacy)"
name "ide-hd", bus IDE, desc "virtual IDE disk"
name "isa-fdc", bus ISA
name "isa-ide", bus ISA
name "lsi53c810", bus PCI
name "lsi53c895a", bus PCI, alias "lsi"
name "megasas", bus PCI, desc "LSI MegaRAID SAS 1078"
name "megasas-gen2", bus PCI, desc "LSI MegaRAID SAS 2108"
name "nvme", bus PCI, desc "Non-Volatile Memory Express"
name "piix3-ide", bus PCI
name "piix3-ide-xen", bus PCI
name "piix4-ide", bus PCI
name "pvscsi", bus PCI
name "scsi-block", bus SCSI, desc "SCSI block device passthrough"
name "scsi-cd", bus SCSI, desc "virtual SCSI CD-ROM"
name "scsi-disk", bus SCSI, desc "virtual SCSI disk or CD-ROM (legacy)"
name "scsi-generic", bus SCSI, desc "pass through generic scsi device (/dev/sg*)"
name "scsi-hd", bus SCSI, desc "virtual SCSI disk"
name "sdhci-pci", bus PCI
name "usb-bot", bus usb-bus
name "usb-mtp", bus usb-bus, desc "USB Media Transfer Protocol device"
name "usb-storage", bus usb-bus
name "usb-uas", bus usb-bus
name "vhost-scsi", bus virtio-bus
name "vhost-scsi-pci", bus PCI
name "vhost-user-blk", bus virtio-bus
name "vhost-user-blk-pci", bus PCI
name "vhost-user-scsi", bus virtio-bus
name "vhost-user-scsi-pci", bus PCI
name "virtio-9p-device", bus virtio-bus
name "virtio-9p-pci", bus PCI, alias "virtio-9p"
name "virtio-blk-device", bus virtio-bus
name "virtio-blk-pci", bus PCI, alias "virtio-blk"
name "virtio-scsi-device", bus virtio-bus
name "virtio-scsi-pci", bus PCI, alias "virtio-scsi"
This is awesome, because now I can remove some hacks to stop doing the script munging I was trying to do to cut out the --device
flag that was pointing at virtio-9p-pci
.
kata-containers/osbuilder
Around this point I figured out that clearcontainers/osbuilder
was actually no longer the builder to be used for kata-containers
, huge thanks to @grahamwhaley for letting me know and pointing me in the right direction. I switched over to kata-containers/osbuilder
and found everything pretty easy to build, although the commands only ran properly once. When I tried to run the commands again I got an error noting that /sbin/init
wasn’t in the rootfs for some reason. I’m not sure what tha’ts about, but once is good enough for me!
kata-containers/osbuilder
didn’t (at the time) provide a kernel, so I actually had to use the kernel that I got from the old clear-containers/osbuilder
(I found a github issue about it on kata-containers/builder). In the meantime, I figured out that to re-run the tools you have to delete centos_rootfs
.
OK, so at this point, I generate all the new image/kernel/initrd stuff and try again, but get another error about the kataShared
device. At this point my working understanding is that this is the device that enables kata-agent
to share information with kubernetes, which is obviously necessary for Kubernetes to communicate with the pods it spawns (that kata-runtime
launched/controls). The error I saw was:
ERROR:
May 30 02:54:46 localhost containerd[2485]: time="2018-05-30T02:54:46Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:vm-shell,Uid:f992bbf3-63b3-11e8-9ee1-8c89a517d15e,Namespace:default,Attempt:0,} failed, error" error="failed to start sandbox container: failed to create containerd task: OCI runtime create failed: rpc error: code = Internal desc = Could not mount kataShared to /run/kata-containers/shared/containers/: no such file or directory: unknown"
The fix for this was that I was doing the OS building in the wrong order – I was meant to generate the root fs then generate the image (which is obvious in retrospect).
After all this, the pods that were started were now dying as soon as they were created. This was a step forward in that I got a different error, but obviously the pods being DOA was less than ideal. Here’s the output I saw:
May 30 03:13:57 localhost kata-runtime[28231]: time="2018-05-30T03:13:57.758452267Z" level=info msg="VM started" arch=amd64 name=kata-runtime pid=28231 sandbox-id=bc375a63edcd3dc5dee09b618455bb47bf0ae65d313e1cece50aeffa7d80325d source=virtcontainers subsystem=sandbox
May 30 03:13:57 localhost kata-runtime[28231]: time="2018-05-30T03:13:57.758808113Z" level=info msg="proxy started" arch=amd64 name=kata-runtime pid=28231 proxy-pid=28266 proxy-url="unix:///run/vc/sbs/bc375a63edcd3dc5dee09b618455bb47bf0ae65d313e1cece50aeffa7d80325d/proxy.sock" sandbox-id=bc375a63edcd3dc5dee09b618455bb47bf0ae65d313e1cece50aeffa7d80325d source=virtcontainers subsystem=kata_agent
May 30 03:13:57 localhost kata-runtime[28231]: time="2018-05-30T03:13:57.758898027Z" level=warning msg="unsupported address" address="fe80::94:22ff:fe6f:e50a/64" arch=amd64 name=kata-runtime pid=28231 source=virtcontainers subsystem=kata_agent unsupported-address-type=ipv6
May 30 03:13:57 localhost kata-runtime[28231]: time="2018-05-30T03:13:57.758997876Z" level=warning msg="unsupported route" arch=amd64 destination="fe80::/64" name=kata-runtime pid=28231 source=virtcontainers subsystem=kata_agent unsupported-route-type=ipv6
May 30 03:13:58 localhost containerd[2485]: time="2018-05-30T03:13:58Z" level=info msg="shim reaped" id=bc375a63edcd3dc5dee09b618455bb47bf0ae65d313e1cece50aeffa7d80325d
May 30 03:13:58 localhost containerd[2485]: 2018-05-30 03:13:58.941 [INFO][28300] utils.go 379: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=bc375a63edcd3dc5dee09b618455bb47bf0ae65d313e1cece50aeffa7d80325d CNI_NETNS=/var/run/netns/cni-0c73f075-4009-af02-c329-e172f17d30ee CNI_ARGS=IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=vm-shell;K8S_POD_INFRA_CONTAINER_ID=bc375a63edcd3dc5dee09b618455bb47bf0ae65d313e1cece50aeffa7d80325d;IgnoreUnknown=1 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin INVOCATION_ID=dee1797538c244ccb7107bf2ffe43b7e JOURNAL_STREAM=9:4414399 DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig]
May 30 03:13:58 localhost containerd[2485]: 2018-05-30 03:13:58.951 [INFO][28300] calico.go 431: Extracted identifiers ContainerID="bc375a63edcd3dc5dee09b618455bb47bf0ae65d313e1cece50aeffa7d80325d" Node="localhost" Orchestrator="k8s" WorkloadEndpoint="localhost-k8s-vm--shell-eth0"
May 30 03:13:58 localhost containerd[2485]: 2018-05-30 03:13:58.985 [WARNING][28300] workloadendpoint.go 72: Operation Delete is not supported on WorkloadEndpoint type
May 30 03:13:58 localhost containerd[2485]: 2018-05-30 03:13:58.985 [INFO][28300] k8s.go 361: Endpoint deletion will be handled by Kubernetes deletion of the Pod. ContainerID="bc375a63edcd3dc5dee09b618455bb47bf0ae65d313e1cece50aeffa7d80325d" endpoint=&v3.WorkloadEndpoint{TypeMeta:v1.TypeMeta{Kind:"WorkloadEndpoint", APIVersion:"projectcalico.org/v3"}, ObjectMeta:v1.ObjectMeta{Name:"localhost-k8s-vm--shell-eth0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"7ba861c2-63b7-11e8-9ee1-8c89a517d15e", ResourceVersion:"118446", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63663246833, loc:(*time.Location)(0x1ec5320)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"projectcalico.org/namespace":"default", "projectcalico.org/orchestrator":"k8s"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v3.WorkloadEndpointSpec{Orchestrator:"k8s", Workload:"", Node:"localhost", ContainerID:"", Pod:"vm-shell", Endpoint:"eth0", IPNetworks:[]string{"10.244.0.10/32"}, IPNATs:[]v3.IPNAT(nil), IPv4Gateway:"", IPv6Gateway:"", Profiles:[]string{"kns.default"}, InterfaceName:"cali4a530075471", MAC:"", Ports:[]v3.EndpointPort(nil)}}
May 30 03:13:58 localhost containerd[2485]: Calico CNI releasing IP address
May 30 03:13:58 localhost containerd[2485]: 2018-05-30 03:13:58.985 [INFO][28300] utils.go 149: Using a dummy podCidr to release the IP ContainerID="bc375a63edcd3dc5dee09b618455bb47bf0ae65d313e1cece50aeffa7d80325d" podCidr="0.0.0.0/0"
May 30 03:13:58 localhost containerd[2485]: Calico CNI deleting device in netns /var/run/netns/cni-0c73f075-4009-af02-c329-e172f17d30ee
May 30 03:13:58 localhost containerd[2485]: time="2018-05-30T03:13:58Z" level=error msg="Failed to destroy network for sandbox "bc375a63edcd3dc5dee09b618455bb47bf0ae65d313e1cece50aeffa7d80325d"" error="failed to get IP addresses for "eth0": <nil>"
May 30 03:13:59 localhost containerd[2485]: time="2018-05-30T03:13:58Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:vm-shell,Uid:7ba861c2-63b7-11e8-9ee1-8c89a517d15e,Namespace:default,Attempt:0,} failed, error" error="failed to start sandbox container: failed to create containerd task: OCI runtime create failed: rpc error: code = Internal desc = Could not mount kataShared to /run/kata-containers/shared/containers/: no such file or directory: unknown"
I didn’t get this error if I use the version of the rootfs/image where the agent is the thing that starts – but if I do that, the pod is stuck forever in the Creating
state, and never gets to running. I figured I should check inside kubelet
to see if there’s anything there and I saw this telling line of log output:
May 30 04:09:06 localhost kubelet[2308]: I0530 04:09:06.200582 2308 kubelet_node_status.go:811] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2018-05-30 04:09:06.200546887 +0000 UTC m=+820.245566857 LastTransitionTime:2018-05-30 04:09:06.200546887 +0000 UTC m=+820.245566857 Reason:KubeletNotReady Message:PLEG is not healthy: pleg was last seen active 3m11.681064631s ago; threshold is 3m0s}
This is exactly where the node was becoming NotReady
(PLEG error) was taking hold – the pleg is going missing for the pods that get started by kata-runtime
still, despite everything starting up and seeming fine.
Unfortunately, the previous step is how far I got. After all the work, getting containerd
to call kata-runtime
to call qemu-system-x86_64
, all the way down to getting a kubernetes pod actually running, I stopped trying to get everything fixed after I encountered the PLEG errors, which seemed to be taking out the Node.
Weirdly enough, all external indicators seemed to point to the fact that the pod was running and qemu
was running and everything was in orer, but Kubernetes just couldn’t manage it (pun intended). I think with a little more work I could have gotten it completely working but I think I’ll leave that to any future explorers or maybe myself if I ever come back to Container Linux, maybe Flatcar linux?, for a 3rd time.
Thanks for reading along, this post has been a doozie to write but it’s been a long time coming (I’ve left it on ice for a while). Hopefully if you’re out there experimenting with these technologies on top of Container Linux you’ll find this post (and the F/OSS repos included within) useful!
This is the last post in the series but certainly not my last time experimenting with untrusted container runtimes! I switched off of Container Linux (to Ubuntu server) because I wasn’t down to experience this much difficulty, so the next time I approach this I expect to glide through everything super easily! Maybe I can even use an Operator like kubevirt
(it’s got some nice documentation as well). Either way, the sky is the limit (it feels like) going back to the mainstream and running my Kubernetes cluster on Ubuntu server. This is probably the last post I’ll write about Container Linux for a long time, but it was fun while it lasted, and I’m glad I did this since I learned a lot along the way.