Fresh Dedicated Server to Single Node Kubernetes Cluster on CoreOS, Part 2: Getting Kubernetes Running

This is the seccond in a series of blog posts centered around my explorations and experiments with using Kubernetes and CoreOS to power my own small slice of infrastructure. Check out the previous post:

  1. Part 1 (Setting up the server with CoreOS)
  2. Part 2 (Getting Kubernetes running) (this post)
  3. Part 3 (Setting up essential Kubernetes extras)

tl;dr - Read the step-by-step guide on the CoreOS site for setting up Kubernetes, it’s excellent. With a little bit of elbow grease despite the fact that it’s deprecated in favor of using Tectonic to set up Kubernetes, it’s still a great guide and a good way to understand much of Kubernetes is set up.

This post details what I went through to install Kubernetes manually (without using the currely blessed tectonic-based setup), and my initial experiences and mistakes in learning about and setting up Kubernetes on a CoreOS-equipped machine. In this post I”ll be going through the CoreOS documentation on Kubernetes, along with the Step-by-step guide for setting up Kubernetes on CoreOS. Let’s get into it!

Step 0: Read up on Kubernetes

As with any new technology excursion, the first step is to read up thoroughly (otherwise said, RTFM) on what the new technology is and what it does for you. The Kubernetes documentation is fantastic, and that combined with the CoreOS documentation on Kubernetes was more than enough reading to give me a feeling like I understood the world I was about to dive into.

Step 1: Make a decision about whether to use Tectonic

After reading the Kubernetes documentation on CoreOS, I came across the warning put in there that advised the use of Tectonic. Visiting the official tectonic site makes it seem like Tectonic is only available as a paid tool but that’s not the case, Tectonic is also on github, and very much open source.

When I was first looking at Tectonic, it wasn’t immediately clear to me that “bare metal” was referring to just running on a VPS/dedicated hardware somewhere (I would have thought it meant some software that was meant to be run right on some hypervisor). When looking at the bare metal with terraform Tectonic documentation, I’m still quite unsure whether it’s for my use case.

I didn’t go with Tectonic (Kubernetes by way of Terraform) for two reasons:

  1. I wasn’t sure Tectonic was right for my usecase, which is just setting up Kubernetes on a rented VPS/dedicated machine
  2. I didn’t want to hide the complexity of setting up Kubernetes just yet

To expand more on the second reason, I’m a big fan of going through the pain that tools are supposed to save you from as a way of furthering understanding of the tool/why you’re using it. I believe that good tools are very often simple tools, and if setting up a tool is so unbearably difficult that it has to be abstracted away, it’s likely going to be very difficult to deal with that tool when things go wrong and you have to peel back one or two layers of abstraction anyway to figure out what’s wrong.

Step 2: Getting ETCD up and running

etcd is a big requirement for Kubernetes clusters. etcd is a distributed key-value store that Kubernetes uses to store various information and configuration. As Kubernetes is meant to make a bunch of machines essentially act as one to handle your compute needs, the underlying machines have to talk to each other and share state/information somehow and etcd is how that gets done.

etcd comes pre-installed on Container Linux/CoreOS, so I didn’t have to install it, but I did need to enable the appropriate systemd service and make sure it was configured properly. This is actually the point in time where I figured out that the Ignition configuration system that CoreOS uses is not meant to be run repeatedly – I spent a bunch of time trying to figure out how to bake in the etcd configuration that I needed into the ignition config and “re-run” ignition, only to bash my head into the fact that that’s not how ignition was ever supposed to work.

Configuring etcd to work with the impending Kubernetes installation was pretty easy given the manual instructions in the Kubernetes documentation. The single-node etcd configuration instructions in the documentation worked just fine for me, I only needed to replace the {PUBLIC_IP} placeholder with my server’s actual IP. I did, however, run into a slightly confusing point: should I should be using/updating etcd2 or etcd-member systemd services?. Both of these services are present on the version of COreOS I installed, and after reading through the CoreOS FAQ documentation section, it looks like etcd2 is deprecated so etcd-member should be the right one.

NOTE Since I’m running a single node cluster, I actually didn’t need to make etcd available to other machines, this means I didn’t want it to listen at 0.0.0.0 (all network interfaces), 127.0.0.1 (the loopback local network interface, the default) was just fine to use.

To check that etcd is up and running properly a few calls to systemctl status etcd-member or journalctl -xe -u etcd-member sufficed.

Step 3: Setting up TLS

Of course, as you might expect, if you’re going to automatically coordinate physical machines in a busy data center or over the noisy internet, secuerly sending traffic between those machines is a necessity. SSL/TLS is the defacto way to protect communications over the internet these days so of course, one of the first steps to setting up Kubernetes is setting up the means of secure communication for the impending Kubernetes configuration.

The CoreOS guide to setting up TLS is very easy to go through, and is fairly descriptive of what you need and why, like much of the rest of the documentation, it’s no longer maintained (deprecated in favor of Tectonic), but is still accurate as of the writing of this blog post. In my case, since it’s just a single-node cluster, I did not need to set up any worker keypairs (the main node is the only worker). So basically I don’t have much to add here other than “follow the guide”.

Basically, the guide walks you through setting up a API server keypair and a cluster administrator keypair along with a Certificate Authority, using OpenSSL.

Step 4: Getting the kubelet system service running

For this step the master node deployment documentation was extremely helpful – I was able to get everything set up with relatively little wandering.

Following the guide for deploying the Kubernetes master node was enough for this, except in my case since it’s only a single node cluster, I needed to set –register-schedulable=true in the configuration for kubelet. This actually ends up being wrong (Kubernetes has changed the name of this configuration option since the documentation was written), but it was important to be introduced to the idea that Kubernetes doesn’t normally schedule work on the API server that’s supposed to be ordering workers around.

I came across a great guide using CentOS 7 on Medium which I looked at along with the official API documentation on kubelet to suss out what kubelet was and what it was supposed to be doing, maybe they’ll be useful for you as well.

Step 4.1: Choose between Flannel and calico

The documentation asks you to make a choice between flannel and calico, the latter offering you better security controls. Since I wanted to stay with as vanilla a set up as possible for the first iteration, I went with only Flannel. After doing some research it also looks like flannel and calico are set to merge into a different thing called “canal”. I didn’t think I needed features of calico just yet, and I haven’t quite been proven wrong – I’m finding the combination of a good firewall and single-tenancy (I’m the only one using my cluster so intra-container communication isn’t really an issue right now) to be enough.

Step 4.2: Set up Flannel

Flannel is an important building block of Container Linux/CoreOS and resultingly Kubernetes – it manages your network around and between the containers that run on the system. Luckily for me, CoreOS comes with Flannel installed so I got to skip the step of installing flannel itself (though it certainly didn’t seem very difficult).

Putting in configuration in etcd for flannel

One thing I came across was that version 3 of etcd no longer uses HTTP1.1, which means now to update etcd you must use a program called etcdctl. For me this ended up looking like:

$ etcdctl set /coreos.com/network/config '{ "Network": "10.3.0.0/16" }'

Note that it should match the cluster IP you maybe have used earlier in the guide when setting up flannel

PERSONAL RANT: I really dislike that HTTP2 doesn’t use text (or support multiple methods or something), because it makes learning how HTTP works so much harder. One of the best things about HTTP1.1 is that I can just SHOW someone a HTTP request in a text editor. I can do things like telnet-ing in and pretending to be a client to show how HTTP works, but after HTTP2 becomes the established default, those options won’t quite work anymore, and while I’m sure things will pop up to replace them, you just can’t beat the simplicity of not having to explain a binary encoding.

Attempting to start flannel

Starry-eyed, I tried to sudo systemctl start flanneld and after a bit of waiting I realized it failed, when looking at the log output (sudo systemctl status flanneld). The error I saw repeated was:

Aug 06 14:26:12 localhost flannel-wrapper[3082]: E0806 14:26:12.522655    3082 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured; error #0: unsupported protocol scheme ""

Pretty obvious error, looks like a missing value for protocol scheme, so I went back and looked at the /etc/flannel/options.env file that was created as part of flannel setup. It looked like the line in options.env for etcd endopints should be something like “http://localhost:2379" (with scheme and port included), so I changed that and a few restarts later things were working like they should. At first I suspected that options.env wasn’t being read properly or something, but I got a better explanation of what I was doing wrong from a issue that was filed with flanneld.

Here’s what my flanneld options.env (@ /etc/flanneld/options.env) looks like:

FLANNELD_IFACE=<my machine ip>
FLANNELD_ETCD_ENDPOINTS=http://localhost:2379

Step 4.3: Start up kubelet.service

My configuration looked pretty much just like the one in the documentation:

[Service]
Environment=KUBELET_IMAGE_TAG=v1.7.2_coreos.0
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/run/kubelet-pod.uuid \
  --volume var-log,kind=host,source=/var/log \
  --mount volume=var-log,target=/var/log \
  --volume dns,kind=host,source=/etc/resolv.conf \
  --mount volume=dns,target=/etc/resolv.conf"
ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/usr/bin/mkdir -p /var/log/containers
    ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
  --register-node=true \
  --cni-conf-dir=/etc/kubernetes/cni/net.d \
  --network-plugin=${NETWORK_PLUGIN} \
  --container-runtime=docker \
  --allow-privileged=true \
  --read-only-port=0 \
  --pod-manifest-path=/etc/kubernetes/manifests \
  --hostname-override=<ip of the machine> \
  --cluster_dns=10.3.1.1 \
  --cluster_domain=cluster.local
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/run/kubelet-pod.uuid
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Check out the kubelet documentation for what a lot of these options do, and the kubelet-wrapper documentation for what the script that this unit runs actually does.

NOTE the use of register-node=true instead of register-schedulable=true, the interface for that changed a little bit and the documentation wasn’t updated yet when I last used it.

The next step was to attempt to start the kubelet systemd service (sudo systemctl start kubelet). After the first attempt to start the kubelet service, it produced an eror, noting that it couldn’t connect to the Kubernetes API (which is supposed to be on the same machine) – that makes perfect sense of course, because I haven’t installed anything at this point by that name at all!

Step 4.4: Double check the manifests for system-level Kubernetes contaners

Kubernetes performs just about all it’s essential functions outside of distributed configuration (etcd) and inter-container networking (flannel) by using internal system-level services (they’re in the namespace kube-system). These system-level services get created by having their resource configurations put in /etc/kubernetes/manifests (or wherever you configure that folder to be). The Kubernetes API is just another one of those system-level services, and it needs to be properly installed. My manifests folder contains the following:

$ ls /etc/kubernetes/manifests/
kube-apiserver.yaml  kube-controller-manager.yaml  kube-proxy.yaml  kube-scheduler.yaml

Here’s what the kube-apiserver.yaml looked like by the end (note this is what the file looks like in the future, not necessarily at this point in the setup):

apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-apiserver
    image: quay.io/coreos/hyperkube:v1.7.2_coreos.0
    command:
    - /hyperkube
    - apiserver
    - --bind-address=0.0.0.0
    - --etcd-servers=http://127.0.0.1:2379
    - --allow-privileged=true
    - --service-cluster-ip-range=10.3.0.0/16
    - --secure-port=6443
    - --advertise-address=xxx.xxx.xxx.xxx
    - --feature-gates=PersistentLocalVolumes=true
    - --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota
    - --tls-cert-file=/etc/kubernetes/ssl/apiserver.pem
    - --tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem
    - --client-ca-file=/etc/kubernetes/ssl/ca.pem
    - --service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem
    - --runtime-config=extensions/v1beta1/networkpolicies=true
    - --runtime-config=batch/v2alpha1=true
    - --anonymous-auth=false
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        port: 8080
        path: /healthz
      initialDelaySeconds: 15
      timeoutSeconds: 15
    ports:
    - containerPort: 6443
      hostPort: 6443
      name: https
    - containerPort: 8080
      hostPort: 8080
      name: local
    volumeMounts:
    - mountPath: /etc/kubernetes/ssl
      name: ssl-certs-kubernetes
      readOnly: true
    - mountPath: /etc/ssl/certs
      name: ssl-certs-host
      readOnly: true
  volumes:
  - hostPath:
      path: /etc/kubernetes/ssl
    name: ssl-certs-kubernetes
  - hostPath:
      path: /usr/share/ca-certificates
    name: ssl-certs-host

After making sure those all these manifests contained the right configuration (it wasn’t too far from the defaults), I found that checking docker ps after starting the kubelet system service showed the services that were created for each of the system-level services.I’d like to point out that it’s OK that no ports are linked in the PORTS column of the ‘docker ps’ output (it surprised me and made me think things were broken for a bit).

Another important thing to note – make sure you check that the right release tag is set for all the kube-* manifests, they should be synchronized, otherwise I’d assume you risk some super interop issues.

Step 4.5: Debug kubelet.service

The kubelet service seemed to be started, and I saw containers running, but the API was still unreachable. This lead to a lot of debugging of the systemd service using sudo systemctl status kubelet and sudo journalctl xe -u kubelet. After searching the logs (and seeing a ton of repeating messages about the API server being unreachable), I found the culprit rather quickly:

Aug 06 15:35:30 localhost kubelet-wrapper[7642]: Flag --api-servers has been deprecated, Use --kubeconfig instead. Will be removed in a future version.
Aug 06 15:35:30 localhost kubelet-wrapper[7642]: Flag --register-schedulable has been deprecated, will be removed in a future version

I really do appreciate the very clear error messages – it is obvious what the problem is, and this made it easy to fix. I can’t say enough about projects that think of and take the time to make their software this easy to use. After fixing the issue with the deprecated kubelet options, the next issue I ran into was that the API server itself seemed to be thrashing (starting and stopping repeatedly), in the logs for kubelet.

Aug 06 15:40:34 localhost kubelet-wrapper[8266]: I0806 15:40:34.272073    8266 kuberuntime_manager.go:500] pod "kube-apiserver-xxx.xxx.xxx.xxx_kube-system(924557a9880230aa04b6f0b364e4c745)" container "kube-apiserver" is unhealthy, it will be killed and re-created.
Aug 06 15:40:34 localhost kubelet-wrapper[8266]: I0806 15:40:34.303192    8266 kuberuntime_manager.go:741] checking backoff for container "kube-apiserver" in pod "kube-apiserver-xxx.xxx.xxx.xxx_kube-system(924557a9880230aa04b6f0b364e4c745)"

At this point, that kind of error indicates to me that something with the API server is misconfigured, so it’s time to head to /etc/kubernetes/manifests and figure out what’s wrong (maybe it’s more deprecated config?). It looks like I entered the configuration for etcd server incorrectly here (much like I did earlier in the flannel config), the scheme and port needed to be specified (note that in the configuration above the problem is fixed, since it’s from the future).

One more error for good luck:

Aug 06 15:43:48 localhost kubelet-wrapper[8266]: E0806 15:43:48.617385    8266 kubelet.go:1621] Unable to mount volumes for pod "kube-apiserver-xxx.xxx.xxx.xxx_kube-system(924557a9880230aa04b6f0b364e4c745)": timeout expired waiting for volumes to attach/mount for pod "kube-system"/"kube-apiserver-xxx.xxx.xxx.xxx". list of unattached/unmounted volumes=[ssl-certs-kubernetes ssl-certs-host]; skipping pod

At this point I wasn’t sure what was causing this (as the volume specification was right, and the files were indeed there), but what I did was to restart kubelet, to give the API server a chance to start again and everything started working. The first taste of success, and it certainly tastes good!

Step 5: Restarting and enabling everything

Once Kubernetes was actually running properly, it was time to make sure the services were always running. As always, that’s a breeze with systemd:

systemctl enable etcd-member
systemctl enable flanneld
systemctl enable kubelet

Note At this point I’m not sure everything is working properly, but I can at least tell from the logs that everyting is running without errors, so I thought this was a good place to stop and make my changes permanent/survive restarts.

Step 6: Sanity testing

As the documentation was very easy to follow and I didn’t spend days tied up in a problem I didn’t understand, I instantly became very suspicious that nothing was indeed working. My first idea was to run some small pod on Kubernetes to make sure things were working, but I figured before I even tried to do a test like that, I should confirm that the basics (like access to the API in the first place) were working, which means I ran:

$ curl localhost:8080/version
{
  "major": "1",
  "minor": "7",
  "gitVersion": "v1.7.2+coreos.0",
  "gitCommit": "c6574824e296e68a20d36f00e71fa01a81132b66",
  "gitTreeState": "clean",
  "buildDate": "2017-07-24T23:28:22Z",
  "goVersion": "go1.8.3",
  "compiler": "gc",
  "platform": "linux/amd64"
}

So that’s good news, it looks like the API is running on my machine at the very least, and serving requests. Remember, at the end of the day, Kubernetes manages itself with some program that runs on a port on your computer and takes configuration changes for your cluster over HTTP.

The documentation also suggests running a command that basically uses the API to list the pods:

$ curl -s localhost:10255/pods | jq -r '.items[].metadata.name'
kube-apiserver-xxx.xxx.xxx.xxx
kube-controller-manager-xxx.xxx.xxx.xxx
kube-proxy-xxx.xxx.xxx.xxx
kube-scheduler-xxx.xxx.xxx.xxx

If you’ve never encountered jq, it’s a tool that helps format and deal with JSON output on te command line. It’s a useful tool especially as JSON continues to be a very popular data exchange language.

Getting this output back is very encouraging, and does enough to convince me that the Kubernetes API is running at the very least and accessible locally. At this point, with connectivity established, I discovered (the hard way) that CoreOS only comes with iptables, not ufw, which makes me very sad. I initially set up the API to receive on port 443 (default SSL/TLS port), so I hit the port just to make sure nothing was exposed that shouldn’t have been, and I get a 4xx unauthorized right away so that was a good sign of a reasonable level of security (no other ports were opened to the outside world by kubelet, so far).

Step 7: Set up kubectl

Of course, configuration of Kubernetes is not done only by the HTTP API, they’ve also provided a tool called kubectl that gives a command line interface to Kubernetes and has some great developer ergonomics. The documentation on how to set up kubectl is just about all you need to set it up. On my home computer this ammounted to:

  1. curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.7.2/bin/amd64/amd64/kubectl
  2. Verifying, Installing and running the binary on my system (yay running binaries from the internet… :|)

To access a remote Kubernetes master node (my only node in this case), I needed to copy over some results of the the TLS setup to my own machine (home computer).

NOTE A friendly tip: nmap served me very well as a debugging tool throughoutmy server setup – I often wanted to make absolutely sure only the ports I expected to be open were open on the server, and nmap performed it’s function beautifully. While it is not OK to go around nmapping and poking around at other people’s servers, doing that to your own is no problem (and was massively helpful to me as a way to sanity check my kubernetes/firewall configuration).

One of the first issues I ran into was a malformed HTTP response error while trying to run kubectl:

Unable to connect to the server: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

The actual issue turned out to be that I forgot to configure kubectl correctly (I forgot the “https://“):

kubectl config set-cluster default-cluster --server=https://<my server ip> --certificate-authority=/absolute/path/to/ca.pem

(needless to say this is after going through the documentation, and doing all the kubectl config commands there)

After doing these things, kubectl was working from my home computer, accessing the remote server properly:

$ kubectl get nodes
No resources found.

I’ve never been more excited to receive what’s essentially a 404 before in my life.

Step 8: Relax, take a breather, and get ready for what’s next

At this point, after battling through the documentation, shoring up my understding of Kubernetes, CoreOS, and all the other tech involved along the way, I finally have a running, single node Kubernetes cluster up and acessible from my home machine. At this point I went ahead and had a long nap to try and let things sink in and get ready for all the things that were immediately next:

  • Getting cluster-level DNS working
  • Creating my first pod/service/deployments configurations and getting them running on the cluster
  • Becoming familiar with tools like kubectl proxy since I wasn’t ready to expose Kubernetes to the internet yet
  • Setting up some dashboard that could make it a little easier to monitor performance/how the cluster is configured
  • Exposing the cluster to the internet

After all this work, there’s of course the work of actually porting my existing programs and infrastructure pieces to properly run on Kubernetes. This means figuring out how to port not only stateless web servers, but also databases, caching layers, utilities that I didn’t build (like piwik), and services like email over SMTP and running them The Right Way ™ on Kubernetes. Stay tuned, because getting Kubernetes running was just the beginning!

Reflections on the steps so far

At this point in my discovery and use of Kubernetes, I was pretty pleased with the documentation (I can only imagine how many man hours went into making it so good), and how the concepts and guides for Kubernetes fit together. I was never lost for very long, and often when I found what I needed to do, it seemed almost obvious, given what I knew about how the infrastructure was supposed to fit together.

For the most part, this post could be summarized into follow the manual, but I wanted to share at least some of the issues I ran into, for those who might want to set up Kubernetes but might not want to go the tectonic route just yet.