tl;dr - UDP support is coming to traefik soon, so I’m updating my cluster’s traefik to be ready to take advantage of it and all the other new features. Going from v1 -> v2.2 (the latest) requires some config changes so I detail them below.
NGINX is one of the most venerated load balancers on the internet and when I first set up my tiny kubernetes cluster I used it. While it’s got lots of benefits (like being able to debug it by just entering the container and poking around /etc/nginx
), I started looking at some of the options to fill the spot of my kubernetes Ingress Controller. One of those options is Traefik and it’s worked great for me (which means for the most part I’ve not had to deal with it).
Traefik is written in Go which I was initially was fanatical about, because of how much simpler it was than some of the other choices (coughJavacough) in the space and how it easily statically compiled, built fast, etc. While I’ve cooled on Golang somewhat (Rust is where it’s at these days), the fact that Golang has contributed a huge amount to being able to easily write good, efficient software is undeniable with the existence of projects like Traefik. As far as I can tell it’s performant and easy to deploy, and has an approachable code base.
Of course it’s a good idea to stay up to date with newer versions that your infrastructure depends on – security concerns, more efficiency, etc. Upgrading does carry risk, however, and I sure do dislike whenever I take down my cluster, even if it’s not running anything too important. I’m paritcularly motivated to upgrade Traefik to v2 because of the prospect of it supporting UDP in the near future. I’ve also posted to reddit about this, and people shared the enthusiasm (if upvotes are any indicator).
Let’s jump right in – in usual Kubernetes style, Traefik is my cluster’s ingress controller– this means it runs as a DaemonSet
on every node in my cluster and mediates access from the outside world.
The Traefik DaemonSet
gets it’s configuration in the usual ways – from a ConfigMap
. Here’s what the current one (for v1) looks like:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: traefik-config
namespace: ingress
data:
traefik.toml: |
defaultEntryPoints = ["http","https"]
debug = false
logLevel = "INFO"
#Config to redirect http to https
[entryPoints]
[entryPoints.http]
address = ":80"
compress = true
[entryPoints.https]
address = ":443"
compress = true
[entryPoints.https.tls]
[entryPoints.admin]
address = ":9999"
compress = true
[entryPoints.admin.auth.basic]
users = [
"admin:<bcrypt'ed password>",
]
[api]
entrypoint = "admin"
[api.statistics]
recentErrors = 10
[kubernetes]
# Only create ingresses where the object has traffic-type: external label
# labelselector = "traffic-type=external"
[metrics]
[metrics.prometheus]
buckets=[0.1,0.3,1.2,5.0]
entryPoint = "traefik"
[ping]
entryPoint = "http"
[accessLog]
Side note – it’s awesome that Traefik lets you specify some pre-existing users for the basic auth? All you have to do is drop in those <user>:<bcrypt'ed password>
entries into entrypoints.admin.auth.basic.users
and you’re good to go! It’s pretty amazing. This is one of the things I wish all self-hostable tools supported.
Also TOML is the best configuration language out there – change my mind. It’s got the best mix of human readability along with allowing for complex structures. INI config style headers are super important, the nesting you can do, and the targeting you can do for a section (ex. [metrics.prometheus]
) is awesome.
Traefik has excellent documentation and of course, there’s a migration guide for v1 to v2. and they even have a tool called traefik-migration-tool
for doing the migration! Unfortunately, running it gives me a bunch of warnings about manual conversion:
$ traefik-migration-tool static -i traefik.toml
Compress on entry point "http" must be converted manually. See https://docs.traefik.io/middlewares/compress/
Compress on entry point "https" must be converted manually. See https://docs.traefik.io/middlewares/compress/
TLS on entry point "https" must be converted manually. See https://docs.traefik.io/routing/routers/#tls
Compress on entry point "admin" must be converted manually. See https://docs.traefik.io/middlewares/compress/
The entry point (admin) defined in API must be converted manually. See https://docs.traefik.io/operations/api/
Despite those warnings there is a generated static
folder which houses the traefik static configuration that looks like this:
$ tree static/
static/
├── new-traefik.toml
└── new-traefik.yml
0 directories, 2 files
Inside, new-traefik.toml
looks like this:
[global] # NOTE: SEE the BUG: in this post, this should not have been there, though it was generated
[entryPoints]
[entryPoints.admin]
address = ":9999"
[entryPoints.http]
address = ":80"
[entryPoints.https]
address = ":443"
[providers]
providersThrottleDuration = "2s"
[providers.kubernetesIngress]
throttleDuration = "0s"
[api]
insecure = true
[metrics]
[metrics.prometheus]
buckets = [0.1, 0.3, 1.2, 5.0]
entryPoint = "traefik"
[ping]
entryPoint = "http"
[log]
level = "INFO"
[accessLog]
bufferingSize = 0
And that’s what the new configuration at least should look like at the base level!
For reference, the relevant warning lines:
Compress on entry point "http" must be converted manually. See https://docs.traefik.io/middlewares/compress/
Compress on entry point "https" must be converted manually. See https://docs.traefik.io/middlewares/compress/
Compress on entry point "admin" must be converted manually. See https://docs.traefik.io/middlewares/compress/
It looks like v2 has changed how this is set up – surely the compress middleware/plugin is still present. Taking a look at the Traefik documentation site reveals the Compress middleware that I’m looking for. Interestingly enough, I have two choices on how to implement this – I can either use a Kubernetes CRD like so:
# Enable gzip compression
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: test-compress
spec:
compress: {}
Alternatively I can include it in the static (file) configuration, like so:
# Enable gzip compression
[http.middlewares]
[http.middlewares.test-compress.compress]
Since I want compression on all the traffic (at least for now), I’m going to go ahead and enable it at the file level. I’m not actually sure that I have all the required CRDs installed (which reminds me, I’ll need to pull and dissect the Traefik install YAML and see if there are any bits that I don’t already have for V2). I won’t show the updated file here, but I added these lines to the TOML file that was generated earlier (also, a section for https
).
For reference, the relevant warning lines:
TLS on entry point "https" must be converted manually. See https://docs.traefik.io/routing/routers/#tls
Obviously the first thing to do is to visit and read the page that was so helpfully mentioned in the warning – so I took some time to do that to be introduced to the changes. Well it looks like TLS configuration has changed a lot, but it’s actually quite unclear how my static configuration needs to change. As the documentation notes, I have an [entrypoints.https]
section set up already, listening on :443
… If I take a look back at the previous section the config did not actually require any special TLS options, I believe they were provided via the related k8s Ingress
resource.
I think should be safe by leaving this section alone – I don’t need to change the default min versions and other TLS options.
For reference, the relevant warning lines:
The entry point (admin) defined in API must be converted manually. See https://docs.traefik.io/operations/api/
The “admin” site is now called the “dashboard” (maybe it always was?) and has seen a significant refresh – looking at the linked page the guide lets you know how to set up the “Secure mode”. First you add the [api]
to the static configuration like so:
[api]
dashboard = true
This isn’t really needed, because the dashboard is set to enabled by default. Then we need a routing configuration on Traefik attached to a service called api@internal
. It looks like this configuration must be dynamic, however, I think I’ll need to create a new file (and figure out how to make traefik find it). The configuration I need looks like this:
# Dynamic Configuration
[http.routers.my-api]
rule = "Host(`traefik.domain.com`)"
service = "api@internal"
middlewares = ["auth"]
[http.middlewares.auth.basicAuth]
users = [
"test:$apr1$H6uskkkW$IgXLP6ewTrSuBkTrqE8wj/",
"test2:$apr1$d9hr9HBB$4HxwgUir3HP4EsggP/QNo0",
]
So very similar to the old stuff, but the biggest difference is that this needs to be part of the dynamic configuration. It also looks like the router is under http
so I think that should be https
since I want the dashboard to be protected by TLS, and I’ll probably change some names around. Well anyway, let’s look at how the dynamic configuration works. It looks like I’ll have to add a provider for it like so:
[providers.file]
directory = "/etc/traefik/config/dynamic"
I’ll also update the ConfigMap
(and update the mount path) to have some files that end up in this location. Here are what the changes look like:
traefik.ds.yaml
:
---
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: traefik-ingress-controller
# ... lots of yaml ... #
- image: traefik:v1.7.20-alpine
name: traefik-ingress-lb
ports:
args:
- --logLevel=INFO
- --configfile=/etc/traefik/config/static/traefik.toml
# ... more yaml ... #
volumeMounts:
- mountPath: /etc/traefik/config/static
name: static-config
- mountPath: /etc/traefik/config/dynamic # this is new
name: dynamic-config
volumes:
- name: static-config
configMap:
name: traefik-static-config
- name: dynamic-config
configMap:
name: traefik-dynamic-config
traefik-dynamic.configmap.yaml
:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: traefik-dynamic
namespace: ingress
data:
traefik.toml: |
# Dynamic Configuration
[https.routers.api]
rule = "Host(`traefik.vadosware.io`)"
service = "api@internal"
middlewares = ["auth"]
[https.middlewares.auth.basicAuth]
users = [
"admin:<bcrypt'ed password>",
]
I’ve updated the volumes
and volumeMounts
sections to better make clear the static vs dynamic configurations and added them. I’ll also have to passin a path to the static config file in the daemonset via the --configFile
option to traefik
itself.
I use the “Makeinfra” pattern, which means that all my infrastructure is managed wiht Makefile
s – while this seems like a terrible idea, it’s very easy and obvious for me to manage which makes it a win for me. When I see a “please kubectl -f make
instead
In traefik’s case there is a Helm chart to look through. After looking through it, I’ve decided that I probably don’t need to take anything from it because I won’t be using the CRD flavor approach for configuring my services (for now). I feel the opposite from the community on this point, reading the excerpt on the Kubernetes CRD configuration page:
However, as the community expressed the need to benefit from Traefik features without resorting to (lots of) annotations, we ended up writing a Custom Resource Definition (alias CRD in the following) for an IngressRoute type, defined below, in order to provide a better way to configure access to a Kubernetes cluster.
I’m actually totally fine using annotations – so I’ll skip all the changed CRD stuff. Here are the other changes I made:
Deployment
+ Service
now – but I definitely need a DaemonSet
– I want traefik to be active on every node in the cluster (I think the difference in assumption here is that I’d be using a LoadBalancer
available from one of the big cloud boys, so I’ll ignore that bit)strategy
(RollingUpdate
)readinessProbe
and livenessProbe
sections along with making sure to add the --ping
arg (and if you look back, ping
is enabled on the static configuration)resources
section with more generous requests
and limits
pods
was missing (I skipped the CRD stuff)[global]
cannot be a standalone elementOnce I tried to deploy I was hit with this error:
$ k logs traefik-ingress-controller-kjhht -n ingress
2020/03/07 02:14:19 command traefik error: global cannot be a standalone element (type *static.Global)
The [global]
in the generated Traefik config cannot be a standalone element, evidently… Despite it coming straight out of the generated config – maybe this was something that it just copied from my previous config that needed to be deleted.
By creating an http
endpoint I disabled the automatic redirection that Traefik was doing to point all websites to https://
versions. I noticed this when I checked http://statping.vadosware.io
– I got a 404, when in the past the result was always to redirect to https://statping.vadosware.io
(and serve the site). As always, Traefik documentation is fantastic, and had a section I should have read more closely. Unfortunately this part of Traefik got a little harder to use, and others have noticed:
In the end, instead of going with the parital solution, I just upgraded to v2.2.0-rc1 which contains a better fix – adding a global entry point redirection. This meant that I needed to take advantage of the work done in the merged PR by updating to v2.2.0-rc1
and adding the following to my static config:
[entryPoints.http]
address = ":80"
[entryPoints.http.http.redirections.entryPoint]
to = "https"
scheme = "https"
NOTE: the http.http
is required because I decided to call my HTTP entrypoint “http” and the subkey that needs to be changed is http
.
Looks like I picked exactly the right day to upgrade, because this would have been a huge PITA otherwise. After doing that, I kubectl apply
the daemonset and sometimes to force re-deploy of the ingress controller I run a little command I keep handy:
$ kubectl patch ds/traefik-ingress-controller -n ingress -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"deployDate\":\"`date +'%s'`\"}}}}}"
daemonset.apps/traefik-ingress-controller patched
After this it took a little while but the HTTP -> HTTPS redirection started working again.
Here are all the added and updated configuration files (unchanged files, like the namespace configuration are not included here) when everything is said & done:
traefik-static.configmap.yaml
:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: traefik-static
namespace: ingress
data:
traefik.toml: |
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.http.redirections.entryPoint]
to = "https"
scheme = "https"
[entryPoints.https]
address = ":443"
[providers]
providersThrottleDuration = "2s"
[providers.kubernetesIngress]
throttleDuration = "0s"
[providers.file]
directory = "/etc/traefik/config/dynamic"
[metrics]
[metrics.prometheus]
buckets = [0.1, 0.3, 1.2, 5.0]
entryPoint = "traefik"
[ping]
entryPoint = "http"
[log]
level = "INFO"
[accessLog]
bufferingSize = 0
[http.middlewares]
[http.middlewares.all-compress.compress]
[https.middlewares]
[https.middlewares.all-compress.compress]
traefik-dynamic.configmap.yaml
:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: traefik-dynamic
namespace: ingress
data:
traefik.toml: |
# Dynamic Configuration
[https.routers.api]
rule = "Host(`traefik.vadosware.io`)"
service = "api@internal"
middlewares = ["auth"]
[https.middlewares.auth.basicAuth]
users = [
"admin:<bcrypt'ed password>",
]
traefik-dynamic.rbac.yaml
:
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: traefik-ingress-controller
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- secrets
- pods
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: traefik-ingress-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: traefik-ingress-controller
subjects:
- kind: ServiceAccount
name: traefik-ingress-controller
namespace: ingress
traefik.ds.yaml
:
---
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: traefik-ingress-controller
namespace: ingress
labels:
k8s-app: traefik-ingress-lb
spec:
selector:
matchLabels:
k8s-app: traefik-ingress-lb
name: traefik-ingress-lb
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
k8s-app: traefik-ingress-lb
name: traefik-ingress-lb
spec:
hostNetwork: true
serviceAccountName: traefik-ingress-controller
terminationGracePeriodSeconds: 60
containers:
- image: traefik:v2.2
name: traefik-ingress-lb
args:
- --logLevel=INFO
- --configFile=/etc/traefik/config/static/traefik.toml
- --ping=true
# PORTS
ports:
- name: http
containerPort: 80
hostPort: 80
- name: https
containerPort: 443
hostPort: 443
# RESOURCES
resources:
requests:
cpu: 500m
memory: 128Mi
limits:
cpu: 2
memory: 2Gi
# SECURITY
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
# PROBES
livenessProbe:
httpGet:
path: /ping
port: 80
failureThreshold: 1
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 2
readinessProbe:
httpGet:
path: /ping
port: 80
failureThreshold: 1
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 2
# VOLUMES
volumeMounts:
- mountPath: /etc/traefik/config/static
name: static-config
- mountPath: /etc/traefik/config/dynamic
name: dynamic-config
volumes:
- name: static-config
configMap:
name: traefik-static
- name: dynamic-config
configMap:
name: traefik-dynamic
And a make
command later, everything’s running:
$ make
kubectl apply -f traefik.ns.yaml
namespace/ingress unchanged
kubectl apply -f traefik.serviceaccount.yaml
serviceaccount/traefik-ingress-controller unchanged
kubectl apply -f traefik.rbac.yaml
clusterrole.rbac.authorization.k8s.io/traefik-ingress-controller unchanged
clusterrolebinding.rbac.authorization.k8s.io/traefik-ingress-controller unchanged
kubectl apply -f traefik-static.configmap.yaml
configmap/traefik-static unchanged
kubectl apply -f traefik-dynamic.configmap.yaml
configmap/traefik-dynamic unchanged
kubectl apply -f traefik.ds.yaml
daemonset.apps/traefik-ingress-controller configured
kubectl apply -f traefik.svc.yaml
service/traefik-ingress-service configured
$ k get pods -n ingress
NAME READY STATUS RESTARTS AGE
traefik-ingress-controller-kjhht 1/1 Running 4 2m3s
And if we do a kubectl get all -n ingress
:
$ k get all -n ingress
NAME READY STATUS RESTARTS AGE
pod/traefik-ingress-controller-kjhht 1/1 Running 4 8m30s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/traefik-ingress-service NodePort 10.98.134.235 <none> 80:32492/TCP,443:30165/TCP 453d
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/traefik-ingress-controller 1 1 1 1 1 <none> 76d
And a quick peek at the logs for the traefik instance:
$ k logs traefik-ingress-controller-kjhht -n ingress
time="2020-03-07T02:24:30Z" level=warning msg="Ingress totejo/next-totejo: the apiVersion 'extensions/v1beta1' is deprecated, use 'networking.k8s.io/v1beta1' instead."
time="2020-03-07T02:24:30Z" level=warning msg="Ingress totejo/totejo: the apiVersion 'extensions/v1beta1' is deprecated, use 'networking.k8s.io/v1beta1' instead."
time="2020-03-07T02:24:30Z" level=warning msg="Ingress monitoring/statping: the apiVersion 'extensions/v1beta1' is deprecated, use 'networking.k8s.io/v1beta1' instead."
time="2020-03-07T02:24:30Z" level=warning msg="Ingress totejo/fathom: the apiVersion 'extensions/v1beta1' is deprecated, use 'networking.k8s.io/v1beta1' instead."
<THOUSANDS OF LINES OF THE ABOVE WARNINGS>
Uh oh, better do that real quick!
extensions/v1beta1
to networking.k8s.io/v1beta1
insteadTraefik was complaining about it quite a bit – so I went through and changed the Ingress
resource apiVersion
s to be networking.k8s.io/v1beta1
. After changing my resources though I found a very peculiar behavior – kubectl applying the new apiVersion has no effect, the resources get made with extensions/v1beta1
instead of what is inside the resource. I think this is a bug – I don’t expect my resource to be silently changed, but this is how a bunch of features in kubernetes work (ex. admission controllers).
This behavior is unsurprisingly not a bug. In the future when I update the cluster and extensions/v1beta1
is fully deprecated I guess that’s when these warnings will go away, since right now the version I’m using is inbetween and supports both, so for now I’m going to just… ignore these warnings.
Hopefully this post helps some that might have been procrastinating on switching over from Traefik v1 to v2 – it was a relatively simple process, so I don’t know why I put it off so long (well I know why but that’s a discussion for another time)!