Updating From Traefik V1 to V2.2

Categories
Traefik logo + Kubernetes logo

tl;dr - UDP support is coming to traefik soon, so I’m updating my cluster’s traefik to be ready to take advantage of it and all the other new features. Going from v1 -> v2.2 (the latest) requires some config changes so I detail them below.

NGINX is one of the most venerated load balancers on the internet and when I first set up my tiny kubernetes cluster I used it. While it’s got lots of benefits (like being able to debug it by just entering the container and poking around /etc/nginx), I started looking at some of the options to fill the spot of my kubernetes Ingress Controller. One of those options is Traefik and it’s worked great for me (which means for the most part I’ve not had to deal with it).

A sidenote on the use of Go

Traefik is written in Go which I was initially was fanatical about, because of how much simpler it was than some of the other choices (*cough*Javacough) in the space and how it easily statically compiled, built fast, etc. While I’ve cooled on Golang somewhat (Rust is where it’s at these days), the fact that Golang has contributed a huge amount to being able to easily write good, efficient software is undeniable with the existence of projects like Traefik. As far as I can tell it’s performant and easy to deploy, and has an approachable code base.

Why upgrade to v2?

Of course it’s a good idea to stay up to date with newer versions that your infrastructure depends on – security concerns, more efficiency, etc. Upgrading does carry risk, however, and I sure do dislike whenever I take down my cluster, even if it’s not running anything too important. I’m paritcularly motivated to upgrade Traefik to v2 because of the prospect of it supporting UDP in the near future. I’ve also posted to reddit about this, and people shared the enthusiasm (if upvotes are any indicator).

The old config

Let’s jump right in – in usual Kubernetes style, Traefik is my cluster’s ingress controller– this means it runs as a DaemonSet on every node in my cluster and mediates access from the outside world.

The Traefik DaemonSet gets it’s configuration in the usual ways – from a ConfigMap. Here’s what the current one (for v1) looks like:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: traefik-config
  namespace: ingress
data:
  traefik.toml: |
    defaultEntryPoints = ["http","https"]
    debug = false
    logLevel = "INFO"

    #Config to redirect http to https
    [entryPoints]
      [entryPoints.http]
        address = ":80"
        compress = true
      [entryPoints.https]
        address = ":443"
        compress = true
        [entryPoints.https.tls]

      [entryPoints.admin]
        address = ":9999"
        compress = true
        [entryPoints.admin.auth.basic]
          users = [
            "admin:<bcrypt'ed password>",
          ]

    [api]
      entrypoint = "admin"

      [api.statistics]
        recentErrors = 10

    [kubernetes]
      # Only create ingresses where the object has traffic-type: external label
      # labelselector = "traffic-type=external"

    [metrics]
      [metrics.prometheus]
      buckets=[0.1,0.3,1.2,5.0]
      entryPoint = "traefik"

    [ping]
      entryPoint = "http"

    [accessLog]

Side note – it’s awesome that Traefik lets you specify some pre-existing users for the basic auth? All you have to do is drop in those <user>:<bcrypt'ed password> entries into entrypoints.admin.auth.basic.users and you’re good to go! It’s pretty amazing. This is one of the things I wish all self-hostable tools supported.

Also TOML is the best configuration language out there – change my mind. It’s got the best mix of human readability along with allowing for complex structures. INI config style headers are super important, the nesting you can do, and the targeting you can do for a section (ex. [metrics.prometheus]) is awesome.

The new config

Traefik has excellent documentation and of course, there’s a migration guide for v1 to v2. and they even have a tool called traefik-migration-tool for doing the migration! Unfortunately, running it gives me a bunch of warnings about manual conversion:

$ traefik-migration-tool static -i traefik.toml
Compress on entry point "http" must be converted manually. See https://docs.traefik.io/middlewares/compress/
Compress on entry point "https" must be converted manually. See https://docs.traefik.io/middlewares/compress/
TLS on entry point "https" must be converted manually. See https://docs.traefik.io/routing/routers/#tls
Compress on entry point "admin" must be converted manually. See https://docs.traefik.io/middlewares/compress/
The entry point (admin) defined in API must be converted manually. See https://docs.traefik.io/operations/api/

Despite those warnings there is a generated static folder which houses the traefik static configuration that looks like this:

$ tree static/
static/
├── new-traefik.toml
└── new-traefik.yml

0 directories, 2 files

Inside, new-traefik.toml looks like this:

[global] # NOTE: SEE the BUG: in this post, this should not have been there, though it was generated

[entryPoints]
  [entryPoints.admin]
    address = ":9999"
  [entryPoints.http]
    address = ":80"
  [entryPoints.https]
    address = ":443"

[providers]
  providersThrottleDuration = "2s"
  [providers.kubernetesIngress]
    throttleDuration = "0s"

[api]
  insecure = true

[metrics]
  [metrics.prometheus]
    buckets = [0.1, 0.3, 1.2, 5.0]
    entryPoint = "traefik"

[ping]
  entryPoint = "http"

[log]
  level = "INFO"

[accessLog]
  bufferingSize = 0

And that’s what the new configuration at least should look like at the base level!

Fixing the manual changes

Enabling compression

For reference, the relevant warning lines:

Compress on entry point "http" must be converted manually. See https://docs.traefik.io/middlewares/compress/
Compress on entry point "https" must be converted manually. See https://docs.traefik.io/middlewares/compress/
Compress on entry point "admin" must be converted manually. See https://docs.traefik.io/middlewares/compress/

It looks like v2 has changed how this is set up – surely the compress middleware/plugin is still present. Taking a look at the Traefik documentation site reveals the Compress middleware that I’m looking for. Interestingly enough, I have two choices on how to implement this – I can either use a Kubernetes CRD like so:

# Enable gzip compression
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: test-compress
spec:
  compress: {}

Alternatively I can include it in the static (file) configuration, like so:

# Enable gzip compression
[http.middlewares]
  [http.middlewares.test-compress.compress]

Since I want compression on all the traffic (at least for now), I’m going to go ahead and enable it at the file level. I’m not actually sure that I have all the required CRDs installed (which reminds me, I’ll need to pull and dissect the Traefik install YAML and see if there are any bits that I don’t already have for V2). I won’t show the updated file here, but I added these lines to the TOML file that was generated earlier (also, a section for https).

TLS on the “https” entrypoint

For reference, the relevant warning lines:

TLS on entry point "https" must be converted manually. See https://docs.traefik.io/routing/routers/#tls

Obviously the first thing to do is to visit and read the page that was so helpfully mentioned in the warning – so I took some time to do that to be introduced to the changes. Well it looks like TLS configuration has changed a lot, but it’s actually quite unclear how my static configuration needs to change. As the documentation notes, I have an [entrypoints.https] section set up already, listening on :443… If I take a look back at the previous section the config did not actually require any special TLS options, I believe they were provided via the related k8s Ingress resource.

I think should be safe by leaving this section alone – I don’t need to change the default min versions and other TLS options.

The Admin app

For reference, the relevant warning lines:

The entry point (admin) defined in API must be converted manually. See https://docs.traefik.io/operations/api/

The “admin” site is now called the “dashboard” (maybe it always was?) and has seen a significant refresh – looking at the linked page the guide lets you know how to set up the “Secure mode”. First you add the [api] to the static configuration like so:

[api]
  dashboard = true

This isn’t really needed, because the dashboard is set to enabled by default. Then we need a routing configuration on Traefik attached to a service called api@internal. It looks like this configuration must be dynamic, however, I think I’ll need to create a new file (and figure out how to make traefik find it). The configuration I need looks like this:

# Dynamic Configuration
[http.routers.my-api]
  rule = "Host(`traefik.domain.com`)"
  service = "api@internal"
  middlewares = ["auth"]

[http.middlewares.auth.basicAuth]
  users = [
    "test:$apr1$H6uskkkW$IgXLP6ewTrSuBkTrqE8wj/",
    "test2:$apr1$d9hr9HBB$4HxwgUir3HP4EsggP/QNo0",
  ]

So very similar to the old stuff, but the biggest difference is that this needs to be part of the dynamic configuration. It also looks like the router is under http so I think that should be https since I want the dashboard to be protected by TLS, and I’ll probably change some names around. Well anyway, let’s look at how the dynamic configuration works. It looks like I’ll have to add a provider for it like so:

[providers.file]
  directory = "/etc/traefik/config/dynamic"

I’ll also update the ConfigMap (and update the mount path) to have some files that end up in this location. Here are what the changes look like:

traefik.ds.yaml:

---
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: traefik-ingress-controller
# ... lots of yaml ... #
      - image: traefik:v1.7.20-alpine
        name: traefik-ingress-lb
        ports:
        args:
        - --logLevel=INFO
        - --configfile=/etc/traefik/config/static/traefik.toml
# ... more yaml ... #
        volumeMounts:
          - mountPath: /etc/traefik/config/static
            name: static-config
          - mountPath: /etc/traefik/config/dynamic # this is new
            name: dynamic-config
      volumes:
      - name: static-config
        configMap:
          name: traefik-static-config
      - name: dynamic-config
        configMap:
          name: traefik-dynamic-config

traefik-dynamic.configmap.yaml:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: traefik-dynamic
  namespace: ingress
data:
  traefik.toml: |
    # Dynamic Configuration
    [https.routers.api]
      rule = "Host(`traefik.vadosware.io`)"
      service = "api@internal"
      middlewares = ["auth"]

    [https.middlewares.auth.basicAuth]
      users = [
        "admin:<bcrypt'ed password>",
      ]

I’ve updated the volumes and volumeMounts sections to better make clear the static vs dynamic configurations and added them. I’ll also have to passin a path to the static config file in the daemonset via the --configFile option to traefik itself.

Checking the Kubernetes resources for v2.2 against my current

I use the “Makeinfra” pattern, which means that all my infrastructure is managed wiht Makefiles – while this seems like a terrible idea, it’s very easy and obvious for me to manage which makes it a win for me. When I see a “please kubectl -f ” what I do instead is download the file and break apart the file into it’s contituent resources and orchestrate them with make instead. This helps me to better understand what any given software is doing (and supposed to be doing) much more clearly. I did this for Traefik v1, but there’s a very likely possibility things have changed (especially the CRDs) with v2.2 so it’s time to do it again.

In traefik’s case there is a Helm chart to look through. After looking through it, I’ve decided that I probably don’t need to take anything from it because I won’t be using the CRD flavor approach for configuring my services (for now). I feel the opposite from the community on this point, reading the excerpt on the Kubernetes CRD configuration page:

However, as the community expressed the need to benefit from Traefik features without resorting to (lots of) annotations, we ended up writing a Custom Resource Definition (alias CRD in the following) for an IngressRoute type, defined below, in order to provide a better way to configure access to a Kubernetes cluster.

I’m actually totally fine using annotations – so I’ll skip all the changed CRD stuff. Here are the other changes I made:

  • The Traefik helm chart is a Deployment + Service now – but I definitely need a DaemonSet – I want traefik to be active on every node in the cluster (I think the difference in assumption here is that I’d be using a LoadBalancer available from one of the big cloud boys, so I’ll ignore that bit)
  • Copied over the strategy (RollingUpdate)
  • Copied over readinessProbe and livenessProbe sections along with making sure to add the --ping arg (and if you look back, ping is enabled on the static configuration)
  • Added resources section with more generous requests and limits
  • Updated the RBAC configurations – access to pods was missing (I skipped the CRD stuff)

BUG: [global] cannot be a standalone element

Once I tried to deploy I was hit with this error:

$ k logs traefik-ingress-controller-kjhht -n ingress
2020/03/07 02:14:19 command traefik error: global cannot be a standalone element (type *static.Global)

The [global] in the generated Traefik config cannot be a standalone element, evidently… Despite it coming straight out of the generated config – maybe this was something that it just copied from my previous config that needed to be deleted.

BUG: I forgot to make sure all http requests redirect to https

By creating an http endpoint I disabled the automatic redirection that Traefik was doing to point all websites to https:// versions. I noticed this when I checked http://statping.vadosware.io – I got a 404, when in the past the result was always to redirect to https://statping.vadosware.io (and serve the site). As always, Traefik documentation is fantastic, and had a section I should have read more closely. Unfortunately this part of Traefik got a little harder to use, and others have noticed:

In the end, instead of going with the parital solution, I just upgraded to v2.2.0-rc1 which contains a better fix – adding a global entry point redirection. This meant that I needed to take advantage of the work done in the merged PR by updating to v2.2.0-rc1 and adding the following to my static config:

      [entryPoints.http]
        address = ":80"
        [entryPoints.http.http.redirections.entryPoint]
          to = "https"
          scheme = "https"

NOTE: the http.http is required because I decided to call my HTTP entrypoint “http” and the subkey that needs to be changed is http.

Looks like I picked exactly the right day to upgrade, because this would have been a huge PITA otherwise. After doing that, I kubectl apply the daemonset and sometimes to force re-deploy of the ingress controller I run a little command I keep handy:

$ kubectl patch ds/traefik-ingress-controller -n ingress -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"deployDate\":\"`date +'%s'`\"}}}}}"
daemonset.apps/traefik-ingress-controller patched

After this it took a little while but the HTTP -> HTTPS redirection started working again.

The completed configuration files

Here are all the added and updated configuration files (unchanged files, like the namespace configuration are not included here) when everything is said & done:

traefik-static.configmap.yaml:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: traefik-static
  namespace: ingress
data:
  traefik.toml: |
    [entryPoints]

      [entryPoints.http]
        address = ":80"
        [entryPoints.http.http.redirections.entryPoint]
          to = "https"
          scheme = "https"

      [entryPoints.https]
        address = ":443"

    [providers]
      providersThrottleDuration = "2s"
      [providers.kubernetesIngress]
        throttleDuration = "0s"
      [providers.file]
        directory = "/etc/traefik/config/dynamic"

    [metrics]
      [metrics.prometheus]
        buckets = [0.1, 0.3, 1.2, 5.0]
        entryPoint = "traefik"

    [ping]
      entryPoint = "http"

    [log]
      level = "INFO"

    [accessLog]
      bufferingSize = 0

    [http.middlewares]
      [http.middlewares.all-compress.compress]

    [https.middlewares]
      [https.middlewares.all-compress.compress]

traefik-dynamic.configmap.yaml:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: traefik-dynamic
  namespace: ingress
data:
  traefik.toml: |
    # Dynamic Configuration
    [https.routers.api]
      rule = "Host(`traefik.vadosware.io`)"
      service = "api@internal"
      middlewares = ["auth"]

    [https.middlewares.auth.basicAuth]
      users = [
        "admin:<bcrypt'ed password>",
      ]

traefik-dynamic.rbac.yaml:

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: traefik-ingress-controller
rules:
  - apiGroups:
      - ""
    resources:
      - services
      - endpoints
      - secrets
      - pods
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: traefik-ingress-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: traefik-ingress-controller
subjects:
- kind: ServiceAccount
  name: traefik-ingress-controller
  namespace: ingress

traefik.ds.yaml:

---
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: traefik-ingress-controller
  namespace: ingress
  labels:
    k8s-app: traefik-ingress-lb
spec:
  selector:
    matchLabels:
      k8s-app: traefik-ingress-lb
      name: traefik-ingress-lb
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        k8s-app: traefik-ingress-lb
        name: traefik-ingress-lb
    spec:
      hostNetwork: true
      serviceAccountName: traefik-ingress-controller
      terminationGracePeriodSeconds: 60
      containers:
      - image: traefik:v2.2
        name: traefik-ingress-lb
        args:
        - --logLevel=INFO
        - --configFile=/etc/traefik/config/static/traefik.toml
        - --ping=true
        # PORTS
        ports:
        - name: http
          containerPort: 80
          hostPort: 80
        - name: https
          containerPort: 443
          hostPort: 443
        # RESOURCES
        resources:
          requests:
            cpu: 500m
            memory: 128Mi
          limits:
            cpu: 2
            memory: 2Gi
        # SECURITY
        securityContext:
          capabilities:
            drop:
            - ALL
            add:
            - NET_BIND_SERVICE
        # PROBES
        livenessProbe:
          httpGet:
            path: /ping
            port: 80
          failureThreshold: 1
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
        readinessProbe:
          httpGet:
            path: /ping
            port: 80
          failureThreshold: 1
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
        # VOLUMES
        volumeMounts:
          - mountPath: /etc/traefik/config/static
            name: static-config
          - mountPath: /etc/traefik/config/dynamic
            name: dynamic-config
      volumes:
      - name: static-config
        configMap:
          name: traefik-static
      - name: dynamic-config
        configMap:
          name: traefik-dynamic

And a make command later, everything’s running:

$ make
kubectl apply -f traefik.ns.yaml
namespace/ingress unchanged
kubectl apply -f traefik.serviceaccount.yaml
serviceaccount/traefik-ingress-controller unchanged
kubectl apply -f traefik.rbac.yaml
clusterrole.rbac.authorization.k8s.io/traefik-ingress-controller unchanged
clusterrolebinding.rbac.authorization.k8s.io/traefik-ingress-controller unchanged
kubectl apply -f traefik-static.configmap.yaml
configmap/traefik-static unchanged
kubectl apply -f traefik-dynamic.configmap.yaml
configmap/traefik-dynamic unchanged
kubectl apply -f traefik.ds.yaml
daemonset.apps/traefik-ingress-controller configured
kubectl apply -f traefik.svc.yaml
service/traefik-ingress-service configured
$ k get pods -n ingress
NAME                               READY   STATUS    RESTARTS   AGE
traefik-ingress-controller-kjhht   1/1     Running   4          2m3s

And if we do a kubectl get all -n ingress:

$ k get all -n ingress
NAME                                   READY   STATUS    RESTARTS   AGE
pod/traefik-ingress-controller-kjhht   1/1     Running   4          8m30s

NAME                              TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/traefik-ingress-service   NodePort   10.98.134.235   <none>        80:32492/TCP,443:30165/TCP   453d

NAME                                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/traefik-ingress-controller   1         1         1       1            1           <none>          76d

And a quick peek at the logs for the traefik instance:

$ k logs traefik-ingress-controller-kjhht -n ingress
time="2020-03-07T02:24:30Z" level=warning msg="Ingress totejo/next-totejo: the apiVersion 'extensions/v1beta1' is deprecated, use 'networking.k8s.io/v1beta1' instead."
time="2020-03-07T02:24:30Z" level=warning msg="Ingress totejo/totejo: the apiVersion 'extensions/v1beta1' is deprecated, use 'networking.k8s.io/v1beta1' instead."
time="2020-03-07T02:24:30Z" level=warning msg="Ingress monitoring/statping: the apiVersion 'extensions/v1beta1' is deprecated, use 'networking.k8s.io/v1beta1' instead."
time="2020-03-07T02:24:30Z" level=warning msg="Ingress totejo/fathom: the apiVersion 'extensions/v1beta1' is deprecated, use 'networking.k8s.io/v1beta1' instead."
<THOUSANDS OF LINES OF THE ABOVE WARNINGS>

Uh oh, better do that real quick!

GOTCHA: Converting ingress apiVersion extensions/v1beta1 to networking.k8s.io/v1beta1 instead

Traefik was complaining about it quite a bit – so I went through and changed the Ingress resource apiVersions to be networking.k8s.io/v1beta1. After changing my resources though I found a very peculiar behavior – kubectl applying the new apiVersion has no effect, the resources get made with extensions/v1beta1 instead of what is inside the resource. I think this is a bug – I don’t expect my resource to be silently changed, but this is how a bunch of features in kubernetes work (ex. admission controllers).

This behavior is unsurprisingly not a bug. In the future when I update the cluster and extensions/v1beta1 is fully deprecated I guess that’s when these warnings will go away, since right now the version I’m using is inbetween and supports both, so for now I’m going to just… ignore these warnings.

Wrapup

Hopefully this post helps some that might have been procrastinating on switching over from Traefik v1 to v2 – it was a relatively simple process, so I don’t know why I put it off so long (well I know why but that’s a discussion for another time)!