Awesome FOSS Logo
Discover awesome open source software
Launched 🚀🧑‍🚀

Improving security easily with Traefik and Kubernetes

Categories
Traefik logo + Kubernetes logo

tl;dr - I upgraded traefik and added some resources (IngressRoute, Middleware) to get a better security score from Mozilla’s (HTTP) Observatory. The upgrade from 2.2.0-rc1 to 2.3.2 came with a few breaking changes so it was a bit involved (see Traefik v1 to v2 docs and also the general v2.x migration docs)

I recently came across an insanely helpful and concise Written by Sam Texas from simplecto.com post while surfing r/Traefik subreddit. While I was vaguely aware of Mozilla Observatory, I had never run it against my own site, and my score is terribad:

original observatory score

While I have some of the absolute basics down, I’m missing a lot of easy wins and configuration that could help improve this score and make my site more secure in general. One of the awesome things about investing in Kubernetes and keeping up (somewhat) this whole time is that I got put on to Traefik, a really amazing L4/L7 load balancer (which is a tiny bit difficult to configure, though I guess nothing is “easy”). The integration between Traefik and Kubernetes is fantastic (Traefik has a lot of fantastic integration) and one of the best things about it is that I can just modify my Ingress to take advantage of the fixes that Sam made.

The Ingress, before any changes

Here’s what the YAML looks like for the ingress that powers this blog:

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: blog
  namespace: vadosware-blog
  annotations:
    ingress.kubernetes.io/ssl-redirect: "true"
    ingress.kubernetes.io/limit-rps: "20"
    ingress.kubernetes.io/proxy-body-size: "10m"
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "traefik"
    traefik.frontend.rule.type: PathPrefixStrip
spec:
  tls:
  - hosts:
    - "www.vadosware.io"
    - "vadosware.io"
    - "mailing-list.vadosware.io"
    secretName: vadosware-blog-blog-tls
  rules:
  - host: "www.vadosware.io"
    http:
      paths:
      - path: "/"
        backend:
          serviceName: blog
          servicePort: 80
  - host: "vadosware.io"
    http:
      paths:
      - path: "/"
        backend:
          serviceName: blog
          servicePort: 80
  - host: "mailing-list.vadosware.io"
    http:
      paths:
      - path: "/"
        backend:
          serviceName: blog
          servicePort: 5000

(I’ve included the entire ingress YAML here to hopefully add something to the body of work here, Sam has basically covered it all in his post)

The additions I have to make (that should help) were detailed by Sam:

# Adding in secure headers
- traefik.http.middlewares.securedheaders.headers.forcestsheader=true
- traefik.http.middlewares.securedheaders.headers.sslRedirect=true
- traefik.http.middlewares.securedheaders.headers.STSPreload=true
- traefik.http.middlewares.securedheaders.headers.ContentTypeNosniff=true
- traefik.http.middlewares.securedheaders.headers.BrowserXssFilter=true
- traefik.http.middlewares.securedheaders.headers.STSIncludeSubdomains=true
- traefik.http.middlewares.securedheaders.headers.stsSeconds=63072000
- traefik.http.middlewares.securedheaders.headers.frameDeny=true

There was a little digging on my part through the Traefik Kubernetes Ingress provider documentation

What should have worked

Here’s what you might expect would have worked:

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: blog
  namespace: vadosware-blog
  annotations:
    ingress.kubernetes.io/ssl-redirect: "true"
    ingress.kubernetes.io/limit-rps: "20"
    ingress.kubernetes.io/proxy-body-size: "10m"
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "traefik"

    traefik.frontend.rule.type: PathPrefixStrip
    traefik.http.middlewares.securedheaders.headers.forcestsheader: true
    traefik.http.middlewares.securedheaders.headers.sslRedirect: true
    traefik.http.middlewares.securedheaders.headers.STSPreload: true
    traefik.http.middlewares.securedheaders.headers.ContentTypeNosniff: true
    traefik.http.middlewares.securedheaders.headers.BrowserXssFilter: true
    traefik.http.middlewares.securedheaders.headers.STSIncludeSubdomains: true
    traefik.http.middlewares.securedheaders.headers.stsSeconds: 63072000
    traefik.http.middlewares.securedheaders.headers.frameDeny: true
spec:
# ... rest of the ingress ...

However, this didn’t work – I’m using traefik:v2.2.0-rc1 and there was a massive change in how Traefik works, requiring migration from v1. Since I’m not quite ready to do the full migration (I’d have to migrate every single Ingress to a IngressRoute with matching Middleware resources, etc), I had to search quite hard to find this post which references how it was done before.

Unfortunately, the post didn’t work, there’s another issue which points to the fact that Traefik might have never actually implemented custom header support for frontends. This lead me down quite the rabbit hole of issues:

Migrating to Traefik v2.3.2

Looks like it’s already time to pay the piper – using v2.2.0-rc1 isn’t good enough because I need a version greater than v2.2.0 for the annotations and features I’d like to use to work. Breaking changes are involved, so I’m going to have to be more careful this time.

To make it a little bit easier to manage, I’m going to:

  • Set up a new traefik instance that is the newer version (v2.3.2) on port 9090 and 9443 for HTTP and HTTPS respectively.
  • Use the example project containous/whoami, and I’m going to set up a completely separate ingress

With this setup I should be able to test the new ingress and configuration without disturbing the old too much before I’m ready to start switching apps over.

The new DaemonSet configuration

Here’s what the DaemonSet looks like for the new traefik instances:

traefik.ds.yaml:

---
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: traefik
spec:
  updateStrategy:
    type: RollingUpdate
  template:
    spec:
      hostNetwork: true
      serviceAccountName: traefik
      terminationGracePeriodSeconds: 60
      containers:
      - image: traefik:v2.3.2
        name: traefik
        # BINARY ARGS
        args:
          - --log.level=WARNING
          - --configFile=/etc/traefik/config/static/traefik.toml
        # PORTS
        ports:
        - name: http
          # TODO: revert to 80
          containerPort: 9090
          hostPort: 9090
        - name: https
          # TODO: revert to to 443
          containerPort: 9443
          hostPort: 9443
        # RESOURCES
        resources:
          requests:
            cpu: 500m
            memory: 128Mi
          limits:
            cpu: 2
            memory: 2Gi
        # SECURITY
        securityContext:
          capabilities:
            drop:
            - ALL
            add:
            - NET_BIND_SERVICE
        # PROBES
        livenessProbe:
          httpGet:
            path: /ping
            port: 80
          failureThreshold: 1
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
        readinessProbe:
          httpGet:
            path: /ping
            port: 80
          failureThreshold: 1
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
        # VOLUMES
        volumeMounts:
          - mountPath: /etc/traefik/config/static
            name: static-config
      volumes:
      - name: static-config
        configMap:
          name: traefik-static

traefik-static.configmap.yaml:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: traefik-static
data:
  traefik.toml: |
    [log]
      level = "INFO"

    [metrics]
      # Prometheus (enabled)
      [metrics.prometheus]
      entrypoint = "prometheus"

    [tracing]
      # Jaeger
      [tracing.jaeger]
        [tracing.jaeger.collector]
          endpoint = "jaeger.monitoring.svc.cluster.local"

    # Ping service
    [ping]
      entryPoint = "web"

    [entryPoints]

      [entryPoints.web]
        address = ":9090"

      [entryPoints.websecure]
        address = ":9443"

      [entryPoints.prometheus]
        address = ":9999"

    [providers]
      providersThrottleDuration = "2s"

      # Kubernetes CRD
      [providers.kubernetescrd]
        ingressClass= "ingress-new"

      # Kubernetes Ingress
      [providers.kubernetesingress]
        ingressClass= "ingress-new"

      # [providers.file]
      #   directory = "/etc/traefik/config/dynamic"

    [api]
      dashboard = true

    [accessLog]
      bufferingSize = 0    

A few bits aren’t pictured here:

  • The kustomization.yaml file that I’m using to inject things like namespace and labels
  • Various resources that I’m using (ServiceAccount, etc)

It shouldn’t be hard to imagine what those other files look like, so I’ll leave them out for now.

Issue: Something is listening on port 8080?

NOTE: Originally, I tried to use ports 8080 and 8443, and when doing that I ran into this issue. If you start with the resource config above you shouldn’t have this problem at all (since it uses 9090 and 9443).

The very first thing that went wrong was that something was somehow listening on port 8080!

$ k logs traefik-7jdfn -n ingress-new
time="2020-11-06T09:10:48Z" level=info msg="Configuration loaded from file: /etc/traefik/config/static/traefik.toml"
time="2020-11-06T09:10:48Z" level=info msg="Traefik version 2.3.2 built on 2020-10-19T18:36:22Z"
time="2020-11-06T09:10:48Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://doc.traefik.io/traefik/contributing/data-collection/\n"
2020/11/06 09:10:48 traefik.go:76: command traefik error: error while building entryPoint traefik: error preparing server: error opening listener: listen tcp :8080: bind: address already in use

After a bit of searching, seeing that the entrypoint was traefik (which is one I hadn’t set), I immediately thought of the api functionality – the docs noted that the default entrypoint was traefik. Since I enabled the dashboard, I quickly disabled it to test and sure enough that’s what it was. The proper way to get around this is to change the entrypoint that prometheus will use, and specify the port you want there.

Here’s what the updates in the ConfigMap looked like:

modified   kubernetes/cluster/ingress/traefik-new/traefik-static.configmap.yaml
@@ -11,6 +11,7 @@ data:
     [metrics]
       # Prometheus (enabled)
       [metrics.prometheus]
+      entrypoint = "prometheus"

     [tracing]
       # Jaeger
@@ -32,6 +33,9 @@ data:

       [entryPoints.websecure]
         address = ":9443"
+
+      [entryPoints.prometheus]
+        address = ":9999"
         # address = ":443" # TODO: REVERT

     [providers]

Creating the whoami Deployment and Service

The Deployment and Service are pretty straight forward though I definitely took some time to try and remember the format for these resources:

---
kind: Deployment
apiVersion: apps/v1
metadata:
  namespace: ingress-new
  name: whoami
spec:
  replicas: 1
  selector:
    matchLabels:
      app: whoami
  template:
    metadata:
      labels:
        app: whoami
    spec:
      containers:
        - name: whoami
          image: traefik/whoami
          resources:
            requests:
              cpu: 100m
              memory: 100Mi
          ports:
            - name: http
              containerPort: 80

---
kind: Service
apiVersion: v1
metadata:
  namespace: ingress-new
  name: whoami
spec:
  selector:
    app: whoami
  ports:
    - protocol: TCP
      port: 9090
      targetPort: 80

To Ingress or to IngressRoute

Earlier on in looking at the documenation for the new 2.x releases, plain kubernetes Ingress were looking to be deprecated in favor of IngressRoutes and in the new version much of teh old functionality (middleware, for example) weren’t properly supported. Maybe I was mistaking the move from frontends/backends to routers/services/middleware, but it looks support for middlewares and the like has been restored via annotations on Ingresses and Services.

Creating the new whoami IngressRoute CRD

OK, before we start trying to write an IngressRoute let’s try and walk through what should be happening:

  1. Our machine is running at some IP address (xxx.xxx.xxx.xxx in IPv4)
  2. We’ve persisted the fact that the user-readable address www.vadosware.io should point to that IP address.
  3. whoami is running as a contianer on this machine, and a Kubernetes service which enables internal access to it has been created

Normally, at this point in the normal flow, we’d create a Kubernetes Ingress, point it at the Service, and give it the right FQDN (let’s say whoami.vadosware.io) we’d be golden, but the new setup requires a few things:

  • We need to create an IngressRoute as Traefik wants
  • We need to access on a different port than normal – 9090. The URL should look like http://whoami.vadosware.io:9090
  • We should be able to check that HTTPS is also working if we change the address to https://whoami.vadosware.io:9443 as well, though the cert will likely be untrusted

Going off the documentation for traefik this is what it should look like:

whoami.ingressroute.yaml:

---
kind: IngressRoute
apiVersion: traefik.containo.us/v1alpha1
metadata:
  name: whoami
  namespace: ingress-new
spec:
  entryPoints:
    - web
    - websecure
  routes:
    - kind: Rule
      match: Host(`whoami.vadosware.io`)
      services:
        - kind: Service
          name: whoami
          port: 80
          passHostHeader: true
      tls:
        secretName: whoami-tls

One thing I did forget to do was put in the new CRDs for the new traefik version, so I took some time to do that as well.

Integrating with cert-manager for TLS

I use the outstanding cert-manager to manage my Let’s Encrypt-powered TLS x509 certificates. I love that cert-manager keys in to and watches regular Ingress resources, but now with traefik v2 – how will it handle IngressRoute resources? There’s not likely to be a direct integration between these two very different project but the plumbing could go lots of ways. After some quick searching I found some resources:

Thanks to the community already working around this issue it’s nice and easy for me to figure out:

whoami.certificate.yaml:

---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: whoami
  namespace: ingress-new
spec:
  commonName: whoami.vadosware.io
  secretName: whoami-tls
  dnsNames:
    - whoami.vadosware.io
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer

Let’s see if we get the secret we expect to exist after a while:

$ k get secrets -n ingress-new
NAME                  TYPE                                  DATA   AGE
default-token-769mm   kubernetes.io/service-account-token   3      47h
traefik-token-v8gr5   kubernetes.io/service-account-token   3      46h
whoami-tls            kubernetes.io/tls                     3      9m31s

Well, there’s something in there! Not sure if it works yet, time to test it out!

Testing out the IngressRoute

While everything seems to be working, after trying to visit the site in the browser at https://whoami.vadosware.io:9443, I’m getting the usual error with self-signed certs for the 404 page. There are a few issues:

  • Traefik itself is having issues – there are some errors in the traefik pod logs
  • Plain HTTP requests aren’t working – requests to whoami.vadosware.io are redirected to HTTP (this is normally what I’d want, but in this case I’d like to be able to test whoami without worrying about TLS)
  • HTTPS isn’t working – The Traefik default cert is being returned instead of whatever is in whoami-tls (if there’s anything in it at all)

DEBUG: Errors in traefik logs cased by Jaeger config

If I look in the logs for the traefik pod there are a bunch of repeats of one error in particular:

175.177.46.129 - - [08/Nov/2020:07:37:01 +0000] "GET / HTTP/2.0" - - "-" "-" 1 "-" "-" 0ms
time="2020-11-08T07:37:01Z" level=error msg="Tracing jaeger error: error reporting Jaeger span \"EntryPoint websecure whoami.vadosware.io:9443\": Post \"jaeger.monitoring.svc.cluster.local\": unsupported protocol scheme \"\"" tracingProviderName=jaeger

Well the problem here is pretty clear – I need to specify the protocol scheme on the Jaeger URL! When I look at the jaeger deployment I’m using though, it turns out that I’m using the zipkin collection method (and a super) old version of Jaeger – I’m going to avoid upgrading for now, and just switch the sending method on traefik to Zipkin. Here’s what the changes looked like in the ConfigMap:

@@ -14,10 +14,10 @@ data:
       entrypoint = "prometheus"

     [tracing]
-      # Jaeger
-      [tracing.jaeger]
-        [tracing.jaeger.collector]
-          endpoint = "jaeger.monitoring.svc.cluster.local"
+      # TODO: Switch to proper Jaeger collector
+      [tracing.zipkin]
+        httpEndpoint = "http://jaeger.monitoring.svc.cluster.local:9411/api/v2/spans"
+        sampleRate = 0.1

     # Ping service
     [ping]

In addition to this, I needed to make sure the Pod’s DNS policy was set to ClusterFirstWithHostNet so that my hostNetwork-enabled DS could acess things like *.svc.cluster.local. Once that was done, it was time to hit the website again (everything else is still broken but we should have one less problem).

DEBUG: Plain HTTP requests not being allowed through

It looks like I’m going to need to change the static configuration of traefik to allow plain HTTPS requests through (not automatically redirect them to web secure), but then make sure to set some sort of HTTPS redirect middleware (Redirect Scheme) on every IngressRoute I make afterwards. Changes are pretty simple to the ConfigMap:

@@ -28,8 +28,10 @@ data:
       [entryPoints.web]
         address = ":9090"
         # address = ":80" # TODO: REVERT
-        [entryPoints.web.http.redirections.entryPoint]
-          to = "websecure"
+
+        ## Applications must redirect themselves
+        # [entryPoints.web.http.redirections.entryPoint]
+        #   to = "websecure"

       [entryPoints.websecure]
         address = ":9443"

After changing this and manually entering http://whoami.vadosware.io:9090 in my browser I get the following log lines (startup lines omitted):

175.177.46.129 - - [08/Nov/2020:08:02:47 +0000] "GET / HTTP/1.1" - - "-" "-" 1 "-" "-" 0ms
175.177.46.129 - - [08/Nov/2020:08:02:48 +0000] "GET /favicon.ico HTTP/1.1" - - "-" "-" 2 "-" "-" 0ms

OK, so this is great – HTTP is now not automatically redirected, but I’m still getting the 404 page! I need to figure out why the whoami service isn’t being hit. I wonder if the issue happening because I left off the namespace configuration in the IngressRoute? I though (naively, without checking) that it might default to the namespace of the resource, but maybe it went to default… Nope, that’s not it (but I’ll leave the namespace change there just in case).

Turns out I had the Service configuration wrong! I was using 9090 as the port, when there’s no need to:

@@ -36,5 +36,4 @@ spec:
     app: whoami
   ports:
     - protocol: TCP
-      port: 9090
-      targetPort: 80
+      port: 80

After this things were still not working properly, but after some digging, I found the reason – I forgot to set the ingressClass on the IngressRoute to match traefik-new (which is set on Traefik itself to avoid collisions with the old infrastructure):

modified   kubernetes/cluster/ingress/traefik-new/whoami.ingressroute.yaml
@@ -4,6 +4,8 @@ apiVersion: traefik.containo.us/v1alpha1
 metadata:
   name: whoami
   namespace: ingress-new
+  annotations:
+    kubernetes.io/ingress.class: traefik-new
 spec:
   entryPoints:
     - web

After that things worked great – visiting http://whoami.vadosware.io:9090 worked like a charm.

DEBUG: HTTPS not working, default cert being returned

After getting HTTP working, the default cert being returned was still the Traefik default cert, unfortunately. It looks like I need to do a bit more configuration, probably on the cert-manager side. Let’s first take a look at the contents of that secret that I thought was fine, by running k get secret whoami-tls -n ingress-new -o json:

{
    "apiVersion": "v1",
    "data": {
        "ca.crt": "",
        "tls.crt": "<REDACTED>",
        "tls.key": "<REDACTED>"
    },
    "kind": "Secret",
    "metadata": {
        "annotations": {
            "cert-manager.io/alt-names": "whoami.vadosware.io",
            "cert-manager.io/certificate-name": "whoami",
            "cert-manager.io/common-name": "whoami.vadosware.io",
            "cert-manager.io/ip-sans": "",
            "cert-manager.io/issuer-kind": "ClusterIssuer",
            "cert-manager.io/issuer-name": "letsencrypt-prod",
            "cert-manager.io/uri-sans": ""
        },
        "creationTimestamp": "2020-11-08T07:22:14Z",
        "name": "whoami-tls",
        "namespace": "ingress-new",
        "resourceVersion": "165576087",
        "selfLink": "/api/v1/namespaces/ingress-new/secrets/whoami-tls",
        "uid": "2049a07c-2328-43f5-bf17-8f4ef61dbcc1"
    },
    "type": "kubernetes.io/tls"
}

Well it ceratinly does look fine! But for some reason it’s not being served with the page! There are a few things that are off here though, it was a little too easy to get this cert (considering HTTP wasn’t working), so what I think has happened is that the the cert was achieved using the OLD traefik instance. While this would normally be worrying, I think it should be fine, as long as I can serve this cert. Going forward though, if I use an ingressClass I’ll need to use the certmanager.k8s.io/acme-http01-ingress-class annotation to let cert-manager create an Ingress that will be properly picked up. I won’t have to worry about however, if I switch to letting the one (new) ingress handle all Ingress and IngressRoute objects that show up.

Well regardless of all this pontificating, the right cert still isn’t being used – let’s look back at the resource configs…

It looks like one issue is that the certificate, although being created, is either old/the wrong format:

$ k delete certificate.cert-manager.io/whoami -n ingress-new
Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=Certificate failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found

cert-manager tries to do some sortof conversion and fails… Looks like I’m running cert-manager version 0.15.0 and it’s actually had a 1.0 release (and more) in the meantime! Well I do know that the Secret that contains the cert got made, so I’m going to not worry about upgrading cert-manager for now.

Let’s just double check that what I thought did happen – that the old ingress handled the let’s encrypt request and I got a good cert:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            04:e8:28:a1:da:35:a3:81:b2:be:82:e2:63:dd:ff:2f:76:78
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3
        Validity
            Not Before: Nov  8 06:22:42 2020 GMT
            Not After : Feb  6 06:22:42 2021 GMT
        Subject: CN = whoami.vadosware.io
        Subject Public Key Info:
... rest of the cert ...

Well that’s definitely not the traefik default cert! For some reason the secret isn’t getting picked up correctly… Jesus christ, I had the tls config in the wrong place:

modified   kubernetes/cluster/ingress/traefik-new/whoami.ingressroute.yaml
@@ -19,5 +19,5 @@ spec:
           namespace: ingress-new
           port: 80
           passHostHeader: true
-      tls:
-        secretName: whoami-tls
+  tls:
+    secretName: whoami-tls

With this it worked just fine, immediately! It looks like the TLS and non-TLS IngressRoutes need to be separate, as well. After this was working, I was able to boil down the IngressRoute to the following:

---
kind: IngressRoute
apiVersion: traefik.containo.us/v1alpha1
metadata:
  name: whoami
  namespace: ingress-new
  annotations:
    kubernetes.io/ingress.class: traefik-new
spec:
  entryPoints:
    - web
  routes:
    - kind: Rule
      match: Host(`whoami.vadosware.io`)
      services:
        - name: whoami
          port: 80
---
kind: IngressRoute
apiVersion: traefik.containo.us/v1alpha1
metadata:
  name: whoami-tls
  namespace: ingress-new
  annotations:
    kubernetes.io/ingress.class: traefik-new
spec:
  entryPoints:
    - websecure
  routes:
    - kind: Rule
      match: Host(`whoami.vadosware.io`)
      services:
        - name: whoami
          port: 80
  tls:
    secretName: whoami-tls

Adding some middlewares

Before we call this new configuration good, let’s do some experimentation with adding some middlewares and making sure they work!

HTTPS redirect via middleware

Since we’re going to have to be enforcing HTTP->HTTPS redirect on a per-IngressRoute basis, it’s worth figuring out how the Traefik HTTPS RedirectScheme plugin works. It’s pretty simple actually:

---
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: https-redirect
  namespace: ingress-new
spec:
  redirectScheme:
    scheme: https
    port: "9443"

And to make the HTTP IngressRoute use this:

modified   kubernetes/cluster/ingress/traefik-new/whoami.ingressroute.yaml
@@ -12,6 +12,8 @@ spec:
   routes:
     - kind: Rule
       match: Host(`whoami.vadosware.io`)
+      middlewares:
+        - name: https-redirect
       services:
         - name: whoami
           port: 80

A quick check and it works great!

Basic Auth

While we’re here, let’s also try the basic auth middleware.

I ran into a few issues trying to get this set up:

  • One thing that did get me was that htpasswd wasn’t installed on my system – I needed to get apache-tools on the AUR.
  • Defining the Secret properly was surprisingly error-prone

After a while where I couldn’t figure out what was wrong with my Secret, I checked the traefik showed this:

time="2020-11-09T02:49:57Z" level=error msg="Error while reading basic auth middleware: failed to load auth credentials: found 2 elements for secret 'ingress-new/whoami-basicauth', must be single element exactly" providerName=kubernetescrd middlewareName=ingress-new-whoami-basicauth

The Secret needs to have exactly ONE sub-key/“element” which is users. Here’s what the secret looked like at the end (for only one user):

---
apiVersion: v1
kind: Secret
metadata:
  name: whoami-basicauth
  namespace: ingress-new
data:
  # Which action star stars in the classic movie where "who am I" is screamed from a mountain top?
  users: |2
    YWRtaW46JGFwcjEkUko3MDBtWUUkQkhpZVBhTXNFemM4QWp3VHlrelVSMQoK

If it’s not obvious, don’t use that combination. It took me a bit to realize that the |2 did not indicate that list had 2 elements, but rather that it represents how to handle indented text. To generate that password I used what the Traefik docs suggested:

$ htpasswd -nb admin jackiechan | openssl base64

The password is a reference to this scene (LOUD).

Improving the TLS options

So way back at the beginning of the post, I wanted to set the following options:

- traefik.http.middlewares.securedheaders.headers.forcestsheader=true
- traefik.http.middlewares.securedheaders.headers.sslRedirect=true
- traefik.http.middlewares.securedheaders.headers.STSPreload=true
- traefik.http.middlewares.securedheaders.headers.BrowserXssFilter=true
- traefik.http.middlewares.securedheaders.headers.STSIncludeSubdomains=true
- traefik.http.middlewares.securedheaders.headers.stsSeconds=63072000
- traefik.http.middlewares.securedheaders.headers.frameDeny=true
- traefik.http.middlewares.securedheaders.headers.contentTypeNosniff=true

Let’s do it on through the Headers middleware!

---
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: whoami-headers
  namespace: ingress-new
spec:
  headers:
    forceSTSHeader: true
    stsPreload: true
    contentTypeNosniff: true
    browserXssFilter: true
    stsIncludeSubdomains: true
    stsSeconds: 63072000
    frameDeny: true
    sslRedirect: true
    accessControlAllowMethods:
      - "GET"
    accessControlAllowOriginList:
      - "https://whoami.vadosware.io"
    accessControlMaxAge: 100
    addVaryHeader: true

And after adding the new middleware to the IngressRoute’s list, I see the headers in my browser:

headers in Firefox inspector

Skipped: Trying the same setup with a single Kubernetes native Ingress resource

I’d love to get into trying to set up the same route with just a Kbuernetes native Ingress resource (since the Middlewares are already present, we can just use them all as annotations), but this post is already SUPER long.

Skipped: Prep all your apps for the switch

You can host your new apps on the alternate Traefik by doing the following:

  • List all your ingresses (kubectl get ingress --all-namespaces)
  • Update the Ingresses for the old applications with the new annotations (see the migration guide)
  • Check the usual URLs but at different ports (9090/9443) to ensure the apps will still work

I did find that the new Traefik setup definitely breaks HTTP/HTTPS redirection, but left it for now. The annotations that are supposed to work didn’t, and neither did static config.

Skipped: Switching out the traefik instances

While I could go for a downtime-less switchover by adding some sort of double tier setup (traefik/nginx/whatever that listens on 80 and decides between old and new traefik instances), I’m going to just go with the swicth-and-hope-for-the-best approach. I’m going to leave this as an exercise for the reader (if you’re in the same situation) – note that you may or may not have to change a buch of Ingress object so IngressRoutes for extra features, and note that you will need also need to change the non-standard ports (like 9443 and 9090)!

One thing I will say is that you must pay attention to the migration guide! The following bits were crucial:

Entrypoints definition for regular kubernetes ingress:

# Static configuration

[entryPoints.web]
  address = ":80"

[entryPoints.websecure]
  address = ":443"
  [entryPoints.websecure.http]
    [entryPoints.websecure.http.tls]

[providers.kubernetesIngress]

And also the per-ingress annotations:

kind: Ingress
apiVersion: networking.k8s.io/v1beta1
metadata:
  name: example-tls
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
    traefik.ingress.kubernetes.io/router.tls: "true"

# ... rest of the ingress ... #

In the end, I did go around and change all my Ingresses to IngressRoutes since it was just so much easier to configure that way (instead of dealing with stringly-typed annoations).

Finally, my new and improved Observatory Score

For the blog I needed to tweak the Middleware a bit:

---
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: blog-headers
spec:
  headers:
    forceSTSHeader: true
    stsPreload: true
    contentTypeNosniff: true
    browserXssFilter: true
    stsIncludeSubdomains: true
    stsSeconds: 63072000
    frameDeny: true
    sslRedirect: true
    contentSecurityPolicy: |
      default-src 'none';form-action 'none';frame-ancestors 'none';base-uri 'none'
    accessControlAllowMethods:
      - "GET"
      - "POST"
    accessControlAllowOriginList:
      - "https://*.vadosware.io"
      - "https://vadosware.io"
    accessControlMaxAge: 100
    addVaryHeader: true
    referrerPolicy: "same-origin"

And here’s the IngressRoute:

---
kind: IngressRoute
apiVersion: traefik.containo.us/v1alpha1
metadata:
  name: blog-https
spec:
  entryPoints:
    - websecure
  routes:
    - kind: Rule
      match: Host(`vadosware.io`, `www.vadosware.io`)
      middlewares:
        - name: blog-headers
      services:
        - name: blog
          port: 80
  tls:
    secretName: vadosware-blog-tls

With that all set, we’ve got Traefik fully migrated to a newer version and the new IngressRoute, Middleware and other resources in place, I can re-run the observatory check:

Improved observatory scores

Great success! A better, more secure site and flexible configuration for the future.

Wrap-up

OK, so it was a long winding route to get here, but was it all worth it? Someone with knowledge of NGINX and how easy and well known it is to operate might complain that this is all possible with a properly configured NGINX instance (inside the container hosting this statically generated blog). It totally is – but I thought this was worth making a post about because of how easy it was to set up with Traefik and my Kubernetes cluster. Things got somewhat messy because I wasn’t up to date on my Traefik version, and the Traefik project itself had some large changes it worked through, but the second time this kind of change it’s needed it will be much simpler, just a few resources here and there.

Deploying web services and websites (this blog included) has become so much easier with Kubernetes (once you’re past the various ramp-ups and movement in underlying libraries) and Traefik has been good to me 99% of the time. I wanted to make this post to highlight how much easier things can be when you take the time to learn/implement complex tooling.