Setting Up Piwik on Kubernetes

Setting up a new Piwik instance on Kubernetes (including migrating old data)

vados

9 minute read

tl;dr - Setting up piwik is pretty straight forward, since I’ve gone through the trouble of setting up a database before, and piwik’s web based setup is pretty convenient. This post is the last in the pipeline that’s related to Kubernetes for a bit.

One of the most useful tools I’ve ever come across is Piwik – it’s an excellent self-hostable tool for doing web analytics like tracking visits to your website (this very site uses it as well). One of the last moves in porting all my infrastructure to Kubernetes was of course to port my Piwik data and figure out how I should be running Piwik on the new Kubernetes cluster. Piwik is a pretty robust piece of software, so I was worried that there would be a lot of moving pieces that I needed to port over.

Turns out it’s not that hard – Here’s what I did:

Step 1: Set up the mysql/mariadb container and transfer over old data

Piwik has an extensive set of user guides which are pretty fantastic. Also check out the piwik installation documentation FAQ, it’s pretty useful. Since I had an existing Piwik instance from which I am going to be transferring data, I also needed to read up on how to move piwik from one server to another.

I actually encountered quite a bit of frustration trying to use the official mysql docker image, and instead went with the official mariadb image. Here’s what I found while trying to work with the mysql container that made me switch:

  • The root user would become inaccessible from inside the container (if you bashed in) if you try to setup piwik once
  • ENV variables didn’t seem to be honored/weren’t working
  • Default encoding error hadn’t been solved, piwik needs default encoding to be utf8

Outside of these bulletpoints I didn’t write much in the notes about what frustrated me, but I’d really suggest you use mariadb instead.

One of the great things about Kubernetes (and Docker in general) is that you can actually do tests of this process WITHOUT too much hassle, just make temporary mariadb-based pods and fire away (and feel no guilt when the pod is torn down and none of the data is saved). At this point, I did a lot of experimentation and didn’t worry too much about getting a proper resource configuration written just yet.

NOTE You’re going to want to do the data transfer BEFORE you set up Piwik. Once the deployment is up (you can find the resource configuration later in this article), you’re going to want to do the data transfer BEFORE you kubectl port-forward and finish the web-based setup.

General gist of how I transferred data from the old Piwik instance (basically rehashing the Piwik FAQ):

  1. Go to your previous instsance and dump the database
  2. Unzip the piwik_(DATE).sql.gz file you generated from the other databsae (it’s just a gzipped SQL script)
  3. Put the unzipped file in a data volume accessible to your container (basically just get that file into the mariadb container somehow)
  4. Run mysql -p, put in the password that was specified in the container’s env (in your kubernetes resource config) and you should get the mariadb console
  5. source <absolute path to backup file> - this command will create the piwik table(s) and updating things along the way

Step 2: Finalize the Kubernetes resource configuration

After I was done experimenting with ephemeral mariadb/piwik containers, the finalized Kubernetes resource configuration looks like this:

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: piwik
  annotations:
    ingress.kubernetes.io/class: "nginx"
    ingress.kubernetes.io/ssl-redirect: "true"
    ingress.kubernetes.io/limit-rps: 20
spec:
  tls:
  - hosts:
    - "piwik.example.com"
    secretName: letsencrypt-certs-all
  rules:
  - host: "piwik.example.com"
    http:
      paths:
      - path: "/.well-known/acme-challenge"
        backend:
          serviceName: letsencrypt-helper-svc
          servicePort: 80
      - path: "/"
        backend:
          serviceName: piwik
          servicePort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: piwik
  labels:
    app: piwik
spec:
  type: LoadBalancer
  selector:
    app: piwik
  ports:
    - name: mysql
      protocol: TCP
      port: 3306
    - name: piwik
      protocol: TCP
      port: 80

---

apiVersion: batch/v2alpha1
kind: CronJob
metadata:
  name: piwik-archive
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: piwik-cron
            image: piwik:apache
            args:
            - /bin/bash
            - -c
            - date; /usr/local/bin/php /var/www/html/console core:archive www-data
          restartPolicy: OnFailure

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: piwik
  labels:
    app: piwik
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: piwik
        env: prod
    spec:
      containers:
      - name: piwik-mariadb
        image: mariadb:10.3.0
        imagePullPolicy: IfNotPresent
        args: ["--character-set-server=utf8mb4", "--collation-server=utf8mb4_unicode_ci"]
        ports:
          - containerPort: 3306
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: youshoulddefinitelychangethis
        volumeMounts:
        - name: piwik-mysql-data
          mountPath: /var/lib/mysql
        - name: piwik-mysql-config
          mountPath: /etc/mysql/conf.d
      - name: piwik
        image: piwik:3.1.1-apache
        imagePullPolicy: Always
        ports:
          - containerPort: 80
        volumeMounts:
        - name: piwik-config
          mountPath: /var/www/html/config
      volumes:
      - name: piwik-config
        hostPath:
          path: /var/data/piwik/config
      - name: piwik-mysql-data
        hostPath:
          path: /var/data/piwik/mysql/data
      - name: piwik-mysql-config
        hostPath:
          path: /var/data/piwik/mysql/config.d

NOTE For the initial set up, I left out the Cron Job resource.

You’ll likely have to change bits like where you store the data and/or the mysql root password, but that configuration kubectl apply’d correctly for me. I think I had forgotten while I was going through it but it’s useful to remember that containers in the same pod are exposed to each other as long as you use 127.0.0.1/localhost. I think in the past I’ve made services for the sole reason of exposing a container in a pod to another container, and that definitely isn’t necessary.

If this resource configuration works for you as it did for me, the next step is to get a feel for setting up Piwik through the web interface by port-forwarding to it with a command like kubectl port-forward <pod name> 5000:80.

Step 3: Setting up Piwik

Finishing the web-based setup for Piwik is pretty easy – if you’re going to be restoring data from a previous installation definitely ensure to set the table prefix to what you used before, along with the name of the database itself.

Another important thing to note if you’re doing a transfer is that you need to set the mariadb/mysql root password to the PREVIOUS instance’s mariadb/mysql root password. As soon as the previous database’s dump is restored, the root password will change, and it will no longer be what you put in the Kubernetes resource config, after the next restart.

After clicking through the web-based setup, Piwik will either recognize that the tables it’s trying to create are already present (if you’re doing a restoration/transfer), or it will make the necessary tables.

To test, visit your piwik instance through external ingress or kubectl port-forward, and attempt to log in and view everything.

EXTRA: Problems I ran into

piwik+apache was easier to use than piwik+nginx+phpfpm

While developing the resource configuration above, I initially started by using the version of piwik that was meant to work with NGINX and PHP-FPM.

A big limitation of Kubernetes is that you can only mount folders, NOT individual files. This meant that I spent a lot of time trying to figure out how to cleanly inject configuration into the default NGINX container. It took me more than 30 minutes and I couldn’t come up with anything that was clean (and definitely didn’t want to build my own container just for that), so I just went with the apache flavor of piwik instead, for which you can find the code for on github.

If you’re unfamiliar with Apache check it out – it’s an older very solid alternative to NGINX.

NOTE The PHP-FPM version of piwik listens on port 9000, and is looking for FCGI connections from a reverse proxy, it doesn’t serve HTTP directly. I basically abandoned this because I didn’t want to add another nginx container with custom configuration to do the appropriate redirection.

Couldn’t avoid web-based initial configuration

Piwik’s documentation around configuring it BEFORE it starts up (by way of some configuration files) was really hard to find (if it exists, I couldn’t find the definitive source). I had limited success dropping a config.ini.php script in the config folder, but Piwik would tell me that it couldn’t read the file for some reason, even when it was a carbon copy of the global.ini.php file that is created by setup afterwards.

I figured I just didn’t know enough about Piwik, and stuck to teh web-based configuration, which is worse for reproducability, but piwik is very much a pet piece of infrastructure so I didn’t kick myself over it for too long.

MariaDB root password changing after restoring data from another Piwik instsance

This is mentioned earlier in the guide but is worth a re-mention, since it really surprised me while I was going through it.

If you DID forget, and piwik is giving you errors that it can’t connect to the DB (likely after you re-deploy it once):

  1. Find out/remember what your root password used to be on the old serveer
  2. Bash into the container (kubectl exec -it piwik-<gibberish> -c piwik -- /bin/bash)
  3. To have piwik use it update config @ /var/www/html/config/config.ini.php

Of course, to edit that file you’ll have to do a little bit, likely at the very least installing Vim or some editor(apt-get update && apt-get install vim worked for me) – the container didn’t have nano, or vi, or vim installed.

Setting up the cron job that will do archiving for piwik

After some thinking it wasn’t hard to come up with what the resource configuration SHOULD look like for the cron job that needs to run with Piwik:

apiVersion: batch/v2alpha1
kind: CronJob
metadata:
  name: piwik-archive
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
- name: piwik-cron
  image: piwik:apache
  args:
  - /bin/bash
  - -c
  - date; /usr/local/bin/php /var/www/html/console core:archive www-data
restartPolicy: OnFailure

The next thing I realized was that I didn’t actually have Cron Jobs enabled on the Kubelet so I needed to enable it. This consisted of adding an additional runtime configuration switch (--runtime-config=batch/v2alpha1=true) to the api server manifest (for me that resides @ /etc/kubernetes/manifests/api-server.yaml).

Random thought: I wish there was a lighter piwik

I only really need Piwik for tracking hits (for now) to the various sites I operate. I really wish there was a simpler version of Piwik that just did that, and maybe used SQLite to make things even easier to move around. Maybe I’ll make something like that some day.

Wrapping up

So all in all it was pretty easy to set up Piwik. I spent almost as much time figuring out how to transfer over the backup (and how Piwik handles it) as actually figuring out what the resource configuration was supposed to look like. That’s a great sign for the usability of a tool like Kubernetes; after paying the startup cost (to learn Kubernetes concepts/technology), I’m being paid back in spades.

The theme on this blog for a good number of posts has been Kubernetes – this is looking like it’s going to come to an end, as this is the last post that was in the pipeline related to Kubernetes for a bit. Hope you’ve enjoyed these posts anywhere near as much as I enjoyed working through and getting used to Kubernetes!

Did you find this read beneficial? Send me questions/comments/clarifciations.
Want my expertise on your team/project? Send me interesting opportunities!