Running Zulip on Kubernetes

Categories
Kubernetes logo + logo

tl;dr - I started up a local Zulip instance on my tiny k8s cluster for some friends and I to use – it was surprisingly challenging to do so this post contains the recipe (k8s resource configs).

While everyone is getting really into Slack and other workplace-use chat programs, the group of really good open source options has been growing steadily. At this point, Slack is so well known and used that it’s sort of become a must-have for trendy startups and midsize companies, at least in my experience.

While we’re not a startup or a midsize company, I wanted to get a group chat going with some people that I co-work with – we sometimes use Facebook Messenger but they recently pushed a bug that actually made loading messages not work on desktop so I was particularly annoyed with my privacy/convenience tradeoff at that point in time. As far as open source chat servers there are quite a few:

While I won’t go into all these solutions and how they differ (mostly because I don’t know), Zulip is one that I’ve been wanting to try because of it’s approach to threading. It’s pretty widely touted as being interesting because of this feature, and outside of that being a working chat program feels like a pretty common bar.

Why not IRC?

Well mostly because I didn’t think about it. I definitely should have, but I’m just going to take the hit to my nerd cred.

Why Kubernetes?

Well I didn’t, but that’s what one of the servers I have is running – at this point I actually find it easier to deploy things to k8s on this cluster than other ways. These days things are rarely as easy as “download and run this binary”, so that means that containerization and k8s things (services, ingresses, etc) are very helpful.

I haven’t done too much recent k8s work but the cluster has been stable (this blog is hosted on it) and I do keep my tiny cluster (not even really a cluster) upgraded thanks to super mega useful kubeadm. Later this year I plan on expanding my cluster even more (basically only my mail server is keeping from doing it) and using wireguard for the links between the servers (as it’s over the open web), so that’s going to be exciting.

The Code

Here are all the resources that it took to get everything set up! Note that this code is using the make-infra pattern so that’s how replacement is getting done, and that’s how everything is being orchestrated (if you want to see the Makefile shoot me an email and I’ll add it to the post).

It took me roughly 8-16 hours to get this right, but hopefully someone out there saves some time.

General stuff

chat.ns.yaml (the namespace)

---
apiVersion: v1
kind: Namespace
metadata:
  name: chat

chat.ingress.yaml (the inlet from the world wide internet)

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: zulip
  namespace: chat
  annotations:
    ingress.kubernetes.io/ssl-redirect: "true"
    ingress.kubernetes.io/limit-rps: "20"
    ingress.kubernetes.io/proxy-body-size: "25m"
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "traefik"
spec:
  tls:
  - hosts:
      - chat.domain.tld
    secretName: chat-tls
  rules:
  - host: chat.domain.tld
    http:
      paths:
      - backend:
          serviceName: zulip
          servicePort: 80

Memcached

zulip-memcached-data.pvc.yaml (the PVC powering Zulip’s memcached instance)

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: zulip-memcached-data
  namespace: chat
  labels:
    app: chat
    tier: data
spec:
  storageClassName: openebs-jiva-non-ha
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi

zulip-memcached.deployment.yaml (the deployment of memcached)

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: zulip-memcached
  namespace: chat
spec:
  replicas: 1
  selector:
    matchLabels:
      app: chat
      component: memcached
  template:
    metadata:
      labels:
        app: chat
        component: memcached
    spec:
      containers:
      - name: memcached
        image: quay.io/sameersbn/memcached:latest
        resources:
          limits:
            cpu: 75m
            memory: 768Mi
        ports:
        - name: memcached
          containerPort: 11211
          protocol: TCP

zulip-memcached.svc.yaml (the Service that ensures memcached is reachable by DNS name from zulip)

---
apiVersion: v1
kind: Service
metadata:
  name: zulip-memcached
  namespace: chat
  labels:
    app: chat
spec:
  selector:
    app: chat
    component: memcached
  ports:
  - name: memcached
    port: 11211
    targetPort: 11211
    protocol: TCP

RabbitMQ

zulip-rabbitmq-data.pvc.yaml (the PVC that holds the rabbitmq data)

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: zulip-rabbitmq-data
  namespace: chat
  labels:
    app: chat
    tier: data
spec:
  storageClassName: openebs-jiva-non-ha
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi

zulip-rabbitmq.deployment.yaml.pre

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: zulip-rabbitmq
  namespace: chat
spec:
  replicas: 1
  selector:
    matchLabels:
      app: chat
      component: rabbitmq
  template:
    metadata:
      labels:
        app: chat
        component: rabbitmq
    spec:
      containers:
      - name: rabbitmq
        image: rabbitmq:3.7.7
        resources:
          limits:
            cpu: 1
            memory: 1Gi
        env:
        - name: RABBITMQ_DEFAULT_USER
          value: "zulip"
        - name: RABBITMQ_DEFAULT_PASS
          value: "${ZULIP_RABBITMQ_PASSWORD}"
        volumeMounts:
          - name: data
            mountPath: /var/lib/rabbitmq
        ports:
          - name: rabbitmq
            containerPort: 5672
            protocol: TCP
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: zulip-rabbitmq-data

zulip-rabbitmq.svc.yaml (the Service that ensures rabbitmq is reachable by DNS name from zulip)

---
apiVersion: v1
kind: Service
metadata:
  name: zulip-rabbitmq
  namespace: chat
  labels:
    app: chat
spec:
  selector:
    app: chat
    component: rabbitmq
  ports:
  - name: rabbitmq
    port: 5672
    targetPort: 5672
    protocol: TCP

Redis

zulip-redis-data.pvc.yaml (the PVC that holds the data for redis)

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: zulip-redis-data
  namespace: chat
  labels:
    app: chat
    tier: data
spec:
  storageClassName: openebs-jiva-non-ha
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi

zulip-redis.deployment.yaml (the deployment of Redis itself)

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: zulip-redis
  namespace: chat
spec:
  replicas: 1
  selector:
    matchLabels:
      app: chat
      component: redis
  template:
    metadata:
      labels:
        app: chat
        component: redis
    spec:
      containers:
      - name: redis
        image: quay.io/sameersbn/redis:latest
        resources:
          limits:
            cpu: 1
        ports:
        - name: redis
          containerPort: 6379
          protocol: TCP
        volumeMounts:
          - name: data
            mountPath: /var/lib/redis
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: zulip-redis-data

zulip-redis.svc.yaml (the Service that ensures redis is reachable by DNS name from zulip)

---
apiVersion: v1
kind: Service
metadata:
  name: zulip-redis
  namespace: chat
  labels:
    app: chat
spec:
  selector:
    app: chat
    component: redis
  ports:
  - name: redis
    port: 6379
    targetPort: 6379
    protocol: TCP

Postgres

zulip-postgres-command.configmap.yaml (A configmap used to override the initial command for the Postgres instance)

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: zulip-postgres-command
  namespace: chat
  labels:
    app: chat
data:
  start.sh:
    |
      echo "=> hacking around missing dictionary files..."
      touch /usr/local/share/postgresql/tsearch_data/english.stop && \
      touch /usr/local/share/postgresql/tsearch_data/en_us.dict && \
      touch /usr/local/share/postgresql/tsearch_data/en_us.affix && \

      wget https://raw.githubusercontent.com/zulip/zulip/master/puppet/zulip/files/postgresql/zulip_english.stop -O /usr/local/share/postgresql/tsearch_data/zulip_english.stop

      echo "=> running docker-entrypoint..."
      exec /docker-entrypoint.sh postgres

zulip-postgres-entrypoint.configmap.yaml (A configmap used to override the postgres setup – ‘entrypoint’ is a bit of a misnomer)

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: zulip-postgres-entrypoint
  namespace: chat
  labels:
    app: chat
data:
  setup.sql:
    |
      ALTER ROLE zulip SET search_path TO zulip,public;
      CREATE SCHEMA zulip AUTHORIZATION zulip;
      CREATE EXTENSION pgroonga;
      GRANT USAGE ON SCHEMA pgroonga TO zulip;

zulip-postgres-data.pvc.yaml (The PVC that holds the postgres data)

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: zulip-postgres-data
  namespace: chat
  labels:
    app: chat
    tier: data
spec:
  storageClassName: openebs-jiva-non-ha
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi

zulip-postgres.deployment.yaml.pre (The postgres deployment itself)

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: zulip-postgres
  namespace: chat
spec:
  replicas: 1
  selector:
    matchLabels:
      app: chat
      component: postgres
  template:
    metadata:
      labels:
        app: chat
        component: postgres
    spec:
      containers:
      - name: postgresql
        # zulip/zulip-postgres:latest (or :10) does *not* work,
        # the startup sript script tries to run `su` but that doesn't work.
        # output is something like the following:
        # | .... other script output ....
        # | ++ su postgres -c psql
        # | su: must be run from a terminal
        #
        # Swapped to using pgroonga's image here + custom setup scripts via mounted configmaps
        image: groonga/pgroonga:latest-alpine-11
        command: ["/bin/bash"]
        args:
          - /pg-command/start.sh
        resources:
          limits:
            cpu: 1
            memory: 4Gi
        env:
        - name: POSTGRES_DB
          value: zulip
        - name: POSTGRES_USER
          value: zulip
        - name: POSTGRES_PASSWORD
          value: "${ZULIP_POSTGRES_PASSWORD}"
        volumeMounts:
          - name: data
            mountPath: /var/lib/postgresql
          - name: pg-entrypoint
            mountPath: /docker-entrypoint-initdb.d
          - name: pg-command
            mountPath: /pg-command
        ports:
        - containerPort: 5432
          name: postgres
          protocol: TCP
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: zulip-postgres-data
      - name: pg-entrypoint
        configMap:
          name: zulip-postgres-entrypoint
      - name: pg-command
        configMap:
          name: zulip-postgres-command

zulip-postgres.svc.yaml (the Service that ensures postgres is reachable by DNS name from zulip)

---
apiVersion: v1
kind: Service
metadata:
  name: zulip-postgres
  namespace: chat
  labels:
    app: chat
spec:
  selector:
    app: chat
    component: postgres
  ports:
  - name: postgres
    port: 5432
    targetPort: 5432
    protocol: TCP

Zulip (finally)

zulip-data.pvc.yaml (the data stored by the zulip server)

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: zulip-data
  namespace: chat
  labels:
    app: chat
    component: app
spec:
  storageClassName: openebs-jiva-non-ha
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi

zulip.deployment.yaml.pre (the zulip deployment itself)

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: zulip
  namespace: chat
spec:
  replicas: 1
  selector:
    matchLabels:
      app: chat
      component: zulip
  template:
    metadata:
      labels:
        app: chat
        component: zulip
    spec:
      containers:
      - name: zulip
        image: zulip/docker-zulip:2.0.5-0
        resources:
          limits:
            cpu: 2
            memory: 4Gi
        env:
        # Postgres
        - name: DB_HOST
          value: 'zulip-postgres'

        # Memcached
        - name: SETTING_MEMCACHED_LOCATION
          value: 'zulip-memcached:11211'

        # Redis
        - name: SETTING_REDIS_HOST
          value: 'zulip-redis'

        # RabbitMQ
        - name: SETTING_RABBITMQ_HOST
          value: 'zulip-rabbitmq'
        - name: SETTING_RABBITMQ_USERNAME
          value: 'zulip'
        - name: SETTING_RABBITMQ_PASSWORD
          value: '${ZULIP_RABBITMQ_PASSWORD}'

        # Zulip email settings
        - name: SETTING_EMAIL_HOST
          value: 'mail.vadosware.io'  # E.g. 'smtp.example.com'
        - name: SETTING_EMAIL_USE_TLS
          value: 'True'
        - name: SETTING_EMAIL_PORT
          value: '587'
        - name: SETTING_EMAIL_HOST_USER
          value: 'vados'
        - name: SETTING_NOREPLY_EMAIL_ADDRESS
          value: 'zulip-noreply@vadosware.io'
        - name: SETTING_TOKENIZED_NOREPLY_EMAIL_ADDRESS
          value: 'zulip-noreply-{token}@vadosware.io'

        # Zulip settings
        - name: SETTING_EXTERNAL_HOST
          value: 'chat.vadosware.io'
        - name: SETTING_ZULIP_ADMINISTRATOR
          value: 'vados@vadosware.io'
        - name: ZULIP_AUTH_BACKENDS
          value: 'EmailAuthBackend'
        - name: ZULIP_USER_EMAIL
          value: 'vados@vadosware.io'
        - name: ZULIP_USER_DOMAIN
          value: 'vadosware.io'
        - name: ZULIP_USER_PASS
          value: '${ZULIP_USER_PASSWORD}'
        - name: DISABLE_HTTPS # SSL termination @ the proxy
          value: 'True'
        - name: SSL_CERTIFICATE_GENERATION
          value: 'self-signed'

        # Secrets
        - name: SECRETS_email_password
          value: '${SMTP_PASSWORD}'
        - name: SECRETS_secret_key
          value: '${ZULIP_SECRET_KEY}'
        - name: SECRETS_postgres_password
          value: '${ZULIP_POSTGRES_PASSWORD}'
        - name: SECRETS_rabbitmq_password
          value: '${ZULIP_RABBITMQ_PASSWORD}'

        # Uncomment this when configuring the mobile push notifications service
        # - name: PUSH_NOTIFICATION_BOUNCER_URL
        # value: 'https://push.zulipchat.com'
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          name: https
          protocol: TCP
        volumeMounts:
          - name: data
            mountPath: /data
        # readinessProbe:
        #   httpGet:
        #     path: /
        #     port: 80
        #     scheme: HTTP
        #   initialDelaySeconds: 120
        #   timeoutSeconds: 12
        #   timeoutSeconds: 12
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: zulip-data

zulip.svc.yaml (the Service that exposes the zulip instance so that it can be hit by the external-facing Ingress)

---
apiVersion: v1
kind: Service
metadata:
  name: zulip
  namespace: chat
  labels:
    app: chat
spec:
  selector:
    app: chat
    component: zulip
  ports:
  - name: http
    port: 80
    targetPort: 80
    protocol: TCP
  - name: https
    port: 443
    targetPort: 443
    protocol: TCP

A note on storage/resource usage

If you’re wondering what I’m using for storage, I’m using OpenEBS and have written about it in the past (in another post I compared it’s performance to hostpath). Another great option (that I used to use) is Rook.

Also, the resource limits are basically random – feel free to swap in your own values.

Issues

I only ran into one significant issue setting up the Zulip server once it was running – user creation seems to have a race condition where it 500s but the user gets created. Basically, it’s pretty inconvenient to do the initial set up – the process looks something like this:

$ k exec -it <zulip pod> -n <namespace> /bin/bash
root@<zulip pod># su zulip
root@<zulip pod># /home/zulip/deployments/current/manage.py generate_realm_creation_link

Once you get the realm creation link for the first one, you can visit it in your browser. Unfortunately it’s kind of hard to add users to realms, so what I’ve had to do was actually create the user first (with the CLI, manage.py create_user). So the process should look something like this:

  1. Start all the resources
  2. Use manage.py to generate the realm creation link
  3. Visit the link, name the initial realm, enter user details (I get a 500 after submitting)
  4. (optionally) Go back to the root and attempt to log in, if you can’t you’ll need to jump back in and use manage.py create_user
  5. (optionally) Check the postgres db in tables like zerver_userprofile and zerver_realm to see what state things are in

Wrapping up

Hopefully whoever comes across this post finds it useful! A few things I want to do in the future:

  • Set up Synapse, Rocket.Chat, Mattermost in a similar way and evaluate them
  • Make PR to Zulip to improve their k8s resources (right now it’s all one Pod)

We’ll see if I ever get to these, but thanks for reading.

Did you find this read beneficial? Send me questions/comments/clarifciations.
Want my expertise on your team/project? Send me interesting opportunities!