tl;dr - If you didn’t know how to set up a SystemD timer and service to backup your k0s cluster, you no longer have an excuse, scroll down for the code
Recently while going through my workloads and making sure everything’s backed up to external storage (I’m using BackBlaze B2) and it’s been great so far). I came across the problem of how I should back up the cluster itself. Originally I thought I might go with Velero but the following stopped me:
cp
at time of backup (this stuff should be in S3 to start with but bear with me)While there are lots of options in the Kubernetes distribution space I use and prefer k0s – it makes the right amount of decisions and chooses pretty neutral options which leaves me lots of lattitude. Along with making the right decisions and having good defaults it also layers on some nice functionality via the k0sctl
command line tool – one of those being cluster backup.
Well if you’re going to back up your cluster, you probably want to back it up continuously right? Well it took me longer than I thought it would so I figured it was worth writing about, and that brings us to this post you’re looking at right now. Let’s get into it.
I found that making an object storage bucket with a write-only key that you can use for backups was pretty reasonable – even if the key were to get exposed, people would only be able to write to (and possibly fill up) my storage. That’s not great, but it’s better than all my backups getting deleted. Unfortunately it looks like backblaze doesn’t quite protect overwrites from deleting but at least with object versioning set up I’m also safe from the issue of someone trying to overwrite.
Not too hard to whip together a script that runs k0s backup
:
#!/bin/bash
echo -e "\n[info] Ensuring backup save path [${K0S_BACKUP_SAVE_PATH}] exists...";
mkdir -p ${K0S_BACKUP_SAVE_PATH};
echo -e "\n[info] Running k0s backup..."
/usr/bin/k0s backup --save-path=${K0S_BACKUP_SAVE_PATH}
echo -e "\n[info] Retrieving most recent backup..."
MOST_RECENT_BACKUP=$(ls -Art ${K0S_BACKUP_SAVE_PATH} | tail -n1 | tr -d '\n')
if [ -z $MOST_RECENT_BACKUP ] ; then
echo -e "\n[error] Failed to find a most recent backup in [${K0S_BACKUP_SAVE_PATH}]";
exit 1;
fi
echo -e "\n[info] Most recent backup is [${MOST_RECENT_BACKUP}]"
echo "[info] Sending data to Backblaze account with ID [${B2_ACCOUNT_ID}]..."
rclone copy \
--b2-account $B2_ACCOUNT_ID \
--b2-key $B2_KEY \
${K0S_BACKUP_SAVE_PATH}/${MOST_RECENT_BACKUP} \
:b2:k8s-${CLUSTER_NAME}-backups/cluster/`date +%F`
Easy peasy! Of course you’ll need rclone
installed on the base system so make sure to add that to your automation.
systemd
ServiceI use Ansible so my templates are Jinja2 templates, but you should recognize the normal Unit-file-isms:
# k0s-backups.service.j2
[Unit]
Description=Saves k0s cluster backup
[Service]
Environment=K0S_BACKUP_SAVE_PATH=/tmp/k0s-backups
Environment=B2_ACCOUNT_ID={{ b2_account_id }}
Environment=B2_KEY={{ b2_key }}
Environment=CLUSTER_NAME={{ k0s_cluster_name }}
ExecStart=/etc/k0s-node-backup.bash
[Install]
WantedBy=multi-user.target
As you might imagine, you need to make a few variables available via ansible
here – b2_account_id
, b2_key
, and k0s_cluster_name
. Once you have those in they’ll be picked up by that script when the service actually runs. Well when does the service run? I’m glad you asked!
[Unit]
Description=Backs up the k0s cluster every 12 hours
Requires=k0s-backups.service
[Timer]
Unit=k0s-backups.service
OnUnitInactiveSec=12h
RandomizedDelaySec=5m
[Install]
WantedBy=timers.target
And with that, you’ve got a timer which triggers a service. Not much template happening here but let’s not split hairs – is a template with no replacements a template?
Writing all these files doesn’t amount to much if they never make it on to the machine they’re supposed to run on, so here’s a bit of ansible to wrap it together:
#
# Playbook for installing a backup timer on a controller node
#
---
- name: Install backup timer for cluster backups
hosts: "{{ ansible_limit | default(omit) }}"
remote_user: root
vars:
k8s_node_name: "{{ lookup('env', 'NODE_NAME') }}"
b2_account_id: "{{ lookup('env', 'B2_ACCOUNT_ID') }}"
b2_key: "{{ lookup('env', 'B2_KEY') }}"
k0s_cluster_name: "{{ lookup('env', 'CLUSTER') }}"
tasks:
- name: Install rclone
become: yes
ansible.builtin.apt:
name: "{{ packages }}"
update_cache: yes
state: present
vars:
packages:
- rclone
- name: Add the k0s-backups script
ansible.builtin.template:
src: ../../templates/k0s-node-backup.bash.j2
dest: /etc/k0s-node-backup.bash
owner: root
group: root
mode: 0700
- name: Add the k0s-backups service
ansible.builtin.template:
src: ../../templates/k0s-backups.service.j2
dest: /etc/systemd/system/k0s-backups.service
owner: root
group: root
mode: 0644
- name: Start & Enable k0s-backups.service
ansible.builtin.systemd:
name: k0s-backups.service
state: started
enabled: yes
daemon_reload: yes
- name: Add the k0s-backups timer
ansible.builtin.template:
src: ../../templates/k0s-backups.timer.j2
dest: /etc/systemd/system/k0s-backups.timer
owner: root
group: root
mode: 0644
- name: Start & Enable k0s-backups.timer
ansible.builtin.systemd:
name: k0s-backups.timer
state: started
enabled: yes
daemon_reload: yes
Note there are many ways to get these variables in – I’ve chosen ENV as it fits a bit better with my Makefile
-driven orchestration, but any of the other ansible approved (or unapproved) methods would work.
Automation is dandy and Ansible is a very reliable tool but at this point you’re probably going to want to at least check that your service works and your timer is running (ex. systemctl list-timers
).
Once you’ve verified the backup did indeed make it to your object storage, you’re probably going to want to test your backup as well. k0s
makes it really easy here – almost as easy as getting the backup:
$ k0s restore <path to compressed backup file>
DaemonSet
?One alternative to a systemd
service unit and timer is to run a privileged DaemonSet with some shared namespaces to run and perform the needed steps, but I shyed away from that a little bit since I don’t want the backup-taking mechanism to be implemented via the thing I’m a backup of.
Well while I’m here I might as well show a basic CronJob
for taking a backup of a SQLite database. This is probably the simplest implementation (outside of just PVC snapshotting) of a SQLite backup, but it does get a little hairy.
First start with a ServiceAccount
that you’ll be doing the backup-taking with:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: your-app-backups
namespace: your-app
Now we’ll go ahead and add some RBAC to allow the account to do get what we need accomplished in the given namespace:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: your-app-backups
namespace: your-app
rules:
- apiGroups:
- ""
- apps
- extensions
- autoscaling
resources:
- deployments
verbs:
- get
- list
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- apiGroups:
- ""
resources:
- "pods/exec"
verbs:
- create
Of course we need to bind this Role
to the ServiceAccount
we created earlier:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: your-app-backups
namespace: your-app
roleRef:
kind: Role
name: your-app-backups
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: your-app-backups
And now we can finally get to the meat of the work, the CronJob
. This script is a bit wasteful (I install rclone
every time and kubectl
as well), but it gets the job done:
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: your-app-backups
namespace: your-app
spec:
schedule: "0 0,12 * * *" # who doesn't love/hate cron syntax?
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
serviceAccountName: your-app-backups # You're going to want
containers:
- name: job
image: alpine:3.14.2
imagePullPolicy: IfNotPresent
command:
- /bin/ash
- -c
- |
echo -e "[info] Installing rclone"
apk add rclone curl
echo -e "[info] Installing kubectl"
curl -LO https://dl.k8s.io/release/v1.21.4/bin/linux/amd64/kubectl
chmod +x kubectl
mv kubectl /usr/bin/kubectl
export BACKUP_FILE_NAME=backup-`date +%F@%H_%M_%S-%Z`
export BACKUP_FILE_PATH=/tmp/${BACKUP_FILE_NAME}
echo -e "[info] BACKUP_FILE_NAME=${BACKUP_FILE_NAME}"
echo -e "[info] BACKUP_FILE_PATH=${BACKUP_FILE_PATH}"
echo "[info] installing sqlite3..."
kubectl exec deploy/your-app -n ${NAMESPACE} -- apk add sqlite
echo "[info] taking backup..."
kubectl exec deploy/your-app -n ${NAMESPACE} -- sqlite3 ${YOUR_APP_SQLITE_DB_PATH} ".backup '${BACKUP_FILE_PATH}'"
echo -e "[info] backup taken, @ [${BACKUP_FILE_PATH}] inside your-app pod"
echo "[info] copying out backup from container..."
export YOUR_APP_CONTAINER_NAME=$(kubectl get pods -n vadosware-blog -l app=your-app --template '{{range .items}}{{.metadata.name}}{{end}}')
kubectl cp ${YOUR_APP_CONTAINER_NAME}:${BACKUP_FILE_PATH} ${BACKUP_FILE_PATH} -n ${NAMESPACE}
export BACKUP_SIZE=$(du -hs ${BACKUP_FILE_PATH})
echo -e "[info] Backup size: [${BACKUP_SIZE}]"
echo -e "[info] Zipping backup..."
gzip ${BACKUP_FILE_PATH}
echo "[info] saving backup to Backblaze under account [${B2_ACCOUNT_ID}]..."
rclone copy \
--b2-account $B2_ACCOUNT_ID \
--b2-key $B2_KEY \
${BACKUP_FILE_PATH}.gz \
:b2:$BUCKET/$NAMESPACE/$RESOURCE_TYPE/$RESOURCE_NAME/`date +%F`
env:
# Info required for backup
- name: YOUR_APP_SQLITE_DB_PATH
value: /var/data/your-app/db.sqlite
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# S3 folder info
- name: BUCKET
value: your-app-backup-bucket
- name: RESOURCE_TYPE
value: deployment
- name: RESOURCE_NAME
value: your-app-sqlite
# Rclone (S3 info)
- name: B2_ACCOUNT_ID
valueFrom:
secretKeyRef:
name: backup-secrets
key: B2_ACCOUNT_ID.secret
- name: B2_KEY
valueFrom:
secretKeyRef:
name: backup-secrets
key: B2_KEY.secret
And we’re done! Easy peasy SQLite backups that are done the “right” way (using .dump
) though a PVC snapshot would have probably been enough (assuming you have SQLite WAL mode enabled).
As always make sure to test your backups!
And while we’re here, a similarly amateur (but functional) backup of a Postgres database. This is pretty basic (and wasteful like the previous one) so of course you’ll need some consideration before taking it into your production environment:
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: your-app-backups
spec:
schedule: "0 0,12 * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: job
image: postgres:13.1-alpine
imagePullPolicy: IfNotPresent
command:
- /bin/ash
- -xc
- |
apk add rclone
export BACKUP_FILE_NAME=backup-`date +%F@%H_%M_%S-%Z`
echo "[info] BACKUP_FILE_NAME=$(BACKUP_FILE_NAME)"
echo "[info] taking backup..."
pg_dump \
--user=yourdbuser \
--clean \
--create \
--no-owner \
--format=custom \
--file=/tmp/$BACKUP_FILE_NAME
echo -e "[info] backup taken, @ [/tmp/$(BACKUP_FILE_NAME)]"
echo "[info] starting rclone..."
rclone copy \
--b2-account $B2_ACCOUNT_ID \
--b2-key $B2_KEY \
/tmp/$BACKUP_FILE_NAME \
:b2:$BUCKET/$NAMESPACE/deployment/$RESOURCE_NAME/`date +%F`
env:
- name: RESOURCE_TYPE
value: deployment
- name: RESOURCE_NAME
value: your-app-pg
- name: BUCKET
value: your-app-backups-bucket
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# psql configuration for checking whether the DB exists
- name: PGHOST
value: your-app-pg.vadosware-blog.svc.cluster.local
- name: PGUSER
value: yourappuser
- name: PGDATABASE
value: yourappdb
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: your-app-secrets
key: DB_PASSWORD.secret
# Rclone
- name: B2_ACCOUNT_ID
valueFrom:
secretKeyRef:
name: backup-secrets
key: B2_ACCOUNT_ID.secret
- name: B2_KEY
valueFrom:
secretKeyRef:
name: backup-secrets
key: B2_KEY.secret
And as always test your backup!. It’s not hard to spin up a Postgres instance and do a quick restore and ensure your tables and data are still present.
Well this was a pretty quick writeup but hopefully it gets someone out there off on the right foot – It’s easy to do this stuff in theory but sitting down to write it always takes a surprisingly long time (for me at least).