tl;dr - CSI is awesome but it doesn’t help you with cross-StorageClass
data migration. I wrote a small, painfully procedural script & docker container called pvcloney
to perform this task repeatably for myself. The code is a great crash-course to @kubernetes/client-node
Today’s yak shave is of the storage variety – useful only to people who use/evaluate different storage mechanisms on their clusters.
I’ve gone hostPath
-> Rook -> Longhorn -> OpenEBS ZFS LocalPV -> OpenEBS LVM -> Rook on ZFS and all over the place, so I’ve got a few old workloads that are running storage providers that I’m not planning to use going forward.
As I’ve tinkered with storage, CSI has been really well adopted and is awesome because it has stuff like snapshotting and cloning out of the box. That said, you’re shit outta luck if it’s cross-namespace – it’s essentially not something k8s does (which is silly IMO, maybe eventually they’ll do it).
I’ve run into this in the past and actually spent hours moving over PVCs one by one… This time I figured I should invest the time to write a quick script to do it, so I figured I’d take some notes as I go. A simple bash
script would do, but that’s not why we’re here so we’re going Typescript with some bells and whistles thrown in. Nothing crazy, but more than is strictly necessary.
Before we get going I will limit the scope somewhat (sorry for all the fellow yak shavers out there who wanted to see more):
Filesystem
PVC cloning for now, ignoring Block
PVCsDeployment
s only (no StatefulSet
s or DaemonSet
s)
In the end we’re going to run this script as a Job
so we’ll package it up as a docker container as well.
There are a few ways we could do this, so let’s spend a little bit of time planning.
No rocket science here, just standing on the shoulders of giants.
First let’s think of our requirements:
And the steps should look something like this:
You can find all the code on GitLab @ mrman/pvcloney
.
It took me a while to write all the code, and it ended up being ~1000LOC, mostly because I wrote it in the simplest most procedural way possible.
I’ll go through some examples of the kind of code I wrote below to give you an idea.
///////////////////////////////
// Step: Find the deployment //
///////////////////////////////
// Find the deployment
try {
resp = await k8sAppsV1API.readNamespacedDeployment(srcDeploymentName, ns);
if (!resp || !resp.body) { throw new Error("Missing/Invalid response from k8s API call"); }
srcDeployment = resp.body;
if (!srcDeployment) { throw new Error("failed to find determine source PVC name"); }
srcDeploymentOriginalReplicaCount = srcDeployment.spec?.replicas ?? 1;
volumeSpec = (resp.body.spec?.template?.spec?.volumes ?? []).find(v => v.name === volumeName);
if (!volumeSpec) { throw new Error(`failed to find volume spec with name [${volumeName}]`); }
if (!volumeSpec.persistentVolumeClaim) {
throw new Error(`Volume with name [${volumeName}] is not of type PersistentVolumeClaim`);
}
} catch (err: any) {
logger?.error({
msg: `failed to valid deployment named [${srcDeploymentName}] in namespace [${ns}]`,
errMsg: err?.toString(),
k8sResponse: err?.response,
});
throw new Error("Failed to find valid deployment");
}
srcPVCName = volumeSpec.persistentVolumeClaim.claimName;
if (!srcPVCName) { throw new Error("failed to find determine source PVC name"); }
if (typeof srcDeploymentOriginalReplicaCount === "undefined") { throw new Error("Failed to retrieve source deployment replicas"); }
//////////////////////////////////
// Step: Create destination PVC //
//////////////////////////////////
// Create the job that will do the copying
let destPVC: k8s.V1PersistentVolumeClaim;
const destPVCResource = await buildDestinationPVC({
ns,
storageClassName: copyArgs.dest.storageClass.name,
id: operationId,
srcPVC,
});
const destPVCName = destPVCResource.metadata?.name;
if (!destPVCName) { throw new Error("Failed to build PVC resource"); }
// Create a new replacement PVC
try {
resp = await k8sCoreV1API.createNamespacedPersistentVolumeClaim(ns, destPVCResource);
if (!resp || !resp.body) { throw new Error("Missing/Invalid response from k8s API call"); }
} catch (err: any) {
logger?.error({
msg: `failed to create PVC [${destPVCName}] in namespace [${ns}]\nERROR: ${err.toString()}`,
errMsg: err?.toString(),
k8sResponse: err?.response,
});
throw new Error(`Failed to create PVC [${destPVCName}]`);
}
// If waiting for PVC Pending to end has been specified, then wait
if (copyArgs.copy.waitForPVCPendingEnd) {
// Wait for PVC status to change from pending
try {
logger?.info(`waiting for destination PVC [${destPVCName}] to exit pending state...`);
await waitFor(
async () => {
try {
const resp = await k8sCoreV1API.readNamespacedPersistentVolumeClaimStatus(destPVCName, ns);
if (resp.body && resp.body.status?.phase?.toLowerCase() !== "pending") { return true; }
} catch (err: any) {
logger?.warn(`error while reading status for destination PVC [${destPVCName}]: ${err.toString()}`);
}
return null;
},
{ timeoutMs: DEFAULT_TIMEOUT_MS },
);
logger.info(`PVC [${destPVCName}] successfully exited pending state...`);
} catch (err: any) {
logger?.error(`failed to get destination PVC [${destPVCName}] status in namespace [${ns}]\nERROR: ${err.toString()}`);
throw err;
}
}
// Retrieve the PVC
try {
resp = await k8sCoreV1API.readNamespacedPersistentVolumeClaim(destPVCName, ns);
if (!resp || !resp.body) { throw new Error("Missing/Invalid response from k8s API call"); }
destPVC = resp.body;
} catch (err: any) {
logger?.error({
msg: `failed to create PVC [${destPVCName}] in namespace [${ns}]\nERROR: ${err.toString()}`,
errMsg: err?.toString(),
k8sResponse: err?.response,
});
throw new Error(`Failed to create PVC [${destPVCName}]`);
}
if (!destPVC) { throw new Error(`Failed to create PVC [${destPVCName}]`); }
logger?.info(`successfully created destination PVC [${destPVCName}]`);
I could have used a watcher here, but polling is pretty easy to implement.
////////////////////////////////////////////////////////
// Step: Ensure the new PV reclaim policy is 'Retain' //
////////////////////////////////////////////////////////
if (!destPVC.spec?.volumeName) { throw new Error("volume name is missing from destination PVC"); }
const destPVName = destPVC.spec?.volumeName;
// Ensure the PV underlying the new PVC is set to be retained after job completion
try {
const patch = [
{
op: "replace",
path: "/spec/persistentVolumeReclaimPolicy",
value: "Retain",
},
];
// patch the PV
resp = await k8sCoreV1API.patchPersistentVolume(
destPVName,
patch,
undefined,
undefined,
undefined,
undefined,
{ "headers": { "Content-Type": k8s.PatchUtils.PATCH_FORMAT_JSON_PATCH }},
);
if (!resp || !resp.body) { throw new Error("Missing/Invalid response from k8s API call"); }
logger.info(`patched PV [${destPVName}] reclaim policy to 'Retain'`);
// Wait until the status reads reclaim on the PV
await waitFor(
async () => {
try {
const resp = await k8sCoreV1API.readPersistentVolumeStatus(destPVName, ns);
if (resp.body && resp.body.spec?.persistentVolumeReclaimPolicy === "Retain") { return true; }
} catch (err: any) {
logger?.warn(`error while reading status of PV [${destPVName}]: ${err.toString()}`);
}
return null;
},
{ timeoutMs: DEFAULT_TIMEOUT_MS },
);
logger.info(`PV [${destPVName}] successfully set to 'Retain'`);
} catch (err: any) {
logger?.error({
msg: `failed to patch PV [${destPVName}] reclaim policy to 'Retain'\nERROR: ${err.toString()}`,
errMsg: err?.toString(),
k8sResponse: err?.response,
});
throw new Error(`Failed to patch PV [${destPVName}]`);
}
For the full source, check src/index.ts
in the gitlab repo.
Well that should be enough code for now – let me go over quickly how to use the script.
NOTE the following section is basically in the repo README verbatim, so you can look there as well
Here’s how we’d call the script from the command line, assuming you’d built the repo and wanted to use make
:
$ make run ARGS="--namespace <ns> --source-deployment-name <deployment> --dest-storageclass-name <new storage class> --volume-name <volume attached to deployment>"
This assumes of course that you’re using some tool like kubie
to manage your kubectl
environment (or just have it configured properly) and have all the necessary RBAC permissions to perform the steps the script does!
Let’s say you dislike make
and want to run node
yourself:
$ make build
$ node dist/index.js --namespace <ns> --source-deployment-name <deployment> --dest-storageclass-name <new storage class> --volume-name <volume attached to deployment>
Here’s how we’d call the script as a container on a computer where you don’t want to download it:
$ docker run --rm \
-v /path/to/your/account.kubeconfig:/k8s/config:ro \
-e KUBECONFIG=/k8s/config \
registry.gitlab.com/mrman/pvcloney/cli:v0.2.1 \
--namespace <namespace> \
--source-deployment-name <deployment> \
--dest-storageclass-name <storageclass> \
--volume-name <volume attached to deployment>
Similar to the command line execution, this assumes the kubectl
you’re calling has all the permissions and is correctly configured to begin with.
To use the tool from Kubernetes, we need to do a bit more (yes, this is also in the repo README):
First we set up a ServiceAccount
:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: pvcloney
namespace: "<ns>" # REPLACE: your namespace goes here
Then we give that account the permissions it will need:
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: pvcloney
namespace: <ns> # REPLACE: namespace
rules:
- apiGroups:
- ""
resources:
- persistentvolumes
- persistentvolumes/status
verbs:
- get
- create
- patch
- delete
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: pvcloney
namespace: <ns> # REPLACE: namespace
rules:
- apiGroups:
- ""
resources:
- deployments
verbs:
- get
- patch
- list
- delete
- apiGroups:
- ""
resources:
- persistentvolumeclaims
- persistentvolumeclaims/status
- persistentvolumes
verbs:
- get
- create
- patch
- delete
- apiGroups:
- apps
resources:
- deployments/scale
- deployments
verbs:
- get
- patch
- delete
- apiGroups:
- batch
resources:
- jobs
- jobs/status
verbs:
- get
- list
- watch
- create
- delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pvcloney
namespace: <ns> # REPLACE: namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: pvcloney
subjects:
- kind: ServiceAccount
name: pvcloney
namespace: <ns> # REPLACE: namespace
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: pvcloney
namespace: <ns> # REPLACE: namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: pvcloney
subjects:
- kind: ServiceAccount
name: pvcloney
namespace: <ns> # REPLACE: namespace
And we create a one-time Job
:
---
apiVersion: batch/v1
kind: Job
metadata:
name: pvcloney
namespace: <ns> # REPLACE: namespace
spec:
backoffLimit: 0
template:
spec:
restartPolicy: Never
serviceAccountName: pvcloney
containers:
- name: pvcloney
# v0.2.1 as of 2022/05/05 (you should check for newer versions!)
image: registry.gitlab.com/mrman/pvcloney/cli@sha256:61ddae15b53c1056ecd1ba58f2c35945a4442c55c38cad0fdc428f60b15e5fbax
imagePullPolicy: IfNotPresent
args:
- --namespace <namespace> # REPLACE: your namespace
- --source-deployment-name <deployment> # REPLACE: the deployment you need to replace
- --dest-storageclass-name <storageclass> # REPLACE: the new storageclass to clone into
- --volume-name <volume attached to deployment> # REPLACE: the volume to replace
Obviously there are some replacements to be made here, but these are the pieces.
I’ll probably never get to these, but here are some ideas for improvements.
This script would make for a decent quick k8s operator – being able to define some CRD that caused the attempting of this migration in a namespace would be awesome, and update the status of the operation as we go.
It might be overkill, but adding a pluggable status update mechanism to the script (even when I run on commandline I want to maintain status across CLI executions (ex. in $XDG_CONFIG_HOME/.pvc-copy
or ~/.pvc-copy
). Updating such a general mechanism to interact with a CRD would then be pretty easy.
There are a couple ideas on how we can have slightly less downtime:
rsync
’s remote copying to copy to the destination PVC ahead of time
rsync-continuous
or at least study itI’m not sure there’s an easy zero downtime method for this kind of migration with my current constraints and knowledge – if I was running an enterprise-grade SAN replication would be a few clicks/commands away (to another SAN).
Of course, the best way would have been to use a more robust filesystem in the first place and be switching to one robust enough to integrate and handle the switchover gracefully.
We’re probably at the the limits of my knowledge of storage systems here – it feels like the file system interfaces could easily be shimmed at the kernel level to do some replication (though making that remote replication could be quite dangerous) which would make this easy. Being able to transparently (asynchronously at first, and eventually maybe synchronously) shuffle device-level writes to a remote machine seems like it could be done.
Maybe another choice would be using a tool like criu
to do a migration of the process to a process, with storage that’s also linked?
Finally I have a script, a docker container, and the k8s resources that I can run in my cluster as a Job
that will do this in an automated fashion. Just in time for the relative likely end of my storage system twiddling on Kubernetes (I really like the setup I’m on now)!
I’m glad I sat down and got a script that was automated to do this – hopefully others find it useful as well.