Awesome FOSS Logo
Discover awesome open source software
Launched 🚀🧑‍🚀

A script for copying PVC data across StorageClassses

Categories
kubernetes logo

tl;dr - CSI is awesome but it doesn’t help you with cross-StorageClass data migration. I wrote a small, painfully procedural script & docker container called pvcloney to perform this task repeatably for myself. The code is a great crash-course to @kubernetes/client-node

Today’s yak shave is of the storage variety – useful only to people who use/evaluate different storage mechanisms on their clusters.

I’ve gone hostPath -> Rook -> Longhorn -> OpenEBS ZFS LocalPV -> OpenEBS LVM -> Rook on ZFS and all over the place, so I’ve got a few old workloads that are running storage providers that I’m not planning to use going forward.

As I’ve tinkered with storage, CSI has been really well adopted and is awesome because it has stuff like snapshotting and cloning out of the box. That said, you’re shit outta luck if it’s cross-namespace – it’s essentially not something k8s does (which is silly IMO, maybe eventually they’ll do it).

I’ve run into this in the past and actually spent hours moving over PVCs one by one… This time I figured I should invest the time to write a quick script to do it, so I figured I’d take some notes as I go. A simple bash script would do, but that’s not why we’re here so we’re going Typescript with some bells and whistles thrown in. Nothing crazy, but more than is strictly necessary.

Before we get going I will limit the scope somewhat (sorry for all the fellow yak shavers out there who wanted to see more):

  • Focus on Filesystem PVC cloning for now, ignoring Block PVCs
  • Support for Deployments only (no StatefulSets or DaemonSets)
    • This might look a bit weird/useless, but most of my stateful apps are just Deployments with PVCs.

In the end we’re going to run this script as a Job so we’ll package it up as a docker container as well.

The Plan

There are a few ways we could do this, so let’s spend a little bit of time planning.

Simple

No rocket science here, just standing on the shoulders of giants.

First let’s think of our requirements:

  • Need Deployment name
    • Deployments should have only one pod that’s using the storage?
  • Need PVC name

And the steps should look something like this:

The code (step examples)

You can find all the code on GitLab @ mrman/pvcloney.

It took me a while to write all the code, and it ended up being ~1000LOC, mostly because I wrote it in the simplest most procedural way possible.

I’ll go through some examples of the kind of code I wrote below to give you an idea.

Finding the right deployment

  ///////////////////////////////
  // Step: Find the deployment //
  ///////////////////////////////

  // Find the deployment
  try {
    resp = await k8sAppsV1API.readNamespacedDeployment(srcDeploymentName, ns);
    if (!resp || !resp.body) { throw new Error("Missing/Invalid response from k8s API call"); }

    srcDeployment = resp.body;
    if (!srcDeployment) { throw new Error("failed to find determine source PVC name"); }

    srcDeploymentOriginalReplicaCount = srcDeployment.spec?.replicas ?? 1;

    volumeSpec = (resp.body.spec?.template?.spec?.volumes ?? []).find(v => v.name === volumeName);
    if (!volumeSpec) { throw new Error(`failed to find volume spec with name [${volumeName}]`); }

    if (!volumeSpec.persistentVolumeClaim) {
      throw new Error(`Volume with name [${volumeName}] is not of type PersistentVolumeClaim`);
    }
  } catch (err: any) {
    logger?.error({
      msg: `failed to valid deployment named [${srcDeploymentName}] in namespace [${ns}]`,
      errMsg: err?.toString(),
      k8sResponse: err?.response,
    });

    throw new Error("Failed to find valid deployment");
  }

  srcPVCName = volumeSpec.persistentVolumeClaim.claimName;
  if (!srcPVCName) { throw new Error("failed to find determine source PVC name"); }

  if (typeof srcDeploymentOriginalReplicaCount === "undefined") { throw new Error("Failed to retrieve source deployment replicas"); }

Creating a PVC, waiting on it, then re-retrieving it

  //////////////////////////////////
  // Step: Create destination PVC //
  //////////////////////////////////

  // Create the job that will do the copying
  let destPVC: k8s.V1PersistentVolumeClaim;
  const destPVCResource = await buildDestinationPVC({
    ns,
    storageClassName: copyArgs.dest.storageClass.name,
    id: operationId,
    srcPVC,
  });
  const destPVCName = destPVCResource.metadata?.name;
  if (!destPVCName) { throw new Error("Failed to build PVC resource"); }

  // Create a new replacement PVC
  try {
    resp = await k8sCoreV1API.createNamespacedPersistentVolumeClaim(ns, destPVCResource);
    if (!resp || !resp.body) { throw new Error("Missing/Invalid response from k8s API call"); }

  } catch (err: any) {
    logger?.error({
      msg: `failed to create PVC [${destPVCName}] in namespace [${ns}]\nERROR: ${err.toString()}`,
      errMsg: err?.toString(),
      k8sResponse: err?.response,
    });

    throw new Error(`Failed to create PVC [${destPVCName}]`);
  }

  // If waiting for PVC Pending to end has been specified, then wait
  if (copyArgs.copy.waitForPVCPendingEnd) {

    // Wait for PVC status to change from pending
    try {
      logger?.info(`waiting for destination PVC [${destPVCName}] to exit pending state...`);
      await waitFor(
        async () => {
          try {
            const resp = await k8sCoreV1API.readNamespacedPersistentVolumeClaimStatus(destPVCName, ns);
            if (resp.body && resp.body.status?.phase?.toLowerCase() !== "pending") { return true;  }
          } catch (err: any) {
            logger?.warn(`error while reading status for destination PVC [${destPVCName}]: ${err.toString()}`);
          }
          return null;
        },

        { timeoutMs: DEFAULT_TIMEOUT_MS },
      );
      logger.info(`PVC [${destPVCName}] successfully exited pending state...`);
    } catch (err: any) {
      logger?.error(`failed to get destination PVC [${destPVCName}] status in namespace [${ns}]\nERROR: ${err.toString()}`);
      throw err;
    }
  }

  // Retrieve the PVC
  try {
    resp = await k8sCoreV1API.readNamespacedPersistentVolumeClaim(destPVCName, ns);
    if (!resp || !resp.body) { throw new Error("Missing/Invalid response from k8s API call"); }

    destPVC = resp.body;
  } catch (err: any) {
    logger?.error({
      msg: `failed to create PVC [${destPVCName}] in namespace [${ns}]\nERROR: ${err.toString()}`,
      errMsg: err?.toString(),
      k8sResponse: err?.response,
    });

    throw new Error(`Failed to create PVC [${destPVCName}]`);
  }

  if (!destPVC) { throw new Error(`Failed to create PVC [${destPVCName}]`); }
  logger?.info(`successfully created destination PVC [${destPVCName}]`);

I could have used a watcher here, but polling is pretty easy to implement.

Ensuring the reclaim policy for a given PV is “Retain”

  ////////////////////////////////////////////////////////
  // Step: Ensure the new PV reclaim policy is 'Retain' //
  ////////////////////////////////////////////////////////

  if (!destPVC.spec?.volumeName) { throw new Error("volume name is missing from destination PVC"); }
  const destPVName = destPVC.spec?.volumeName;

  // Ensure the PV underlying the new PVC is set to be retained after job completion
  try {
    const patch = [
      {
        op: "replace",
        path: "/spec/persistentVolumeReclaimPolicy",
        value: "Retain",
      },
    ];

    // patch the PV
    resp = await k8sCoreV1API.patchPersistentVolume(
      destPVName,
      patch,
      undefined,
      undefined,
      undefined,
      undefined,
      { "headers": { "Content-Type": k8s.PatchUtils.PATCH_FORMAT_JSON_PATCH }},
    );
    if (!resp || !resp.body) { throw new Error("Missing/Invalid response from k8s API call"); }
    logger.info(`patched PV [${destPVName}] reclaim policy to 'Retain'`);

    // Wait until the status reads reclaim on the PV
    await waitFor(
      async () => {
        try {
          const resp = await k8sCoreV1API.readPersistentVolumeStatus(destPVName, ns);
          if (resp.body && resp.body.spec?.persistentVolumeReclaimPolicy === "Retain") { return true; }
        } catch (err: any) {
          logger?.warn(`error while reading status of PV [${destPVName}]: ${err.toString()}`);
        }
        return null;
      },

      { timeoutMs: DEFAULT_TIMEOUT_MS },
    );
    logger.info(`PV [${destPVName}] successfully set to 'Retain'`);

  } catch (err: any) {
    logger?.error({
      msg: `failed to patch PV [${destPVName}] reclaim policy to 'Retain'\nERROR: ${err.toString()}`,
      errMsg: err?.toString(),
      k8sResponse: err?.response,
    });

    throw new Error(`Failed to patch PV [${destPVName}]`);
  }

For the full source, check src/index.ts in the gitlab repo.

The usage

Well that should be enough code for now – let me go over quickly how to use the script.

NOTE the following section is basically in the repo README verbatim, so you can look there as well

Command line

Here’s how we’d call the script from the command line, assuming you’d built the repo and wanted to use make:

$ make run ARGS="--namespace <ns> --source-deployment-name <deployment> --dest-storageclass-name <new storage class> --volume-name <volume attached to deployment>"

This assumes of course that you’re using some tool like kubie to manage your kubectl environment (or just have it configured properly) and have all the necessary RBAC permissions to perform the steps the script does!

Let’s say you dislike make and want to run node yourself:

$ make build
$ node dist/index.js --namespace <ns> --source-deployment-name <deployment> --dest-storageclass-name <new storage class> --volume-name <volume attached to deployment>

As a container

Here’s how we’d call the script as a container on a computer where you don’t want to download it:

$ docker run --rm \
    -v /path/to/your/account.kubeconfig:/k8s/config:ro \
    -e KUBECONFIG=/k8s/config \
    registry.gitlab.com/mrman/pvcloney/cli:v0.2.1 \
        --namespace <namespace> \
        --source-deployment-name <deployment> \
        --dest-storageclass-name <storageclass> \
        --volume-name <volume attached to deployment>

Similar to the command line execution, this assumes the kubectl you’re calling has all the permissions and is correctly configured to begin with.

In Kubernetes

To use the tool from Kubernetes, we need to do a bit more (yes, this is also in the repo README):

First we set up a ServiceAccount:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: pvcloney
  namespace: "<ns>" # REPLACE: your namespace goes here

Then we give that account the permissions it will need:

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: pvcloney
  namespace: <ns> # REPLACE: namespace
rules:
  - apiGroups:
      - ""
    resources:
      - persistentvolumes
      - persistentvolumes/status
    verbs:
      - get
      - create
      - patch
      - delete

---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: pvcloney
  namespace: <ns> # REPLACE: namespace
rules:
  - apiGroups:
      - ""
    resources:
      - deployments
    verbs:
      - get
      - patch
      - list
      - delete

  - apiGroups:
      - ""
    resources:
      - persistentvolumeclaims
      - persistentvolumeclaims/status
      - persistentvolumes
    verbs:
      - get
      - create
      - patch
      - delete

  - apiGroups:
      - apps
    resources:
      - deployments/scale
      - deployments
    verbs:
      - get
      - patch
      - delete

  - apiGroups:
      - batch
    resources:
      - jobs
      - jobs/status
    verbs:
      - get
      - list
      - watch
      - create
      - delete

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pvcloney
  namespace: <ns> # REPLACE: namespace
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: pvcloney
subjects:
  - kind: ServiceAccount
    name: pvcloney
    namespace: <ns> # REPLACE: namespace

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: pvcloney
  namespace: <ns> # REPLACE: namespace
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: pvcloney
subjects:
  - kind: ServiceAccount
    name: pvcloney
    namespace: <ns> # REPLACE: namespace

And we create a one-time Job:

---
apiVersion: batch/v1
kind: Job
metadata:
  name: pvcloney
  namespace: <ns> # REPLACE: namespace
spec:
  backoffLimit: 0
  template:
    spec:
      restartPolicy: Never
      serviceAccountName: pvcloney

      containers:
        - name: pvcloney
          # v0.2.1 as of 2022/05/05 (you should check for newer versions!)
          image: registry.gitlab.com/mrman/pvcloney/cli@sha256:61ddae15b53c1056ecd1ba58f2c35945a4442c55c38cad0fdc428f60b15e5fbax
          imagePullPolicy: IfNotPresent
          args:
            - --namespace <namespace> # REPLACE: your namespace
            - --source-deployment-name <deployment> # REPLACE: the deployment you need to replace
            - --dest-storageclass-name <storageclass> # REPLACE: the new storageclass to clone into
            - --volume-name <volume attached to deployment> # REPLACE: the volume to replace

Obviously there are some replacements to be made here, but these are the pieces.

Improvements

I’ll probably never get to these, but here are some ideas for improvements.

Make it an Operator

This script would make for a decent quick k8s operator – being able to define some CRD that caused the attempting of this migration in a namespace would be awesome, and update the status of the operation as we go.

It might be overkill, but adding a pluggable status update mechanism to the script (even when I run on commandline I want to maintain status across CLI executions (ex. in $XDG_CONFIG_HOME/.pvc-copy or ~/.pvc-copy). Updating such a general mechanism to interact with a CRD would then be pretty easy.

Ways to have less downtime

There are a couple ideas on how we can have slightly less downtime:

  • Using an ephemeral container to manage operations without disturbing the original workload
  • Use rsync’s remote copying to copy to the destination PVC ahead of time

I’m not sure there’s an easy zero downtime method for this kind of migration with my current constraints and knowledge – if I was running an enterprise-grade SAN replication would be a few clicks/commands away (to another SAN).

Of course, the best way would have been to use a more robust filesystem in the first place and be switching to one robust enough to integrate and handle the switchover gracefully.

We’re probably at the the limits of my knowledge of storage systems here – it feels like the file system interfaces could easily be shimmed at the kernel level to do some replication (though making that remote replication could be quite dangerous) which would make this easy. Being able to transparently (asynchronously at first, and eventually maybe synchronously) shuffle device-level writes to a remote machine seems like it could be done.

Maybe another choice would be using a tool like criu to do a migration of the process to a process, with storage that’s also linked?

Wrapup

Finally I have a script, a docker container, and the k8s resources that I can run in my cluster as a Job that will do this in an automated fashion. Just in time for the relative likely end of my storage system twiddling on Kubernetes (I really like the setup I’m on now)!

I’m glad I sat down and got a script that was automated to do this – hopefully others find it useful as well.