tl;dr - I install even more providers, this time OpenEBS cStor, OpenEBS Jiva, OpenEBS LocalPV hostPath, and OpenEBS LocalPV ZFS, LINSTOR via kvaps/kube-linstor
. I skipped OpenEBS LocalPV because it threw a panic linked to a supposed kernel issue complaining about old kernels. Ain’t nobody got time for that. The GitLab Repo is finally public though, so you could skip this entire article and go there.
I had some more issues with linstor than I thought -- I've updated the linstor section to reflect it, but basically along with some install ordering issues I forgot to enable the drbd
(the 'd' was missing) kernel module somehow.
NOTE: This a multi-part blog-post!
In part 2 we worked through installing some of the various storage plugins and got a repeatable process – mostly kubectl apply -f
and a little Makefile glue – for installing them (with some bits hardcoded to the server). Now it’s time to install even more storage plugins.
Hopefully this post won’t be too long – surely you’ve had enough exposition at this point – let’s get right into it.
STORAGE_PROVIDER=openebs-cstor
#OpenEBS cStor is based on userspace ZFS, see the openebs/cstor
repo which uses libcstor
, a storage engine built to work on uZFS. In the past it was a bit iffy and performed slightly slower than the Jiva engine but I’m looking forward to seeing how it performs these days.
Some pros on using cStor:
And some reasons that I’ve found to not use cStor:
I haven’t actually run cStor in my production cluster up until now – I’ve used Jiva mainly and cStor very briefly during some testing.
The general OpenEBS installation documentation is actually the same for the early bits of most of the products. cStor has some extra steps (it requires the creation of an extra pool object) but I’ll go through the general steps here mostly. Some of the early setups things are already done by way of ansible – verifying that iscsi is installed, iscsid
is running, etc.
Since I don’t use helm (still haven’t found time to evaluate Helm 3), I’m going to be digging through and installing the components by hand. We’ll be using the kubectl
installation method. Here’s the full list of resources you end up with at the end of the day:
$ tree .
.
├── Makefile
├── maya-apiserver.deployment.yaml
├── maya-apiserver.svc.yaml
├── openebs-admission-server.deployment.yaml
├── openebs-localpv-provisioner.deployment.yaml
├── openebs-maya-operator.rbac.yaml
├── openebs-maya-operator.serviceaccount.yaml
├── openebs-ndm-config.configmap.yaml
├── openebs-ndm.ds.yaml
├── openebs-ndm-operator.deployment.yaml
├── openebs.ns.yaml
├── openebs-provisioner.deployment.yaml
└── openebs-snapshot-operator.deployment.yaml
0 directories, 25 files
ndm
is the Node Disk Manager, I.E. the thing that gets and keeps track of the actual physical hardware on every node.
Well one thing that really stood out while I was working on splitting these out of the huge openebs-operator.yaml
YAML file they want you to download and apply was the opt-out nature of the analytics. Wasn’t too happy to find this:
# OPENEBS_IO_ENABLE_ANALYTICS if set to true sends anonymous usage
# events to Google Analytics
- name: OPENEBS_IO_ENABLE_ANALYTICS
value: "true"
It’s also in the LocalPV provisioner:
- name: OPENEBS_IO_ENABLE_ANALYTICS
value: "false"
A similar comment wasn’t there but I’m not taking any chances. Even if anonymized, I don’t want to send usage information to Google analytics. OpenEBS should at least set aside some engineering/ops resources to collecting their own analytics themselves, privacy-conscious devs/sysadmins are going to turn this off right away just based on where it is going.
Another setting I wanted to change was removing the default storage classes:
# If OPENEBS_IO_CREATE_DEFAULT_STORAGE_CONFIG is false then OpenEBS default
# storageclass and storagepool will not be created.
- name: OPENEBS_IO_CREATE_DEFAULT_STORAGE_CONFIG
value: "false"
I want to manage my storage classes with pre-determined names and replication levels so no need to make any default ones. I’ll need to do a bit more reading into the cStor sparse pools (AKA zfs sparse datasets, since remember cStor is based on uZFS), but this is stuff I should know anyway if I’m going to administer them.
OK, so making all this stuff is pretty easy – there is some documentation on verifying our installation so let’s do that as well. Looks like verification is just ensuring that the pods are all running well:
$ k get pods
NAME READY STATUS RESTARTS AGE
maya-apiserver-7f969b8db4-cb9tc 1/1 Running 2 44m
openebs-admission-server-78458d9ff6-mx7bn 1/1 Running 0 41m
openebs-localpv-provisioner-d7464d5b-dgfsw 1/1 Running 0 41m
openebs-ndm-jkftf 1/1 Running 0 44m
openebs-ndm-operator-67876b4dc4-n94x6 1/1 Running 0 44m
openebs-provisioner-c666d6b4-djnpj 1/1 Running 0 41m
openebs-snapshot-operator-749db7b5f5-vq6ff 2/2 Running 0 12m
OK, pretty straight forward! The documentation says to check for StorageClass
es next, but since I actually am going to set up the custom storage classes myself, what I’m going to check instead is that we have 2 BlockDevice
custom resources. Remember that I have basically two pieces of storage I want to make available:
/dev/nvme0n1p5
- A ~396GB partition on my main drive (which has the OS installed) –/dev/nvme1n1
- An empty (formerly software RAIDed) 512GB disk$ k get blockdevice
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
blockdevice-8608c47fdaab0450c9d449213c46d7de all-in-one-01 512110190592 Unclaimed Active 45m
Ah-ha! Looks like I have only one block device, the whole disk (you can tell by the SIZE
). It like I might have to tell OpenEBS about this disk manually, but let’s see what changes have been made to the disks:
root@all-in-one-01 ~ # lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme1n1 259:0 0 477G 0 disk
nvme0n1 259:1 0 477G 0 disk
├─nvme0n1p1 259:2 0 16G 0 part [SWAP]
├─nvme0n1p2 259:3 0 1G 0 part /boot
├─nvme0n1p3 259:4 0 64G 0 part /
├─nvme0n1p4 259:5 0 1K 0 part
└─nvme0n1p5 259:6 0 396G 0 part
So nothing has been changed it looks like – the disks haven’t bene partitioned, or labeled, or anything. It look like OpenEBS doesn’t do anything with the disks until (see that CLAIMSTATE
) you create a BlockDeviceClaim
object, probably. That’s certainly a nice feature to have! According to the documentation though, NDM should be able to manage partitions:
Currently, NDM out of the box manages disks, partitions, lvm, crypt and other dm devices. If the user need to have blockdevice for other device types like md array or any other unsupported device types, the blockdevice resource can be manually created using the following steps:
So why don’t I see nvme0n1p5
? There’s actually a frequently asked question section on NDM with an example of this case! Excellent work by OpenEBS, having this answer ready to go:
By default NDM excludes partitions mounted at /, /boot and /etc/hosts (which is same as the partition at which kubernetes / docker filesystem exists) and the parent disks of those partitions. In the above example /dev/sdb is excluded because of root partitions on that disk. /dev/sda4 contains the docker filesystem, and hence /dev/sda is also excluded.
So the existence of the /
and /boot
partitions caused the disk itself to be excluded. Unfortunately I think the fix for me isn’t as easy as removing the /etc/hosts
entry from the os-disk-exclude-filter
in the openebs-ndm-config
ConfigMap
– I want to use a partition on a disk that should be ignored. This means I’ll have to add my own manually-created BlockDevice
:
---
apiVersion: openebs.io/v1alpha1
kind: BlockDevice
metadata:
name: first-disk-partition
namespace: openebs
labels:
kubernetes.io/hostname: all-in-one-01 # TODO: clustered setup needs ot have this be different
ndm.io/managed: "false" # for manual blockdevice creation put false
ndm.io/blockdevice-type: blockdevice
status:
claimState: Unclaimed
state: Active
spec:
nodeAttributes:
nodeName: all-in-one-01 # TODO: clustered setup needs to have this be different
capacity:
logicalSectorSize: 512
storage: 425133957120 # TODO: get from `blockdev --getsize64 <device path>`
details:
# TODO: obtain this information automatically from `udevadm info`
deviceType: partition # like disk, partition, lvm, crypt, md
firmwareRevision: "EXF7201Q"
model: "SAMSUNG MZVLB512HBJQ-00000"
serial: "SAMSUNG MZVLB512HBJQ-00000_S4GENX0N425033"
# compliance: <compliance of disk> #like "SPC-4" # normally get this from smartctl but sometimes it's not there.
vendor: SAMSUNG
devlinks:
- kind: by-path
path: /dev/disk/by-id/nvme-SAMSUNG_MZVLB512HBJQ-00000_S4GENX0N425033-part5
- kind: by-path
path: /dev/disk/by-id/nvme-eui.0025388401b90b26-part5
- kind: by-path
path: /dev/disk/by-partuuid/a1b5d104-05
- kind: by-path
path: /dev/disk/by-path/pci-0000:01:00.0-nvme-1-part5
- kind: by-path
path: /dev/disk/by-uuid/6d734e22-1d33-4d50-8e7e-cf079255f634
path: /dev/nvme0n1p5 # like /dev/md0
Since I create this right away when the cluster is starting up, I have to do a little bit to make sure that I don’t try and make this BlockDevice
before the CRD it exists:
blockdevice:
@echo "[info] waiting until blockdevice CRD is installed..."
@until $(KUBECTL) -n openebs get blockdevice; \
do echo "trying again in 20 seconds (ctrl+c to cancel)"; \
sleep 20; \
done
$(KUBECTL) apply -f first-disk-partition.blockdevice.yaml
Alternatively, I could have just made the CRDs myself to ensure they exist but I’m OK with this code since it runs all the way at the end and is not too terrible. At some point I’m going to have to fix all those TODO
s so that I can more easily adapt this script to a cluster-driven or at least node-name/hard drive type agnostic setup… That’s work for another day. With this done, we now have two BlockDevice
s just like we want:
$ k get blockdevice
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
blockdevice-8608c47fdaab0450c9d449213c46d7de all-in-one-01 512110190592 Unclaimed Active 26m
first-disk-partition all-in-one-01 425133957120 Unclaimed Active 24s
StoragePool
and StorageClass
esOK, now that we’ve got our control/data plane and our BlockDevice
s registered, let’s make a StoragePool
out of them! Earlier we turned off the automatic pool creation so we’re going to need to create them manually. Since this is the cStor section, I’ll outline only the cStor-relevant resources here. This is a good place to take a gander at the cStor storage pools documentation.
First up is the disk
based StoragePool
which spans over both BlockDevices
, where all the storage will come from:
disk-pool.storagepool.yaml
:
---
apiVersion: openebs.io/v1alpha1
kind: StoragePoolClaim
metadata:
name: cstor-disk-pool
namespace: openebs
annotations:
cas.openebs.io/config: |
- name: PoolResourceRequests
value: |-
memory: 1Gi
- name: PoolResourceLimits
value: |-
memory: 4Gi
- name: AuxResourceRequests
value: |-
memory: 0.5Gi
cpu: 100m
ephemeral-storage: 50Mi
- name: AuxResourceLimits
value: |-
memory: 0.5Gi
cpu: 100m
spec:
name: cstor-disk-pool
type: disk
poolSpec:
poolType: mirrored
blockDevices:
blockDeviceList:
- first-disk-partition
- __PARTITION_NAME__
So everything looks good, except there’s an issue – I can’t make the pool without looking up the block device’s name! Looks like I should just make the second disk’s partition manually as well, so I can give a name. And here’s that updated second disk config:
---
apiVersion: openebs.io/v1alpha1
kind: BlockDevice
metadata:
name: second-disk
namespace: openebs
labels:
kubernetes.io/hostname: all-in-one-01 # TODO: clustered setup needs ot have this be different
ndm.io/managed: "false" # for manual blockdevice creation put false
ndm.io/blockdevice-type: blockdevice
status:
claimState: Unclaimed
state: Active
spec:
nodeAttributes:
nodeName: all-in-one-01 # TODO: clustered setup needs to have this be different
path: /dev/nvme1n1
capacity:
logicalSectorSize: 512
storage: 512110190592 # TODO: get from `blockdev --getsize64 <device path>`
details:
# TODO: obtain this information automatically from `udevadm info`
deviceType: partition # like disk, partition, lvm, crypt, md
firmwareRevision: "EXF7201Q"
model: "SAMSUNG MZVLB512HBJQ-00000"
serial: "SAMSUNG MZVLB512HBJQ-00000_S4GENX0N425033"
# compliance: <compliance of disk> #like "SPC-4" # normally get this from smartctl but sometimes it's not there.
vendor: SAMSUNG
devlinks: # udevadm info -q property -n <device path>
- kind: by-path
path: /dev/disk/by-id/nvme-SAMSUNG_MZVLB512HBJQ-00000_S4GENX0N425034
- kind: by-path
path: /dev/disk/by-path/pci-0000:07:00.0-nvme-1
- kind: by-path
path: /dev/disk/by-id/nvme-eui.0025388401b90b27
OK, now let’s make that StoragePoolClaim
and see if it comes up:
$ k get storagepoolclaim
NAME AGE
cstor-disk-pool 4m8s
$ k get storagepool
No resources found
$ k get bd
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
first-disk-partition all-in-one-01 425133957120 Unclaimed Active 31m
second-disk all-in-one-01 512110190592 Unclaimed Active 8m29s
Uh oh, something’s gone wrong – the claim got created but the pool is not, and the BlockDevice
s are still unclaimed!
Before I get get to debugging this though, I want to point out something – I’ve chosen to maek a disk
based pool, but at this point I think I might actually like to use a sparse
pool instead – OpenEB cStor “sparse” pools are not the same as ZFS sparse pools! See their warning:
Note: Starting with 0.9, cStor Sparse pool and its Storage Class are not created by default. If you need to enable the cStor Sparse pool for development or test environments, you should have the above Default Storage Configuration enabled as well as cStor sparse pool enabled using the instructions mentioned here.
So in production, it looks like you’re going to be want to be using disk
based pools. Coming back to the StoragePoolClaim
problem, k describe spc
returns no events, so there’s nothing to analyze there. Oh but wait, I’ve actually checked the wrong place – there is a CStorPool
object I should be checking instead:
$ k get csp
NAME ALLOCATED FREE CAPACITY STATUS READONLY TYPE AGE
cstor-disk-pool-0omx PoolCreationFailed false mirrored 5m16s
Well there’s some nice feedback – storagepool
supposedly includes cstorstoragepool
but I guess if the creation failed then it can’t! Let’s see what k describe cstor-disk-pool-0omx
says:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 6m54s CStorPool Received Resource create event
Normal Synced 6m54s (x2 over 6m54s) CStorPool Received Resource modify event
Warning FailCreate 24s (x14 over 6m54s) CStorPool Pool creation failed zpool create command failed error: invalid vdev specification
use '-f' to override the following errors:
mirror contains devices of different sizes
: exit status 1
Well, we’re working with ZFS all right! The disks in the pool have to be similarly sized! Once thing that’s been cut out of Part 1 is all the experimentation I did to make a zpool just to realize that Ceph on top of ZFS was silly. Back then I actually devised a fairly easy to way to create the similiarly sized disk – copying the partition table from one disk to the other:
root@all-in-one-01 ~ # lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme1n1 259:0 0 477G 0 disk
nvme0n1 259:1 0 477G 0 disk
├─nvme0n1p1 259:2 0 16G 0 part [SWAP]
├─nvme0n1p2 259:3 0 1G 0 part /boot
├─nvme0n1p3 259:4 0 64G 0 part /
├─nvme0n1p4 259:5 0 1K 0 part
└─nvme0n1p5 259:6 0 396G 0 part
root@all-in-one-01 ~ # sgdisk -R /dev/nvme1n1 /dev/nvme0n1
***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format
in memory.
***************************************************************
The operation has completed successfully.
root@all-in-one-01 ~ # lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme1n1 259:0 0 477G 0 disk
├─nvme1n1p1 259:11 0 16G 0 part
├─nvme1n1p2 259:12 0 1G 0 part
├─nvme1n1p3 259:13 0 64G 0 part
└─nvme1n1p5 259:14 0 396G 0 part
nvme0n1 259:1 0 477G 0 disk
├─nvme0n1p1 259:2 0 16G 0 part [SWAP]
├─nvme0n1p2 259:3 0 1G 0 part /boot
├─nvme0n1p3 259:4 0 64G 0 part /
├─nvme0n1p4 259:5 0 1K 0 part
└─nvme0n1p5 259:6 0 396G 0 part
Though terribly inefficient, I can use the same partition off of both disks and they are guaranteed to have the same size, so ZFS will be happy! Good thing OpenEBS hasn’t done anything with my disks yet since it’s back to the drawing board – I’ll need to update the ansible
scripts to copy the partition and change the BlockDevice
s I’m creating. For now, since I’ve done it manually there’s no harm in a little check after deleting the StoragePoolClaim
and re-doing my BlockDevice
s.
Along with making to right-size the disks there’s one more problem – if you disks that you want to manage manually, you must exclude them by path. If you don’t, NDM will compete with you, and be unable to use the disks because they could be used by other pools (your StoragePoolClaim
will never get made). It’s documented in the “NDM related” section of the troubleshooting docs. You need to modify the path-filter
filter configuration in the openebs-ndm-config
ConfigMap
like so:
- key: path-filter
name: path filter
state: true
include: ""
exclude: "/dev/loop,/dev/fd0,/dev/sr0,/dev/ram,/dev/dm-,/dev/md,/dev/rbd,/dev/zd,/dev/nvme0n1p5,/dev/nvme1n1"
Assuming at this point you see exactly the list of BlockDevice
s you want (make sure there aren’t two with identical size), you shouldn’t run into any problems. After rightsizing the partitions and recreating everything, I can try again at creating the StoragePoolClaim
, and I can see the CStorPool
I want:
$ k get csp
NAME ALLOCATED FREE CAPACITY STATUS READONLY TYPE AGE
cstor-disk-pool-bpe9 140K 394G 394G Healthy false mirrored 33m
That capacity is what we’d expect – inefficient, but it reflects the total mirrored capacity. Finally, we can make some StorageClass
es to represent some use cases – I’ll show the replicated StorageClass
below:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
provisioner: openebs.io/provisioner-iscsi
metadata:
name: openebs-cstor-replicated
annotations:
openebs.io/cas-type: cstor
cas.openebs.io/config: |
- name: StoragePoolClaim
value: cstor-disk-pool
- name: ReplicaCount
value: "2" # TODO: clustered setup we could have a multi-node pool with >2 disks
Once all the storage classes are created let’s make sure they show up again:
$ k get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
openebs-cstor-replicated openebs.io/provisioner-iscsi Delete Immediate false 83s
openebs-cstor-single openebs.io/provisioner-iscsi Delete Immediate false 85s
openebs-snapshot-promoter volumesnapshot.external-storage.k8s.io/snapshot-promoter Delete Immediate false 3h33m
OK now that we’ve got cStor all set up, Let’s make the usual test PVC + Pod combinations, here’s an example for the non-replicated (“single”) StorageClass
:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-single
namespace: default
spec:
storageClassName: openebs-cstor-single
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
name: test-single
namespace: default
spec:
containers:
- name: alpine
image: alpine
command: ["ash", "-c", "while true; do sleep infinity; done"]
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 0.5
memory: "512Mi"
requests:
cpu: 0.5
memory: "512Mi"
volumeMounts:
- mountPath: /var/data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: test-single
The PVC, resulting PV and Pod get created nice and easy after the work we’ve done:
$ k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
test-single Bound pvc-d92c89e0-899b-4038-9073-ffc1e31dc5c9 1Gi RWO openebs-cstor-single 84s
$ k get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-d92c89e0-899b-4038-9073-ffc1e31dc5c9 1Gi RWO Delete Bound default/test-single openebs-cstor-single 85s
$ k get pod
NAME READY STATUS RESTARTS AGE
test-single 1/1 Running 0 85s
And here the output of the usual basic persistence test.
$ k exec -it test-single -n default -- /bin/ash
/ # echo "this is a test file" > /var/data/test-file.txt
/ #
$ k delete pod test-single -n default
pod "test-single" deleted
$ k get pv -n default
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-d92c89e0-899b-4038-9073-ffc1e31dc5c9 1Gi RWO Delete Bound default/test-single openebs-cstor-single 3m9s
$ make test-single
make[1]: Entering directory '/home/mrman/code/foss/k8s-storage-provider-benchmarks/kubernetes/openebs/cstor'
kubectl --kubeconfig=/home/mrman/code/foss/k8s-storage-provider-benchmarks/ansible/output/all-in-one-01.k8s.storage-benchmarks.experiments.vadosware.io/var/lib/k0s/pki/admin.conf apply -f test-single.pvc.yaml
persistentvolumeclaim/test-single unchanged
kubectl --kubeconfig=/home/mrman/code/foss/k8s-storage-provider-benchmarks/ansible/output/all-in-one-01.k8s.storage-benchmarks.experiments.vadosware.io/var/lib/k0s/pki/admin.conf apply -f test-single.pod.yaml
pod/test-single created
make[1]: Leaving directory '/home/mrman/code/foss/k8s-storage-provider-benchmarks/kubernetes/openebs/cstor'
$ k exec -it test-single -n default -- /bin/ash
/ # cat /var/data/test-file.txt
this is a test file
Perfect – the PVC is definitely holding data between pod restarts. To keep this post from getting too long, this will be the only time I’ll actually print the output of the basic persistence test. From now on, I’ll just refer to having run it.
STORAGE_PROVIDER=openebs-jiva
#If I remember correctly Jiva the oldest (?) implementation that OpenEBS has worked on. It’s based on Longhorn though they differ in some places (see the note about modified architectural changes). In general, this is probably the easiest possible way to use storage on Kubernetes today I think – having writes go straight to other pods and then down to sparse files as a backend and shipping all the data around with ISCSI is pretty brilliant. Jiva is what I have deployed to run this blog right now and it’s been purring along for a couple years at this point (so much so that I can’t upgrade to the newest version easily anymore!).
Here are some Pros for Jiva as far as I’m concerned:
Here are some points against Jiva:
ext4
and xfs
are supported)Jiva is the incumbent so most of this testing is really for me to find something that I am fine using instead of OpenEBS Jiva.
I had to update my ansible
automation to partition & mount drives for Jiva to use… It’s a bit weird that it can’t use raw disks (feels like it could handle the partitioning and filesystem creation). Right f
StoragePool
and StorageClass
esNOTE: Since the control/data plane setup for Jiva is identical for the most part to cStor I’ve excluded it (and the code is reused anyway)
The StoragePool
for Jiva is really easy to set up:
---
apiVersion: openebs.io/v1alpha1
kind: StoragePool
metadata:
name: second-disk
namespace: openebs
type: hostdir
spec:
path: "/second-disk"
One thing I realized while setting htis up is that Jiva seems to only be able to use one disk for StoragePool
s! There was an issue about it a long time ago, it turns out. I’m somewhat surprised because in my own head I don’t remember it working like this, but now that I think about it maybe I picked this in the past due to the fact that I left the software RAID in place, so it was actually fine? That does leave me with some questions though, like where how storage pools on other nodes are picked – do all the mount points have to be named the same? I filed a ticket since I think this is something that should be mentioned in the documentation.
In my case where there is a part of a disk and a whole ’nother disk to attach, maybe it makes the most sense to combine the first and second disks into an LVM logical volume or maybe use software RAID (mdadm
) or something. Looks like ZFS is off the table though. It looks like btrfs
does supoprt extent mapping so maybe that’s the way to go? I’m going to leave that for another day though.
Making the StorageClass
was similarly easy:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
provisioner: openebs.io/provisioner-iscsi
metadata:
name: openebs-jiva-d2-replicated
annotations:
openebs.io/cas-type: jiva
cas.openebs.io/config: |
- name: ReplicaCount
value: "2"
- name: StoragePool
value: second-disk
It’s a bit annoying that I have to have 2 sets of storage classes for each disk – opebs-jiva-d1-[single|replicated]
and openebs-jiva-d2-[single|replicated]
but I guess things could be worse.
With the StoragePool
and the StorageClass
es set up, we’re free to make our PersistentVolumeClaim
s, PersistentVolume
s and Pod
s. I won’t include the basic persistent test here but it works quite easily.
STORAGE_PROVIDER=openebs-localpv-hostpath
#We’ll never know how many people rely on hostPath
volumes in production, but theoretically they should be avoided in production. If you’re going to use them anyway, it’s nice to at least be able to dynamically provision them.
I won’t spend too much time rehashing all of the pros/cons of hostPath
volumes here but in general:
Pros:
Cons:
StoragePool
and StorageClass
esNOTE: Since the control/data plane setup for Jiva is identical for the most part to cStor I’ve excluded it (and the code is reused anyway)
As usual OpenEBS does have a good documentation page on this, so give that a read for a full guide. There’s actually no setup outside of the common OpenEBS setup since the openebs-localpv-provisioner
Deployment
is included. The “volumes” (folders) will be created under /var/openebs/local
on the node that the PVCs are provisioned on. We do still need to make a StorageClass
, so here’s what that looks like:
apiVersion: storage.k8s.io/v1
kind: StorageClass
provisioner: openebs.io/local
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
metadata:
name: local-hostpath
annotations:
openebs.io/cas-type: local
cas.openebs.io/config: |
- name: StorageType
value: hostpath
With this StorageClass
in place we can go make the PVC and Pod like we usually do (which I won’t go into again) and we’re off to the races, nice and simple. This will very likely serve as the ballast against which to test performance, I wonder if there’s a point in even making “real” hostPath
pods to test with when we get to it.
STORAGE_PROVIDER=openebs-localpv-device
#LocalPV is very similar to hostPath
but better – it uses locally provisioned bind-mounted loopback devices which essentially interpret files on the filesystem as disks. It’s a clever idea and one that I had myself very early on – seemed like an obvious easy-to-make local provisioner.
Some pros of using LocalPV devices:
There is one potentially large issue with LocalPV though:
Performance looks to be halved for disks that sync on every write, and there is a whole class of programs who try to make sure to sync very often (databases) so that’s worth watching out for.
StoragePool
and StorageClass
esNOTE: Since the control/data plane setup for Jiva is identical for the most part to cStor I’ve excluded it (and the code is reused anyway)
Same here with hostPath
– there’s great documentation, and there’s not much setup to do outside of making the StorageClass
. There are two choices on underlying filesystems to pick though, ext4
and xfs
– I wonder if it can use any installed filesystem whether it can use btrfs
as well. Here’s what that looks like:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
provisioner: openebs.io/local
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
metadata:
name: openebs-localpv-device
annotations:
openebs.io/cas-type: local
cas.openebs.io/config: |
- name: StorageType
value: device
I ran into some issues while trying to reuse a cluster where openebs-localpv-hostpath
was installed though:
2f-9c67-a23e4c87726a
2021-04-07T10:04:19.272Z ERROR app/provisioner_blockdevice.go:54 {"eventcode": "local.pv.provision.failure", "msg": "Failed to provision Local PV", "rname": "pvc-b4e9930d-7c4f-482f-9c67-a23e4c87726a", "reason": "Block device initialization failed", "storagetype": "device"}
github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app.(*Provisioner).ProvisionBlockDevice
/go/src/github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app/provisioner_blockdevice.go:54
github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app.(*Provisioner).Provision
/go/src/github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app/provisioner.go:131
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).provisionClaimOperation
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:1280
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).syncClaim
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:1019
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).syncClaimHandler
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:988
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).processNextClaimWorkItem.func1
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:895
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).processNextClaimWorkItem
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:917
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).runClaimWorker
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:869
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
Weirdly enough, I’m not sure these two actually work well/coexist together – you’d think they did, but it’s a bit weird that I’m getting an error at all. There’s at least one block device (who knows why there isn’t two):
$ k get bd
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
second-disk all-in-one-01 512110190592 Claimed Active 7m43s
After a good ‘ol hard refresh things… Got worse:
$ k logs -f openebs-localpv-provisioner-d7464d5b-dqw8x
I0407 11:24:47.433831 1 start.go:70] Starting Provisioner...
I0407 11:24:47.445732 1 start.go:132] Leader election enabled for localpv-provisioner
I0407 11:24:47.446084 1 leaderelection.go:242] attempting to acquire leader lease openebs/openebs.io-local...
I0407 11:24:47.450595 1 leaderelection.go:252] successfully acquired lease openebs/openebs.io-local
I0407 11:24:47.450667 1 controller.go:780] Starting provisioner controller openebs.io/local_openebs-localpv-provisioner-d7464d5b-dqw8x_5b405baf-42da-4cbe-9249-6fdd835d80e1!
I0407 11:24:47.450692 1 event.go:281] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"openebs", Name:"openebs.io-local", UID:"137cd575-e424-4750-8da0-ccb8464c8087", APIVersion:"v1", ResourceVersion:"1148", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' openebs-localpv-provisioner-d7464d5b-dqw8x_5b405baf-42da-4cbe-9249-6fdd835d80e1 became leader
I0407 11:24:47.550786 1 controller.go:829] Started provisioner controller openebs.io/local_openebs-localpv-provisioner-d7464d5b-dqw8x_5b405baf-42da-4cbe-9249-6fdd835d80e1!
---- After a device PV gets made ----
I0407 11:28:03.568607 1 controller.go:1211] provision "default/test-single" class "openebs-localpv-device": started
I0407 11:28:03.574190 1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"test-single", UID:"600fb03a-ecc0-44a3-ba48-6c8058a096c1", APIVersion:"v1", ResourceVersion:"1796", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/test-single"
I0407 11:28:03.580632 1 helper_blockdevice.go:175] Getting Block Device Path from BDC bdc-pvc-600fb03a-ecc0-44a3-ba48-6c8058a096c1
E0407 11:28:08.589110 1 runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
goroutine 88 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1667180, 0xc0006a6380)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82
panic(0x1667180, 0xc0006a6380)
/usr/local/go/src/runtime/panic.go:969 +0x166
github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app.(*Provisioner).getBlockDevicePath(0xc00032c000, 0xc0002ff540, 0x0, 0x7, 0xc00012f488, 0x1743d01, 0x8, 0xc00028d838)
/go/src/github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app/helper_blockdevice.go:212 +0x751
github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app.(*Provisioner).ProvisionBlockDevice(0xc00032c000, 0xc00044ef00, 0xc0000445d0, 0x28, 0xc000144fc0, 0xc000324000, 0xc0002cc280, 0x6, 0x174dd96, 0x10)
/go/src/github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app/provisioner_blockdevice.go:51 +0x2b1
github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app.(*Provisioner).Provision(0xc00032c000, 0xc00044ef00, 0xc0000445d0, 0x28, 0xc000144fc0, 0xc000324000, 0xc, 0xc0002ce140, 0x4b)
/go/src/github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app/provisioner.go:131 +0x610
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).provisionClaimOperation(0xc00011c6c0, 0xc000144fc0, 0x252ee00, 0x0, 0x0, 0xc0007385a0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:1280 +0x1594
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).syncClaim(0xc00011c6c0, 0x1725580, 0xc000144fc0, 0xc0000444b0, 0x24)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:1019 +0xd1
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).syncClaimHandler(0xc00011c6c0, 0xc0000444b0, 0x24, 0x413c33, 0xc000861cf8)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:988 +0xb3
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).processNextClaimWorkItem.func1(0xc00011c6c0, 0x14eafc0, 0xc000714020, 0x0, 0x0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:895 +0xe0
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).processNextClaimWorkItem(0xc00011c6c0, 0x0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:917 +0x53
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).runClaimWorker(0xc00011c6c0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:869 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00044b650)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5f
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00044b650, 0x3b9aca00, 0x0, 0x1, 0x0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc00044b650, 0x3b9aca00, 0x0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).Run.func1
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:825 +0x42f
panic: runtime error: index out of range [0] with length 0 [recovered]
panic: runtime error: index out of range [0] with length 0
goroutine 88 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x105
panic(0x1667180, 0xc0006a6380)
/usr/local/go/src/runtime/panic.go:969 +0x166
github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app.(*Provisioner).getBlockDevicePath(0xc00032c000, 0xc0002ff540, 0x0, 0x7, 0xc00012f488, 0x1743d01, 0x8, 0xc00028d838)
/go/src/github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app/helper_blockdevice.go:212 +0x751
github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app.(*Provisioner).ProvisionBlockDevice(0xc00032c000, 0xc00044ef00, 0xc0000445d0, 0x28, 0xc000144fc0, 0xc000324000, 0xc0002cc280, 0x6, 0x174dd96, 0x10)
/go/src/github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app/provisioner_blockdevice.go:51 +0x2b1
github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app.(*Provisioner).Provision(0xc00032c000, 0xc00044ef00, 0xc0000445d0, 0x28, 0xc000144fc0, 0xc000324000, 0xc, 0xc0002ce140, 0x4b)
/go/src/github.com/openebs/dynamic-localpv-provisioner/cmd/provisioner-localpv/app/provisioner.go:131 +0x610
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).provisionClaimOperation(0xc00011c6c0, 0xc000144fc0, 0x252ee00, 0x0, 0x0, 0xc0007385a0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:1280 +0x1594
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).syncClaim(0xc00011c6c0, 0x1725580, 0xc000144fc0, 0xc0000444b0, 0x24)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:1019 +0xd1
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).syncClaimHandler(0xc00011c6c0, 0xc0000444b0, 0x24, 0x413c33, 0xc000861cf8)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:988 +0xb3
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).processNextClaimWorkItem.func1(0xc00011c6c0, 0x14eafc0, 0xc000714020, 0x0, 0x0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:895 +0xe0
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).processNextClaimWorkItem(0xc00011c6c0, 0x0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:917 +0x53
sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).runClaimWorker(0xc00011c6c0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:869 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00044b650)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5f
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00044b650, 0x3b9aca00, 0x0, 0x1, 0x0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc00044b650, 0x3b9aca00, 0x0)
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/sig-storage-lib-external-provisioner/controller.(*ProvisionController).Run.func1
/go/src/github.com/openebs/dynamic-localpv-provisioner/vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:825 +0x42f
runtime: note: your Linux kernel may be buggy
runtime: note: see https://golang.org/wiki/LinuxKernelSignalVectorBug
runtime: note: mlock workaround for kernel bug failed with errno 12
So at this point, I’m not even going to try and look into this, I’m running a very new kernel and I’m unlikely to use LocalPV Device seriously in production. In fact, I’m going to take out the LocalPV device all together.
STORAGE_PROVIDER=openebs-localpv-zfs
#This is a particularly appealing choice for running databases on, ZFS is pretty awesome for rock solid data reliability and protection. The idea of being able to actually just set up ZFS (2 drive mirror) to replace the software RAID that is normally installed on Hetzner and using OpenEBS to provision out of the zpools for local-only storage is awesome. The only thing that would be missing is how to replicate writes for high availability. Of course, in the case you run some piece of software with application level replication features (ex. postgres
w/ logical replication, streaming replication, etc), you can actually just mostly take backups (from the “master” node) and be OK. Anyway, I think this option flies under the radar for many people but is actually really compelling.
This is an incomplete list of pros to ZFS:
And here are some cons, some influenced by my particular platform/settings:
ZFS is a pretty rock solid file system and it’s exciting to get a chance to use it easily in clusters equipped with the right operator.
The early “common” code for setting up OpenEBS actually doesn’t apply at all to the LocalPVs powered by ZFS. I had to make some changes to the ansible
code to make sure that a zpool was created though:
- name: Create a ZFS mirrored pool from first and second partition
tags: [ "drive-partition-prep" ]
# when: storage_plugin in target_plugins and nvme_disk_0.stat.exists and nvme_disk_1.stat.exists and nvme_disk_0_partition_5.stat.exists and nvme_disk_1_partition_5.stat.exists
command: |
zpool create tank mirror /dev/nvme0n1p5 /dev/nvme1n1p5
vars:
target_plugins:
- openebs-localpv-zfs
As far as the Kubernetes resources go, I did take some liberties with the resources before applying them. One change I made was moving the resources to the openebs
namespace. There’s a warning on the repo that goes like this:
You have access to install RBAC components into kube-system namespace. The OpenEBS ZFS driver components are installed in kube-system namespace to allow them to be flagged as system critical components.
Well luckily for me this restriction was flagged as a bug in 2018, and the restriction was lifted by v1.17. The documentation on guaranteed scheduling has also been updated, so I just went around setting priorityClassName
on the structurally important bits. Here’s the full list of resources I needed:
$ tree .
.
├── Makefile
├── openebs-localpv-zfs-default-ext4.storageclass.yaml
├── openebs-localpv-zfs-default.storageclass.yaml
├── openebs-zfs-bin.configmap.yaml
├── openebs-zfs-controller-sa.rbac.yaml
├── openebs-zfs-controller-sa.serviceaccount.yaml
├── openebs-zfs-controller.statefulset.yaml
├── openebs-zfs-csi-driver.csidriver.yaml
├── openebs-zfs-node.ds.yaml
├── openebs-zfs-node-sa.rbac.yaml
├── openebs-zfs-node-sa.serviceaccount.yaml
├── volumesnapshotclass.crd.yaml
├── volumesnapshotcontents.crd.yaml
├── volumesnapshot.crd.yaml
├── zfsbackup.crd.yaml
├── zfsrestore.crd.yaml
├── zfssnapshot.crd.yaml
└── zfsvolume.crd.yaml
0 directories, 18 files
So ~17 distinct resources required, not a huge amount yet not a small amount either. If we don’t count the CRDs and StorageClass
es then it’s even less, which is nice. Structurally the important compnents are the ZFS controller (openebs-zfs-controller.statefulset.yaml
), ZFS per-node DaemonSet
(openebs-zfs-node.ds.yaml
) and the CSI driver (openebs-zfs-csi-driver.csidriver.yaml
).
StorageClass
esSo the storage class looks like this:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-localpv-zfs-default-ext4
provisioner: zfs.csi.openebs.io
parameters:
recordsize: "16k"
compression: "lz4"
atime: "off"
dedup: "off"
fstype: "ext4"
logbias: "throughput"
poolname: "tank"
xattr: "sa"
All the ZFS settings that I mentioned in part 2 that might be good are in there, but there’s one interesting thing – the fstype
is specifyable! This particular storage class will make ext4
the filesystem, but the default (represented by openebs-localpv-zfs-default
) is actually to give out ZFS-formatted PersistentVolume
s.
With the StorageClass
es made, we’re free to make our PersistentVolumeClaim
s, PersistentVolume
s and Pod
s. Remember, since the replication is happening at the ZFS level (the mirror
ed tank
pool), I don’t have to make any *-replicated.*.yaml
pods/PVCs – all the pods will be replicated and will have disk-level high durability/availability (though if one disk goes down, we’re in a super dangerous but working position). ZFS also doesn’t do synchronous replication (and it’s up to you to set up asynchronous replication via zfs send
/zfs recv
), so there’s no node-level high availability built in.
One thing I ran into was the need for topologies to be specified. When I first created a Pod + PVC I saw the following events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 37s (x9 over 4m50s) zfs.csi.openebs.io_openebs-zfs-controller-0_0c369586-c312-4551-a297-88fe247b0c79 External provisioner is provisioning volume for claim "default/test-single"
Warning ProvisioningFailed 37s (x9 over 4m50s) zfs.csi.openebs.io_openebs-zfs-controller-0_0c369586-c312-4551-a297-88fe247b0c79 failed to provision volume with StorageClass "openebs-localpv-zfs-default": error generating accessibility requirements: no available topology found
Normal ExternalProvisioning 5s (x21 over 4m50s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "zfs.csi.openebs.io" or manually created by system administrator
So the PVC couldn’t be created because there was no available topology. Looking at the docs it didn’t seem like the topology options were required, but they are:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-localpv-zfs-default
provisioner: zfs.csi.openebs.io
volumeBindingMode: WaitForFirstConsumer
parameters:
recordsize: "16k"
compression: "lz4"
atime: "off"
dedup: "off"
fstype: "zfs"
logbias: "throughput"
poolname: "tank"
xattr: "sa"
allowedTopologies:
- matchLabelExpressions:
- key: zfs-support
values:
- "yes"
I guess this makes sense since not every node will necessarily have ZFS tools installed, so I’d rather do this than disable --strict-topology
on the controller. Of course we’ll have to label the node:
$ k label node all-in-one-01 zfs-support=yes
node/all-in-one-01 labeled
$ k get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
all-in-one-01 Ready <none> 54m v1.20.5-k0s1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=all-in-one-01,kubernetes.io/os=linux,zfs-support=yes
OK great, except this didn’t work! The error is different now, but there’s still something wrong:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 15s persistentvolume-controller waiting for first consumer to be created before binding
Normal Provisioning 6s (x4 over 13s) zfs.csi.openebs.io_openebs-zfs-controller-0_0c369586-c312-4551-a297-88fe247b0c79 External provisioner is provisioning volume for claim "default/test-single"
Warning ProvisioningFailed 6s (x4 over 13s) zfs.csi.openebs.io_openebs-zfs-controller-0_0c369586-c312-4551-a297-88fe247b0c79 failed to provision volume with StorageClass "openebs-localpv-zfs-default": error generating accessibility requirements: no topology key found on CSINode all-in-one-01
Normal ExternalProvisioning 5s (x3 over 13s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "zfs.csi.openebs.io" or manually created by system administrator
Has the CSINode
not properly taken in the Node
labels? It looks like editing the node after the fact (maybe without a restart) wasn’t going to update the CSINode
properly, so I went ahead and did a full reset…. Which didn’t work. Looks like it’s time for more digging!
The first thing that sticks out is that the CSINode
that matches my node (all-in-one-01
) has no drivers:
$ k get csinode
NAME DRIVERS AGE
all-in-one-01 0 9h
That’s a bit bizarre! Luckily for me there’s an issue filed that shows what’s wrong – it’s our old friend /var/lib/kubelet
! I added another comment to the issue in k0s
and called it a day. I also added a small documentation PR for openebs/zfs-localpv
. If the k0s
/k3s
crew don’t fix this early they’re going to be pelted by mistaken reports of issues down the road – every important cluster/node-level utility that tries to access /var/lib/kubelet
is going to have this issue.
With registration fix, we’ve got the CSI driver registered:
$ k get csinode
NAME DRIVERS AGE
all-in-one-01 1 10h
And the pod + PVC combination is working properly:
$ k get pod
NAME READY STATUS RESTARTS AGE
test-single 1/1 Running 0 8s
$ k exec -it test-single -n default -- /bin/ash
/ # echo "hello ZFS!" > /var/data/hello-zfs.txt
/ # ls /var/data/
hello-zfs.txt
OK since everything is working now, rather than rehashing the Pod and PVC specs I’ll include the output of some zfs
subcommands to show what’s happened after I spun up a Pod
with the right PersistentVolumeClaim
:
root@all-in-one-01 ~ # zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 153K 382G 24K /tank
tank/pvc-14327c05-a1b6-4359-8a62-32ebb6db80e2 24K 1024M 24K legacy
Oh look there’s the PVC we made as a data set! Awesome. Now we’ve got all the power of ZFS. This might be the best bang-for-buck storage plugin there is – You could provision and manage ZFS pools with this quite efficiently. Some automation around zfs send
/zfs recv
, or running sanoid
as a DaemonSet
could be really awesome.
drbd
) - STORAGE_PROVIDER=linstor
#LINSTOR is a somewhat new (to me) entry into the field – drbd
is venerable old technology which has been in the kernel since December 2009 (version 2.6.33). The way it works looks to be relatively simple, and it’s free and open source software. The corporate apparatus around it is a bit stodgy (take a look at their website and you’ll see) but luckily we don’t have to buy an enterprise license to take it for a spin and see how it performs against the rest.
After looking for some videos on LINSTOR I came across Case-Study: 600 Nodes on DRBD + LINSTOR for Kubernetes, OpenNebula and Proxmox which I watched – pretty exciting stuff. It certainly scales so that’s nice. There’s also a quick video on the HA abilities of drbd
. It looks like one of the features they were working on in that 600 node talk, backups to S3, got implemented? If LINSTOR storage performance turns out to be good I’m going to be pretty excited to use it, or at least have it installed alongside the other options.
Along with that it looks like another project I like a lot, KubeVirt works with DRBD block devices, including live migration.
Pros of LINSTOR:
drbd
)Cons of LINSTOR:
One thing about LINSTOR was that I was quite confused about the various projects that existed which were trying to integrate it with Kubernetes. There were two main ones I found:
I was a bit confused on how they fit together, so I filed an issue @ kvaps/kube-linstor
(you might recognize/remember, kvaps
gave that talk on the 600 node LINSTOR cluster), and kvaps
himself was nice enough to respond and clarify things for me. Reproduced below:
Actually kube-linstor (this project) started earlier than the official piraeus-operator. Both projects have the same goals to containerize linstor-server and other components and provide ready solution for running them in Kubernetes. But they are doing that different ways, eg:
- kube-linstor - is just standard Helm-chart, it consists of static YAML-manifests, very simple, but it has no some functionality inherent in piraeus-operator, eg. autoconfiguration of storage-pools (work in progress) and auto-injecting DRBD kernel-module.
- piraeus-operator - implements the operator pattern and allows to bootstrap LINSTOR cluster and configure it using Kubernetes Custom Resources.
That made things very clear, so thanks to kvaps
for that! I think I’m going to go with kvaps/kube-linstor
since it looks like it is more straight forward/simple and might be a little easier for me to get started with, both projects look great though.
One major benefit of using kvaps/kube-linstor
is that it includes the optionally includes TLS setup for the control plane – I’m not currently using any automatic mutual TLS solution (Cilium, Istio, Calico, etc) so this is nice for me. Unfortunately, I’m not using Helm, and writing the code to generate the TLS certs, store them, and use them in the scripts is not complication I want so I’m going to be forgoing that bit. In production what I’ll do is turn on the wireguard
integration and/or use the automatic mutual TLS features of something like linkerd
or Cilium that is mostly set-and-forget.
One major benefit of using the piraeus-data-store/piraeus-operator
is that it has some facilities for automatic drive detection (and manual detection if you want). piraeus-operator
is a bit more automated and “kubernetes native” – almost too much, starting the controller is a CRD and not a DaemonSet – so it’s also worth giving a shot… Hard to decide between these two but since I want to maximize the amount of LINSTOR I learn while doing this (using this as a sort of intro to the technology in general), I’ll go with the slightly-more-manual kvaps/kube-linstor
.
Anyway, here’s the full listing of files required (remember this is using the slightly more manual kvaps/kube-linstor
):
$ tree .
.
├── controller-client.configmap.yaml
├── controller-config.secret.yaml
├── controller.deployment.yaml
├── controller.rbac.yaml
├── controller.serviceaccount.yaml
├── controller.svc.yaml
├── csi-controller.deployment.yaml
├── csi-controller.rbac.yaml
├── csi-controller.serviceaccount.yaml
├── csi-node.ds.yaml
├── csi-node.rbac.yaml
├── csi-node.serviceaccount.yaml
├── linstor-control-plane.psp.yaml
├── linstor.ns.yaml
├── Makefile
├── satellite.configmap.yaml
├── satellite.ds.yaml
├── satellite.rbac.yaml
└── satellite.serviceaccount.yaml
0 directories, 19 files
So quite a few files/things to keep track of, but the big structural elements are easy to make out (I’ve taken tons of liberties with naming of files and resources).
There are a few things I need to do at the node level to prepare them to run LINSTOR. In the previous post I went over how I installed as much LINSTOR related software from apt
as I could, hopefully this won’t come back to bite me like pre-installing ceph
things did. It shouldn’t since most documentation calls for prepping the given node(s) with the appropriate kernel modules and things. Anyway, here’s the LINSTOR-specific Ansible code:
- name: Install ZFS
when: storage_plugin in target_plugins
ansible.builtin.apt:
name: zfsutils-linux
update_cache: yes
state: present
vars:
target_plugins:
- openebs-localpv-zfs
- linstor-bd9
- name: Install LVM
when: storage_plugin in target_plugins
block:
- name: Install lvm2
ansible.builtin.apt:
name: lvm2
update_cache: yes
state: present
- name: Ensure rbd kernel module is installed
community.general.modprobe:
name: rbd
state: present
vars:
target_plugins:
- rook-ceph-lvm
- linstor-rbd9
- name: Add drbd9 apt repositories
when: storage_plugin in target_plugins
ansible.builtin.apt_repository:
repo: ppa:linbit/linbit-drbd9-stack
state: present
vars:
target_plugins:
- linstor-rbd9
- name: Install LINSTOR components
when: storage_plugin in target_plugins
block:
- name: Install drbd packages
ansible.builtin.apt:
name:
- drbd-dkms
- drbd-utils
update_cache: yes
state: present
- name: Install linstor components
ansible.builtin.apt:
name:
- linstor-controller
- linstor-satellite
- linstor-client
update_cache: yes
state: present
- name: Ensure rbd kernel module is installed
community.general.modprobe:
name: rbd
state: present
vars:
target_plugins:
- linstor-rbd9
I’ve installed both LVM and ZFS to make it possible to go either way on LINSTOR’s underlying storage. As far the disk layout, I just made sure to leave the single disk partition and the second disk as empty as possible, and refer to the original documentation for the configuration as directed in the docs.
LINSTOR’s user guide is the place to go for information on provisioning disks for LINSTOR, so I went there and read up a bit. The section on storage pools makes it pretty easy, and with the linstor
CLI tool already installed (thanks apt
!) I only have to run ~1 command outside of the safety of the LINSTOR documentation on-ramp, which is making the LVM Volume Group (which I messed with back in part-2. LINSTOR can use either LVM or ZFS so I figured I would go with LVM since we are already testing ZFS-on-bare-metal with OpenEBS’s LocalPV ZFS.
LINSTOR expects referenced LVM VolumeGroup
s to already be present, so we’re going to have to do a bit of setup ourselves. I wasn’t 100% sure on the differences and tradeoffs between thick and thin LVM pools, so I needed to spend some time reading up on them – I found a few good resources:
Reading these I think I’m going with thin provisioning. Another option that we can set on LINSTOR LVM pools is the LVM RAID level. I’m pretty happy with RAID1 (mirroring) across the disks, so I’ll go with that. Here’s roughly what this looks like in ansible
YAML:
- name: Create an LVM thin provisioned pool for LINSTOR
when: storage_plugin in target_plugins
tags: [ "drive-partition-prep" ]
block:
- name: Create Volume Group with both drives
# NOTE: LINSTOR requires/expects similar LVM VG/ZPool naming to share across nodes
command: |
vgcreate vg_nvme /dev/nvme0n1p5 /dev/nvme1n1
- name: Initialize LINSTOR storage-pool for disk one
shell: |
linstor storage-pool create lvmthin linstor-{{ k8s_node_name }} pool_nvme vg_nvme
vars:
target_plugins:
- openebs-localpv-zfs
While I was figuring this out I referenced the LINSTOR documentation and referred to another good guide out there from 2019.
So a huge deep breath, and it’s time apply -f
all the files. I immediately found some issues with my YAML which I went and fixed but I was met with this nasty error:
$ k logs satellite-6q76d
LINSTOR, Module Satellite
Version: 1.11.1 (fe95a94d86c66c6c9846a3cf579a1a776f95d3f4)
Build time: 2021-02-11T14:40:43+00:00
Java Version: 11
Java VM: Debian, Version 11.0.9.1+1-post-Debian-1deb10u2
Operating system: Linux, Version 5.4.0-67-generic
Environment: amd64, 1 processors, 15528 MiB memory reserved for allocations
System components initialization in progress
07:25:09.212 [main] INFO LINSTOR/Satellite - SYSTEM - ErrorReporter DB version 1 found.
07:25:09.213 [main] INFO LINSTOR/Satellite - SYSTEM - Log directory set to: '/logs'
07:25:09.230 [main] WARN io.sentry.dsn.Dsn - *** Couldn't find a suitable DSN, Sentry operations will do nothing! See documentation: https://docs.sentry.io/clients/java/ ***
07:25:09.234 [Main] INFO LINSTOR/Satellite - SYSTEM - Loading API classes started.
07:25:09.366 [Main] INFO LINSTOR/Satellite - SYSTEM - API classes loading finished: 132ms
07:25:09.366 [Main] INFO LINSTOR/Satellite - SYSTEM - Dependency injection started.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/usr/share/linstor-server/lib/guice-4.2.3.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
07:25:09.800 [Main] INFO LINSTOR/Satellite - SYSTEM - Dependency injection finished: 434ms
07:25:10.098 [Main] INFO LINSTOR/Satellite - SYSTEM - Initializing main network communications service
07:25:10.098 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
07:25:10.099 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'FileEventService' of type FileEventService
07:25:10.099 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'SnapshotShippingService' of type SnapshotShippingService
07:25:10.099 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'DeviceManager' of type DeviceManager
07:25:10.105 [Main] WARN LINSTOR/Satellite - SYSTEM - NetComService: Connector NetComService: Binding the socket to the IPv6 anylocal address failed, attempting fallback to IPv4
07:25:10.105 [Main] ERROR LINSTOR/Satellite - SYSTEM - NetComService: Connector NetComService: Attempt to fallback to IPv4 failed
07:25:10.123 [Main] ERROR LINSTOR/Satellite - SYSTEM - Initialization of the com.linbit.linstor.netcom.TcpConnectorService service instance 'NetComService' failed. [Report number 606EAFD4-928FB-000000]
07:25:10.128 [Main] ERROR LINSTOR/Satellite - SYSTEM - Initialisation of SatelliteNetComServices failed. [Report number 606EAFD4-928FB-000001]
07:25:10.129 [Thread-2] INFO LINSTOR/Satellite - SYSTEM - Shutdown in progress
07:25:10.129 [Thread-2] INFO LINSTOR/Satellite - SYSTEM - Shutting down service instance 'DeviceManager' of type DeviceManager
07:25:10.129 [Thread-2] INFO LINSTOR/Satellite - SYSTEM - Waiting for service instance 'DeviceManager' to complete shutdown
07:25:10.130 [Thread-2] INFO LINSTOR/Satellite - SYSTEM - Shutting down service instance 'SnapshotShippingService' of type SnapshotShippingService
07:25:10.130 [Thread-2] INFO LINSTOR/Satellite - SYSTEM - Waiting for service instance 'SnapshotShippingService' to complete shutdown
07:25:10.130 [Thread-2] INFO LINSTOR/Satellite - SYSTEM - Shutting down service instance 'FileEventService' of type FileEventService
07:25:10.130 [Thread-2] INFO LINSTOR/Satellite - SYSTEM - Waiting for service instance 'FileEventService' to complete shutdown
07:25:10.131 [Thread-2] INFO LINSTOR/Satellite - SYSTEM - Shutting down service instance 'TimerEventService' of type TimerEventService
07:25:10.131 [Thread-2] INFO LINSTOR/Satellite - SYSTEM - Waiting for service instance 'TimerEventService' to complete shutdown
07:25:10.131 [Thread-2] INFO LINSTOR/Satellite - SYSTEM - Shutdown complete
As this point I’d be suspicious if anything worked on the first try, so I’m mostly enhearted that the logging is good and orderly for LINSTOR! But it lokos like LINSTOR might be written in java..? Oh no, say it ain’t so, I didn’t even notice it on the LINBIT/linstor-server
GitHub project. Oh well I won’t let my bias against Java spoil this free high-quality software I’m about to partake of!
The error at hand is that the NetComService
has failed – it could not bind to the IPv6 port (there shouldn’t be a problem with that) and so it tried IPv4 but failed there too (again, weird). This is probably a PSP binding (which happens through RBAC problem)… And after fixing those, nothing changed (though the Role
was missing a namespace
). You know what this is – this is another case of pre-installing biting me in the ass. Let’s see if there’s something listening on port 3366 already on the machine:
root@all-in-one-01 ~ # lsof -i :3366
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 25432 root 132u IPv6 119091 0t0 TCP *:3366 (LISTEN)
Of course – I made the same mistake as with ceph
– installing the package from apt
is not a good idea, because it will start a LINSTOR instance for you. I removed the installation of linstor-controller
and linstor-satellite
from the ansible
code and from-zero re-install of the machine.
I had one more issue – be careful of ReplicaSet
s getting stuck with 0 pods but still being present – if you look at the controller
pod logs, it’s possible that the “leader” doesn’t actually exist:
$ k get pods
NAME READY STATUS RESTARTS AGE
controller-58dcfb884d-2jpg7 1/1 Running 0 19s
controller-58dcfb884d-c8v9d 1/1 Running 0 19s
csi-node-jmrcd 2/3 CrashLoopBackOff 5 3m9s
satellite-kkgmh 1/1 Running 0 5h47m
$ k logs -f controller-58dcfb884d-c8v9d
time="2021-04-08T14:07:17Z" level=info msg="running k8s-await-election" version=v0.2.2
I0408 14:07:17.385975 1 leaderelection.go:242] attempting to acquire leader lease linstor/controller...
time="2021-04-08T14:07:17Z" level=info msg="long live our new leader: 'controller-587cb449d7-vvjp2'!"
This is probably an unlikely occurrence (I probably had just the right failures in just the right order), but worth knowing about. After clearing that up, I was able to get everything running:
$ k get pods
NAME READY STATUS RESTARTS AGE
controller-58dcfb884d-2jpg7 1/1 Running 0 11h
controller-58dcfb884d-c8v9d 1/1 Running 0 11h
csi-node-c6n5q 3/3 Running 1 24s
satellite-dz66f 1/1 Running 0 2m51s
OK awesome, now that the control plane is up, I’ve realized that I’m missing something… I need to define storage pools that LINSTOR will use!
LINSTOR can use a variety of databases:
I don’t want to introduce too much non-essential complexity into deploying LINSTOR though, so I’m going to go with ETCD. Looking at the configuration it looks like when etcd
is used for the DB, kube-linstor
will actually use the available ETCD instance that Kubernetes is running on, and you can supply a prefix to “isolate” your writes from everything else. I’m going to use an etcd prefix
of "linstor"
(configured in controller-config.secret.yaml
).
As I haven’t seen any complaints about the DB being used just yet in the logs, I’m assuming I’m good on this front!
StoragePool
s for LINSTOROne thing that kvaps/kube-linstor
doesn’t do (or seem to do, as far as I can tell) is create the storage pools you’ll actually be using. The piraeus-operator
does do this but I didn’t choose that so… Theoretically LINSTOR is capable of creating the storage pools automatically/dealing with disks, but in my case I want to follow the LINSTOR documentation where they discuss one pool per disk for disk-level replication, but they mention it but don’t really describe it – I can’t tell if I should be making two LMV volume groups (with one volume each) or exposing one volume group which is mirrored (and will waste some of the space, but theoretically in a larger cluster the space wouldn’t get wasted cross-machine). The best way to make sure I get the layout I want is to configure it manually.
Luckily for me there’s a PR on adding storage pools via init on csi-node
which lays it all out very well for me. While I don’t need the helm
machinations, I’ve adapted that code to work for my setup as an init container:
# Configure storage pools
# (see https://github.com/kvaps/kube-linstor/pull/31)
- name: add-storage-pools
image: ghcr.io/kvaps/linstor-satellite:v1.11.1-1
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -exc
# TODO: Maybe get this JSON by mounting a ConfigMap? Automatic discovery would be much better
# TODO: clustered deployments will have a problem with this.
# First drive
- |
curl -s -f http://${CONTROLLER_HOST}.${CONTROLLER_NAMESPACE}:${CONTROLLER_PORT}/v1/nodes/${NODE_NAME} || exit 1
curl -s -f \
-H "Content-Type: application/json" \
-d "{\"storage_pool_name\":\"linstor-${NODE_NAME}-0\",\"provider_kind\":\"LVM_THIN\",\"props\":{\"StorDriver/LvmVg\":\"vg_linstor_0\",\"StorDriver/ThinPool\":\"lv_thin_linstor_0\"}}" \
http://${CONTROLLER_HOST}.${CONTROLLER_NAMESPACE}:${CONTROLLER_PORT}/v1/nodes/${NODE_NAME}/storage-pools || true
# Second drive
- |
curl -s -f http://${CONTROLLER_HOST}.${CONTROLLER_NAMESPACE}:${CONTROLLER_PORT}/v1/nodes/${NODE_NAME} || exit 1
curl -s -f \
-H "Content-Type: application/json" \
-d "{\"storage_pool_name\":\"linstor-${NODE_NAME}-1\",\"provider_kind\":\"LVM_THIN\",\"props\":{\"StorDriver/LvmVg\":\"vg_linstor_1\",\"StorDriver/ThinPool\":\"lv_thin_linstor_1\"}}" \
http://${CONTROLLER_HOST}.${CONTROLLER_NAMESPACE}:${CONTROLLER_PORT}/v1/nodes/${NODE_NAME}/storage-pools || true
env:
- name: CONTROLLER_PORT
value: "3370"
- name: CONTROLLER_HOST
value: "controller"
- name: CONTROLLER_NAMESPACE
value: linstor
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
A bit manual, but in a world where I’m dealing with mostly homogeneous nodes I think I’m OK. With the re-used/consistent naming I should also be able to get re-use for all the 1st and second drives – it’s not clear to me if LINSTOR will pick up on the logical volume name or the volume group name for sharing – if it was looking at the LV name only I could probably name them all lv_nvme
and get the whole cluster to be fully pooled but right now all the first disks and all the second disks would be pooled as far as I can tell. I’m only running on one node here so it doesn’t really matter but it will later.
I did have to do some adjustment to the ansible side to pre-provision the LVM pieces though, so I’ll share that as well:
- name: Create an LVM thin provisioned pool for LINSTOR
when: storage_plugin in target_plugins and nvme_disk_0_partition_5.stat.exists and nvme_disk_1.stat.exists
tags: [ "drive-partition-prep" ]
block:
# NOTE: LINSTOR requires/expects similar LVM VG/ZPool naming to share across nodes
# NOTE: To get per-device isolation, we need to create a storage pool per backend device, not sure if this includes VGs and thin LVs
# (see: https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-a_storage_pool_per_backend_device)
- name: Create Volume Group for disk one partition 5
community.general.lvg:
vg: vg_linstor_0
pvs: /dev/nvme0n1p5
pvresize: yes # maximum available size
- name: Create Volume Group for disk 2
community.general.lvg:
vg: vg_linstor_1
pvs: /dev/nvme1n1
pvresize: yes # maximum available size
- name: Create thinpool for disk one partition 5
community.general.lvol:
vg: vg_linstor_0
thinpool: lv_thin_linstor_0
size: 100%FREE
- name: Create thinpool for disk one
community.general.lvol:
vg: vg_linstor_1
thinpool: lv_thin_linstor_1
size: 100%FREE
vars:
target_plugins:
- linstor-drbd9
StorageClass
esOK so now that the control and data planes are set up we can make the stoarge classes:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linstor-single
provisioner: linstor.csi.linbit.com
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
- key: zfs-support
values:
- "yes"
parameters:
# # CSI related parameters
# csi.storage.k8s.io/fstype: xfs
# LINSTOR parameters
placementCount: "1" # aka `autoPlace`, replica count
# resourceGroup: "full-example"
# storagePool: "my-storage-pool"
# disklessStoragePool: "DfltDisklessStorPool"
# layerList: "drbd,storage"
# placementPolicy: "AutoPlace"
# allowRemoteVolumeAccess: "true"
# encryption: "true"
# nodeList: "diskful-a,diskful-b"
# clientList: "diskless-a,diskless-b"
# replicasOnSame: "zone=a"
# replicasOnDifferent: "rack"
# disklessOnRemaining: "false"
# doNotPlaceWithRegex: "tainted.*"
# fsOpts: "nodiscard"
# mountOpts: "noatime"
# postMountXfsOpts: "extsize 2m"
# # DRBD parameters
# DrbdOptions/*: <x>
Lots of options I’m not going to touch there, and I wonder how it would be to jam all the ZFS options into mountOpts
, but for now I’m happy it’s finished with nothing else going wrong!
I’ll spare you the PVC + Pod YAML and persistence test, but I’ve developed some intuition for testing whether a CSI storage plugin is working and that’s worth sharing:
$ k get csidriver
NAME ATTACHREQUIRED PODINFOONMOUNT MODES AGE
linstor.csi.linbit.com true true Persistent 55m
$ k get csinode
NAME DRIVERS AGE
all-in-one-01 1 75m
So up until now I’ve been using test-single
and test-replicated
as my storage class names but I’m going to change them – I couldn’t think of the right naming for non replicated disks, but that’s just because I’m not a graybeard sysadmin! In RAID terminology the closest thing to the right term is “RAID0”, but there’s also the term JBOD. Since RAID0 seems to almost always mean striping I can’t use that, but jbod
works for the non-replicated case. So basically, test-single.storageclass.yaml
-> jbod.storageclass.yaml
and test-replicated.storageclass.yaml
everywhere. Not a huge thing but thought it was worth describing.
Unfortunately not everything matches up nicely for example in LocalPV ZFS the disk below is mirrored so it doesn’t make sense to make a jbod
StorageClass
, but what can you do – I think I’ve yak shaved enough.
OpenEBS continued it’s tradition of being easy to set up and LINSTOR wasn’t much far behind! I have to admit I’m still a little sore from the difficulties of the Ceph setup but it was refreshing that these were so much easier to install. Most people just never look at the resources and kubectl apply -f
the large file, but that’s crazy to me, because when something goes wrong you’d have absolutely no part which pieces were which and the expanse of the problem domain.
Anyway, hopefully now everyone has some reference material on the installation of these tools. Finally we can get on to actually running the tests – I’m pretty sure simply running some fio
pods (and even booting up postgres
/etc pods for testing) will be much easier than this setup was. It’s the whole point of using Kubernetes – set this stuff up once, and the abstractions above (PersistentVolumeClaim
s, PersistentVolume
s, Pod
s, Deployment
s) are super easy to use.
So it looks like I let some broken code slip through – the LINSTOR code is broken/flaky. It turns out it’s really important the order in which you start the compnents (piraeus-operator
may have been better at this…) – controller
comes first, then satellite
(which registers the node) then the csi-node
(which registers storage pools) and the rest of it. After a hard refresh it was very finnicky. You have to be sure you can get at least this:
$ k exec -it deploy/controller -- /bin/bash
root@controller-869dcf7955-6xtgb:/# linstor node list
╭─────────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞═════════════════════════════════════════════════════════════════╡
┊ all-in-one-01 ┊ SATELLITE ┊ xx.xxx.xx.xxx:3366 (PLAIN) ┊ Online ┊
╰─────────────────────────────────────────────────────────────────╯
Once you have the node registered (which means satellite
is fine), the next thing to check is if you have the storage pools (generally the state of the csi-node
deployment would tell you this) :
root@controller-869dcf7955-6xtgb:/# linstor storage-pool list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ all-in-one-01 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Nope, we are definitely missing a pool made of disks in there. If you look at the output of the add-storage-pools
init container there might be some hints:
[Default|linstor] mrman 17:50:01 [linstor] $ k logs ds/csi-node -c add-storage-pools
+ curl -s -f http://controller.linstor:3370/v1/nodes/all-in-one-01
+ exit 1
{"name":"all-in-one-01","type":"SATELLITE","props":{"CurStltConnName":"default","NodeUname":"all-in-one-01"},"net_interfaces":[{"name":"default","address":"XXX.XXX.XXX.XXX","satellite_port":3366,"satellite_encryption_type":"PLAIN","is_active":true,"uuid":"2ef1e6f7-c2df-46c2-a346-6b7443ed3e43"}],"connection_status":"ONLINE","uuid":"b0b7ff34-4232-465c-b97a-6df0f8320cbe","storage_providers":["DISKLESS","LVM","LVM_THIN","FILE","FILE_THIN","OPENFLEX_TARGET"],"resource_layers":["LUKS","NVME","WRITECACHE","CACHE","OPENFLEX","STORAGE"],"unsupported_providers":{"SPDK":["IO exception occured when running 'rpc.py get_spdk_version': Cannot run program \"rpc.py\": error=2, No such file or directory"],"ZFS_THIN":["'cat /sys/module/zfs/version' returned with exit code 1"],"ZFS":["'cat /sys/module/zfs/version' returned with exit code 1"]},"unsupported_layers":{"DRBD":["DRBD version has to be >= 9. Current DRBD version: 0.0.0"]}}
And if we clean that up…
{
"name": "all-in-one-01",
"type": "SATELLITE",
"props": {
"CurStltConnName": "default",
"NodeUname": "all-in-one-01"
},
"net_interfaces": [
{
"name": "default",
"address": "XXX.XXX.XXX.XXX",
"satellite_port": 3366,
"satellite_encryption_type": "PLAIN",
"is_active": true,
"uuid": "2ef1e6f7-c2df-46c2-a346-6b7443ed3e43"
}
],
"connection_status": "ONLINE",
"uuid": "b0b7ff34-4232-465c-b97a-6df0f8320cbe",
"storage_providers": [
"DISKLESS",
"LVM",
"LVM_THIN",
"FILE",
"FILE_THIN",
"OPENFLEX_TARGET"
],
"resource_layers": [
"LUKS",
"NVME",
"WRITECACHE",
"CACHE",
"OPENFLEX",
"STORAGE"
],
"unsupported_providers": {
"SPDK": [
"IO exception occured when running 'rpc.py get_spdk_version': Cannot run program \"rpc.py\": error=2, No such file or directory"
],
"ZFS_THIN": [
"'cat /sys/module/zfs/version' returned with exit code 1"
],
"ZFS": [
"'cat /sys/module/zfs/version' returned with exit code 1"
]
},
"unsupported_layers": {
"DRBD": [
"DRBD version has to be >= 9. Current DRBD version: 0.0.0"
]
}
}
Well a few unexpected errors there, but it looks like I can ignore the unsupported_providers
section – zfs
isn’t installed so of course it couldn’t get the version. One section that I probably can’t ignore is the unsupported_layers
section – drbd
is definitely supposed to be on the machine – Current drbd
version should not be 0.0.0
(if that’s even a valid version). Why would it be unable to find DRBD?
I definitely installed drbd
… Right? Well I found an issue similar to mine and of course I skipped to the bit where he check the hostname (because the issue filer did have the drbd9
kernel module installed), and sure enough my username does match my LINSTOR node name (the node registered)… But actually, if I run lsmod | grep -i drbd9
I get:
$ lsmod | grep -i drbd9
Welp, looks like I installed but forgot to enable (via modprobe
) the drbd
kernel module:
modified ansible/storage-plugin-setup.yml
@@ -132,7 +132,7 @@
state: present
- name: Ensure rbd kernel module is installed
community.general.modprobe:
- name: rbd
+ name: drbd
state: present
vars:
target_plugins:
- linstor-drbd9
Off by a single ’d’ (rbd
was required for some other things so there’s another legitimate modprobe
elsewhere). Another thing – to make sure this kernel module gets loaded every time, I’m going to need to make sure there’s something in /etc/modprobe.d
:
- name: Ensure kernel module comes up with next restart
ansible.builtin.copy:
dest: "/etc/modprobe.d/drbd.conf"
content: |
options drbd
I fixed rbd
and nvme-tcp
as well. After that I got a slightly different error (always a good sign):
"unsupported_layers":{
"DRBD": [
"DRBD version has to be >= 9. Current DRBD version: 8.4.11"
]
}
}
Well that’s not good – why don’t I have v9? turns out it wasn’t enough just to try and install drbd-dkms
and add the ppa:linbit/linbit-drbd9-stack
repository – I found an excellent guide. So you need to anso make sure that the linux sources are installed so that a new kernel can be rebuilt – after installing linux-generic
I saw this output (important bit is towards the end):
root@all-in-one-01 ~ # apt install linux-generic
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
linux-headers-5.4.0-70 linux-headers-5.4.0-70-generic linux-headers-generic
The following NEW packages will be installed:
linux-generic linux-headers-5.4.0-70 linux-headers-5.4.0-70-generic linux-headers-generic
0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
Need to get 12.4 MB of archives.
After this operation, 85.9 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://de.archive.ubuntu.com/ubuntu focal-updates/main amd64 linux-headers-5.4.0-70 all 5.4.0-70.78 [11.0 MB]
Get:2 http://de.archive.ubuntu.com/ubuntu focal-updates/main amd64 linux-headers-5.4.0-70-generic amd64 5.4.0-70.78 [1,400 kB]
Get:3 http://de.archive.ubuntu.com/ubuntu focal-updates/main amd64 linux-headers-generic amd64 5.4.0.70.73 [2,428 B]
Get:4 http://de.archive.ubuntu.com/ubuntu focal-updates/main amd64 linux-generic amd64 5.4.0.70.73 [1,896 B]
Fetched 12.4 MB in 0s (27.9 MB/s)
Selecting previously unselected package linux-headers-5.4.0-70.
(Reading database ... 45993 files and directories currently installed.)
Preparing to unpack .../linux-headers-5.4.0-70_5.4.0-70.78_all.deb ...
Unpacking linux-headers-5.4.0-70 (5.4.0-70.78) ...
Selecting previously unselected package linux-headers-5.4.0-70-generic.
Preparing to unpack .../linux-headers-5.4.0-70-generic_5.4.0-70.78_amd64.deb ...
Unpacking linux-headers-5.4.0-70-generic (5.4.0-70.78) ...
Selecting previously unselected package linux-headers-generic.
Preparing to unpack .../linux-headers-generic_5.4.0.70.73_amd64.deb ...
Unpacking linux-headers-generic (5.4.0.70.73) ...
Selecting previously unselected package linux-generic.
Preparing to unpack .../linux-generic_5.4.0.70.73_amd64.deb ...
Unpacking linux-generic (5.4.0.70.73) ...
Setting up linux-headers-5.4.0-70 (5.4.0-70.78) ...
Setting up linux-headers-5.4.0-70-generic (5.4.0-70.78) ...
/etc/kernel/header_postinst.d/dkms:
* dkms: running auto installation service for kernel 5.4.0-70-generic
Kernel preparation unnecessary for this kernel. Skipping...
Building module:
cleaning build area...
make -j12 KERNELRELEASE=5.4.0-70-generic -C src/drbd KDIR=/lib/modules/5.4.0-70-generic/build.....
cleaning build area...
DKMS: build completed.
drbd.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-70-generic/updates/dkms/
drbd_transport_tcp.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-70-generic/updates/dkms/
depmod...
DKMS: install completed.
...done.
Setting up linux-headers-generic (5.4.0.70.73) ...
Setting up linux-generic (5.4.0.70.73) ...
So I guess it just… didn’t do anything before? Do I have v9 now? What version was i pulling before? I’m restarting to be sure, but considering sometimes it doesn’t build properly I may have to do a build from scratch. Luckily for me, I did not have to build it from scratch, and after a reboot the DRBD-specific errors are gone from the init container. The other errors still show up though:
"unsupported_providers": {
"SPDK": [
"IO exception occured when running 'rpc.py get_spdk_version': Cannot run program \"rpc.py\": error=2, No such file or directory"
],
"ZFS_THIN": [
"'cat /sys/module/zfs/version' returned with exit code 1"],"ZFS":["'cat /sys/module/zfs/version' returned with exit code 1"
]
}
I guess I’ll fix these by just installing spdk and zfs. ZFS is easy to fix (apt install zfsutils
) [SPDK][spdk] is very bleeding edge tech… I’m not sure I want to try and install that just yet… Also the error it’s giving me has almost nothing to do with spdk itself but some script called rpc.py
that tries to get the SPDK version? yikes. As you might expect after installing ZFS I’m left with only the SPDK error stopping the init container from making progress:
"unsupported_providers": {
"SPDK": [
"IO exception occured when running 'rpc.py get_spdk_version': Cannot run program \"rpc.py\": error=2, No such file or directory"
]
}
I could just modify the script to skip this check (having any unsupported provider seems to hmake the curl
fail, but I think I’ll install SPDK – I do have NVMe drives and there might be some cool features unlocked by having SPDK available Jumping down the rabbit hole I ended up installing spdk
(skipped out on dpdk
there were some issues similar to this persons’s). I won’t get into it here but you can find out all about it in the ansible
code if you’re really interested. Of course, no good deed goes unpunished:
"unsupported_providers": {
"SPDK": [
"'rpc.py get_spdk_version' returned with exit code 1"
]
}
If I exec
into the container and try to run it myself I get the following:
[Default|linstor] mrman 20:57:09 [linstor] $ k exec -it ds/satellite -- /bin/bash
root@all-in-one-01:/# rpc.py get_spdk_version
Traceback (most recent call last):
File "/usr/local/sbin/rpc.py", line 3, in <module>
from rpc.client import print_dict, print_json, JSONRPCException
File "/usr/local/sbin/rpc.py", line 3, in <module>
from rpc.client import print_dict, print_json, JSONRPCException
ModuleNotFoundError: No module named 'rpc.client'; 'rpc' is not a package
Welp, I’m doing trying to get this working now, and I’ve gained a little more hate for interpreted languages. I bet this stupid rpc script doesn’t even do much, but it was too much for linstor
to re-implement in java so now they’re trying to call python from a Java application. I’m just going to ignore the check in the storage pool creation init container:
- name: add-storage-pools
image: ghcr.io/kvaps/linstor-satellite:v1.11.1-1
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -exc
# TODO: Maybe get this JSON by mounting a ConfigMap? Automatic discovery would be much better
# TODO: clustered deployments will have a problem with this.
# First drive
- |
curl -s -f http://${CONTROLLER_HOST}.${CONTROLLER_NAMESPACE}:${CONTROLLER_PORT}/v1/nodes/${NODE_NAME} || echo "ERROR: failed to retrieve node [${NODE_NAME}] from controller... proceeding anyway..." && true
curl -s -f \
-H "Content-Type: application/json" \
-d "{\"storage_pool_name\":\"linstor-${NODE_NAME}-0\",\"provider_kind\":\"LVM_THIN\",\"props\":{\"StorDriver/LvmVg\":\"vg_linstor_0\",\"StorDriver/ThinPool\":\"lv_thin_linstor_0\"}}" \
http://${CONTROLLER_HOST}.${CONTROLLER_NAMESPACE}:${CONTROLLER_PORT}/v1/nodes/${NODE_NAME}/storage-pools || true
Not ideal, but working:
$ k get pods --watch
NAME READY STATUS RESTARTS AGE
controller-869dcf7955-6xtgb 1/1 Running 6 3h39m
csi-node-f4hsc 0/3 PodInitializing 0 3s
satellite-h7d7p 1/1 Running 0 18m
csi-node-f4hsc 3/3 Running 0 3s
And what does the cluster think?
$ k exec -it deploy/controller -- /bin/bash
root@controller-869dcf7955-6xtgb:/# linstor storage-pool list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ all-in-one-01 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ linstor-all-in-one-01-0 ┊ all-in-one-01 ┊ LVM_THIN ┊ vg_linstor_0/lv_thin_linstor_0 ┊ 0 KiB ┊ 0 KiB ┊ True ┊ Error ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ERROR:
Description:
Node: 'all-in-one-01', storage pool: 'linstor-all-in-one-01-0' - Failed to query free space from storage pool
Cause:
Unable to parse free thin sizes
Well, we almost did it! The information looks almost right but storage pool free space query is failing… I wonder why. Well at least someone else has run into this before, on piraeus-server
so I’m not the only one. Also, it’s becoming clearer and clearer that the lack of self-healing is a problem – the order in which these thing start is really important, such that if the csi-node
starts before the satellite
, it does not register the storage pool, and I essentially have restart the rollout. I might as well fix this otherwise things will be inconsistent so I took some time and added some initContainer
s that wait for various bits to start.
It looks like the LVM issues were solved by a restart of the node (which probably will be OK with a hard reset) and I had to add some code to make sure things started up in the right order – for example in satellite
I now wait for controlller
with an init container:
initContainers:
## Wait for controller to be ready -- it must be before satellite can register with it
- name: wait-for-controller
image: bitnami/kubectl
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -exc
- |
n=0
until [ $n -ge 30 ]; do
REPLICA_COUNT=$(kubectl get deploy/${CONTROLLER_DEPLOYMENT_NAME} -n ${CONTROLLER_NAMESPACE} -o template --template='{{ .status.availableReplicas }}')
if [ "${REPLICA_COUNT}" -gt "0" ] ; then
echo "[info] found ${REPLICA_COUNT} available replicas."
break
fi
echo -n "[info] waiting 10 seconds before trying again..."
sleep 10
done
env:
- name: CONTROLLER_DEPLOYMENT_NAME
value: "controller"
- name: CONTROLLER_NAMESPACE
value: linstor
So LINSTOR’s been a much bigger pain than I expected, but at least it’s now reporting the right status:
ult|linstor] mrman 23:34:30 [linstor] $ k exec -it deploy/controller -- linstor storage-pool list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ all-in-one-01 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ linstor-all-in-one-01-0 ┊ all-in-one-01 ┊ LVM_THIN ┊ vg_linstor_0/lv_thin_linstor_0 ┊ 395.74 GiB ┊ 395.74 GiB ┊ True ┊ Ok ┊
┊ linstor-all-in-one-01-1 ┊ all-in-one-01 ┊ LVM_THIN ┊ vg_linstor_1/lv_thin_linstor_1 ┊ 476.70 GiB ┊ 476.70 GiB ┊ True ┊ Ok ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
So after this I ran into tons of issues (I even filed an issue on piraeusdatastore/piraeus-operator
since evidently that’s where all the linstor server issues should go) and solved them one by one (see the GitLab repo, unfortunately a bunch of this work happened on the add-tests
branch), but ultimately the last issue came down to a misconfigured port on the csi-controller
:
modified kubernetes/linstor/csi-controller.deployment.yaml
@@ -137,7 +137,7 @@ spec:
port: 9808
env:
- name: LS_CONTROLLERS
- value: "http://controller:3366"
+ value: "http://controller:3370"
volumeMounts:
- name: socket-dir
mountPath: /var/lib/csi/sockets/pluginproxy/
The controller listens for Rest API access on port 3370, not 3366.
Along with that I also ran into some placement issues (LINSTOR expects more than one node, which is reasonable) – with that, the PVCs were working again. Turns out to force LINSTOR to put replicas on the same node you need to use replicasOnSame
, and set some AUX properties:
$ k exec -it deploy/controller -- linstor node set-property all-in-one-01 --aux node all-in-one-01
SUCCESS:
Successfully set property key(s): Aux/node
WARNING:
The property 'Aux/node' has no effect since the node 'all-in-one-01' does not support DRBD 9
SUCCESS:
Description:
Node 'all-in-one-01' modified.
Details:
Node 'all-in-one-01' UUID is: 44ff6a28-5499-4457-a5b0-87e6b8e55899
SUCCESS:
(all-in-one-01) Node changes applied.
Once you’ve done that, you have to restrict the StorageClass
to target the node. Oh one more thing… I did have to install DRBD9 manually (huge thanks tot he guide @ nethence.com).
Even more painful, painful experimentation later, turns out you need to make Resource Groups for this stuff to work:
$ k exec -it deploy/controller -- linstor resource-group create --storage-pool=linstor-all-in-one-01-0,linstor-all-in-one-01-1 --place-count=1 jbod
SUCCESS:
Description:
New resource group 'jbod' created.
Details:
Resource group 'jbod' UUID is: c56afd58-479d-4e26-945d-fdf5b1f27f2a
This lead to success though, finally. It’s been like ~1-2 solid days to get to this point:
$ k exec -it deploy/controller -- linstor resource-group list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceGroup ┊ SelectFilter ┊ VlmNrs ┊ Description ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltRscGrp ┊ PlaceCount: 2 ┊ ┊ ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ jbod ┊ PlaceCount: 1 ┊ ┊ ┊
┊ ┊ StoragePool(s): linstor-all-in-one-01-1 ┊ ┊ ┊
┊ ┊ LayerStack: ['DRBD', 'STORAGE'] ┊ ┊ ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ raid1 ┊ PlaceCount: 2 ┊ ┊ ┊
┊ ┊ StoragePool(s): linstor-all-in-one-01-0,linstor-all-in-one-01-1 ┊ ┊ ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Unfortunately, it looks like raid1 just doesn’t work across disks on the same node, despite how good this looks – I spent hours (honestly the better part of a day) trying to get LINSTOR to believe that replicas could be placed across disks and it just didn’t work. I tried just about every permutation of placementPolicy
, placementCount
, nodeList
, clientList
that I could think of and it just didn’t work. So at least for the single node I’m just not going to att
empt to test RAID1 on with LINSTOR. Since single-replica (single placeCount
) was working just fine, in production it’s probably a better idea to just use LVM to RAID1 the disks underneath and let LINSTOR provision from that.
This is definitely more work than I expected to do for LINSTOR, and I’m not comfortable with it because Ceph is the much more industry chosen and trusted solution – at this point I think I’ve spent more time (with less intermediate progress) figuring out how LINSTOR is supposed to work on k8s than Rook/Ceph. Anyway, hopefully this update helps anyone who was wondering why the LINSTOR stuff didn’t quite work.