tl;dr - outline of some approaches I’ve taken to storage on my small k8s cluster, why I can’t just use Rook (which is/was primarily Ceph underneath), and setup & evaluation of OpenEBS. OpenEBS is working great and is wonderfuly simple – I’m probably going to be using it for everything from now on.
Discovering Rook (and resultingly Ceph, which was Rook’s first underlying system) was a huge moment for me in figuring out how to do interesting things with Kubernetes. My “cluster” is super small (only the one node!), but I always wanted to get away from the hackiness of hostsPath
volumes, and use something that was a little more dynamic.
Using Rook with Ceph underneath meant that you needed to hand over an entire disk to Ceph to manage. I re-read both Ceph and Rook documentation countless times because both of them seem to suggest you can just run “in a folder”, but I’m convinced that what they mean is in a folder on a dedicated disk. There’s a fundamental problem with being able to constraint folder sizes in linux to consider as well. Either way, giving an entire disk to Ceph to manage is doable – you can just let ansible do the heavy lifting of wiping the second drive and reformatting it. In my case things are a little more difficult because Hetzner’s dedicated servers come with software RAID1’d disks (RAID1 = multiple copies of the same data). This meant I had to spend some time learning about software RAID on linux in general and learning to dissassemble it.
Relatively recently this set up bit me (I’ll get into the how/why later) and I reverted to the indignity that is managed hostPath
volumes again (it’s not that bad, I just make sure to keep all data in a central location like /var/data
and folder by project). Recovering the data is also really easy if you follow this path because you can just SSH on to the machine (and if it doesn’t boot go to recovery mode), and rsync
the whole folder out (don’t forget rsync’s compression options!).
You might be wondering why I’d go for a solution like Rook/Ceph when I could just use any of the other awesome volume types that Kubernetes supports. Well there are a few reasons:
local
volumes are awesome but don’t support dynamic provisioning as of nowAnd of course, tools like Rook/Ceph are what I want to get used to going forward because they do things like handle replication of data under the covers for you (so essentially RAIDx), and I’m preparing for the day where I enter the normal case of 99% of k8s operators who run more than one node and data starts flying everywhere. I have a fetish for good yet general solutions, so I’d rather run Rook/Ceph on one node then figure out how to run it on more than commit to a one-node solution that isn’t really managable in a multi-node environment (given that I’ll be multi-node sooner rather than later).
I only recently of OpenEBS through a random comment by u/dirtypete1981 on reddit, and I had no idea OpenEBS existed, and the idea of Container Attached Storage is interesting, although “CAS” is a terribly overloaded term already in computer science. After reading up on the concepts (or watching the FOSDEM 2018 talk by Jeffrey Molanus), I was interested in trying it out – while I don’t know that a case can be made for the CAS approach being faster than traditional approaches, the flexbility is self-apparent.
The CAS approach is kind of like Ceph turned inside out – the OSD/Mons and other internal stuff are exposed as part of your infrastructure instead of behind the Ceph curtain. Could OpenEBS be my solution to small-scale but general storage-for-my-workloads problems? (Spoilers: the answer is yes, which is why this blog post exists).
tl;dr - After prepping for a kubernetes 1.12 upgrade, I updated the OS as well and grub2
carelessly.
I don’t know why I keep doing this to myself (it’s not the first time), but in the middle of getting ready to migrate to Kubernetes 1.12, I did an apt-get update && apt-get upgrade
. Nothing better than adding one big upgrade while you perform another. While apt
was doing it’s thing I noticed that grub2-install
was trying to configure itself and asking me for input. I picked some settings that I thought were correct (and that the config tool LGTM’d), but it turns out that grub/grub2 basically doesn’t support proper installation for LVM/Software RAID. That, or I’m just not smart enough to get it to work and I need more gray hairs on my beard, either way my setup was borked – say goodbye to that sweet Kubernetes 1.12 upgrade.
Cue hours of downtime, I spend lots of time running around the internet frantically searching terms like “grub”, “raid1”, “mdadm”, “grub-install”, and trying to figure out how I could get grub to realize where it should be booting from.
I will spoil it for you now though: in the end I had to get my data and rebuild the server completely. The silver lining is that my ansible infrastructure code (you can find an unmaintained snapshot of it on gitlab) was able to get from fresh 18.04 install to k8s 1.12 (might as well do the upgrade if I’m remaking the cluster) very quickly with no manual intervention. I did choose to remove the code that disassembled software RAID (you can approximate it by scaling your raid cluster down to 1 disk, then doing stuff with the second one without going into hetzner rescue mode) – I’m going to leave the drives RAID1’d on hetzner boxes from now on.
Well let’s pretend that it was going to work – here are some helpful resources I found along the way:
The first link helped me mount disks properly in Hetzner’s rescue mode after observing the machine wouldn’t boot. After requesting a live connection to my machine I saw that it was stuck at the Hetzner PXE boot screen (I really wish Hetzner let you bring your own PXE boot setup), tries the local disk but never succeeds. My first instinct though was of course to try and get my data off this possibly borked server.
After searching and trying many things, checking the drives for errors, wiping and re-building the raid partitions and messing with grub configuration, I gave up. As I said before, in the end I didn’t win this particular battle, but the silver lining is that I got to test my infrastructure code and it didn’t rot very much at all.
Now that I’m not trying to undo the RAID so I can give a full disk to Ceph any more I started to wonder what my other options were. While I was knee deep in incomplete help threads and beginner-level instructions, I realized that another way I could have solved this problem was to create a loop
-based virtual disk and give that to Ceph. So now here are the options I know of:
hostPath
/local
volumesloop
-based virtual diskBy the title/flow of this article you problably know which one I’m going to investigate, but I do want to note that the virtual disk solution actually seems really promising for dynamic provisioning on a per-node level, because it seems nestable. Instead of trying to deal with size-constraining folders on disk, why not make one “big” virtual disk (let’s say 500GB), mount that, then partition it into smaller virtual disks? I get the feeling I could hack together an operator to do this and provide the ever-elusive “dynamic hostpath/local volumes” very very easily. Eventually I’ll find time to explore that idea, but that time isn’t this time.
Before choosing to go with OpenEBS I took a step back to evaluate why I want to solve this problem at all. At the end of the day I want:
hostPath
s)One of the first things I looked at was OpenEBS’s list of features, and they’re pretty great. Some highlights:
Snapshots are a huge differentiator if true. Rook is still working on it according to their roadmap (scheduled for v0.9, which is). I was also really impressed by the architecture docs, they’re pretty concise and informative. Reading through the docs it’s looking like OpenEBS is going to offer me a way to have dynamically allocated drives & PV/PVCs without the static provisioning that you’d need for local
volumes.
NOTE I just realised that Rook 0.9 is out now, so they should have snapshots.
Obviously, there’s a lot of tech to read up on here if you’re new to the space. At this point basically I know enough about Kubernetes + Rook + Ceph to be dangerous after reading documentation and setting up/fiddling. Here’s a loose list of things you may want to read up on/know about:
Skimming these resources is obviously enough – it would take months/years to be an actual expert nevermind the actual in-the-trenches experience. Importantly, we need to keep the user-level goal in mind, which I can try and encapsulate with this statement:
When I start a pod, if there is space either in a local disk or some network attached storage I’ve purchased, I want a PVC to be automatically created for it, and I want to automatically have data replicated as the pod makes use of the filesystem
The idea is simple of course, but the devil is in the details, and there are deals (tradeoffs) to be made all over. Storage systems can be good for some usecases but bad for others – so one system might be great for storing & replicating pictures, but bad for storing and replicating writes to a Write Ahead Log like Postgres (or your favorite database) might perform. I didn’t choose GlusterFS when I was first looking into distributed storage mostly because of reports (that never seemed to get rebutted) that it was less than ideal for running databases on. What I’m looking for is a solution with a decent general-case usage I’m not Google, or a tech giant, I don’t run applications that are causing writes thousands of times a second, but I do want to enable easy operations.
OK enough exposition let’s get to installing OpenEBS.
The OpenEBS documentation has a section on installing OpenEBS as you’d expect which we’re going to follow. We’ll use the default Jiva store, which seems to work with a local folder on the machine by default. I’m basing this understanding off the following quote:
OpenEBS can be used to create Storage Pool on a host disk or an externally mounted disk. This Storage Pool can be used to create Jiva volume which can be utilized to run applications. By default, Jiva volume will be deployed on host path. If you are using an external disk, see storage pool for more details about creating a storage pool with an external disk.
Hopefully they don’t mean this in the same way Rook/Ceph did, and I can just give OpenEBS a folder on-disk (which again is actually 2 disks software-RAIDed together) and OpenEBS will manage sizing and dynamic provisioning of data and expose it via iscsi. Which brings me to one of the hard requirements of OpenEBS – you need open-iscsi
installed, as noted in the prerequsities:
root@Ubuntu-1810-cosmic-64-minimal ~ # sudo apt-get install open-iscsi
As for the kubernetes-parts, they recommend that you install with Helm or by running kubectl
on a monolithic YAML file like this:
kubectl apply -f https://openebs.github.io/charts/openebs-operator-0.8.0.yaml
As usual, I don’t ever do that, but instead pull down and split up the monolithic YAML file and get an idea of what’s running. Here’s what it looks like for me (I use the makeinfra pattern:
infra/kubernetes/cluster/storage/openebs/openebs.ns.yaml
:
---
apiVersion: v1
kind: Namespace
metadata:
name: openebs
infra/kubernetes/cluster/storage/openebs/openebs.serviceaccount.yaml
:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: openebs-maya-operator
namespace: openebs
infra/kubernetes/cluster/storage/openebs/openebs.configmap.yaml
:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: openebs-ndm-config
namespace: openebs
data:
# udev-probe is default or primary probe which should be enabled to run ndm
# filterconfigs contails configs of filters - in their form fo include
# and exclude comma separated strings
node-disk-manager.config: |
probeconfigs:
- key: udev-probe
name: udev probe
state: true
- key: smart-probe
name: smart probe
state: true
filterconfigs:
- key: os-disk-exclude-filter
name: os disk exclude filter
state: true
exclude: "/,/etc/hosts,/boot"
- key: vendor-filter
name: vendor filter
state: true
include: ""
exclude: "CLOUDBYT,OpenEBS"
- key: path-filter
name: path filter
state: true
include: ""
exclude: "loop,/dev/fd0,/dev/sr0,/dev/ram,/dev/dm-,/dev/md"
infra/kubernetes/cluster/storage/openebs/openebs.rbac.yaml
:
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: openebs-maya-operator
rules:
- apiGroups: ["*"]
resources: ["nodes", "nodes/proxy"]
verbs: ["*"]
- apiGroups: ["*"]
resources: ["namespaces", "services", "pods", "deployments", "events", "endpoints", "configmaps", "jobs"]
verbs: ["*"]
- apiGroups: ["*"]
resources: ["storageclasses", "persistentvolumeclaims", "persistentvolumes"]
verbs: ["*"]
- apiGroups: ["volumesnapshot.external-storage.k8s.io"]
resources: ["volumesnapshots", "volumesnapshotdatas"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apiextensions.k8s.io"]
resources: ["customresourcedefinitions"]
verbs: [ "get", "list", "create", "update", "delete"]
- apiGroups: ["*"]
resources: [ "disks"]
verbs: ["*" ]
- apiGroups: ["*"]
resources: [ "storagepoolclaims", "storagepools"]
verbs: ["*" ]
- apiGroups: ["*"]
resources: [ "castemplates", "runtasks"]
verbs: ["*" ]
- apiGroups: ["*"]
resources: [ "cstorpools", "cstorvolumereplicas", "cstorvolumes"]
verbs: ["*" ]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: openebs-maya-operator
namespace: openebs
subjects:
- kind: ServiceAccount
name: openebs-maya-operator
namespace: openebs
- kind: User
name: system:serviceaccount:default:default
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: openebs-maya-operator
apiGroup: rbac.authorization.k8s.io
infra/kubernetes/cluster/storage/openebs/openebs-api-server.deployment.yaml
:
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: maya-apiserver
namespace: openebs
spec:
replicas: 1
template:
metadata:
labels:
name: maya-apiserver
spec:
serviceAccountName: openebs-maya-operator
containers:
- name: maya-apiserver
imagePullPolicy: IfNotPresent
image: quay.io/openebs/m-apiserver:0.8.0
ports:
- containerPort: 5656
env:
# OPENEBS_IO_KUBE_CONFIG enables maya api service to connect to K8s
# based on this config. This is ignored if empty.
# This is supported for maya api server version 0.5.2 onwards
#- name: OPENEBS_IO_KUBE_CONFIG
# value: "/home/ubuntu/.kube/config"
# OPENEBS_IO_K8S_MASTER enables maya api service to connect to K8s
# based on this address. This is ignored if empty.
# This is supported for maya api server version 0.5.2 onwards
#- name: OPENEBS_IO_K8S_MASTER
# value: "http://172.28.128.3:8080"
# OPENEBS_IO_INSTALL_DEFAULT_CSTOR_SPARSE_POOL decides whether default cstor sparse pool should be
# configured as a part of openebs installation.
# If "true" a default cstor sparse pool will be configured, if "false" it will not be configured.
- name: OPENEBS_IO_INSTALL_DEFAULT_CSTOR_SPARSE_POOL
value: "true"
# OPENEBS_NAMESPACE provides the namespace of this deployment as an
# environment variable
- name: OPENEBS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# OPENEBS_SERVICE_ACCOUNT provides the service account of this pod as
# environment variable
- name: OPENEBS_SERVICE_ACCOUNT
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
# OPENEBS_MAYA_POD_NAME provides the name of this pod as
# environment variable
- name: OPENEBS_MAYA_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: OPENEBS_IO_JIVA_CONTROLLER_IMAGE
value: "quay.io/openebs/jiva:0.8.0"
- name: OPENEBS_IO_JIVA_REPLICA_IMAGE
value: "quay.io/openebs/jiva:0.8.0"
- name: OPENEBS_IO_JIVA_REPLICA_COUNT
value: "3"
- name: OPENEBS_IO_CSTOR_TARGET_IMAGE
value: "quay.io/openebs/cstor-istgt:0.8.0"
- name: OPENEBS_IO_CSTOR_POOL_IMAGE
value: "quay.io/openebs/cstor-pool:0.8.0"
- name: OPENEBS_IO_CSTOR_POOL_MGMT_IMAGE
value: "quay.io/openebs/cstor-pool-mgmt:0.8.0"
- name: OPENEBS_IO_CSTOR_VOLUME_MGMT_IMAGE
value: "quay.io/openebs/cstor-volume-mgmt:0.8.0"
- name: OPENEBS_IO_VOLUME_MONITOR_IMAGE
value: "quay.io/openebs/m-exporter:0.8.0"
# OPENEBS_IO_ENABLE_ANALYTICS if set to true sends anonymous usage
# events to Google Analytics
- name: OPENEBS_IO_ENABLE_ANALYTICS
value: "false"
# OPENEBS_IO_ANALYTICS_PING_INTERVAL can be used to specify the duration (in hours)
# for periodic ping events sent to Google Analytics. Default is 24 hours.
#- name: OPENEBS_IO_ANALYTICS_PING_INTERVAL
# value: "24h"
livenessProbe:
exec:
command:
- /usr/local/bin/mayactl
- version
initialDelaySeconds: 30
periodSeconds: 60
readinessProbe:
exec:
command:
- /usr/local/bin/mayactl
- version
initialDelaySeconds: 30
periodSeconds: 60
infra/kubernetes/cluster/storage/openebs/openebs-provisioner.deployment.yaml
:
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: openebs-provisioner
namespace: openebs
spec:
replicas: 1
template:
metadata:
labels:
name: openebs-provisioner
spec:
serviceAccountName: openebs-maya-operator
containers:
- name: openebs-provisioner
imagePullPolicy: IfNotPresent
image: quay.io/openebs/openebs-k8s-provisioner:0.8.0
env:
# OPENEBS_IO_K8S_MASTER enables openebs provisioner to connect to K8s
# based on this address. This is ignored if empty.
# This is supported for openebs provisioner version 0.5.2 onwards
#- name: OPENEBS_IO_K8S_MASTER
# value: "http://10.128.0.12:8080"
# OPENEBS_IO_KUBE_CONFIG enables openebs provisioner to connect to K8s
# based on this config. This is ignored if empty.
# This is supported for openebs provisioner version 0.5.2 onwards
#- name: OPENEBS_IO_KUBE_CONFIG
# value: "/home/ubuntu/.kube/config"
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: OPENEBS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# OPENEBS_MAYA_SERVICE_NAME provides the maya-apiserver K8s service name,
# that provisioner should forward the volume create/delete requests.
# If not present, "maya-apiserver-service" will be used for lookup.
# This is supported for openebs provisioner version 0.5.3-RC1 onwards
#- name: OPENEBS_MAYA_SERVICE_NAME
# value: "maya-apiserver-apiservice"
livenessProbe:
exec:
command:
- pgrep
- ".*openebs"
initialDelaySeconds: 30
periodSeconds: 60
infra/kubernetes/cluster/storage/openebs/openebs-snapshot-operator.deployment.yaml
:
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: openebs-snapshot-operator
namespace: openebs
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
name: openebs-snapshot-operator
spec:
serviceAccountName: openebs-maya-operator
containers:
- name: snapshot-controller
image: quay.io/openebs/snapshot-controller:0.8.0
imagePullPolicy: IfNotPresent
env:
- name: OPENEBS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
livenessProbe:
exec:
command:
- pgrep
- ".*controller"
initialDelaySeconds: 30
periodSeconds: 60
# OPENEBS_MAYA_SERVICE_NAME provides the maya-apiserver K8s service name,
# that snapshot controller should forward the snapshot create/delete requests.
# If not present, "maya-apiserver-service" will be used for lookup.
# This is supported for openebs provisioner version 0.5.3-RC1 onwards
#- name: OPENEBS_MAYA_SERVICE_NAME
# value: "maya-apiserver-apiservice"
- name: snapshot-provisioner
image: quay.io/openebs/snapshot-provisioner:0.8.0
imagePullPolicy: IfNotPresent
env:
- name: OPENEBS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# OPENEBS_MAYA_SERVICE_NAME provides the maya-apiserver K8s service name,
# that snapshot provisioner should forward the clone create/delete requests.
# If not present, "maya-apiserver-service" will be used for lookup.
# This is supported for openebs provisioner version 0.5.3-RC1 onwards
#- name: OPENEBS_MAYA_SERVICE_NAME
# value: "maya-apiserver-apiservice"
livenessProbe:
exec:
command:
- pgrep
- ".*provisioner"
initialDelaySeconds: 30
periodSeconds: 60
infra/kubernetes/cluster/storage/openebs/openebs-disk-manager.ds.yaml
:
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: openebs-ndm
namespace: openebs
spec:
template:
metadata:
labels:
name: openebs-ndm
spec:
# By default the node-disk-manager will be run on all kubernetes nodes
# If you would like to limit this to only some nodes, say the nodes
# that have storage attached, you could label those node and use
# nodeSelector.
#
# e.g. label the storage nodes with - "openebs.io/nodegroup"="storage-node"
# kubectl label node <node-name> "openebs.io/nodegroup"="storage-node"
#nodeSelector:
# "openebs.io/nodegroup": "storage-node"
serviceAccountName: openebs-maya-operator
hostNetwork: true
containers:
- name: node-disk-manager
command:
- /usr/sbin/ndm
- start
image: quay.io/openebs/node-disk-manager-amd64:v0.2.0
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
volumeMounts:
- name: config
mountPath: /host/node-disk-manager.config
subPath: node-disk-manager.config
readOnly: true
- name: udev
mountPath: /run/udev
- name: procmount
mountPath: /host/mounts
- name: sparsepath
mountPath: /var/openebs/sparse
env:
# pass hostname as env variable using downward API to the NDM container
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# specify the directory where the sparse files need to be created.
# if not specified, then sparse files will not be created.
- name: SPARSE_FILE_DIR
value: "/var/openebs/sparse"
# Size(bytes) of the sparse file to be created.
- name: SPARSE_FILE_SIZE
value: "10737418240"
# Specify the number of sparse files to be created
- name: SPARSE_FILE_COUNT
value: "1"
livenessProbe:
exec:
command:
- pgrep
- ".*ndm"
initialDelaySeconds: 30
periodSeconds: 60
volumes:
- name: config
configMap:
name: openebs-ndm-config
- name: udev
hostPath:
path: /run/udev
type: Directory
# mount /proc/1/mounts (mount file of process 1 of host) inside container
# to read which partition is mounted on / path
- name: procmount
hostPath:
path: /proc/1/mounts
- name: sparsepath
hostPath:
path: /var/openebs/sparse
infra/kubernetes/cluster/storage/openebs/openebs.svc.yaml
:
---
apiVersion: v1
kind: Service
metadata:
name: maya-apiserver-service
namespace: openebs
spec:
ports:
- name: api
port: 5656
protocol: TCP
targetPort: 5656
selector:
name: maya-apiserver
sessionAffinity: None
And a very basic Makefile
to tie it all together:
infra/kubernetes/cluster/storage/openebs/Makefile
:
.PHONY: install uninstall
KUBECTL := kubectl
install: namespace serviceaccount rbac configmap api-server provisioner snapshot-operator node-disk-manager svc
namespace:
$(KUBECTL) apply -f openebs.ns.yaml
serviceaccount:
$(KUBECTL) apply -f openebs.serviceaccount.yaml
configmap:
$(KUBECTL) apply -f openebs.configmap.yaml
rbac:
$(KUBECTL) apply -f openebs.rbac.yaml
svc:
$(KUBECTL) apply -f openebs.svc.yaml
api-server:
$(KUBECTL) apply -f openebs-api-server.deployment.yaml
provisioner:
$(KUBECTL) apply -f openebs-provisioner.deployment.yaml
snapshot-operator:
$(KUBECTL) apply -f openebs-snapshot-operator.deployment.yaml
node-disk-manager:
$(KUBECTL) apply -f openebs-disk-manager.ds.yaml
uninstall:
$(KUBECTL) delete -f openebs.svc.yaml
$(KUBECTL) delete -f openebs-disk-manager.ds.yaml
$(KUBECTL) delete -f openebs-snapshot-operator.deployment.yaml
$(KUBECTL) delete -f openebs-provisioner.deployment.yaml
$(KUBECTL) delete -f openebs-api-server.deployment.yaml
$(KUBECTL) delete -f openebs.configmap.yaml
$(KUBECTL) delete -f openebs.rbac.yaml
$(KUBECTL) delete -f openebs.serviceaccount.yaml
$(KUBECTL) delete -f openebs.namespace.yaml
OK, now that it’s installed let’s check if everything looks good:
$ make
kubectl apply -f openebs.ns.yaml
namespace/openebs created
kubectl apply -f openebs.serviceaccount.yaml
serviceaccount/openebs-maya-operator created
kubectl apply -f openebs.rbac.yaml
clusterrole.rbac.authorization.k8s.io/openebs-maya-operator created
clusterrolebinding.rbac.authorization.k8s.io/openebs-maya-operator created
kubectl apply -f openebs.configmap.yaml
configmap/openebs-ndm-config created
kubectl apply -f openebs-api-server.deployment.yaml
deployment.apps/maya-apiserver created
kubectl apply -f openebs-provisioner.deployment.yaml
deployment.apps/openebs-provisioner created
kubectl apply -f openebs-snapshot-operator.deployment.yaml
deployment.apps/openebs-snapshot-operator created
kubectl apply -f openebs-disk-manager.ds.yaml
daemonset.extensions/openebs-ndm created
kubectl apply -f openebs.svc.yaml
service/maya-apiserver-service created
$ # ... wait some time ...
$ kubectl get all -n openebs
NAME READY STATUS RESTARTS AGE
pod/cstor-sparse-pool-o9mk-7b585d7b8d-bgc4q 2/2 Running 0 2m33s
pod/maya-apiserver-78c59c89c-5h674 1/1 Running 0 3m16s
pod/openebs-ndm-29d9g 1/1 Running 0 2m50s
pod/openebs-provisioner-77dd68645b-tv98t 1/1 Running 5 3m14s
pod/openebs-snapshot-operator-85dd4d7c94-hbbd8 2/2 Running 0 3m12s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/maya-apiserver-service ClusterIP 10.110.168.61 <none> 5656/TCP 34s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/openebs-ndm 1 1 1 1 1 <none> 2m51s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cstor-sparse-pool-o9mk 1/1 1 1 2m34s
deployment.apps/maya-apiserver 1/1 1 1 3m18s
deployment.apps/openebs-provisioner 1/1 1 1 3m15s
deployment.apps/openebs-snapshot-operator 1/1 1 1 3m13s
NAME DESIRED CURRENT READY AGE
replicaset.apps/cstor-sparse-pool-o9mk-7b585d7b8d 1 1 1 2m34s
replicaset.apps/maya-apiserver-78c59c89c 1 1 1 3m17s
replicaset.apps/openebs-provisioner-77dd68645b 1 1 1 3m15s
replicaset.apps/openebs-snapshot-operator-85dd4d7c94 1 1 1 3m13s
Well that certainly looks good to me – no errors, and the node management daemon set is running without issue. Let’s try and test it out.
Now that we have the system theoretically in a working state, let’s make a Pod
with a PersistentVolumeClaim
to validate. I want to note here that StatefulSet
s and PersistentVolumeClaim
s are separate concepts. I often see people mention them as if the only way to use a PersistentVolumeClaim
is to have a StatefulSet
– but this has more to do with the how the other options work (i.e. a Deployment
) – it’s perfectly possible to have a single Deployment
use a PVC, but you can’t have more than one, because the second instance/replica would try to mount the same PV. StatefulSet
s offer more things like consistent/different startup semantics and namings, and that’s what makes them well suited for less flexible stateful workloads.
The Makefile
is a little disingenuous becuase of how the operator works, a bunch of Custom Resource Definitions (CRDS) also got installed as well as things like StorageClass
ess. Since we’ll need to know the pool to be able to make our PersistentVolumeClaim
, let’s list them:
$ kubectl get sc
NAME PROVISIONER AGE
openebs-cstor-sparse openebs.io/provisioner-iscsi 124m
openebs-jiva-default openebs.io/provisioner-iscsi 125m
openebs-snapshot-promoter volumesnapshot.external-storage.k8s.io/snapshot-promoter 124m
Let’s use the openebs-jiva-default
– now we can write our resource definitions for our PersistentVolumeClaim
and Pod
:
openebs-test.allinone.yaml
:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-test-data
namespace: default
labels:
app: pvc-test
spec:
storageClassName: openebs-jiva-default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: Pod
metadata:
name: pvc-test
namespace: default
labels:
app: pvc-test
spec:
containers:
- name: pvc-test
image: alpine
command: ["ash", "-c", "while true; do sleep 60s; done"]
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 0.25
memory: "256Mi"
limits:
cpu: 0.50
memory: "512Mi"
volumeMounts:
- mountPath: /var/data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: pvc-test-data
Shortly after kubectl apply -f
ing that file:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84 2/2 Running 0 25s
pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-2fr6d 0/1 Pending 0 25s
pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-j7swl 1/1 Running 0 25s
pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-sndh2 0/1 Pending 0 25s
pvc-test 0/1 ContainerCreating 0 12s
OK so here we see the CAS concept taking off – there are a bunch of Pod
s being started that manage the data being shuffled around – if you look closely you can see the -ctrl-
and -rep-
in the pod names. I assume there are 3 data taking nodes + 1 manager here for the one PVC. I did nothing to tell OpenEBS I only have one node, so it’s running in the usual HA pattern.
After waiting a bit for some of the Pending
containers to come out of pending and the pvc-test
pod to get created I realized there was something wrong. A quick kubectl describe pod pvc-test
reveals the problem:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m3s default-scheduler Successfully assigned default/pvc-test to ubuntu-1810-cosmic-64-minimal
Normal SuccessfulAttachVolume 3m3s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e"
Warning FailedMount 63s (x8 over 2m42s) kubelet, ubuntu-1810-cosmic-64-minimal MountVolume.WaitForAttach failed for volume "pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e" : failed to get any path for iscsi disk, last err seen:
iscsi: failed to sendtargets to portal 10.108.192.121:3260 output: iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: connection login retries (reopen_max) 5 exceeded
iscsiadm: No portals found
, err exit status 21
Warning FailedMount 60s kubelet, ubuntu-1810-cosmic-64-minimal Unable to mount volumes for pod "pvc-test_default(6ead8128-10b4-11e9-9cf0-8c89a517d15e)": timeout expired waiting for volumes to attach or mount for pod "default"/ "pvc-test". list of unmounted volumes= [data]. list of unattached volumes= [data default-token-lsfvf]
Well, this is par for the course, since things very rarely work the first time, let’s get in and solve the issues. Before we go on though, let’s check what those other pods are doing:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m16s (x37 over 8m19s) default-scheduler 0/1 nodes are available: 1 node (s) didn't match pod affinity/anti-affinity, 1 node (s) didn't satisfy existing pods anti-affinity rules.
OK, so the node couldn’t start because it’s anti-affinity requirements couldn’t be met. This actually isn’t a problem but is actually expected behavior – OpenEBS runs 3 replicas for fault tolerance, and I’m going in the face of this since I’m only using one node. I love tools that I can predict/reason/guess about armed with only documentation knowledge – in this case it was just a guess but this is a great sign. Rather than re-configure OpenEBS to make less replicas right now I’m going to just ignore the 2 pending containers, and focus on the connection issues.
Since we’re having connectivity issues, let’s make sure I don’t have any NetworkPolicy
(I use and love kube-router
in my cluster)set that’s preventing the communication:
$ kubectl get networkpolicy
No resources found.
OK, all’s clear on that front, let’s figure out what is behind 10.108.192.121
that my pod is trying to talk to:
$ kubectl get pods -o=wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84 2/2 Running 0 13m 10.244.0.137 ubuntu-1810-cosmic-64-minimal <none> <none>
pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-2fr6d 0/1 Pending 0 13m <none> <none> <none> <none>
pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-j7swl 1/1 Running 0 13m 10.244.0.136 ubuntu-1810-cosmic-64-minimal <none> <none>
pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-rep-6ff7fd654d-sndh2 0/1 Pending 0 13m <none> <none> <none> <none>
pvc-test 0/1 ContainerCreating 0 12m <none> ubuntu-1810-cosmic-64-minimal <none> <none>
The wider output (-o=wide
) lets us know that the running pods (again, the Pending
pods are OK, since we’re in a very not HA situation) – and that the IP we’re trying to connect to isn’t any one of these pods. But if you stop and think about it, of course it isn’t one of these pods – Pod IPs can shift, and if you want a reliable pointer to another pod what you need is a Service
, let’s check the IPs of our services:
$ kubectl get svc -o=wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 26d <none>
pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-svc ClusterIP 10.108.192.121 <none> 3260/TCP,9501/TCP,9500/TCP 15m openebs.io/controller=jiva-controller,openebs.io/persistent-volume=pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e
BINGO! As you might expect with the CAS model, we have a Service
exposing the harddrive interface that our Pod
will use, to make it accessible. Now we need to figure out why our Pod can’t seem to talk to this service. Let’s dig deeper into the service and make sure it has Endpoint
s attached:
$ kubectl describe svc pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-svc
Name: pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-svc
Namespace: default
Labels: openebs.io/cas-template-name=jiva-volume-create-default-0.8.0
openebs.io/cas-type=jiva
openebs.io/controller-service=jiva-controller-svc
openebs.io/persistent-volume=pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e
openebs.io/persistent-volume-claim=pvc-test-data
openebs.io/storage-engine-type=jiva
openebs.io/version=0.8.0
pvc=pvc-test-data
Annotations: <none>
Selector: openebs.io/controller=jiva-controller,openebs.io/persistent-volume=pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e
Type: ClusterIP
IP: 10.108.192.121
Port: iscsi 3260/TCP
TargetPort: 3260/TCP
Endpoints: 10.244.0.137:3260
Port: api 9501/TCP
TargetPort: 9501/TCP
Endpoints: 10.244.0.137:9501
Port: exporter 9500/TCP
TargetPort: 9500/TCP
Endpoints: 10.244.0.137:9500
Session Affinity: None
Events: <none>
All of this looks fine and dandy to me – in particular, there are endpoints for the pods that did start up. Everything looks fine as far as Kubernetes concepts go, so let’s look back at the error message for some hints:
Warning FailedMount 63s (x8 over 2m42s) kubelet, ubuntu-1810-cosmic-64-minimal MountVolume.WaitForAttach failed for volume "pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e" : failed to get any path for iscsi disk, last err seen:
iscsi: failed to sendtargets to portal 10.108.192.121:3260 output: iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: cannot make connection to 10.108.192.121: Connection refused
iscsiadm: connection login retries (reopen_max) 5 exceeded
iscsiadm: No portals found
So it looks like the iscsi
subystem tried to connect to the kubernetes service @ 10.108.192.121:3260
(which goes to Endpoint
for the -ctrl-
pod 10.244.0.137
). Let’s see what’s happening in the Pod with that IP address, we see its Running
but how are things going?
$ kubectl logs pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84
Error from server (BadRequest): a container name must be specified for pod pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84, choose one of: [pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-con maya-volume-exporter]
OK, so I need to pick one of the internal containers, how about pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-con
:
$ kubectl logs pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84 -c pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-con
time="2019-01-05T06:37:57Z" level=info msg="REPLICATION_FACTOR: 3"
time="2019-01-05T06:37:57Z" level=info msg="Starting controller with frontendIP: , and clusterIP: 10.108.192.121"
time="2019-01-05T06:37:57Z" level=info msg="resetting controller"
time="2019-01-05T06:37:57Z" level=info msg="Listening on :9501"
time="2019-01-05T06:38:11Z" level=info msg="List Replicas"
time="2019-01-05T06:38:11Z" level=info msg="List Replicas"
time="2019-01-05T06:38:11Z" level=info msg="Register Replica for address 10.244.0.136"
time="2019-01-05T06:38:11Z" level=info msg="Register Replica, Address: 10.244.0.136 Uptime: 15.399307176s State: closed Type: Backend RevisionCount: 0"
time="2019-01-05T06:38:11Z" level=warning msg="No of yet to be registered replicas are less than 3 , No of registered replicas: 1"
10.244.0.136 - - [05/Jan/2019:06:38:11 +0000] "POST /v1/register HTTP/1.1" 200 0
time="2019-01-05T06:38:16Z" level=info msg="Register Replica for address 10.244.0.136"
time="2019-01-05T06:38:16Z" level=info msg="Register Replica, Address: 10.244.0.136 Uptime: 20.396328606s State: closed Type: Backend RevisionCount: 0"
10.244.0.136 - - [05/Jan/2019:06:38:16 +0000] "POST /v1/register HTTP/1.1" 200 0
time="2019-01-05T06:38:16Z" level=warning msg="No of yet to be registered replicas are less than 3 , No of registered replicas: 1"
time="2019-01-05T06:38:21Z" level=info msg="Register Replica for address 10.244.0.136"
<the last ~3 lines loop forever>
OK, this was actually the hypothesis I was starting to form in my head – in particular the fact that I haven’t told OpenEBS about how many replicas it’d be able to create (which is my fault since I only have one node, and not 3) might be causing some issues. It’s only a warning, but the repeating nature might suggest that registering is not completing because of this mismatch. Since this isn’t quite a smoking gun let’s check the other container’s logs:
$ kubectl logs pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-ctrl-76548fb456-dnw84 -c maya-volume-exporter
I0105 06:38:01.357964 1 command.go:97] Starting maya-exporter ...
I0105 06:38:01.358045 1 logs.go:43] Initialising maya-exporter for the jiva
I0105 06:38:01.358175 1 exporter.go:39] Starting http server....
Well, absolutely no visible prolbems there… so let’s go ahead and reduce the replication factor that OpenEBS is using and see if that fixes things. It took a little digging after re-reading the docs on deploying Jiva, but the StorageClass
we’re using for the PersistentVolumeClaim
is where we can make this change. Let’s make a new one based on the existing default:
$ kubectl get sc openebs-jiva-default -o=yaml > openebs-jiva-non-ha.storageclass.yaml
$ emacs -nw openebs-jiva-non-ha.storageclass.yaml
.... make edits ...
And here’s what I ended up with:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-jiva-non-ha
annotations:
cas.openebs.io/config: |
- name: ReplicaCount
value: "1"
- name: StoragePool
value: default
#- name: TargetResourceLimits
# value: |-
# memory: 1Gi
# cpu: 100m
#- name: AuxResourceLimits
# value: |-
# memory: 0.5Gi
# cpu: 50m
#- name: ReplicaResourceLimits
# value: |-
# memory: 2Gi
openebs.io/cas-type: jiva
provisioner: openebs.io/provisioner-iscsi
reclaimPolicy: Delete
volumeBindingMode: Immediate
Those limits definitely seem like a good idea but I’m ignoring them for now (the default is the same way). After kubectl apply
ing this StorageClass
, and updating our PVC to use the changed storageClassName
, we can delete everything (kubectl delete -f openebs-test.allinone.yaml
), update our Makefile
and and re-make everything. After we do:
$ kubectl apply -f openebs-test.allinone.yaml
persistentvolumeclaim/pvc-test-data created
pod/pvc-test created
... after waiting a few seconds ...
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e-ctrl-5b5d84cd8f-v5zcn 2/2 Running 0 37s
pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e-rep-84897dfc97-t59bb 1/1 Running 0 37s
pvc-test 0/1 ContainerCreating 0 37s
sjr-pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-tcj7-6d484 0/1 Completed 0 5m18s
Great, so we’ve no pending -rep-
pods, and one sjr-pvc
pod that I’ve never seen before, but it seems that pod gets left after cleanup happens. More important is making sure pvc-test
makes it out of the ContainerCreating
state, let’s inspect it:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m14s (x3 over 2m14s) default-scheduler pod has unbound immediate PersistentVolumeClaims
Normal Scheduled 2m14s default-scheduler Successfully assigned default/pvc-test to ubuntu-1810-cosmic-64-minimal
Normal SuccessfulAttachVolume 2m14s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e"
Warning FailedCreatePodSandBox 8s (x9 over 116s) kubelet, ubuntu-1810-cosmic-64-minimal Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: OCI runtime create failed: container_linux.go:265: starting container process caused "process_linux.go:348: container init caused \"read init-p: connection reset by peer\"": unknown
Well good news and bad news the Pod was able to attach but it looks like containerd
is having some issues… Which may have nothing to do with OpenEBS. Let’s take a detour
Checking containerd
’s systemd
status says it’s running fine, so let’s try and start a pod without a PVC:
Normal Scheduled 16s default-scheduler Successfully assigned default/no-pvc-test to ubuntu-1810-cosmic-64-minimal
Warning FailedCreatePodSandBox 3s (x2 over 15s) kubelet, ubuntu-1810-cosmic-64-minimal Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: OCI runtime create failed: container_linux.go:265: starting container process caused "process_linux.go:348: container init caused \"read init-p: connection reset by peer\"": unknown
Alright, it looks like something is just wrong with containerd
, which is good news because it means OpenEBS is ostensibly working, but bad news because it’s a bit of a chink in the armor. Not being able to create new Pods is definitely not ideal if I were running in a more serious production environment. To avoid a full machine reboot, Let’s take a look at the kubelet logs:
Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: E0105 08:28:09.535560 1586 kuberuntime_sandbox.go:65] CreatePodSandbox for pod "no-pvc-test_default(3f8d2a77-10bb-11e9-9cf0-8c89a517d15e)" failed: rpc error: code = Unknown desc = failed to start sandbox contai
Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: E0105 08:28:09.535587 1586 kuberuntime_manager.go:662] createPodSandbox for pod "no-pvc-test_default(3f8d2a77-10bb-11e9-9cf0-8c89a517d15e)" failed: rpc error: code = Unknown desc = failed to start sandbox conta
Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: E0105 08:28:09.535659 1586 pod_workers.go:190] Error syncing pod 3f8d2a77-10bb-11e9-9cf0-8c89a517d15e ("no-pvc-test_default(3f8d2a77-10bb-11e9-9cf0-8c89a517d15e)"), skipping: failed to "CreatePodSandbox" for "n
Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: W0105 08:28:09.708588 1586 manager.go:1195] Failed to process watch event {EventType:0 Name:/kubepods/burstable/pod13bea637-10ba-11e9-9cf0-8c89a517d15e/83cce6ed723480f83227706c155fc6f6ead206c4587b64e4c5084416bb
Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: W0105 08:28:09.709232 1586 container.go:409] Failed to create summary reader for "/kubepods/burstable/pod3f8d2a77-10bb-11e9-9cf0-8c89a517d15e/55d52719eb084edcbb77f64167d9de7cce6e25f54990364d5fe7e1c8819d437d": n
Jan 05 07:28:09 Ubuntu-1810-cosmic-64-minimal kubelet[1586]: E0105 08:28:09.753328 1586 dns.go:132] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 213.133.98.98 213.133.99.99 213.133.100.100
This is only partial but you can see things aren’t going well. Unfortunately both kubelet
and containerd
are not getting back into good states after restarting, so I’m just going to restart the box :(. I believe this has happened before and the quickest fix was to just restart everything, and this time I will absolutely not try and upgrade the entire system.
Well, I did all that only to realize that the issue is more nuanced – the resources I set in the pod specification were bad. With some binary-search-comment-and-uncomment, I realized my memory
specification was wrong:, here’s the no-pvc-test
Pod after the fix:
---
apiVersion: v1
kind: Pod
metadata:
name: no-pvc-test
namespace: default
labels:
app: no-pvc-test
spec:
containers:
- name: no-pvc-test
image: alpine
command: ["ash", "-c", "while true; do sleep 60s; done"]
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 0.25
memory: "512Mi"
limits:
cpu: 0.50
memory: "512Mi"
Woops! Looks like a classic case of user error
I went back and fixed the other resources and everything worked out just fine, all the pods are running (with the PVC):
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e-ctrl-5b5d84cd8f-v5zcn 2/2 Running 2 38m
pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e-rep-84897dfc97-t59bb 1/1 Running 1 38m
pvc-test 1/1 Running 0 114s
sjr-pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e-tcj7-6d484 0/1 Completed 0 43m
sjr-pvc-af0c599e-10b9-11e9-9cf0-8c89a517d15e-hdu8-vjf6v 0/1 Completed 0 39m
Let’s try and kubectl exec
our way in to try and write some data:
$ kubectl exec -it pvc-test ash
/ # ls /var/data
lost+found
/ # echo "HELLO WORLD" > /var/data/hello-world.txt
/ # ls /var/data
hello-world.txt lost+found
Now, let’s delete only the pod (careful, don’t delete the PVC, we have the reclaimPolicy
set to Delete
, though it probably wouldn’t delete fast enough). After we delete the pod, we should be able to restart it and it will pick up the same volume:
$ kubectl delete pod pvc-test
pod "pvc-test" deleted
$ kubectl apply -f openebs-test.allinone.yaml
persistentvolumeclaim/pvc-test-data unchanged
pod/pvc-test created
$ kubectl exec -it pvc-test ash
/ # ls /var/data
hello-world.txt lost+found
/ # cat /var/data/hello-world.txt
HELLO WORLD
We did it! We’ve got awesome persistent volumes working with OpenEBS and have a great non HA (but could easily go HA) setup. We’re standing on the shoulders of many giants, and things definitely look pretty good from up here!
If we look at the on-disk representation, we can check out the files in /var/openebs
:
root@Ubuntu-1810-cosmic-64-minimal ~ # tree /var/openebs/
/var/openebs/
├── pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e
│ ├── revision.counter
│ ├── volume-head-000.img
│ ├── volume-head-000.img.meta
│ └── volume.meta
├── pvc-5e74411c-10b4-11e9-9cf0-8c89a517d15e
│ └── scrubbed.txt
├── pvc-af0c599e-10b9-11e9-9cf0-8c89a517d15e
│ └── scrubbed.txt
├── shared-cstor-sparse-pool
│ ├── cstor-sparse-pool.cache
│ ├── uzfs.sock
│ └── zrepl.lock
└── sparse
└── 0-ndm-sparse.img
5 directories, 10 files
Looks like OpenEBS is basically doing that “loop-based disk image maintenance” idea I had or something similar (and I’m sure way more robustly) – this might just be the best solution I’ve come across so far for storage with Kubernetes. let’s check out what some of these files are:
/var/openebs/pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e: directory
/var/openebs/pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e/volume-head-000.img: Linux rev 1.0 ext4 filesystem data, UUID=4820dfb0-7574-47b4-91b0-39a31580fbf2 (needs journal recovery) (extents) (64bit) (large files) (huge files)
/var/openebs/pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e/volume-head-000.img.meta: ASCII text
/var/openebs/pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e/volume.meta: ASCII text
/var/openebs/pvc-1367d8a1-10ba-11e9-9cf0-8c89a517d15e/revision.counter: ASCII text, with no line terminators
Pretty awesome, straight forward and predictable stuff!
Thus concludes our whirlwind tour through setting up OpenEBS. As you can see the work was pretty light on our side, things just worked, and that’s thanks to a lot of hard work from the team behind OpenEBS and committers to the project (and all the other giants we’re standing on).
Going forward it looks like I’m going to be using OpenEBS over Rook for my bare metal clusters (on Hetzner at least) – it was/is a blast to try and keep up with this area and see how it evolves over time.