Can Openebs iSCSI target to run on same node as POD attached to the PV? - openebs

Any way to force OpenEBS iscsi target to run on same node as POD attached to the PV?
See below scenario:
NODE A: POD + PV mounted + openebs replica
NODE B: openebs replica + iscsi target
traffic then goes from nodeA to nodeB(iscsi target)--NodeB(write to disk) -- nodeA(write to disk)
and then if try to read a file always generate traffic from nodeA to nodeB when nodeA have a full replica so can be read from there.

This can be achieved using Target Affinity Policy. Policy can be used to co-locate volume target pod on the same node as workload.
labels:
openebs.io/target-affinity: <application-unique-label>
You can specify the Target Affinity in both application and OpenEBS PVC using the following way
For Application Pod, it will be similar to the following
apiVersion: v1
kind: Pod
metadata:
name: fio-cstor
labels:
name: fio-cstor
openebs.io/target-affinity: fio-cstor
For OpenEBS PVC, it will be similar to the following.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: fio-cstor-claim
labels:
openebs.io/target-affinity: fio-cstor
Note: This feature works only for cases where there is a 1-1 mapping between a application and PVC. It's not recommended for STS where PVC is specified as a template.

Related

After resizing PV/PVC in AKS using standard LRS storage class, Artifactory > monitoring > storage section still shows old storage space. How to fix?

Issue:
Need to increase the filestore size after resizing my PV but the filestore size is not changing, even if I set the PV/PVC to 800gi or 300gi, it's still stuck at 205
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: artifactory-pv-claim
namespace: test
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 250Gi
volumeMode: Filesystem
storageClassName: "custom-artifactory"
To resize the PVC, you can edit the PVC to change the storage request to ask for more space. But to make the update works, here shows the way:
File system expansion must be triggered by terminating the pod using
the volume. More specifically:
Edit the PVC to request more space. Once underlying volume has been
expanded by the storage provider, then the PersistentVolume object
will reflect the updated size and the PVC will have the
FileSystemResizePending condition.
And here is the screenshot of my test:
before change:
after change, but before recreate pod:
after recreate pod:

AKS PersistentVolume Affinity?

Disclaimer: This question is very specific about the used platforms and the UseCase we are trying to solve with it. Also it compares two approaches we currently use at least in a development stage and are trying to compare, but perhaps don't fully understand yet. I am asking for guidance on this very specific topic...
A) We are running a Kafka cluster as Kafka Tasks on DC/OS, where persistence of data is maintained via local Disk Storage which is provisioned on the very same host as the according kafka broker instance.
B) We are trying to run Kafka on Kubernetes (via Strimzi Operator), specifically Azure Kubernetes Service (AKS) and are struggling to get reliable Data Persistence using the StorageClasses you get in AKS. We tried three possibilities:
(Default) Azure Disk
Azure File
emptyDir
I see two major issues with Azure Disk, as we are able to set the Kafka Pod Affinity in a manner that they do not end up on the same maintenance zone / host, we have no instrument to bind the according PersistentVolume anywhere near the Pod. There is nothing like NodeAffinity for AzureDisks. Also it is fairly common that an Azure Disk ends up on another host than its corresponding pod, which might be limited by network bandwidth then?
With Azure File we don't have issues because of maintenance zones which are going down temporarily, but as a high latency storage option it doesn't seem to be a good fit and also Kafka has trouble to delete / update files on retention.
So I ended up using an ephemeral Storage Cluster which is commonly NOT recommended but doesn't come with the problems above. The Volume "lives" near the pod and is available to it as long as the pod itself runs on any node. In the maintenance case pod AND volume die together. As long as I am able to maintain a quorum, I don't see where this might cause issues.
Is there anything like podAffinity for PersistentVolumes as Azure-Disk is per definition Node bound?
What are the major downsides in using emptyDir for persistence in a Kafka Cluster on Kubernetes?
Is there anything like podAffinity for PersistentVolumes as Azure-Disk
is per definition Node bound?
As I know, there is nothing like podaffinity for PersistentVolumes as Azure-Disk. The azure disk should be attached to the node, so if the pod changes the host node, then the pod can't use the volume on that disk. Only the Azure file share is podAffinity.
What are the major downsides in using emptyDir for persistence in a
Kafka Cluster on Kubernetes?
You can take a look at the emptyDir:
scratch space, such as for a disk-based merge sort
This is the most thing you need to watch out for when you use the AKS. You need to calculate the disk space, perhaps you need to attach multiple Azure disks to the nodes.
Starting off - I'm not sure what you mean about an Azure Disk ending up on a node other than where the pod is assigned - that shouldn't be possible, per my understanding (for completeness, you can do this on a VM with the shared disks feature outside of AKS, but as far as I'm aware that's not supported in AKS for dynamic disks at the time of writing). If you're looking at the volume.kubernetes.io/selected-node annotation on the PVC, I don't believe that's updated after initial creation.
You can reach the configuration you're looking for by using a statefulset with antiaffinity. Consider this statefulset. It creates three pods, which must be in different availability zones. I'm deploying this to an AKS cluster with a nodepool (nodepool2) with two nodes per AZ:
❯ kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{","}{.metadata.labels.topology\.kubernetes\.io\/zone}{"\n"}{end}'
aks-nodepool1-25997496-vmss000000,0
aks-nodepool2-25997496-vmss000000,westus2-1
aks-nodepool2-25997496-vmss000001,westus2-2
aks-nodepool2-25997496-vmss000002,westus2-3
aks-nodepool2-25997496-vmss000003,westus2-1
aks-nodepool2-25997496-vmss000004,westus2-2
aks-nodepool2-25997496-vmss000005,westus2-3
Once the statefulset is deployed and spun up, you can see each pod was assigned to one of the nodepool2 nodes:
❯ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
echo-0 1/1 Running 0 3m42s 10.48.36.102 aks-nodepool2-25997496-vmss000001 <none> <none>
echo-1 1/1 Running 0 3m19s 10.48.36.135 aks-nodepool2-25997496-vmss000002 <none> <none>
echo-2 1/1 Running 0 2m55s 10.48.36.72 aks-nodepool2-25997496-vmss000000 <none> <none>
Each pod created a PVC based on the template:
❯ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
demo-echo-0 Bound pvc-bf6104e0-c05e-43d4-9ec5-fae425998f9d 1Gi RWO managed-premium 25m
demo-echo-1 Bound pvc-9d9fbd5f-617a-4582-abc3-ca34b1b178e4 1Gi RWO managed-premium 25m
demo-echo-2 Bound pvc-d914a745-688f-493b-9b82-21598d4335ca 1Gi RWO managed-premium 24m
Let's take a look at one of the PVs that was created:
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/bound-by-controller: "yes"
pv.kubernetes.io/provisioned-by: kubernetes.io/azure-disk
volumehelper.VolumeDynamicallyCreatedByKey: azure-disk-dynamic-provisioner
creationTimestamp: "2021-04-05T14:08:12Z"
finalizers:
- kubernetes.io/pv-protection
labels:
failure-domain.beta.kubernetes.io/region: westus2
failure-domain.beta.kubernetes.io/zone: westus2-3
name: pvc-9d9fbd5f-617a-4582-abc3-ca34b1b178e4
resourceVersion: "19275047"
uid: 945ad69a-92cc-4d8d-96f4-bdf0b80f9965
spec:
accessModes:
- ReadWriteOnce
azureDisk:
cachingMode: ReadOnly
diskName: kubernetes-dynamic-pvc-9d9fbd5f-617a-4582-abc3-ca34b1b178e4
diskURI: /subscriptions/02a062c5-366a-4984-9788-d9241055dda2/resourceGroups/rg-sandbox-aks-mc-sandbox0-westus2/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-9d9fbd5f-617a-4582-abc3-ca34b1b178e4
fsType: ""
kind: Managed
readOnly: false
capacity:
storage: 1Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: demo-echo-1
namespace: zonetest
resourceVersion: "19275017"
uid: 9d9fbd5f-617a-4582-abc3-ca34b1b178e4
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/region
operator: In
values:
- westus2
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- westus2-3
persistentVolumeReclaimPolicy: Delete
storageClassName: managed-premium
volumeMode: Filesystem
status:
phase: Bound
As you can see, that PV has a required nodeAffinity for nodes in failure-domain.beta.kubernetes.io/zone with value westus2-3. This ensures that the pod that owns that PV will only ever get placed on a node in westus2-3, and that PV will be bound to the node the disk is running on when the pod is started.
At this point, I deleted all the pods to get them on the other nodes:
❯ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
echo-0 1/1 Running 0 4m4s 10.48.36.168 aks-nodepool2-25997496-vmss000004 <none> <none>
echo-1 1/1 Running 0 3m30s 10.48.36.202 aks-nodepool2-25997496-vmss000005 <none> <none>
echo-2 1/1 Running 0 2m56s 10.48.36.42 aks-nodepool2-25997496-vmss000003 <none> <none>
There's no way to see it via Kubernetes, but you can see via the Azure portal that managed disk kubernetes-dynamic-pvc-bf6104e0-c05e-43d4-9ec5-fae425998f9d, which backs pv pvc-bf6104e0-c05e-43d4-9ec5-fae425998f9d, which backs PVC zonetest/demo-echo-0, is listed as Managed by: aks-nodepool2-25997496-vmss_4, so it's been removed and assigned to the node where the pod is running.
Portal screenshot showing disk attached to node 4
If I were to remove nodes such that I didn't have nodes in AZ 3, I wouldn't be able to start pod echo-1, since it's bound to a disk in AZ 3, which can't be attached to a node not in AZ 3.

Is it possible to mount a shared Azure disk in Azure Kubernetes to multiple PODs/Nodes?

I want to mount an Azure Shared Disk to the multiple deployments/nodes based on this:
https://learn.microsoft.com/en-us/azure/virtual-machines/disks-shared
So, I created a shared disk in Azure Portal and when trying to mount it to deployments in Kubernetes I got an error:
"Multi-Attach error for volume "azuredisk" Volume is already used by pod(s)..."
Is it possible to use Shared Disk in Kubernetes? If so how?
Thanks for tips.
Yes, you can, and the capability is GA.
An Azure Shared Disk can be mounted as ReadWriteMany, which means you can mount it to multiple nodes and pods. It requires the Azure Disk CSI driver, and the caveat is that currently only Raw Block volumes are supported, thus the application is responsible for managing the control of writes, reads, locks, caches, mounts, and fencing on the shared disk, which is exposed as a raw block device. This means that you mount the raw block device (disk) to a pod container as a volumeDevice rather than a volumeMount.
The documentation examples mostly points to how to create a Storage Class to dynamically provision the static Azure Shared Disk, but I have also created one statically and mounted it to multiple pods on different nodes.
Dynamically Provision Shared Azure Disk
Create Storage Class and PVC
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-csi
provisioner: disk.csi.azure.com
parameters:
skuname: Premium_LRS # Currently shared disk only available with premium SSD
maxShares: "2"
cachingMode: None # ReadOnly cache is not available for premium SSD with maxShares>1
reclaimPolicy: Delete
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-azuredisk
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 256Gi # minimum size of shared disk is 256GB (P15)
volumeMode: Block
storageClassName: managed-csi
Create a deployment with 2 replicas and specify volumeDevices, devicePath in Spec
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: deployment-azuredisk
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
name: deployment-azuredisk
spec:
containers:
- name: deployment-azuredisk
image: mcr.microsoft.com/oss/nginx/nginx:1.17.3-alpine
volumeDevices:
- name: azuredisk
devicePath: /dev/sdx
volumes:
- name: azuredisk
persistentVolumeClaim:
claimName: pvc-azuredisk
Use a Statically Provisioned Azure Shared Disk
Using an Azure Shared Disk that has been provisioned through ARM, Azure Portal, or through the Azure CLI.
Define a PersistentVolume (PV) that references the DiskURI and DiskName:
apiVersion: v1
kind: PersistentVolume
metadata:
name: azuredisk-shared-block
spec:
capacity:
storage: "256Gi" # 256 is the minimum size allowed for shared disk
volumeMode: Block # PV and PVC volumeMode must be 'Block'
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
azureDisk:
kind: Managed
diskURI: /subscriptions/<subscription>/resourcegroups/<group>/providers/Microsoft.Compute/disks/<disk-name>
diskName: <disk-name>
cachingMode: None # Caching mode must be 'None'
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-azuredisk-managed
spec:
resources:
requests:
storage: 256Gi
volumeMode: Block
accessModes:
- ReadWriteMany
volumeName: azuredisk-shared-block # The name of the PV (above)
Mounting this PVC is the same for both dynamically and statically provisioned shared disks. Reference the deployment above.
Note
Only raw block device(volumeMode: Block) is supported on shared disk feature, Kubernetes application should manage coordination and control of writes, reads, locks, caches, mounts, fencing on the shared disk which is exposed as raw block device. Multi-node read write is not supported by common file systems (e.g. ext4, xfs), it's only supported by cluster file systems.
details: https://github.com/kubernetes-sigs/azuredisk-csi-driver/tree/master/deploy/example/sharedisk

Where is data stored in OpenEBS?

Setup -- OPENEBS version 0.6
I would like to know where OpenEBS stores data in Jiva.
Where does it exactly store the data? Does it store locally on Kubernetes nodes mount path or on a disk attached to the pool.
It depends on the storage pool which is referred to in the storageclass.
If the pool is "default" and the storage engine chosen is Jiva, then the storage space for volume replica data is carved out from the container image space of the replica pod itself ("/mnt/openebs_disk").
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-standard
annotations:
cas.openebs.io/config: |
- name: ControllerImage
value: openebs/jiva:0.7.0-RC2
- name: ReplicaImage
value: openebs/jiva:0.7.0-RC2
- name: VolumeMonitorImage
value: openebs/m-exporter:0.7.0-RC2
- name: ReplicaCount
value: "1"
- name: StoragePool
value: default
- name: FSType
value: "xfs"
If the storage pool referred to in the storageclass is NOT "default", then it will be an external disk or group of disks that are mounted at a given path inside the replica pod

How does PersistentVolume work with hostPath?

I have deployed Gitlab to my azure kubernetes cluster with a persistant storage defined the following way:
kind: PersistentVolume
apiVersion: v1
metadata:
name: gitlab-data
namespace: gitlab
spec:
capacity:
storage: 8Gi
accessModes:
- ReadWriteMany
hostPath:
path: "/tmp/gitlab-data"
That worked fine for some days. Suddenly all my data stored in Gitlab is gone and I don't know why. I was assuming that the hostPath defined PersistentVolumen is really persistent, because it is saved on a node and somehow replicated to all existing nodes. But my data now is lost and I cannot figure out why. I looked up the uptime of each node and there was no restart. I logged in into the nodes and checked the path and as far as I can see the data is gone.
So how do PersistentVolume Mounts work in Kubernetes? Are the data saved really persistent on the nodes? How do multiple nodes share the data, if a deployment is split to multiple nodes? Is hostPath reliable persistent storage?
hostPath doesn't share or replicate data between nodes and once your pod starts on another node, the data will be lost. You should consider to use some external shared storage.
Here's the related quote from the official docs:
HostPath (single node testing only – local storage is not supported in any way and WILL NOT WORK in a multi-node cluster)
from kubernetes.io/docs/user-guide/persistent-volumes/

Resources