Kubernetes AKS Persistent Volume Disk Claims To Multiple Nodes - azure

How can I attach 100GB Persistent Volume Disk to Each Node in the AKS Kubernetes Cluster?
We are using Kubernetes on Azure using AKS.
We have a scenario where we need to attach Persistent Volumes to each Node in our AKS Cluster. We run 1 Docker Container on each Node in the Cluster.
The reason to attach volumes Dynamically is to increase the IOPS available and available amount of Storage that each Docker container needs to do its job.
The program running inside of each Docker container works against very large input data files (10GB) and writes out even larger output files(50GB).
We could mount Azure File Shares, but Azure FileShares is limited to 60MB/ps which is too slow for us to move around this much raw data. Once the program running in the Docker image has completed, it will move the output file (50GB) to Blob Storage. The total of all output files may exceed 1TB from all the containers.
I was thinking that if we can attach a Persistent Volume to each Node we can increase our available disk space as well as the IOPS without having to go to a high vCPU/RAM VM configuration (ie. DS14_v2). Our program is more I/O intensive vs CPU.
All the Docker images running in the Pod are exactly the same where they read a message from a Queue that tells it a specific input file to work against.
I've followed the docs to create a StorageClass, Persistent Volume Claims and Persistent Volume and run this against 1 POD. https://learn.microsoft.com/en-us/azure/aks/azure-disks-dynamic-pv
However, when I create a Deployment and Scale the number of Pods from 1 to 2 I receive the error (in production we'd scale to as many nodes as necessary ~100)
Multi-Attach error for volume
"pvc-784496e4-869d-11e8-8984-0a58ac1f1e06" Volume is already used by
pod(s) pv-deployment-67fd8b7b95-fjn2n
I realize that an Azure Disk can only be attached to a SingleNode (ReadWriteOnce) however I'm not sure how to create multiple disks and attach them to each Node at the time we load up the Kubernetes Cluster and begin our work.
Persistent Volume Claim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: azure-managed-disk
spec:
accessModes:
- ReadWriteOnce
storageClassName: managed-premium
resources:
requests:
storage: 100Gi
This is my Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: pv-deployment
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- name: volume
mountPath: /mnt/azure
resources:
limits:
cpu: ".7"
memory: "2.5G"
requests:
cpu: ".7"
memory: "2.5G"
volumes:
- name: volume
persistentVolumeClaim:
claimName: azure-managed-disk
If I knew that I was going to scale to 100 Nodes, would I have to create a .yaml files with 100 Deployments and be explicit for each Deployment to use a specific Volume Claim?
For example in my volume claim I'd have azure-claim-01, azure-claim-02, etc. and in each Deployment I would have to make claim to each named Volume Claim
volumes:
- name: volume
persistentVolumeClaim:
claimName: azure-claim-01
I can't quite get my head around how I can do all this dynamically?
Can you recommend a better way to achieve the desired result?

You should use the StatefulSetand volumeClaimTemplates configuration like following:
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 80
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "nginx"
replicas: 4
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
volumeMounts:
- name: persistent-storage
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: persistent-storage
annotations:
volume.beta.kubernetes.io/storage-class: hdd
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 2Gi
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: hdd
provisioner: kubernetes.io/azure-disk
parameters:
skuname: Standard_LRS
kind: managed
cachingMode: ReadOnly
You will get Persistent Volume for every Node:
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON
AGE
pvc-0e651011-7647-11e9-bbf5-c6ab19063099 2Gi RWO Delete Bound default/persistent-storage-web-0 hdd
51m
pvc-17181607-7648-11e9-bbf5-c6ab19063099 2Gi RWO Delete Bound default/persistent-storage-web-1 hdd
49m
pvc-4d488893-7648-11e9-bbf5-c6ab19063099 2Gi RWO Delete Bound default/persistent-storage-web-2 hdd
48m
pvc-6aff2a4d-7648-11e9-bbf5-c6ab19063099 2Gi RWO Delete Bound default/persistent-storage-web-3 hdd
47m
And every Node will create dedicated Persistent Volume Claim:
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistent-storage-web-0 Bound pvc-0e651011-7647-11e9-bbf5-c6ab19063099 2Gi RWO hdd 55m
persistent-storage-web-1 Bound pvc-17181607-7648-11e9-bbf5-c6ab19063099 2Gi RWO hdd 48m
persistent-storage-web-2 Bound pvc-4d488893-7648-11e9-bbf5-c6ab19063099 2Gi RWO hdd 46m
persistent-storage-web-3 Bound pvc-6aff2a4d-7648-11e9-bbf5-c6ab19063099 2Gi RWO hdd 45m

I would consider using DaemonSet. This would allow your pods to only run on each node, hence ReadWriteOnce will take effect. The constraint will be, you cannot scale your application more than the number of nodes you have.

Related

Memory Sev3 alerts in Azure kubernetes cluster with default resource allocations for 2 pods - mySQL db and Spring app

I deployed few days ago 2 services into Azure Kubernetes cluster. I set up cluster with 1 node, virtual machine parameters: B2s: 2 Cores, 4 GB RAM, 8 GB Temporary storage. Then I placed 2 pods on the same node:
MySQL database with 4Gib storage persistent volume, 5 tables at the moment
Spring boot java application
There is no replicas.
Take a look on kubectl output regarding the deployed pods:
The purpose is to create internal application in company where I work which will be used by company team. There won't be a lot of data in DB.
When we started to test connection with API from front-end I received memory alert like below:
mySQL deployment yaml file looks like:
apiVersion: v1
kind: Service
metadata:
name: mysql-db-testing-service
namespace: testing
spec:
type: LoadBalancer
ports:
- port: 3307
targetPort: 3306
selector:
app: mysql-db-testing
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql-db-testing
namespace: testing
spec:
selector:
matchLabels:
app: mysql-db-testing
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: mysql-db-testing
spec:
containers:
- name: mysql-db-container-testing
image: mysql:8.0.31
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysqldb-secret-testing
key: password
ports:
- containerPort: 3306
name: mysql-port
volumeMounts:
- mountPath: "/var/lib/mysql"
name: mysql-persistent-storage
volumes:
- name: mysql-persistent-storage
persistentVolumeClaim:
claimName: azure-managed-disk-pvc-mysql-testing
nodeSelector:
env: preprod
Spring app deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: spring-app-api-testing
namespace: testing
labels:
app: spring-app-api-testing
spec:
replicas: 1
selector:
matchLabels:
app: spring-app-api-testing
template:
metadata:
labels:
app: spring-app-api-testing
spec:
containers:
- name: spring-app-api-testing
image: techradaracr.azurecr.io/technology-radar-be:$(Build.BuildId)
env:
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: mysqldb-secret-testing
key: password
- name: MYSQL_PORT
valueFrom:
configMapKeyRef:
name: spring-app-testing-config-map
key: mysql_port
- name: MYSQL_HOST
valueFrom:
configMapKeyRef:
name: spring-app-testing-config-map
key: mysql_host
nodeSelector:
env: preprod
---
apiVersion: v1
kind: Service
metadata:
labels:
app: spring-app-api-testing
k8s-app: spring-app-api-testing
name: spring-app-api-testing-service
namespace: testing
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
type: LoadBalancer
selector:
app: spring-app-api-testing
First I deployed MySQl database, then java Spring API.
I guess the problem is with default resource allocation and MySQL db is using 90 % of overall RAM memory. That's why I'm receiving memory alert.
I know that there are sections for resources allocation in yaml config:
resources:
requests:
cpu: 250m
memory: 64Mi
limits:
cpu: 500m
memory: 256Mi
for minimum and maximum cpu and memory resources. Question is how many of them should I allocate for spring app and how many for mySQL database in order to avoid memory problems?
I would be grateful for help.
First of all, running the whole cluster on only one VM defeats the purpose of using Kubernetes, specially that you are using a small SKU for the VMSS. Have you considered running the application outside of k8s ?
To answer your question, there is no given formula or set values for the request/limits. The values you choose for resource requests and limits will depend on the specific requirements of your application and the resources available in your cluster.
In detail, you should consider the workload characteristics, cluster capacity, performance (if the values are too small, the application will struggle) and cost.
Please refer to the best practices here: https://learn.microsoft.com/en-us/azure/aks/developer-best-practices-resource-management

Mount a shared Azure disk in Azure Kubernetes to multiple windows PODs

I want to attach a shared disk to multiple windows containers on AKS.
From post learned that it can be done for Linux containers.
I am trying to do the same with windows container but it's failing to mount a shared disk, with below error
MapVolume.MapPodDevice failed for volume "pvc-6e07bdca-2126-4a5b-806a-026016c3798d" : rpc error: code = Internal desc = Could not mount "2" at "\var\lib\kubelet\plugins\kubernetes.io\csi\volumeDevices\publish\pvc-6e07bdca-2126-4a5b-806a-026016c3798d\4e44da87-ea33-4d85-a7db-076db0883bcf": rpc error: code = Unknown desc = not an absolute Windows path: 2
Error occured
Used below to dynamically provision Shared Azure Disk
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-csi-custom
provisioner: disk.csi.azure.com
parameters:
skuname: Premium_LRS
maxShares: "2"
cachingMode: None
reclaimPolicy: Delete
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-azuredisk-dynamic
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 4Gi
volumeMode: Block
storageClassName: managed-csi-custom
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: test-shared-disk
name: deployment-azuredisk
spec:
replicas: 2
selector:
matchLabels:
app: test-shared-disk
template:
metadata:
labels:
app: test-shared-disk
name: deployment-azuredisk
spec:
nodeSelector:
role: windowsgeneral
containers:
- name: deployment-azuredisk
image: mcr.microsoft.com/dotnet/framework/runtime:4.8-windowsservercore-ltsc2019
volumeDevices:
- name: azuredisk
devicePath: "D:\test"
volumes:
- name: azuredisk
persistentVolumeClaim:
claimName: pvc-azuredisk-dynamic
Is it possible to mount shared disk for windows container on AKS? Thanks for help.
Azure shared disks is an Azure managed disks feature that enables attaching an Azure disk to agent nodes simultaneously. But it doesn't apply for window node pool only
To overcome this issue or mounting Azure Disk CSI driver to window node you need to provisoned or create the window node pool first.
Please refer this MS tutorial to add a Windows node pool.
After you have a Windows node pool, you can now use the same built-in storage classes managed-csi to mount the DISK.
For more information and Validating Volume Mapping you can refer this MS Document

Create a new volume when pod restart in a statefulset

I'm using azure aks to create a statefulset with volume using azure disk provisioner.
I'm trying to find a way to write my statefulset YAML file in a way that when a pod restarts, it will get a new Volume and the old volume will be deleted.
I know I can delete volumes manually, but is there any ways to tell Kubernetes to do this via statefulset yaml?
Here is my Yaml:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: janusgraph
labels:
app: janusgraph
spec:
...
...
template:
metadata:
labels:
app: janusgraph
spec:
containers:
- name: janusgraph
...
...
volumeMounts:
- name: data
mountPath: /var/lib/janusgraph
livenessProbe:
httpGet:
port: 8182
path: ?gremlin=g.V(123).count()
initialDelaySeconds: 120
periodSeconds: 10
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "default"
resources:
requests:
storage: 7Gi
If you want your data to be deleted when the pod restarts, you can use an ephemeral volume like EmptyDir.
When a Pod is removed/restarted for any reason, the data in the emptyDir is deleted forever.
Sample:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumes:
- name: www
emptyDir: {}
N.B.:
By default, emptyDir volumes are stored on whatever medium is backing the node - that might be disk or SSD or network storage, depending on your environment. However, you can set the emptyDir.medium field to "Memory" to tell Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead.

Azure csi disk FailedAttachVolume issue : could not get disk name from disk URL

I am using azure csi disk driver method for implementing K8 persistent volume. I have installed azure-csi-drivers in my K8 cluster and using below mentioned files as end-to-end testing purpose but my deployment is getting failed due to following error :
Warning FailedAttachVolume 23s (x7 over 55s) attachdetach-controller
AttachVolume.Attach failed for volume "pv-azuredisk-csi" : rpc error:
code = NotFound desc = Volume not found, failed with error: could not
get disk name from
/subscriptions/464f9a13-7g6o-730g-hqi4-6ld2802re6z1/resourcegroups/560d_RTT_HOT_ENV_RG/providers/Microsoft.Compute/disks/560d-RTT-PVDisk,
correct format:
./subscriptions/(?:.)/resourceGroups/(?:.*)/providers/Microsoft.Compute/disks/(.+)
Note: I have checked multiple times, my URL is correct but I am not sure if underscore in resource group name is creating any problem, RG = "560d_RTT_HOT_ENV_RG". Please suggest if anyone have any idea what is going wrong?
K8 version : 14.9
CSI drivers : v0.3.0
My YAML files are :
csi-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-azuredisk-csi
namespace: azure-static-diskpv-csi-fss
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
csi:
driver: disk.csi.azure.com
readOnly: false
volumeHandle: /subscriptions/464f9a13-7g6o-730g-hqi4-6ld2802re6z1/resourcegroups/560d_RTT_HOT_ENV_RG/providers/Microsoft.Compute/disks/560d-RTT-PVDisk
volumeAttributes:
cachingMode: ReadOnly
fsType: ext4
-------------------------------------------------------------------------------------------------
csi-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-azuredisk-csi
namespace: azure-static-diskpv-csi-fss
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeName: pv-azuredisk-csi
storageClassName: ""
nginx-csi-pod.yaml
kind: Pod
apiVersion: v1
metadata:
name: nginx-azuredisk-csi
namespace: azure-static-diskpv-csi-fss
spec:
nodeSelector:
beta.kubernetes.io/os: linux
containers:
image: nginx
name: nginx-azuredisk-csi
command:
"/bin/sh"
"-c"
while true; do echo $(date) >> /mnt/azuredisk/outfile; sleep 1; done
volumeMounts:
name: azuredisk01
mountPath: "/mnt/azuredisk"
volumes:
name: azuredisk01
persistentVolumeClaim:
claimName: pvc-azuredisk-csi
It seems you create the disk in another resource group, not the AKS nodes group. So you must grant the Azure Kubernetes Service (AKS) service principal for your cluster the Contributor role to the disk's resource group firstly. For more details, see Create an Azure disk.
Update:
Finally, I found out the reason why it cannot find the volume. I think it's a stupid definition. It's case sensitive about the resource Id of the disk which you used for the persist volume. So you need to change your csi-pv.yaml file like below:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-azuredisk-csi
namespace: azure-static-diskpv-csi-fss
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
csi:
driver: disk.csi.azure.com
readOnly: false
volumeHandle: /subscriptions/464f9a13-7g6o-730g-hqi4-6ld2802re6z1/resourcegroups/560d_rtt_hot_env_rg/providers/Microsoft.Compute/disks/560d-RTT-PVDisk
volumeAttributes:
cachingMode: ReadOnly
fsType: ext4
In addition, the first paragraph of the answer is also important.
Update:
Here are the screenshots of the result that the static disk for the CSI driver works on my side:

Kubernetes Unable to mount volumes for pod with timeout

I am trying to mount an NFS volume to my pods but with no success.
I have a server running the nfs mount point, when I try to connect to it from some other running server
sudo mount -t nfs -o proto=tcp,port=2049 10.0.0.4:/export /mnt works fine
Another thing worth mentioning is when I remove the volume from the deployment and the pod is running. I log into it and i can telnet to 10.0.0.4 with ports 111 and 2049 successfully. so there really doesnt seem to be any communication problems
as well as:
showmount -e 10.0.0.4
Export list for 10.0.0.4:
/export/drive 10.0.0.0/16
/export 10.0.0.0/16
So I can assume that there is no network or configuration problems between the server and the client (I am using Amazon and the server that i tested on is in the same security group as the k8s minions)
P.S:
The server is a simple ubuntu->50gb disk
Kubernetes v1.3.4
So I start creating my PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteMany
nfs:
server: 10.0.0.4
path: "/export"
And my PVC
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nfs-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
here is how kubectl describes them:
Name: nfs
Labels: <none>
Status: Bound
Claim: default/nfs-claim
Reclaim Policy: Retain
Access Modes: RWX
Capacity: 50Gi
Message:
Source:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 10.0.0.4
Path: /export
ReadOnly: false
No events.
AND
Name: nfs-claim
Namespace: default
Status: Bound
Volume: nfs
Labels: <none>
Capacity: 0
Access Modes:
No events.
pod deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: mypod
labels:
name: mypod
spec:
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
name: mypod
labels:
# Important: these labels need to match the selector above, the api server enforces this constraint
name: mypod
spec:
containers:
- name: abcd
image: irrelevant to the question
ports:
- containerPort: 80
env:
- name: hello
value: world
volumeMounts:
- mountPath: "/mnt"
name: nfs
volumes:
- name: nfs
persistentVolumeClaim:
claimName: nfs-claim
When I deploy my POD i get the following:
Volumes:
nfs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs-claim
ReadOnly: false
default-token-6pd57:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-6pd57
QoS Tier: BestEffort
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
13m 13m 1 {default-scheduler } Normal Scheduled Successfully assigned xxx-2140451452-hjeki to ip-10-0-0-157.us-west-2.compute.internal
11m 7s 6 {kubelet ip-10-0-0-157.us-west-2.compute.internal} Warning FailedMount Unable to mount volumes for pod "xxx-2140451452-hjeki_default(93ca148d-6475-11e6-9c49-065c8a90faf1)": timeout expired waiting for volumes to attach/mount for pod "xxx-2140451452-hjeki"/"default". list of unattached/unmounted volumes=[nfs]
11m 7s 6 {kubelet ip-10-0-0-157.us-west-2.compute.internal} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "xxx-2140451452-hjeki"/"default". list of unattached/unmounted volumes=[nfs]
Tried everything I know, and everything i can think of. What am i missing or doing wrong here?
I tested version 1.3.4 and 1.3.5 of Kubernetes and NFS mount didn't work for me. Later I switched to the 1.2.5 and that version gave me some more detailed info ( kubectl describe pod ...). It turned out that 'nfs-common' is missing in the hyperkube image. After I added nfs-common to all container instances based on hyperkube image on master and worker nodes the NFS share started to work normally (mount was successful). So that's the case here. I tested it in practice and it solved my problem.

Resources