Volume node affinity conflicts in Argo workflows - azure

I have an Argo workflow that has two steps, the first runs on Linux and the second runs on Windows
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: my-workflow-v1.13
spec:
entrypoint: process
volumeClaimTemplates:
- metadata:
name: workdir
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
arguments:
parameters:
- name: jobId
value: 0
templates:
- name: process
steps:
- - name: prepare
template: prepare
- - name: win-step
template: win-step
- name: win-step
nodeSelector:
kubernetes.io/os: windows
container:
image: mcr.microsoft.com/windows/nanoserver:1809
command: ["cmd", "/c"]
args: ["dir", "C:\\workdir\\source"]
volumeMounts:
- name: workdir
mountPath: /workdir
- name: prepare
nodeSelector:
kubernetes.io/os: linux
inputs:
artifacts:
- name: src
path: /opt/workdir/source.zip
s3:
endpoint: minio:9000
insecure: true
bucket: "{{workflow.parameters.jobId}}"
key: "source.zip"
accessKeySecret:
name: my-minio-cred
key: accesskey
secretKeySecret:
name: my-minio-cred
key: secretkey
script:
image: garthk/unzip:latest
imagePullPolicy: IfNotPresent
command: [sh]
source: |
unzip /opt/workdir/source.zip -d /opt/workdir/source
volumeMounts:
- name: workdir
mountPath: /opt/workdir
both steps share a volume.
To achieve that in Azure Kubernetes Service, I had to create two node pools, one for Linux nodes and another for Windows nodes
The problem is, when I queue the workflow, sometimes it completes, and sometimes, the win-step (the step that runs in the windows container), hangs/fails and shows this message
1 node(s) had volume node affinity conflict
I've read that this could happen because the volume gets scheduled on a specific zone and the windows container (since it's in a different pool) gets scheduled in a different zone that doesn't have access to that volume, but I couldn't find a solution for that.
Please help.

the first runs on Linux and the second runs on Windows
I doubt that you can mount the same volume on both Linux, typically ext4 file system and on a Windows node, Azure Windows containers uses NTFS file system.
So the volume that you try to mount in the second step, is located on the node pool that does not match your nodeSelector.

Related

Volume Mount in SparkApplication resource not working

I am toying with the spark operator in kubernetes, and I am trying to create a Spark Application resource with the following manifest.
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: pyspark-pi
namespace: spark-jobs
spec:
batchScheduler: volcano
batchSchedulerOptions:
priorityClassName: routine
type: Python
pythonVersion: "3"
mode: cluster
image: "<image_name>"
imagePullPolicy: Always
mainApplicationFile: local:///spark-files/csv_data.py
arguments:
- "10"
sparkVersion: "3.0.0"
restartPolicy:
type: OnFailure
onFailureRetries: 3
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
timeToLiveSeconds: 86400
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.0.0
serviceAccount: driver-sa
volumeMounts:
- name: sparky-data
mountPath: /spark-data
executor:
cores: 1
instances: 2
memory: "512m"
labels:
version: 3.0.0
volumeMounts:
- name: sparky-data
mountPath: /spark-data
volumes:
- name: sparky-data
hostPath:
path: /spark-data
I am running this in kind, where I have defined a volume mount to my local system where the data to be processed is present. I can see the volume being mounted in the kind nodes. But when I create the above resource, the driver pod crashes by giving the error 'no such path'. I printed the contents of the root directory of the driver pod and I could not see the mounted volume. What is the problem here and how do I fix this?
The issue is related to permissions. When mounting a volume to a pod, you need to make sure that the permissions are set correctly. Specifically, you need to make sure that the user or group that is running the application in the pod has the correct permissions to access the data.You should also make sure that the path to the volume is valid, and that the volume is properly mounted.To check if a path exists, you can use the exec command:
kubectl exec <pod_name> -- ls
Try to add security context which gives privilege and access control settings for a Pod
For more information follow this document.

Running Terraform on Azure Container Instances (ACI)

I am trying to run some terraform code an Azure Container Instances. But I can't get it to actually start the container.
I am running it from cli, using a yml file. There is a main.tf with an hello world config
This is the yml
apiVersion: '2019-12-01'
location: ukwest
name: file-share-demo
properties:
containers:
- name: hellofiles
properties:
environmentVariables: []
command: [
terraform -chdir=/data/terraform/hello/ apply --auto-approve
]
image: hashicorp/terraform
resources:
requests:
cpu: 1.0
memoryInGB: 1.5
volumeMounts:
- mountPath: /data
name: filesharevolume
osType: Linux
restartPolicy: Never
volumes:
- name: filesharevolume
azureFile:
sharename: scripts
storageAccountName: 'name'
storageAccountKey: 'key'
tags: {}
type: Microsoft.ContainerInstance/containerGroups
I passed all kinds of commands but I can't get it to launch...
The command needed to get this working is
["/bin/sh", "-c", "/bin/terraform -chdir=/apps apply --auto-approve"]
if you try to make it into separate command options then it only runs the terraform part. I've no clue why, but that works!

Kubernetes - "Mount Volume Failed" when trying to deploy

I deployed my first container, I got info:
deployment.apps/frontarena-ads-deployment created
but then I saw my container creation is stuck in Waiting status.
Then I saw the logs using kubectl describe pod frontarena-ads-deployment-5b475667dd-gzmlp and saw MountVolume error which I cannot figure out why it is thrown:
Warning FailedMount 9m24s kubelet MountVolume.SetUp
failed for volume "ads-filesharevolume" : mount failed: exit status 32 Mounting command:
systemd-run Mounting arguments: --description=Kubernetes transient
mount for
/var/lib/kubelet/pods/85aa3bfa-341a-4da1-b3de-fb1979420028/volumes/kubernetes.io~azure-file/ads-filesharevolume
--scope -- mount -t cifs -o username=frontarenastorage,password=mypassword,file_mode=0777,dir_mode=0777,vers=3.0
//frontarenastorage.file.core.windows.net/azurecontainershare
/var/lib/kubelet/pods/85aa3bfa-341a-4da1-b3de-fb1979420028/volumes/kubernetes.io~azure-file/ads-filesharevolume
Output: Running scope as unit
run-rf54d5b5f84854777956ae0e25810bb94.scope. mount error(115):
Operation now in progress Refer to the mount.cifs(8) manual page (e.g.
man mount.cifs)
Before I run the deployment I created a secret in Azure, using the already created azure file share, which I referenced within the YAML.
$AKS_PERS_STORAGE_ACCOUNT_NAME="frontarenastorage"
$STORAGE_KEY="mypassword"
kubectl create secret generic fa-fileshare-secret --from-literal=azurestorageaccountname=$AKS_PERS_STORAGE_ACCOUNT_NAME --from-literal=azurestorageaccountkey=$STORAGE_KEY
In that file share I have folders and files which I need to mount and I reference azurecontainershare in YAML:
My YAML looks like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontarena-ads-deployment
labels:
app: frontarena-ads-deployment
spec:
replicas: 1
template:
metadata:
name: frontarena-ads-aks-test
labels:
app: frontarena-ads-aks-test
spec:
containers:
- name: frontarena-ads-aks-test
image: faselect-docker.dev/frontarena/ads:test1
imagePullPolicy: Always
ports:
- containerPort: 9000
volumeMounts:
- name: ads-filesharevolume
mountPath: /opt/front/arena/host
volumes:
- name: ads-filesharevolume
azureFile:
secretName: fa-fileshare-secret
shareName: azurecontainershare
readOnly: false
imagePullSecrets:
- name: fa-repo-secret
selector:
matchLabels:
app: frontarena-ads-aks-test
The Issue was because of the different Azure Regions in which AKS cluster and Azure File Share are deployed. If they are in the same Region you would not have this issue.

Copy file from cron job's pod to local directory in AKS

I have created a cron job which runs every 60 min. In the job's container I have mounted emptyDir volume as detailed-logs. In my container I am writing a csv file at path detailed-logs\logs.csv.
I am trying to copy this file from pod to local machine using kubectl cp podname:detailed-logs\logs.csv \k8slogs\logs.csv but it throws the error:
path "detailed-logs\logs.csv" not found (no such file or directory).
Once job runs successfully, pod created by job goes to completed state, is this can be a issue?
The file you are referring to is not going to persist once your pod completes running. What you can do is make a backup of the file when the cron job is running. The two solutions I can suggest are either attach a persistent volume to the job pod, or to upload the file somewhere while running the job.
USE A PERSISTENT VOLUME
Here you can create a PV through a quick readWriteOnce Persistent Volume Claim:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
Then you can mount it onto the pod using the following:
...
volumeMounts:
- name: persistent-storage
mountPath: /detailed-logs
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: my-pvc
...
UPLOAD FILE
The way I do it is run the job in a container that has aws-cli installed, and then store my file on AWS S3, you can choose another platform:
apiVersion: v1
kind: ConfigMap
metadata:
name: backup-sh
data:
backup.sh: |-
#!/bin/bash
aws s3 cp /myText.txt s3://bucketName/
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: s3-backup
spec:
schedule: "0 0 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: aws-kubectl
image: expert360/kubectl-awscli:v1.11.2
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: s3-creds
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: s3-creds
key: secret-access-key
command:
- /bin/sh
- -c
args: ["sh /backup.sh"]
volumeMounts:
- name: backup-sh
mountPath: /backup.sh
readOnly: true
subPath: backup.sh
volumes:
- name: backup-sh
configMap:
name: backup-sh
restartPolicy: Never

Kubernetes azureFile never showing up

Github Issue
I'm using Azure ACS with Kubernetes orchestrator with Windows agents.
But I keep running into an issue when I try to use azureFile volume, it never seems to find my share.
The volume remains unknown, and when trying to browse to the website it gives access denied:
But this is probably because the folder is empty.
I'll show you my .yaml file and storagestructure, i'm pretty sure my secret is correct, doublechecked it.
pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: azurepod
labels:
Volumes: ok
spec:
containers:
- image: XXXX
name: aspvolumes
volumeMounts:
- mountPath: C:\site
name: asp-website-volume
imagePullSecrets:
- name: crcatregistry
nodeSelector:
OS: windows
volumes:
- name: asp-website-volume
azureFile:
secretName: azure-secret
shareName: asptestsite
readOnly: false
k8s azure file mount on windows node is not ready yet, the code has been merged into v1.9, see https://github.com/Azure/kubernetes/pull/11, and this feature relies on a new Windows version which is not published yet.

Resources