chmod error while writing outputs with Spark on Kubernetes - apache-spark

I'm working on a POC for getting a Spark cluster set up to use Kubernetes for resource management using AKS (Azure Kubernetes Service). I'm using spark-submit to submit pyspark applications to k8s in cluster mode and I've been successful in getting applications to run fine.
I got Azure file share set up to store application scripts and Persistent Volume and a Persistent Volume Claim pointing to this file share to allow Spark to access the scripts from Kubernetes. This works fine for applications that don't write any output, like the pi.py example given in the spark source code, but writing any kind of outputs fails in this setup. I tried running a script to get wordcounts and the line
wordCounts.saveAsTextFile(f"./output/counts")
causes an exception where wordCounts is an rdd.
Traceback (most recent call last):
File "/opt/spark/work-dir/wordcount2.py", line 14, in <module>
wordCounts.saveAsTextFile(f"./output/counts")
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1570, in saveAsTextFile
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o65.saveAsTextFile.
: ExitCodeException exitCode=1: chmod: changing permissions of '/opt/spark/work-dir/output/counts': Operation not permitted
The directory "counts" has been created from the spark application just fine, so it seems like it has required permissions, but this subsequent chmod that spark tries to perform internally fails. I haven't been able to figure out the cause and what exact configuration I'm missing in my commands that's causing this. Any help would be greatly appreciated.
The kubectl version I'm using is
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"881d4a5a3c0f4036c714cfb601b377c4c72de543", GitTreeState:"clean", BuildDate:"2021-10-21T05:13:01Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
The spark version is 2.4.5 and the command I'm using is
<SPARK_PATH>/bin/spark-submit --master k8s://<HOST>:443 \
--deploy-mode cluster \
--name spark-pi3 \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=docker.io/datamechanics/spark:2.4.5-hadoop-3.1.0-java-8-scala-2.11-python-3.7-dm14 \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
--verbose /opt/spark/work-dir/wordcount2.py
The PV and PVC are pretty basic. The PV yml is:
apiVersion: v1
kind: PersistentVolume
metadata:
name: azure-fileshare-pv
labels:
usage: azure-fileshare-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
azureFile:
secretName: azure-storage-secret
shareName: dssparktestfs
readOnly: false
secretNamespace: spark-operator
The PVC yml is:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: azure-fileshare-pvc
# Set this annotation to NOT let Kubernetes automatically create
# a persistent volume for this volume claim.
annotations:
volume.beta.kubernetes.io/storage-class: ""
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
selector:
# To make sure we match the claim with the exact volume, match the label
matchLabels:
usage: azure-fileshare-pv
Let me know if more info is needed.

The owner and user are root.
It looks like you've mounted your volume as root. Your problem:
chmod: changing permissions of '/opt/spark/work-dir/output/counts': Operation not permitted
is due to the fact that you are trying to change the permissions of a file that you are not the owner of. So you need to change the owner of the file first.
The easiest solution is to chown on the resource you want to access. However, this is often not feasible as it can lead to privilege escalation as well as the image itself may block this possibility. In this case you can create security context.
A security context defines privilege and access control settings for a Pod or Container. Security context settings include, but are not limited to:
Discretionary Access Control: Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).
Security Enhanced Linux (SELinux): Objects are assigned security labels.
Running as privileged or unprivileged.
Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.
>
AppArmor: Use program profiles to restrict the capabilities of individual programs.
Seccomp: Filter a process's system calls.
AllowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process. This bool directly controls whether the no_new_privs flag gets set on the container process. AllowPrivilegeEscalation is true always when the container is: 1) run as Privileged OR 2) has CAP_SYS_ADMIN.
readOnlyRootFilesystem: Mounts the container's root filesystem as read-only.
The above bullets are not a complete set of security context settings -- please see SecurityContext for a comprehensive list.
For more information about security mechanisms in Linux, see Overview of Linux Kernel Security Features
You can Configure volume permission and ownership change policy for Pods.
By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted. For large volumes, checking and changing ownership and permissions can take a lot of time, slowing Pod startup. You can use the fsGroupChangePolicy field inside a securityContext to control the way that Kubernetes checks and manages ownership and permissions for a volume.
Here is an example:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
fsGroupChangePolicy: "OnRootMismatch"
See also this similar question.

Related

Node container unable to locate Hashicorp Vault secrets file on startup on AWS EKS 1.24

We have a small collection of Kubernetes pods which run react/next.js UIs in a node 16 alpine container (node:16.18.1-alpine3.15 to be precise). All of this runs in AWS EKS 1.23. We make use of annotations on these pods in order to inject secrets from Hashicorp Vault at start up. The annotations pull the desired secrets from Vault and write these to a file on the pod. Example of said annotations below :
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/agent-init-first: "true"
vault.hashicorp.com/agent-pre-populate-only: "true"
vault.hashicorp.com/role: "onejourney-ui"
vault.hashicorp.com/agent-inject-secret-config: "secret/data/onejourney-ui"
vault.hashicorp.com/agent-inject-template-config: |
{{- with secret "secret/data/onejourney-ui" -}}
export AUTH0_CLIENT_ID="{{ .Data.data.auth0_client_id }}"
export SENTRY_DSN="{{ .Data.data.sentry_admin_dsn }}"
{{- end }}
When the pod starts up, we source this file (which is created by default at /vault/secrets/config) to set environment variables and then delete the file. We do that with the following pod arguments in our helm chart :
node:
args:
- /bin/sh
- -c
- source /vault/secrets/config; rm -rf /vault/secrets/config; yarn start-admin;
We recently upgraded some of AWS EKS clusters from 1.23 to 1.24. After doing so, we noted that our node applications were failing to start and entering a crash loop. Looking in the logs of these containers, the problem seemed to be that the pod was unable to locate the secrets file anymore.
Interestingly, the Vault init container completed successfully and shows that the file was successfully created...
Out of curiosity, I removed the node args to source the file which allowed the container to start successfully, but I found when execing into the pod, the file WAS infact present and had the content I was expecting. The file also had the correct owner and permissions as we see in a good working instance in EKS 1.23.
We have other containers (php-fpm) which consume secrets in the same manner however these were not affected on 1.24, only node containers were affected. There were no namespace, pod or deployment annotations I saw added which would have been a possible cause. After rolling the cluster back down to EKS 1.23, the deployment worked as expected.
I'm left scratching my head as to why the pod is unable to source that file on 1.24. Any suggestions on what to check or a possible cause would be greatly appreciated.

Kubernetes: /dev/vfio/0: no such file or directory

I am trying to deploy my kubernetes cluster but struck with ContainerCreatingError on one particular pod.
When I do describe on the pod, I found the following error.
spec: failed to generate spec: lstat /dev/vfio/0: no such file or directory
Followed this reddit post (bound to vfio-pci module), but didn't work as expected. Any help here is much appreciated.

Why does an anonymous httptrigger azure function throw a 500 internal server error when 'code' is a param in query string?

I have a Function App that is running in a container in Kubernetes. One of my endpoints is an httptrigger with anonymous access. However the query string contains a parameter code (supplied by a 3rd party vendor with no control over its name) that causes the app to throw a 500 error with no log indicating what happened. The odd part is if I deploy the same function to an Azure Function App everything works as expected. So my question is what configuration or environment variables need to be set in order for this to behave correctly?
Related to this as a follow up question - Azure Function running in AKS throws 500 on query string parameter for http trigger function
The issue turned out that the runtime tries to write files to the azure-functions-host/Secrets directory for anonymous functions where code is a parameter in the query string. Due to the way Kubernetes mounts volumes for secrets when it creates the directory it sets the permissions in a read only fasion even if readonly is false.
As a work-around I ended up creating the directory in the docker file
# To enable ssh & remote debugging on app service change the base image to the one below
# FROM mcr.microsoft.com/azure-functions/dotnet:3.0-appservice
FROM mcr.microsoft.com/azure-functions/dotnet:3.0
ENV AzureWebJobsScriptRoot=/home/site/wwwroot \
AzureFunctionsJobHost__Logging__Console__IsEnabled=true \
FUNCTIONS_WORKER_RUNTIME=dotnet
EXPOSE 80 443
RUN mkdir azure-functions-host/Secrets
COPY . /home/site/wwwroot
In the kubernetes deployment file I mounted the specific file to that directory so that the mount action did not mess with the directory permissions.
volumeMounts:
- name: functionhostkeys-store
mountPath: "/azure-functions-host/Secrets/host.json"
subPath: "host.json"
readOnly: false
This approach allowed the runtime to still write to that directory as needed but allowed me to manage my function keys in Azure KeyVault and mount them at runtime in a known configuration.

Dapr -VaulttokenMountpath Issue

I am trying to execute the Dapr -Secret management using Vault in k8s env.
https://github.com/dapr/quickstarts/tree/master/secretstore
Applied the following component Yaml for vault .
Component yaml:
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: vault
spec:
type: secretstores.hashicorp.vault
version: v1
metadata:
name: vaultAddr
value: vault:8270 # Optional. Default: "https://127.0.0.1:8200"
name: skipVerify # Optional. Default: false
value : true
name: vaultTokenMountPath # Required. Path to token file.
value : root/tmp/
Token file is created under root/tmp path and tried to execute the service. I am getting the following errors.
Permission denied error. (even though I have given all the read/write permissions.)
I tried applying permission to the file not able to access. Can anyone please provide solution.
Your YAML did not format well but it looks like your value for vaultTokenMountPath is incomplete. It needs to point to the file not just the folder root/tmp/. I created a file called vault.txt and copied my root token into it. So my path would be root/tmp/vault.txt in your case.
I was able to make it work in WSL2 by pointing to a file (/tmp/token in my case).
I was unable to make it work in kubernetes as I did not find any way to inject file in the DAPR sidecar, opened issue on github for this: https://github.com/dapr/components-contrib/issues/794

How can I remove or ignore unwanted .snapshot in mounted volume?

I am running a kubernetes cluster with NFS NAStorage, and when I mount volumes they get a .snapshot directory created at the mountpoint. This causes problems for example when using Helm Charts, as these don't expect an unknown Read Only directory in certain paths (e.g. chown ... <dir> can fail, crashing the container).
When installing the Graylog Helm Chart, I noticed the initContainer for the graylog pod crashing due to chown: ... Read-only file system after running the following chown line:
chown -R 1100:1100 /usr/share/graylog/data/
where the following volume is mounted:
...
volumeMounts:
- mountPath: /usr/share/graylog/data/journal
name: journal
...
I tried working around this by modifying the command to fail "quietly" by making it run : upon failure:
chown -fR 1100:1100 /usr/share/graylog/data/ || :
This made the initContainer succeed, but now the main container crashes instead, this time due to the mere presence of the .snapshot dir.
...
kafka.common.KafkaException: Found directory /usr/share/graylog/data/journal/.snapshot, '.snapshot' is not in the form of topic-partition
If a directory does not contain Kafka topic data it should not exist in Kafka's log directory
...
I have tried modifying the mount point of the volume, too, moving it up one level, but this causes new issues:
...
volumeMounts:
- mountPath: /usr/share/graylog/data
name: data-journal
...
com.github.joschi.jadconfig.ValidationException: Parent path /usr/share/graylog/data/journal for Node ID file at /usr/share/graylog/data/journal/node-id is not a directory
I would have expected there to be some way of disabling the creation of the .snapshot directory, ideally a way to unmount/never mount it in the first place. That, or any way to have the container properly ignore the directory entirely, to make it not interfere with the processes in the container, since it seems the very presence of it can seriously disrupt. However, I have yet to find anything of the sort, and I can't seem to find anyone having had a similar issue (the introduction of Volume Snapshots in kubernetes has not made the searching easier, to say the least).
Edit 1
I tried (semi successfully, I get the Parent path ... is not a directoryerror above) to implement subPath: journal, thus circumventing the .snapshot directory (or so I believe), but this still means potentially editing every Chart that is used in my cluster. Hopefully an alternative on a higher level can be found.
volumeMounts:
- mountPath: /usr/share/graylog/data/journal
name: data-journal
subPath: journal
Edit 2
I am running a bare-metal cluster, with MetalLB and Nginx as loadbalancer+ingress controller.
The storage solution is provided by a third party provider, and it is from their backup solution that the .snapshot directory is made.
My imagined workaround
Since this will mainly be a problem when using Helm Charts or other deployments where volume mounts will be more or less out of our control, I will look into applying a "kustomization" that adds a single line to each volumeMount, adding
...
subPath: mount
or something like that. By doing that, I should be separating the actual mount point in the volume and the directory that actually gets mounted in the container by one level, keepin the .snapshot directory hidden in the abstract volume object. I will post my findings and potential kustomization that may come of it, if anyone else runs into a similar problem.
If someone thinks of a more streamlined solution, it is still very welcome - I'm sure it is possible to improve upon this one.
We finally got this fixed by the storage service provider, after them figuring out which configuration needed to be applied. If anyone has run into the same problem and needs to know which configuration, please reach out and I will ask our service provider.
The workaround that worked before we got the configuration fixed was as follows:
(Including --namespace is optional)
Install mongodb-replicaset and elasticsearch (v 6.8.1)
$ helm install --name mongodb-replicaset --namespace graylog stable/mongodb-replicaset
# We add the elastic repo since the 'stable' repo will be deprecated further on
$ helm repo add elastic https://helm.elastic.co
# We run elasticsearch version 6.8.1 since Graylog v3 currently is incompatible with later versions.
$ helm install elastic/elasticsearch --name elasticsearch --namespace graylog --set imageTag=6.8.1
# Wait for deployments to complete, then you can test to see all went well
$ helm test mongodb-replicaset
$ helm test elasticsearch
Extraxt Graylog deployment template
$ helm fetch --untar stable/graylog
$ helm dependency update graylog
$ helm template graylog -n graylog -f graylog-values.yaml > graylog-template.yaml
#graylog-values.yaml
tags:
install-mongodb: false
install-elasticsearch: false
graylog:
mongodb:
uri: "mongodb://mongodb-replicaset-0.mongodb-replicaset.graylog.svc.cluster.local:27017/graylog?replicaSet=rs0"
elasticsearch:
hosts: "http://elasticsearch-client.graylog.svc.cluster.local:9200"
# + any further values
Add namespace: graylog to all objects in graylog-template.yaml
Add subPath: mount to all volumeMounts where a persistent volume is used (in this case name: journal) in graylog-template.yaml
...
volumeMounts:
- mountPath: /usr/share/graylog/data/journal
name: journal
+ subPath: mount
...
volumeMounts:
- mountPath: /usr/share/graylog/data/journal
name: journal
+ subPath: mount
...
volumeClaimTemplates:
- metadata:
creationTimestamp: null
name: journal
This can be done quickly in vim by typing :g/name: <volume-name>/norm osubPath: mount. Please note the lack of a space between "o" and "subPath", and note that this will add the line to the volumeClaimTemplate as well, which needs to be removed. "mount" can also be called something else.
Deploy
$ kubectl apply -f graylog-template.yaml

Resources