Some spark-submit config options not reflected in k8s pod - apache-spark

I'm using spark-submit to create a spark driver pod on my k8s cluster. When I run
bin/spark-submit
--master k8s://https://my-cluster-url:443
--deploy-mode cluster
--name spark-test
--class com.my.main.Class
--conf spark.executor.instances=3
--conf spark.kubernetes.allocation.batch.size=3
--conf spark.kubernetes.namespace=my-namespace
--conf spark.kubernetes.container.image.pullSecrets=my-cr-secret
--conf spark.kubernetes.container.image.pullPolicy=Always
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.my-pvc.mount.path=/var/service/src/main/resources/
--conf spark.kubernetes.container.image=my-registry.io/spark-test:test-3.0.0
local:///var/service/my-service-6.3.0-RELEASE.jar
spark-submit successfully creates a pod in my k8s cluster. However, many of the config options I specified are not seen. For example, the pod does not have a volume mounted at /var/service/src/main/resources/ despite the existence of a persistentVolumeClaim on the cluster called my-pvc. Further, the pod has not been given the specified image pull secret my-cr-secret, causing an ImagePullBackOff error. On the other hand, the pod is properly created in the my-namespace namespace and the pull policy Always.
I have attempted this using spark 3.0.0 and 2.4.5
Why are some config options not reflected in the pod created on my cluster?

Figured out the issue:
I currently have spark 2.3.1 installed locally and the variable SPARK_HOME points to /usr/local/spark. For this current project I downloaded a distribution of spark 2.4.5. I was in the 2.4.5 directory and running bin/spark-submit, which should have (as far as I can tell) pointed to the spark-submit bundled in 2.4.5. However, running bin/spark-submit --version revealed that the version being run was 2.3.1. The configurations that were being ignored in my question above were not available in 2.3.1.
Simply changing SPARK_HOME to the new directory fixed the issue

Related

Standard way to store/upload application jar on Spark cluster on Kubernetes

I have a Spark based Kubernetes cluster where I am using spark-submit to submit the jobs on cluster as needed.
e.g.
spark-submit \
--master spark://my-spark-master-svc:7077 \
--class com.Main \
examples/jars/my-spark-application.jar
Here I have uploaded file my-spark-application.jar using kubectl cp under the directory examples/jars on the master Pod/container before I run the spark-submit command.
Another option could be by mounting a Volume on the cluster and share the jar on the volume that way.
What is the typical way to share the application jar with the spark cluster while using spark-submit on Kubernetes?

Why am I not able to run sparkPi example on a Kubernetes (K8s) cluster?

I have a K8s cluster up and running, on VMs inside VMWare Workstation, as of now. I'm trying to deploy a Spark application natively using the official documentation from here. However, I also landed on this article which made it clearer, I felt.
Now, earlier my setup was running inside nested VMs, basically my machine is on Win10 and I had an Ubuntu VM inside which I had 3 more VMs running for the cluster (not the best idea, I know).
When I tried to run my setup by following the article mentioned, I first created a service account inside the cluster called spark, then created a clusterrolebinding called spark-role, gave edit as the clusterrole and assigned it to the spark service account so that Spark driver pod has sufficient permissions.
I then try to run the example SparkPi job using this command line:
bin/spark-submit \
--master k8s://https://<k8-cluster-ip>:<k8-cluster-port> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=kmaster:5000/spark:latest \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 100
And it fails within a few seconds after it has created the driver-pod, it goes into Running state and after like 3 seconds goes into Error state.
On giving the command kubectl logs spark-pi-driver this is the log I get.
The second Caused by: is always either as mentioned above i.e:
Caused by: java.net.SocketException: Broken pipe (Write failed) or,
Caused by: okhttp3.internal.http2.ConnectionShutdownException
Log #2 for reference.
After running into dead-ends with this, I tried giving --deploy-mode client to see if it makes a difference and get more verbose logs. You can read the difference between client and cluster mode from here.
On deploying the job as client mode it still fails, however, now I see that each time the driver pod (now running not as a pod but as a process on the local machine) tries to create an executor pod, it goes into a loop infinitely trying to create an executor pod with a count-number appended to the pod name, as the last one goes into a terminated state. Also, now I can see the Spark UI on the 4040 port but the job doesn't move forward as it's stuck on trying to create even a single executor pod.
I get this log.
To me, this makes it pretty apparent that it's a resource crunch maybe?
So to be sure, I delete the nested VMs and setup 2 new VMs on my main machine and connect them using a NAT network and setup the same K8s cluster.
But now when I try to do the exact same thing it fails with the same error (Broken Pipe/ShutdownException), except now it tells me that it fails even at creating a driver-pod.
This is the log for reference.
Now I can't even fetch logs as to why it fails, because it's never even created.
I've broken my head over this and can't figure out why it's failing. Now, I tried out a lot of things to rule them out but so far nothing has worked except one (which is a completely different solution).
I tried the spark-on-k8-operator from GCP from here and it worked for me. I wasn't able to see the Spark UI as it runs briefly but it prints the Pi value in the shell window, so I know it works.
I'm guessing, that even this spark-on-k8s-operator 'internally' does the same thing but I really need to be able to deploy it natively, or at least know why it fails.
Any help here will be appreciated (I know it's a long post). Thank you.
Make sure the kubernetes version that you are deploying is compatible with the Spark version that you are using.
Apache Spark uses the Kubernetes Client library to communicate with the kubernetes cluster.
As per today the latest LTS Spark version is 2.4.5 which includes the kubernetes client version 4.6.3.
Checking the compatibility matrix of Kubernetes Client: here
The supported kubernetes versions go all the way up to v1.17.0.
Based on my personal experience Apache Spark 2.4.5 works well with kubernetes version v1.15.3. I have had problems with more recent versions.
When a not supported kubernetes version is used, the logs to get are as the ones you are describing:
Caused by: java.net.SocketException: Broken pipe (Write failed) or,
Caused by: okhttp3.internal.http2.ConnectionShutdownException
Faced exact same issue with v1.18.0, downgrading the version to v1.15.3 made it work
minikube start --cpus=4 --memory=4048 --kubernetes-version v1.15.3
Spark on K8s operator example uses a Spark image (from gcr.io) that works. You can find the image tag in spark-on-k8s-operator/examples/spark-pi.yaml
spec:
...
image: "gcr.io/spark-operator/spark:v2.4.5"
...
I tried to replace the image config in the bin/spark-submit and it worked for me.
bin/spark-submit \
--master k8s://https://192.168.99.100:8443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.driver.cores=1 \
--conf spark.driver.memory=512m \
--conf spark.executor.instances=2 \
--conf spark.executor.memory=512m \
--conf spark.executor.cores=1 \
--conf spark.kubernetes.container.image=gcr.io/spark-operator/spark:v2.4.5 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar

Spark-submit failing to resolve --package dependency when behind a HTTP proxy

Below is my spark-submit command
/usr/bin/spark-submit \
--class "<class_name>" \
--master yarn \
--queue default \
--deploy-mode cluster \
--conf "spark.driver.extraJavaOptions=-DENVIRONMENT=pt -Dhttp.proxyHost=<proxy_ip> -Dhttp.proxyPort=8080 -Dhttps.proxyHost=<proxy_ip> -Dhttps.proxyPort=8080" \
--conf "spark.executor.extraJavaOptions=-DENVIRONMENT=pt -Dhttp.proxyHost=<proxy_ip> -Dhttp.proxyPort=8080 -Dhttps.proxyHost=<proxy_ip> -Dhttps.proxyPort=8080" \
--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0 \
--driver-memory 3G \
--executor-memory 4G \
--num-executors 2 \
--executor-cores 3 <jar_file>
The spark-submit command timesout on resolving the package dependency
Replacing --packages with --jar works but I would like to get to the bottom of why --packages is not working for me. Also for http.proxyHost and https.proxyHost I specify only the ip address without http:// or https://?
Edit
Please note the following
The machine I am deploying from and the spark cluster is behind http proxy
I know what the difference between --jars and --packages is. I want to get the --packages option to work in my case.
I have tested the http proxy settings for my machine. I can reach out to the internet from my machine. I can do a curl. For some reason it feels like spark-submit is not picking up the http proxy setting
The difference between --packages and --jar in a nutshell, is that --packages use maven to resolve the packages you have provided and --jars is a list of jars to be included in the classpath which means you have to make sure those jars are also available in the executor nodes while with --packages you should also ensure you have maven installed and working in every node
More detailed info can be found on spark-submit help
--jars JARS Comma-separated list of jars to include on the driver and executor classpaths.
--packages Comma-separated list of maven coordinates of jars to include
on the driver and executor classpaths. Will search the local
maven repo, then maven central and any additional remote
repositories given by --repositories. The format for the
coordinates should be groupId:artifactId:version.

Deploy Spark into Kubernetes Cluster

I'm newbie in Kubernetes & Spark Environment.
I'm requested to deploy Spark inside Kubernetes so that it's can be auto Horizontal Scalling.
The problem is, I can't deploy SparkPi example from official website(https://spark.apache.org/docs/latest/running-on-kubernetes#cluster-mode).
I've already follow the instruction, but the pods failed to execute.
Here is the explanation :
Already run : Kubectl proxy
When execute :
spark-submit --master k8s://https://localhost:6445 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=xnuxer88/spark-kubernetes-bash-test-entry:v1 local:///opt/spark/examples/jars/spark-examples_2.11-2.3.2.jar
Get Error :
Error: Could not find or load main class org.apache.spark.examples.SparkPi
When I check the docker image (create the container from related image), I found the file.
Is there any missing instruction that I forgot to follow?
Please Help.
Thank You.

Using a k8s cluster as spark cluster manager on Spark 2.3.0

I was trying to submit a example job to k8s cluster from binary release of spark 2.3.0, the submit command is shown below. However, I have met an wrong master error all the time. I am really sure my k8s cluster is working fine.
bin/spark-submit \
--master k8s://https://<k8s-master-ip> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=3 \
--conf spark.kubernetes.container.image= <image-built-from-dockerfile> \
--conf spark.kubernetes.driver.pod.name=spark-pi-driver \
local:///opt/examples/jars/spark-examples_2.11-2.3.0.jar
and the error comes out
Error: Master must either be yarn or start with spark, mesos, local
and this is the output of kubectl cluster-info
Kubernetes master is running at https://192.168.0.10:6443
KubeDNS is running at https://192.168.0.10:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
because i am not good at English. so maybe have some wrong grammar. but i will do my best to responds your question. my resolve method is check your $SPARK_HOME and change to your "apache-spark-on-k8s" file path.because spark-submit is default use "${SPARK_HOME}" to run your command.maybe you have two spark environment in the same machine just like me. so command always use your original spark. hope this answer will help you.

Resources