Spark on K8's Issues loading jar - apache-spark

I am trying to run a sample spark application(provided in the spark examples jar) on kubernetes and trying to understand the behavior. In this process, I did the following,
Built a running kubernetes cluster with 3 nodes (1 master and 2 child) with adequate resources(10 cores, 64Gigs mem, 500GB disk). Note that I don't have internet access on my nodes.
Installed Spark distribution - spark-2.3.3-bin-hadoop2.7
As there is no internet access on the node, I preloaded a spark image( from gcr.io/cloud-solutions-images/spark:v2.3.0-gcs) into the docker on the node running kubernetes master
Running spark-submit to k8's as follows,
./bin/spark-submit --master k8s://https://test-k8:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs \
--conf spark.kubernetes.driver.pod.name=spark-pi-driver \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.3.jar
However, it fails with the below error,
Error: Could not find or load main class org.apache.spark.examples.SparkPi
In regards to the above I have below questions:
Do we need to provide Kubernetes a distribution of spark? and is that what we are doing with?
--conf spark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs
If I have my own spark example, for say processing events from Kafka. What should be my approach?
Any help in debugging the above Error and answering my follow up questions is thankful.

spark.kubernetes.container.image should be an image that has both the spark binaries & the application code. In my case, as I don't have access to the internet from my nodes. Doing the following let spark driver pick the correct jar.
So, this is what I did,
In my local computer, I did a docker build
docker build -t spark_pi_test:v1.0 -f kubernetes/dockerfiles/spark/Dockerfile .
Above built me a docker image in my local computer.
tar'd the built docker image,
docker save spark_pi_test:v1.0 > spark_pi_test_v1.0.tar
scp'd the tar ball to all 3 kube nodes.
docker load the tar ball on all 3 kube nodes.
docker load < spark_pi_test_v1.0.tar
Then I submitted the spark job as follows,
./bin/spark-submit --master k8s://https://test-k8:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=spark_pi_test:v1.0 --conf spark.kubernetes.driver.pod.name=spark-pi-driver --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent local:///opt/spark/examples/jars/spark-examples_2.11-2.3.3.jar 100000
The above jar path is the path in the docker container.
For reference to DockerFile,
https://github.com/apache/spark/blob/master/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile

Related

Apache Spark application is not seen in Spark Web UI (Java)

I am trying to run an ETL job using Apache Spark (Java) in Kubernetes cluster. The Application is running, and data is getting inserted into database (mysql). But, the application is not seen in Spark Web UI.
The command I used for submitting the application is:
./spark-submit --class com.xxxx.etl.EtlApplication \
--name MyETL \
--master k8s://XXXXXXXXXX.xxx.us-west-2.eks.amazonaws.com:443 \
--conf "spark.kubernetes.container.image=YYYYYY.yyy.ecr.us-west-2.amazonaws.com/spark-poc:32" \
--conf "spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://my-spark-master-headless.default.svc.cluster.local:7077" \
--conf "spark.kubernetes.authenticate.driver.serviceAccountName=my-spark" \
--conf "spark.kubernetes.driver.request.cores=256m" \
--conf "spark.kubernetes.driver.limit.cores=512m" \
--conf "spark.kubernetes.executor.request.cores=256m" \
--conf "spark.kubernetes.executor.limit.cores=512m" \
--deploy-mode cluster \
local:///opt/bitnami/spark/examples/jars/EtlApplication-with-dependencies.jar 1000
I use a jenkins job to build my code and move the jar to /opt/bitnami/spark/examples/jars folder in the container inside the cluster.
The job is seen running in the pod when I check with kubectl get pods, and is seen on taking localhost:4040 after mapping the port to localhost using kubectl port-forward pod/myetl-df26f5843cb88da7-driver 4040:4040
Tried the same spark-submit command with Spark example jar (which came along with Spark installation in the container):
./spark-submit --class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.container.image=YYYYYY.yyy.ecr.us-west-2.amazonaws.com/spark-poc:5" \
--master k8s://XXXXXXXXXX.xxx.us-west-2.eks.amazonaws.com:443 \
--conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://my-spark-master-headless.default.svc.cluster.local:7077" \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=my-spark \
--deploy-mode cluster \
local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.3.0.jar 1000
This time this application is getting listed in the Spark Web UI. I tried several options, and on removing the line --conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://my-spark-master-headless.default.svc.cluster.local:7077", the SparkPi example application is also not displayed in Spark Web UI.
Am I missing something? Do I need to change my java code to accept spark.kubernetes.driverEnv.SPARK_MASTER_URL? Tried several options buut nothing works.
Thanks in advance.

Submitting a spark job to a kubernetes cluster using bitnami spark docker image

I have a local setup with minikube and I'm trying to use spark-submit to submit a job to a local Kubernetes. The idea here is to use my local machine's spark-submit to submit to the kubernetes master which will handle creating a spark cluster and taking it down when the work is finished.
I'm using the image bitnami/spark:3.2.1 and the following command:
./bin/spark-submit --master k8s://https://127.0.0.1:52388 \
--deploy-mode cluster \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=bitnami/spark:3.2.1 \
--class org.apache.spark.examples.JavaSparkPi \
--name spark-pi \
local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.2.1.jar
This does not seem to work and the logs in the spark driver are:
[...]
Caused by: java.io.IOException: Failed to connect to spark-master:7077
[...]
and
[...]
Caused by: java.net.UnknownHostException: spark-master
[...]
If I use the docker-image-tool.sh to build a custom spark docker image with the python bindings and use that, it works perfectly. How is bitnami's image special and why doesn't it recognise that the master in this case is kubernetes?
I also tried using the option conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://127.0.0.1:7077 when submitting but the error was similar to above.

Spark executor pods quickly in error after their creation using kubernetes as master

When I launch the SparkPi example on a self-hosted kubernetes cluster, the executor pods are quickly created -> have an error status -> are deleted -> are replaced by new executors pods.
I tried the same command on a Google Kubernetes Engine with success. I check the RBAC rolebinding to make sure that the service account has right to create the pod.
Guessing when the next executor pod will be ready, I can see using kubectl describe pod <predicted_executor_pod_with_number> that the pod is actually created:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1s default-scheduler Successfully assigned default/examplepi-1563878435019-exec-145 to slave-node04
Normal Pulling 0s kubelet, slave-node04 Pulling image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
Normal Pulled 0s kubelet, slave-node04 Successfully pulled image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
Normal Created 0s kubelet, slave-node04 Created container executor
This is my spark-submit call:
/opt/spark/bin/spark-submit \
--master k8s://https://mycustomk8scluster:6443 \
--name examplepi \
--deploy-mode cluster \
--driver-memory 2G \
--executor-memory 2G \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/spark/work-dir/log4j.properties \
--conf spark.kubernetes.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
--conf spark.kubernetes.executor.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.driver.pod.name=pi-driver \
--conf spark.driver.allowMultipleContexts=true \
--conf spark.kubernetes.local.dirs.tmpfs=true \
--class com.olameter.sdi.imagery.IngestFromGrpc \
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar 100
I expect that the required executor (2) should be created. If the driver script cannot create it, I would at least expect some log to be able to diagnose the issue.
The issue was related to Hadoop + Spark integration. I was using Spark binary without Hadoop spark-2.4.3-bin-without-hadoop.tgz+ Hadoop 3.1.2. The configuration using environment variables seemed to be problematic for the Spark Executor.
I compiled Spark with Hadoop 3.1.2 to solve this issue. See: https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn.

Spark submit from windows vs. linux

I'm experiencing with Spark (2.3.0) over Kubernetes in the past few days.
I've tested the example SparkPi from both linux and windows machines and found the linux spark-submit to run ok and give my proper results (spoiler: Pi is roughly 3.1402157010785055)
while on windows spark fails with class path issues (Could not find or load main class org.apache.spark.examples.SparkPi)
I've noticed that when running spark-submit from linux the classpath looks like that:
-cp ':/opt/spark/jars/*:/var/spark-data/spark-jars/spark-examples_2.11-2.3.0.jar:/var/spark-data/spark-jars/spark-examples_2.11-2.3.0.jar'
While on windows, the logs show a bit different version:
-cp ':/opt/spark/jars/*:/var/spark-data/spark-jars/spark-examples_2.11-2.3.0.jar;/var/spark-data/spark-jars/spark-examples_2.11-2.3.0.jar'
Note the : vs. ; in the classpath which I think is the cause for this issue.
Suggestions how to spark-submit from windows machine without the classpath issue?
This is our spark-submit command:
bin/spark-submit \
--master k8s://https://xxx.xxx.xxx.xxx:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.driver.memory=1G \
--conf spark.driver.cores=1 \
--conf spark.executor.instances=5 \
--conf spark.executor.cores=1 \
--conf spark.executor.memory=500m \
--conf spark.kubernetes.container.image=spark:2.3.0 \
http://xxx.xxx.xxx.xxx:9080/spark-examples_2.11-2.3.0.jar
Thanks
As a workaround, you can overwrite the environment variable SPARK_MOUNTED_CLASSPATH in the script $SPARK_HOME/kubernetes/dockerfiles/spark/entrypoint.sh, such that the wrong semicolon is replaced by the correct colon.
Then you need to rebuild the docker image, e.g., with $SPARK_HOME/bin/docker-image-tool.sh. After that, spark-submit on Windows should work.
See also Spark issue tracker: https://issues.apache.org/jira/browse/SPARK-24599

Exception in thread "main" org.apache.spark.SparkException: Must specify the driver container image

I am trying to do spark-submit on minikube(Kubernetes) from local machine CLI with command
spark-submit --master k8s://https://127.0.0.1:8001 --name cfe2
--deploy-mode cluster --class com.yyy.Test --conf spark.executor.instances=2 --conf spark.kubernetes.container.image docker.io/anantpukale/spark_app:1.1 local://spark-0.0.1-SNAPSHOT.jar
I have a simple spark job jar built on verison 2.3.0. I also have containerized it in docker and minikube up and running on virtual box.
Below is exception stack:
Exception in thread "main" org.apache.spark.SparkException: Must specify the driver container image at org.apache.spark.deploy.k8s.submit.steps.BasicDriverConfigurationStep$$anonfun$3.apply(BasicDriverConfigurationStep.scala:51) at org.apache.spark.deploy.k8s.submit.steps.BasicDriverConfigurationStep$$anonfun$3.apply(BasicDriverConfigurationStep.scala:51) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.deploy.k8s.submit.steps.BasicDriverConfigurationStep.<init>(BasicDriverConfigurationStep.scala:51)
at org.apache.spark.deploy.k8s.submit.DriverConfigOrchestrator.getAllConfigurationSteps(DriverConfigOrchestrator.scala:82)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:229)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:227)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2585)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:227)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:192)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 2018-04-06 13:33:52 INFO ShutdownHookManager:54 - Shutdown hook called 2018-04-06 13:33:52 INFO ShutdownHookManager:54 - Deleting directory C:\Users\anant\AppData\Local\Temp\spark-6da93408-88cb-4fc7-a2de-18ed166c3c66
Look like bug with default value for parameters spark.kubernetes.driver.container.image, that must be spark.kubernetes.container.image. So try specify driver/executor container image directly:
spark.kubernetes.driver.container.image
spark.kubernetes.executor.container.image
From the source code, the only available conf options are:
spark.kubernetes.container.image
spark.kubernetes.driver.container.image
spark.kubernetes.executor.container.image
And I noticed that Spark 2.3.0 has changed a lot in terms of k8s implementation compared to 2.2.0. For example, instead of specifying driver and executor separately, the official starter's guide is to use a single image given to spark.kubernetes.container.image.
See if this works:
spark-submit \
--master k8s://http://127.0.0.1:8001 \
--name cfe2 \
--deploy-mode cluster \
--class com.oracle.Test \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=docker/anantpukale/spark_app:1.1 \
--conf spark.kubernetes.authenticate.submission.oauthToken=YOUR_TOKEN \
--conf spark.kubernetes.authenticate.submission.caCertFile=PATH_TO_YOUR_CERT \
local://spark-0.0.1-SNAPSHOT.jar
The token and cert can be found on k8s dashboard. Follow the instructions to make Spark 2.3.0 compatible docker images.

Resources