Spark job in Kubernetes stuck in RUNNING state - apache-spark

I'm submitting Spark jobs in Kubernetes running locally (Docker desktop). I'm able to submit the jobs and see their final output in the screen.
However, even if they're completed, the driver and executor pods are still in a RUNNING state.
The base images used to submit the Spark jobs to kubernetes are the ones that come with Spark, as described in the docs.
This is what my spark-submit command looks like:
~/spark-2.4.3-bin-hadoop2.7/bin/spark-submit \
--master k8s://https://kubernetes.docker.internal:6443 \
--deploy-mode cluster \
--name my-spark-job \
--conf spark.kubernetes.container.image=my-spark-job \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.submission.waitAppCompletion=false \
local:///opt/spark/work-dir/my-spark-job.py
And this is what kubectl get pods returns:
NAME READY STATUS RESTARTS AGE
my-spark-job-1568669908677-driver 1/1 Running 0 11m
my-spark-job-1568669908677-exec-1 1/1 Running 0 10m
my-spark-job-1568669908677-exec-2 1/1 Running 0 10m

Figured it out. I forgot to stop the Spark Context. My script looks like this now, and at completion, the driver goes into Completed status and the drivers get deleted.
sc = SparkContext()
sqlContext = SQLContext(sc)
# code
sc.stop()

Related

Submitting a spark job to a kubernetes cluster using bitnami spark docker image

I have a local setup with minikube and I'm trying to use spark-submit to submit a job to a local Kubernetes. The idea here is to use my local machine's spark-submit to submit to the kubernetes master which will handle creating a spark cluster and taking it down when the work is finished.
I'm using the image bitnami/spark:3.2.1 and the following command:
./bin/spark-submit --master k8s://https://127.0.0.1:52388 \
--deploy-mode cluster \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=bitnami/spark:3.2.1 \
--class org.apache.spark.examples.JavaSparkPi \
--name spark-pi \
local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.2.1.jar
This does not seem to work and the logs in the spark driver are:
[...]
Caused by: java.io.IOException: Failed to connect to spark-master:7077
[...]
and
[...]
Caused by: java.net.UnknownHostException: spark-master
[...]
If I use the docker-image-tool.sh to build a custom spark docker image with the python bindings and use that, it works perfectly. How is bitnami's image special and why doesn't it recognise that the master in this case is kubernetes?
I also tried using the option conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://127.0.0.1:7077 when submitting but the error was similar to above.

Dataproc cluster runs a maximum of 5 jobs in parallel, ignoring available resources

I am loading data from 1200 MS SQL Server tables into BigQuery with a spark job. It's all part of an orchestrated ETL process where the spark job consists of scala code which receives messages from PubSub. So 1200 messages are being received over a period of around an hour. Each message triggers code (async) which reads data from a table, with minor transformations, and writes to BigQuery. The process itself works fine. My problem is that the number of active jobs in spark never goes above 5, in spite of a lot of "jobs" waiting and plenty of resources being available.
I've tried upping the spark.driver.cores to 30, but there is no change. Also, this setting, while visible in the Google Console, doesn't seem to make it through to the actual spark job (when viewed in the spark UI). Here is the spark job running in the console:
And here are the spark job properties:
It's a pretty big cluster, with plenty of resources to spare:
Here is the command line for creating the cluster:
gcloud dataproc clusters create odsengine-cluster \
--properties dataproc:dataproc.conscrypt.provider.enable=false,spark:spark.executor.userClassPathFirst=true,spark:spark.driver.userClassPathFirst=true \
--project=xxx \
--region europe-north1 \
--zone europe-north1-a \
--subnet xxx \
--master-machine-type n1-standard-4 \
--worker-machine-type m1-ultramem-40 \
--master-boot-disk-size 30GB \
--worker-boot-disk-size 2000GB \
--image-version 1.4 \
--master-boot-disk-type=pd-ssd \
--worker-boot-disk-type=pd-ssd \
--num-workers=2 \
--scopes cloud-platform \
--initialization-actions gs://xxx/cluster_init/init_actions.sh
And the command line for submitting the spark job:
gcloud dataproc jobs submit spark \
--project=velliv-dwh-development \
--cluster odsengine-cluster \
--region europe-north1 \
--jars gs://velliv-dwh-dev-bu-dcaods/OdsEngine_2.11-0.1.jar \
--class Main \
--properties \
spark.executor.memory=35g,\
spark.executor.cores=2,\
spark.executor.memoryOverhead=2g,\
spark.dynamicAllocation.enabled=true,\
spark.shuffle.service.enabled=true,\
spark.driver.cores=30\
-- yarn
I am aware that I could look into using partitioning to spread out the load of large individual tables, and I've also had that working in another scenario with success, but in this case I just want to load many tables at once without partitioning each table.
Regarding "a lot of jobs waiting and plenty of resources being available", I'd suggest you check Spark log, YARN web UI and log to see if there are pending applications and why. It also helps to check the cluster web UI's monitoring tab for YARN resource utilization.
Regarding the spark.driver.cores problem, it is effective only in cluster mode, see this doc:
Number of cores to use for the driver process, only in cluster mode
Spark driver runs in client mode by default in Dataproc, which means the driver runs on the master node outside of YARN. You can run the driver in cluster mode as a YARN container with property spark.submit.deployMode=cluster.

Spark executor pods quickly in error after their creation using kubernetes as master

When I launch the SparkPi example on a self-hosted kubernetes cluster, the executor pods are quickly created -> have an error status -> are deleted -> are replaced by new executors pods.
I tried the same command on a Google Kubernetes Engine with success. I check the RBAC rolebinding to make sure that the service account has right to create the pod.
Guessing when the next executor pod will be ready, I can see using kubectl describe pod <predicted_executor_pod_with_number> that the pod is actually created:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1s default-scheduler Successfully assigned default/examplepi-1563878435019-exec-145 to slave-node04
Normal Pulling 0s kubelet, slave-node04 Pulling image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
Normal Pulled 0s kubelet, slave-node04 Successfully pulled image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
Normal Created 0s kubelet, slave-node04 Created container executor
This is my spark-submit call:
/opt/spark/bin/spark-submit \
--master k8s://https://mycustomk8scluster:6443 \
--name examplepi \
--deploy-mode cluster \
--driver-memory 2G \
--executor-memory 2G \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/spark/work-dir/log4j.properties \
--conf spark.kubernetes.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
--conf spark.kubernetes.executor.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.driver.pod.name=pi-driver \
--conf spark.driver.allowMultipleContexts=true \
--conf spark.kubernetes.local.dirs.tmpfs=true \
--class com.olameter.sdi.imagery.IngestFromGrpc \
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar 100
I expect that the required executor (2) should be created. If the driver script cannot create it, I would at least expect some log to be able to diagnose the issue.
The issue was related to Hadoop + Spark integration. I was using Spark binary without Hadoop spark-2.4.3-bin-without-hadoop.tgz+ Hadoop 3.1.2. The configuration using environment variables seemed to be problematic for the Spark Executor.
I compiled Spark with Hadoop 3.1.2 to solve this issue. See: https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn.

What is happening when starting a Spark application on Kubernetes

I read this: Running Spark on Kubernetes.
I want to know more details about the interaction between Kubernetes Controller/Scheduler and Spark runtime when launching a Spark job on K8s.
Specially, assuming we launch an Spark app by :
bin/spark-submit \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--..............
My question is: the K8s may not be able to allocate 5 executors (or called containers/pods) immediately due to unavailability of cluster resources at the moment the Spark app is launched. Which way does Spark app take? (1) Spark starts running tasks as soon as possible when there is at least one executor is allocated. (2) Spark won't launch any tasks until all of the 5 executors have been allocated.
If you know Hadoop YARN, it would be great if you could also answer the question in the scenario of running Spark app on Hadoop YARN(DynamicAllocation Disabled) and point out the difference.

spark executors on mesos are unbalanced when setting role quota

I am going to run spark job on mesos, and I want to limit resource of specified role.
I try to run 5 executors in my job
spark-shell \
--master mesos://zk://host1:2181,host2:2181,host3:2181/mesos \
--conf spark.executor.cores=1 \
--conf spark.cores.max=5 \
--conf spark.mesos.role=myrole
It works well that I can get many resource offers to distribute executors when quota setting is disable.
18/01/25 13:35:49 DEBUG MesosCoarseGrainedSchedulerBackend: Received 4 resource offers.
If I enable quota setting(http://mesos.apache.org/documentation/latest/quota/), then I always get only 1 resource offer.
18/01/25 13:36:31 DEBUG MesosCoarseGrainedSchedulerBackend: Received 1 resource offers.
I have no idea what happened there
My environment:
spark 2.2
mesos 1.4.1 (master*3, slave*5)
CentOS 7.3

Resources