I have a local setup with minikube and I'm trying to use spark-submit to submit a job to a local Kubernetes. The idea here is to use my local machine's spark-submit to submit to the kubernetes master which will handle creating a spark cluster and taking it down when the work is finished.
I'm using the image bitnami/spark:3.2.1 and the following command:
./bin/spark-submit --master k8s://https://127.0.0.1:52388 \
--deploy-mode cluster \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=bitnami/spark:3.2.1 \
--class org.apache.spark.examples.JavaSparkPi \
--name spark-pi \
local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.2.1.jar
This does not seem to work and the logs in the spark driver are:
[...]
Caused by: java.io.IOException: Failed to connect to spark-master:7077
[...]
and
[...]
Caused by: java.net.UnknownHostException: spark-master
[...]
If I use the docker-image-tool.sh to build a custom spark docker image with the python bindings and use that, it works perfectly. How is bitnami's image special and why doesn't it recognise that the master in this case is kubernetes?
I also tried using the option conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://127.0.0.1:7077 when submitting but the error was similar to above.
When I launch the SparkPi example on a self-hosted kubernetes cluster, the executor pods are quickly created -> have an error status -> are deleted -> are replaced by new executors pods.
I tried the same command on a Google Kubernetes Engine with success. I check the RBAC rolebinding to make sure that the service account has right to create the pod.
Guessing when the next executor pod will be ready, I can see using kubectl describe pod <predicted_executor_pod_with_number> that the pod is actually created:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1s default-scheduler Successfully assigned default/examplepi-1563878435019-exec-145 to slave-node04
Normal Pulling 0s kubelet, slave-node04 Pulling image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
Normal Pulled 0s kubelet, slave-node04 Successfully pulled image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
Normal Created 0s kubelet, slave-node04 Created container executor
This is my spark-submit call:
/opt/spark/bin/spark-submit \
--master k8s://https://mycustomk8scluster:6443 \
--name examplepi \
--deploy-mode cluster \
--driver-memory 2G \
--executor-memory 2G \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/spark/work-dir/log4j.properties \
--conf spark.kubernetes.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
--conf spark.kubernetes.executor.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.driver.pod.name=pi-driver \
--conf spark.driver.allowMultipleContexts=true \
--conf spark.kubernetes.local.dirs.tmpfs=true \
--class com.olameter.sdi.imagery.IngestFromGrpc \
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar 100
I expect that the required executor (2) should be created. If the driver script cannot create it, I would at least expect some log to be able to diagnose the issue.
The issue was related to Hadoop + Spark integration. I was using Spark binary without Hadoop spark-2.4.3-bin-without-hadoop.tgz+ Hadoop 3.1.2. The configuration using environment variables seemed to be problematic for the Spark Executor.
I compiled Spark with Hadoop 3.1.2 to solve this issue. See: https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn.
I am trying to run a sample spark application(provided in the spark examples jar) on kubernetes and trying to understand the behavior. In this process, I did the following,
Built a running kubernetes cluster with 3 nodes (1 master and 2 child) with adequate resources(10 cores, 64Gigs mem, 500GB disk). Note that I don't have internet access on my nodes.
Installed Spark distribution - spark-2.3.3-bin-hadoop2.7
As there is no internet access on the node, I preloaded a spark image( from gcr.io/cloud-solutions-images/spark:v2.3.0-gcs) into the docker on the node running kubernetes master
Running spark-submit to k8's as follows,
./bin/spark-submit --master k8s://https://test-k8:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs \
--conf spark.kubernetes.driver.pod.name=spark-pi-driver \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.3.jar
However, it fails with the below error,
Error: Could not find or load main class org.apache.spark.examples.SparkPi
In regards to the above I have below questions:
Do we need to provide Kubernetes a distribution of spark? and is that what we are doing with?
--conf spark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs
If I have my own spark example, for say processing events from Kafka. What should be my approach?
Any help in debugging the above Error and answering my follow up questions is thankful.
spark.kubernetes.container.image should be an image that has both the spark binaries & the application code. In my case, as I don't have access to the internet from my nodes. Doing the following let spark driver pick the correct jar.
So, this is what I did,
In my local computer, I did a docker build
docker build -t spark_pi_test:v1.0 -f kubernetes/dockerfiles/spark/Dockerfile .
Above built me a docker image in my local computer.
tar'd the built docker image,
docker save spark_pi_test:v1.0 > spark_pi_test_v1.0.tar
scp'd the tar ball to all 3 kube nodes.
docker load the tar ball on all 3 kube nodes.
docker load < spark_pi_test_v1.0.tar
Then I submitted the spark job as follows,
./bin/spark-submit --master k8s://https://test-k8:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=spark_pi_test:v1.0 --conf spark.kubernetes.driver.pod.name=spark-pi-driver --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent local:///opt/spark/examples/jars/spark-examples_2.11-2.3.3.jar 100000
The above jar path is the path in the docker container.
For reference to DockerFile,
https://github.com/apache/spark/blob/master/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
I am trying to use spark on k8s.
Launched minikube
minikube --memory 8192 --cpus 2 start
and build spark master version (fresh fetched) and build docker image and push to docker hub and issued command.
$SPARK_HOME/bin/spark-submit \
--master k8s://192.168.99.100:8443 \
--deploy-mode cluster --name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=ruseel/spark:testing \
local:///tmp/spark-examples_2.11-2.4.0-SNAPSHOT-shaded.jar
But pod log said
...
+ case "$SPARK_K8S_CMD" in
+ CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[#]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $SPARK_DRIVER_ARGS)
+ exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java -cp ':/opt/spark/jars/*' -Xms -Xmx -Dspark.driver.bindAddress=172.17.0.4
Invalid initial heap size: -Xms
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
How can I run this command?
Spark master's new entrypoint.sh is not using $SPARK_DRIVER_MEMORY.
It seems to be removed in this commit. So this error doesn't raised anymore for me.
I have spring based based spark 2.3.0 application. I am trying to do spark submit on kubernetes(minikube).
I have Virtual Box with docker and minikube running.
opt/spark/bin/spark-submit --master k8s://https://192.168.99.101:8443 --name cfe2 --deploy-mode cluster --class com.yyy.Application --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=docker.io/anantpukale/spark_app:1.3 local://CashFlow-spark2.3.0-shaded.jar
Below is the stack trace:
start time: N/A
container images: N/A
phase: Pending
status: []
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: cfe2-c4f95aaeaefb3564b8106ad86e245457-driver
namespace: default
labels: spark-app-selector -> spark-dab914d1d34b4ecd9b747708f667ec2b, spark-role -> driver
pod uid: cc3b39e1-3d6e-11e8-ab1d-080027fcb315
creation time: 2018-04-11T09:57:51Z
service account name: default
volumes: default-token-v48xb
node name: minikube
start time: 2018-04-11T09:57:51Z
container images: docker.io/anantpukale/spark_app:1.3
phase: Pending
status: [ContainerStatus(containerID=null, image=docker.io/anantpukale/spark_app:1.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2018-04-11 09:57:52 INFO Client:54 - Waiting for application cfe2 to finish...
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: cfe2-c4f95aaeaefb3564b8106ad86e245457-driver
namespace: default
labels: spark-app-selector -> spark-dab914d1d34b4ecd9b747708f667ec2b, spark-role -> driver
pod uid: cc3b39e1-3d6e-11e8-ab1d-080027fcb315
creation time: 2018-04-11T09:57:51Z
service account name: default
volumes: default-token-v48xb
node name: minikube
start time: 2018-04-11T09:57:51Z
container images: anantpukale/spark_app:1.3
phase: Failed
status: [ContainerStatus(containerID=docker://40eae507eb9b615d3dd44349e936471157428259f583ec6a8ba3bd99d80b013e, image=anantpukale/spark_app:1.3, imageID=docker-pullable://anantpukale/spark_app#sha256:f61b3ef65c727a3ebd8a28362837c0bc90649778b668f78b6a33b7c0ce715227, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://40eae507eb9b615d3dd44349e936471157428259f583ec6a8ba3bd99d80b013e, exitCode=127, finishedAt=Time(time=2018-04-11T09:57:52Z, additionalProperties={}), message=invalid header field value **"oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"driver\\\": executable file not found in $PATH\"\n"**, reason=ContainerCannotRun, signal=null, startedAt=Time(time=2018-04-11T09:57:52Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver
Container image: anantpukale/spark_app:1.3
Container state: Terminated
Exit code: 127
2018-04-11 09:57:52 INFO Client:54 - Application cfe2 finished.
2018-04-11 09:57:52 INFO ShutdownHookManager:54 - Shutdown hook called
2018-04-11 09:57:52 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-d5813d6e-a4af-4bf6-b1fc-dc43c75cd643
Below is image of my docker file.
Error trace suggest that something in docker i have triggered with command "docker".
dockerfile
I was running into this issue. It is related to the docker image ENTRYPOINT. In spark 2.3.0 when using Kubernetes there now is an example of a Dockerfile which uses a specific script in the ENTRYPOINT found in kubernetes/dockerfiles/. If the docker image doesn't use that specific script as the ENTRYPOINT then the container doesn't start up properly. Spark Kubernetes Docker documentation
in you Dockerfile use ENV PATH="/opt/spark/bin:${PATH}" instead of your line.
Is it possible for you to login into the container using
#>docker run -it --rm docker.io/anantpukale/spark_app:1.3 sh
and try running the main program or command that you want to submit.
Based on this output we can try to investigate further .
along with #hichamx suggested changes with below code worked for me to overcome "exec: \"driver\" issue.
spark-submit \
--master k8s://http://127.0.0.1:8001 \
--name cfe2 \
--deploy-mode cluster \
--class com.oracle.Test \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=docker/anantpukale/spark_app:1.1 \
--conf spark.kubernetes.driver.container.image=docker.io/kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.executor.container.image=docker.io/kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 \
local://spark-0.0.1-SNAPSHOT.jar
Though this gave Error: Exit Code :127 and spark-kubernetes-driver terminated.