How to use local Docker images when submitting Spark jobs (2.3) natively to Kubernetes? - apache-spark

I am trying to submit a Spark Job on Kubernetes natively using Apache Spark 2.3.
When I use a Docker image on Docker Hub (for Spark 2.2), it works:
bin/spark-submit \
--master k8s://http://localhost:8080 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar
However, when I try to build a local Docker image,
sudo docker build -t spark:2.3 -f kubernetes/dockerfiles/spark/Dockerfile .
and submit the job as:
bin/spark-submit \
--master k8s://http://localhost:8080 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=spark:2.3 \
local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar
I get the following error; that is "repository docker.io/spark not found: does not exist or no pull access, reason=ErrImagePull, additionalProperties={})"
status: [ContainerStatus(containerID=null, image=spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=rpc error: code = 2 desc = repository docker.io/spark not found: does not exist or no pull access, reason=ErrImagePull, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2018-03-15 11:09:54 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: spark-pi-3a1a6e8ce615395fa7df81eac06d58ed-driver
namespace: default
labels: spark-app-selector -> spark-8d9fdaba274a4eb69e28e2a242fe86ca, spark-role -> driver
pod uid: 5271602b-2841-11e8-a78e-fa163ed09d5f
creation time: 2018-03-15T11:09:25Z
service account name: default
volumes: default-token-v4vhk
node name: mlaas-p4k3djw4nsca-minion-1
start time: 2018-03-15T11:09:25Z
container images: spark:2.3
phase: Pending
status: [ContainerStatus(containerID=null, image=spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=Back-off pulling image "spark:2.3", reason=ImagePullBackOff, additionalProperties={}), additionalProperties={}), additionalProperties={})]
Also, I tried to run a local Docker registry as described in:
https://docs.docker.com/registry/deploying/#run-a-local-registry
docker run -d -p 5000:5000 --restart=always --name registry registry:2
sudo docker tag spark:2.3 localhost:5000/spark:2.3
sudo docker push localhost:5000/spark:2.3
I can do this successfully:
docker pull localhost:5000/spark:2.3
However, when I submit the Spark job:
bin/spark-submit \
--master k8s://http://localhost:8080 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=localhost:5000/spark:2.3 \
local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar
I again got ErrImagePull:
status: [ContainerStatus(containerID=null, image=localhost:5000/spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=rpc error: code = 2 desc = Error while pulling image: Get http://localhost:5000/v1/repositories/spark/images: dial tcp [::1]:5000: getsockopt: connection refused, reason=ErrImagePull, additionalProperties={}), additionalProperties={}), additionalProperties={})]
Is there a way in Spark 2.3 to use local Docker images when submitting jobs natively to Kubernetes?
Thank you in advance.

I guess you using something like a minikube for set-up a local Kubernetes cluster and in most of cases it using a virtual machines to spawn a cluster.
So, when Kubernetes trying to pull image from localhost address, it connecting to virtual machine local address, not to your computer address. Moreover, your local registry bind only on localhost and not accessible from virtual machines.
The idea of a fix is to make your local docker registry accessible for your Kubernetes and to allow pull images from local insecure registry.
So, first of all, bind your docker registry on your PC to all interfaces:
docker run -d -p 0.0.0.0:5000:5000 --restart=always --name registry registry:2
Then, check your local IP address of the PC. It will be something like 172.X.X.X or 10.X.X.X. The way of the check is depends of your OS, so just google it if you don't know how to get it.
After, start your minikube with an additional option:
minikube start --insecure-registry="<your-local-ip-address>:5000", where a 'your-local-ip-address' is your local IP address.
Now you can try to run a spark job with a new address of a registry and K8s has be able to download your image:
spark.kubernetes.container.image=<your-local-ip-address>:5000/spark:2.3

Related

Submitting a spark job to a kubernetes cluster using bitnami spark docker image

I have a local setup with minikube and I'm trying to use spark-submit to submit a job to a local Kubernetes. The idea here is to use my local machine's spark-submit to submit to the kubernetes master which will handle creating a spark cluster and taking it down when the work is finished.
I'm using the image bitnami/spark:3.2.1 and the following command:
./bin/spark-submit --master k8s://https://127.0.0.1:52388 \
--deploy-mode cluster \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=bitnami/spark:3.2.1 \
--class org.apache.spark.examples.JavaSparkPi \
--name spark-pi \
local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.2.1.jar
This does not seem to work and the logs in the spark driver are:
[...]
Caused by: java.io.IOException: Failed to connect to spark-master:7077
[...]
and
[...]
Caused by: java.net.UnknownHostException: spark-master
[...]
If I use the docker-image-tool.sh to build a custom spark docker image with the python bindings and use that, it works perfectly. How is bitnami's image special and why doesn't it recognise that the master in this case is kubernetes?
I also tried using the option conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://127.0.0.1:7077 when submitting but the error was similar to above.

Spark executor pods quickly in error after their creation using kubernetes as master

When I launch the SparkPi example on a self-hosted kubernetes cluster, the executor pods are quickly created -> have an error status -> are deleted -> are replaced by new executors pods.
I tried the same command on a Google Kubernetes Engine with success. I check the RBAC rolebinding to make sure that the service account has right to create the pod.
Guessing when the next executor pod will be ready, I can see using kubectl describe pod <predicted_executor_pod_with_number> that the pod is actually created:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1s default-scheduler Successfully assigned default/examplepi-1563878435019-exec-145 to slave-node04
Normal Pulling 0s kubelet, slave-node04 Pulling image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
Normal Pulled 0s kubelet, slave-node04 Successfully pulled image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
Normal Created 0s kubelet, slave-node04 Created container executor
This is my spark-submit call:
/opt/spark/bin/spark-submit \
--master k8s://https://mycustomk8scluster:6443 \
--name examplepi \
--deploy-mode cluster \
--driver-memory 2G \
--executor-memory 2G \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/spark/work-dir/log4j.properties \
--conf spark.kubernetes.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
--conf spark.kubernetes.executor.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.driver.pod.name=pi-driver \
--conf spark.driver.allowMultipleContexts=true \
--conf spark.kubernetes.local.dirs.tmpfs=true \
--class com.olameter.sdi.imagery.IngestFromGrpc \
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar 100
I expect that the required executor (2) should be created. If the driver script cannot create it, I would at least expect some log to be able to diagnose the issue.
The issue was related to Hadoop + Spark integration. I was using Spark binary without Hadoop spark-2.4.3-bin-without-hadoop.tgz+ Hadoop 3.1.2. The configuration using environment variables seemed to be problematic for the Spark Executor.
I compiled Spark with Hadoop 3.1.2 to solve this issue. See: https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn.

Spark on K8's Issues loading jar

I am trying to run a sample spark application(provided in the spark examples jar) on kubernetes and trying to understand the behavior. In this process, I did the following,
Built a running kubernetes cluster with 3 nodes (1 master and 2 child) with adequate resources(10 cores, 64Gigs mem, 500GB disk). Note that I don't have internet access on my nodes.
Installed Spark distribution - spark-2.3.3-bin-hadoop2.7
As there is no internet access on the node, I preloaded a spark image( from gcr.io/cloud-solutions-images/spark:v2.3.0-gcs) into the docker on the node running kubernetes master
Running spark-submit to k8's as follows,
./bin/spark-submit --master k8s://https://test-k8:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs \
--conf spark.kubernetes.driver.pod.name=spark-pi-driver \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.3.jar
However, it fails with the below error,
Error: Could not find or load main class org.apache.spark.examples.SparkPi
In regards to the above I have below questions:
Do we need to provide Kubernetes a distribution of spark? and is that what we are doing with?
--conf spark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs
If I have my own spark example, for say processing events from Kafka. What should be my approach?
Any help in debugging the above Error and answering my follow up questions is thankful.
spark.kubernetes.container.image should be an image that has both the spark binaries & the application code. In my case, as I don't have access to the internet from my nodes. Doing the following let spark driver pick the correct jar.
So, this is what I did,
In my local computer, I did a docker build
docker build -t spark_pi_test:v1.0 -f kubernetes/dockerfiles/spark/Dockerfile .
Above built me a docker image in my local computer.
tar'd the built docker image,
docker save spark_pi_test:v1.0 > spark_pi_test_v1.0.tar
scp'd the tar ball to all 3 kube nodes.
docker load the tar ball on all 3 kube nodes.
docker load < spark_pi_test_v1.0.tar
Then I submitted the spark job as follows,
./bin/spark-submit --master k8s://https://test-k8:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=spark_pi_test:v1.0 --conf spark.kubernetes.driver.pod.name=spark-pi-driver --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent local:///opt/spark/examples/jars/spark-examples_2.11-2.3.3.jar 100000
The above jar path is the path in the docker container.
For reference to DockerFile,
https://github.com/apache/spark/blob/master/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile

spark on k8s - Error 'Invalid initial heap size: -Xms'

I am trying to use spark on k8s.
Launched minikube
minikube --memory 8192 --cpus 2 start
and build spark master version (fresh fetched) and build docker image and push to docker hub and issued command.
$SPARK_HOME/bin/spark-submit \
--master k8s://192.168.99.100:8443 \
--deploy-mode cluster --name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=ruseel/spark:testing \
local:///tmp/spark-examples_2.11-2.4.0-SNAPSHOT-shaded.jar
But pod log said
...
+ case "$SPARK_K8S_CMD" in
+ CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[#]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $SPARK_DRIVER_ARGS)
+ exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java -cp ':/opt/spark/jars/*' -Xms -Xmx -Dspark.driver.bindAddress=172.17.0.4
Invalid initial heap size: -Xms
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
How can I run this command?
Spark master's new entrypoint.sh is not using $SPARK_DRIVER_MEMORY.
It seems to be removed in this commit. So this error doesn't raised anymore for me.

starting container process caused \"exec: \\\"driver\\\": executable file not found in $PATH\"\n"

I have spring based based spark 2.3.0 application. I am trying to do spark submit on kubernetes(minikube).
I have Virtual Box with docker and minikube running.
opt/spark/bin/spark-submit --master k8s://https://192.168.99.101:8443 --name cfe2 --deploy-mode cluster --class com.yyy.Application --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=docker.io/anantpukale/spark_app:1.3 local://CashFlow-spark2.3.0-shaded.jar
Below is the stack trace:
start time: N/A
container images: N/A
phase: Pending
status: []
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: cfe2-c4f95aaeaefb3564b8106ad86e245457-driver
namespace: default
labels: spark-app-selector -> spark-dab914d1d34b4ecd9b747708f667ec2b, spark-role -> driver
pod uid: cc3b39e1-3d6e-11e8-ab1d-080027fcb315
creation time: 2018-04-11T09:57:51Z
service account name: default
volumes: default-token-v48xb
node name: minikube
start time: 2018-04-11T09:57:51Z
container images: docker.io/anantpukale/spark_app:1.3
phase: Pending
status: [ContainerStatus(containerID=null, image=docker.io/anantpukale/spark_app:1.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2018-04-11 09:57:52 INFO Client:54 - Waiting for application cfe2 to finish...
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: cfe2-c4f95aaeaefb3564b8106ad86e245457-driver
namespace: default
labels: spark-app-selector -> spark-dab914d1d34b4ecd9b747708f667ec2b, spark-role -> driver
pod uid: cc3b39e1-3d6e-11e8-ab1d-080027fcb315
creation time: 2018-04-11T09:57:51Z
service account name: default
volumes: default-token-v48xb
node name: minikube
start time: 2018-04-11T09:57:51Z
container images: anantpukale/spark_app:1.3
phase: Failed
status: [ContainerStatus(containerID=docker://40eae507eb9b615d3dd44349e936471157428259f583ec6a8ba3bd99d80b013e, image=anantpukale/spark_app:1.3, imageID=docker-pullable://anantpukale/spark_app#sha256:f61b3ef65c727a3ebd8a28362837c0bc90649778b668f78b6a33b7c0ce715227, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://40eae507eb9b615d3dd44349e936471157428259f583ec6a8ba3bd99d80b013e, exitCode=127, finishedAt=Time(time=2018-04-11T09:57:52Z, additionalProperties={}), message=invalid header field value **"oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"driver\\\": executable file not found in $PATH\"\n"**, reason=ContainerCannotRun, signal=null, startedAt=Time(time=2018-04-11T09:57:52Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver
Container image: anantpukale/spark_app:1.3
Container state: Terminated
Exit code: 127
2018-04-11 09:57:52 INFO Client:54 - Application cfe2 finished.
2018-04-11 09:57:52 INFO ShutdownHookManager:54 - Shutdown hook called
2018-04-11 09:57:52 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-d5813d6e-a4af-4bf6-b1fc-dc43c75cd643
Below is image of my docker file.
Error trace suggest that something in docker i have triggered with command "docker".
dockerfile
I was running into this issue. It is related to the docker image ENTRYPOINT. In spark 2.3.0 when using Kubernetes there now is an example of a Dockerfile which uses a specific script in the ENTRYPOINT found in kubernetes/dockerfiles/. If the docker image doesn't use that specific script as the ENTRYPOINT then the container doesn't start up properly. Spark Kubernetes Docker documentation
in you Dockerfile use ENV PATH="/opt/spark/bin:${PATH}" instead of your line.
Is it possible for you to login into the container using
#>docker run -it --rm docker.io/anantpukale/spark_app:1.3 sh
and try running the main program or command that you want to submit.
Based on this output we can try to investigate further .
along with #hichamx suggested changes with below code worked for me to overcome "exec: \"driver\" issue.
spark-submit \
--master k8s://http://127.0.0.1:8001 \
--name cfe2 \
--deploy-mode cluster \
--class com.oracle.Test \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=docker/anantpukale/spark_app:1.1 \
--conf spark.kubernetes.driver.container.image=docker.io/kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.executor.container.image=docker.io/kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 \
local://spark-0.0.1-SNAPSHOT.jar
Though this gave Error: Exit Code :127 and spark-kubernetes-driver terminated.

Resources