I try to run simple spark code on kubernetes cluster using spark 2.3 native kubernetes deployment feature.
I have a kubernetes cluster running. At this time, the spark code does not read or write data. It creates an RDD from list and print out the result, just to validate the ability to run kubernetes on spark. Also, copied the spark app jar in the kubernetes container image too.
Below is the command i run.
bin/spark-submit --master k8s://https://k8-master --deploy-mode cluster --name sparkapp --class com.sparrkonk8.rdd.MockWordCount --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=myapp/sparkapp:1.0.0 local:///SparkApp.jar
2018-03-06 10:31:28 INFO LoggingPodStatusWatcherImpl:54 - State
changed, new state: pod name:
sparkapp-6e475a6ae18d3b7a89ca2b5f6ae7aae4-driver namespace: default
labels: spark-app-selector ->
spark-9649dd66e9a946d989e2136d342ef249, spark-role -> driver pod
uid: 6d3e98cf-2153-11e8-85af-1204f474c8d2 creation time:
2018-03-06T15:31:23Z service account name: default volumes:
default-token-vwxvr node name: 192-168-1-1.myapp.engg.com start
time: 2018-03-06T15:31:23Z container images:
dockerhub.com/myapp/sparkapp:1.0.0 phase: Failed status:
[ContainerStatus(containerID=docker://3617a400e4604600d5fcc69df396facafbb2d9cd485a63bc324c1406e72f0d35,
image=dockerhub.com/myapp/sparkapp:1.0.0,
imageID=docker-pullable://dockerhub.com/sparkapp#sha256:f051d86384422dff3e8c8a97db823de8e62af3ea88678da4beea3f58cdb924e5,
lastState=ContainerState(running=null, terminated=null, waiting=null,
additionalProperties={}), name=spark-kubernetes-driver, ready=false,
restartCount=0, state=ContainerState(running=null,
terminated=ContainerStateTerminated(containerID=docker://3617a400e4604600d5fcc69df396facafbb2d9cd485a63bc324c1406e72f0d35,
exitCode=1, finishedAt=Time(time=2018-03-06T15:31:24Z,
additionalProperties={}), message=null, reason=Error, signal=null,
startedAt=Time(time=2018-03-06T15:31:24Z, additionalProperties={}),
additionalProperties={}), waiting=null, additionalProperties={}),
additionalProperties={})] 2018-03-06 10:31:28 INFO
LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver Container image:
myapp/sparkapp:1.0.0 Container state: Terminated Exit code: 1
Below is the spark config that driver pod submits. I pulled this from K8s UI. #TobiSH let me know if this helps for troubleshooting my issue.
SPARK_DRIVER_MEMORY: 1g
SPARK_DRIVER_CLASS: com.sparrkonk8.rdd.MockWordCount
SPARK_DRIVER_ARGS:
SPARK_DRIVER_BIND_ADDRESS:
SPARK_MOUNTED_CLASSPATH: /SparkApp.jar:/SparkApp.jar
SPARK_JAVA_OPT_0: -Dspark.kubernetes.executor.podNamePrefix=sparkapp-028d46fa109e309b8dfe1a4eceb46b61
SPARK_JAVA_OPT_1: -Dspark.app.name=sparkapp
SPARK_JAVA_OPT_2: -Dspark.kubernetes.driver.pod.name=sparkapp-028d46fa109e309b8dfe1a4eceb46b61-driver
SPARK_JAVA_OPT_3: -Dspark.executor.instances=5
SPARK_JAVA_OPT_4: -Dspark.submit.deployMode=cluster
SPARK_JAVA_OPT_5: -Dspark.driver.blockManager.port=7079
SPARK_JAVA_OPT_6: -Dspark.kubernetes.container.image=docker.com/myapp/sparkapp:1.0.0
SPARK_JAVA_OPT_7: -Dspark.app.id=spark-5e3beb5109174f40a84635b786789c30
SPARK_JAVA_OPT_8: -Dspark.master= k8s://https://k8-master
SPARK_JAVA_OPT_9: -Dspark.driver.host=sparkapp-028d46fa109e309b8dfe1a4eceb46b61-driver-svc.default.svc
SPARK_JAVA_OPT_10: -Dspark.jars=/opt/spark/work-dir/SparkApp.jar,/opt/spark/work-dir/SparkApp.jar
SPARK_JAVA_OPT_11: -Dspark.driver.port=7078`
Since there are no logs, this means it is immediately crashing upon container creation. I recommend trying to run this with the local master configuration to ensure everything on the spark side is good and then try it again via kubernetes as the master.
Related
I am trying to run the following code to submit a spark application to a kubernetes' cluster:
/opt/spark/bin/spark-submit --master k8s://https://<spark-master-ip>:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=newfrontdocker/spark:v3.0.1-j14 local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar 100
When I try to run this code, the container runs temporarily before I end up getting this particular error message and the container terminates:
container status:
container name: spark-kubernetes-driver
container image: newfrontdocker/spark:v3.0.1-j14
container state: terminated
container started at: 2021-07-17T11:49:46Z
container finished at: 2021-07-17T11:49:48Z
exit code: 101
termination reason: Error
21/07/17 06:49:13 INFO LoggingPodStatusWatcherImpl: Application status for spark-c15c11340f204794b51bf8d79397bf9e (phase: Failed)
21/07/17 06:49:13 INFO LoggingPodStatusWatcherImpl: Container final statuses:
container name: spark-kubernetes-driver
container image: newfrontdocker/spark:v3.0.1-j14
container state: terminated
container started at: 2021-07-17T11:49:46Z
container finished at: 2021-07-17T11:49:48Z
exit code: 101
termination reason: Error
21/07/17 06:49:13 INFO LoggingPodStatusWatcherImpl: Application spark-pi with submission ID default:spark-pi-4ed9627ab44c778d-driver finished
21/07/17 06:49:14 INFO ShutdownHookManager: Shutdown hook called
21/07/17 06:49:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-3d1a7ff3-1dc9-4db0-acfa-ed52706122b6
What is the exit code 101 and what do I need to do to fix this issue so I run spark apps on the kubernetes' cluster?
In my case the reason was that the jar path was incorrect.
I guess that the path should be:
local:///opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar
(3.0.1 instead of 3.1.1), because you use newfrontdocker/spark:v3.0.1-j14 image.
When i try to submit my app through spark-submit i get the following error:
Please help me resolve the problem
Error:
pod name: newdriver
namespace: default
labels: spark-app-selector -> spark-a17960c79886423383797eaa77f9f706, spark-role -> driver
pod uid: 0afa41ae-4e4c-47be-86a3-1ef77739506c
creation time: 2020-05-06T14:11:29Z
service account name: spark
volumes: spark-local-dir-1, spark-conf-volume, spark-token-tks2g
node name: minikube
start time: 2020-05-06T14:11:29Z
phase: Running
container status:
container name: spark-kubernetes-driver
container image: spark-py:v3.0
container state: running
container started at: 2020-05-06T14:11:31Z
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://172.17.0.2:8443/api/v1/namespaces/default/pods. Message: pods "newtrydriver" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=pods, name=newtrydriver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "newtrydriver" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:510)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:449)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:413)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:819)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:334)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
at org.apache.spark.deploy.k8s.submit.Client.$anonfun$run$2(KubernetesClientApplication.scala:130)
at org.apache.spark.deploy.k8s.submit.Client.$anonfun$run$2$adapted(KubernetesClientApplication.scala:129)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:129)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/05/06 14:11:34 INFO ShutdownHookManager: Shutdown hook called
20/05/06 14:11:34 INFO ShutdownHookManager: Deleting directory /tmp/spark-b7ea9c80-6040-460a-ba43-5c6e656d3039
Configuration for Submitting the job to k8s
./spark-submit
--master k8s://https://172.17.0.2:8443
--deploy-mode cluster
--conf spark.executor.instances=3
--conf spark.kubernetes.container.image=spark-py:v3.0
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
--name newtry
--conf spark.kubernetes.driver.pod.name=newdriver
local:///opt/spark/examples/src/main/python/spark-submit-old.py
Running spark on k8s in Cluster Mode
No other Pod with the name newdriver running on my minikube
Please check if there is a Pod named newdriver in namespace default by running kubectl get pods --namespace default --show-all. You probably already have Terminated or Completed Spark Driver Pod with this name left from the previous runs. If so, delete it by running kubectl delete pod newdriver --namespace default and then try to launch new Spark job again.
I'm having the following issue while trying to run Spark for kubernetes when the app jar is stored in an Azure Blob Storage container:
2018-10-18 08:48:54 INFO DAGScheduler:54 - Job 0 failed: reduce at SparkPi.scala:38, took 1.743177 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 10.244.1.11, executor 2): org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1086)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:538)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1366)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3242)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3291)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3259)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:470)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1897)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:694)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:476)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:755)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:747)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:747)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:863)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1081)
... 24 more
The command I use to launch the job is:
/opt/spark/bin/spark-submit
--master k8s://<my-k8s-master>
--deploy-mode cluster
--name spark-pi
--class org.apache.spark.examples.SparkPi
--conf spark.executor.instances=5
--conf spark.kubernetes.container.image=<my-image-built-with-wasb>
--conf spark.kubernetes.namespace=<my-namespace>
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
--conf spark.kubernetes.driver.secrets.spark=/opt/spark/conf
--conf spark.kubernetes.executor.secrets.spark=/opt/spark/conf
wasb://<my-container-name>#<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar 10000
I have a k8s secret named spark with the following content:
apiVersion: v1
kind: Secret
metadata:
name: spark
labels:
app: spark
stack: service
type: Opaque
data:
core-site.xml: |-
{% filter b64encode %}
<configuration>
<property>
<name>fs.azure.account.key.<my-account-name>.blob.core.windows.net</name>
<value><my-account-key></value>
</property>
<property>
<name>fs.AbstractFileSystem.wasb.Impl</name>
<value>org.apache.hadoop.fs.azure.Wasb</value>
</property>
</configuration>
{% endfilter %}
The driver pod manages to download the jar dependencies as stored in a container in Azure Blob Storage. As can be seen in this log snippet:
2018-10-18 08:48:16 INFO Utils:54 - Fetching wasb://<my-container-name>#<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar to /var/spark-data/spark-jars/fetchFileTemp8575879929413871510.tmp
2018-10-18 08:48:16 INFO SparkPodInitContainer:54 - Finished downloading application dependencies.
How can I get the executor pods to get the credentials as stored in the core-site.xml file that's mounted from the k8s secret? What am I missing?
I solved it by adding the following config to spark-submit
--conf spark.hadoop.fs.AbstractFileSystem.wasb.Impl=org.apache.hadoop.fs.azure.Wasb
--conf spark.hadoop.fs.azure.account.key.${STORAGE_ACCOUNT_NAME}.blob.core.windows.net=${STORAGE_ACCOUNT_KEY}
I have spring based based spark 2.3.0 application. I am trying to do spark submit on kubernetes(minikube).
I have Virtual Box with docker and minikube running.
opt/spark/bin/spark-submit --master k8s://https://192.168.99.101:8443 --name cfe2 --deploy-mode cluster --class com.yyy.Application --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=docker.io/anantpukale/spark_app:1.3 local://CashFlow-spark2.3.0-shaded.jar
Below is the stack trace:
start time: N/A
container images: N/A
phase: Pending
status: []
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: cfe2-c4f95aaeaefb3564b8106ad86e245457-driver
namespace: default
labels: spark-app-selector -> spark-dab914d1d34b4ecd9b747708f667ec2b, spark-role -> driver
pod uid: cc3b39e1-3d6e-11e8-ab1d-080027fcb315
creation time: 2018-04-11T09:57:51Z
service account name: default
volumes: default-token-v48xb
node name: minikube
start time: 2018-04-11T09:57:51Z
container images: docker.io/anantpukale/spark_app:1.3
phase: Pending
status: [ContainerStatus(containerID=null, image=docker.io/anantpukale/spark_app:1.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2018-04-11 09:57:52 INFO Client:54 - Waiting for application cfe2 to finish...
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: cfe2-c4f95aaeaefb3564b8106ad86e245457-driver
namespace: default
labels: spark-app-selector -> spark-dab914d1d34b4ecd9b747708f667ec2b, spark-role -> driver
pod uid: cc3b39e1-3d6e-11e8-ab1d-080027fcb315
creation time: 2018-04-11T09:57:51Z
service account name: default
volumes: default-token-v48xb
node name: minikube
start time: 2018-04-11T09:57:51Z
container images: anantpukale/spark_app:1.3
phase: Failed
status: [ContainerStatus(containerID=docker://40eae507eb9b615d3dd44349e936471157428259f583ec6a8ba3bd99d80b013e, image=anantpukale/spark_app:1.3, imageID=docker-pullable://anantpukale/spark_app#sha256:f61b3ef65c727a3ebd8a28362837c0bc90649778b668f78b6a33b7c0ce715227, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://40eae507eb9b615d3dd44349e936471157428259f583ec6a8ba3bd99d80b013e, exitCode=127, finishedAt=Time(time=2018-04-11T09:57:52Z, additionalProperties={}), message=invalid header field value **"oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"driver\\\": executable file not found in $PATH\"\n"**, reason=ContainerCannotRun, signal=null, startedAt=Time(time=2018-04-11T09:57:52Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver
Container image: anantpukale/spark_app:1.3
Container state: Terminated
Exit code: 127
2018-04-11 09:57:52 INFO Client:54 - Application cfe2 finished.
2018-04-11 09:57:52 INFO ShutdownHookManager:54 - Shutdown hook called
2018-04-11 09:57:52 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-d5813d6e-a4af-4bf6-b1fc-dc43c75cd643
Below is image of my docker file.
Error trace suggest that something in docker i have triggered with command "docker".
dockerfile
I was running into this issue. It is related to the docker image ENTRYPOINT. In spark 2.3.0 when using Kubernetes there now is an example of a Dockerfile which uses a specific script in the ENTRYPOINT found in kubernetes/dockerfiles/. If the docker image doesn't use that specific script as the ENTRYPOINT then the container doesn't start up properly. Spark Kubernetes Docker documentation
in you Dockerfile use ENV PATH="/opt/spark/bin:${PATH}" instead of your line.
Is it possible for you to login into the container using
#>docker run -it --rm docker.io/anantpukale/spark_app:1.3 sh
and try running the main program or command that you want to submit.
Based on this output we can try to investigate further .
along with #hichamx suggested changes with below code worked for me to overcome "exec: \"driver\" issue.
spark-submit \
--master k8s://http://127.0.0.1:8001 \
--name cfe2 \
--deploy-mode cluster \
--class com.oracle.Test \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=docker/anantpukale/spark_app:1.1 \
--conf spark.kubernetes.driver.container.image=docker.io/kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.executor.container.image=docker.io/kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 \
local://spark-0.0.1-SNAPSHOT.jar
Though this gave Error: Exit Code :127 and spark-kubernetes-driver terminated.
I am trying to submit a Spark Job on Kubernetes natively using Apache Spark 2.3.
When I use a Docker image on Docker Hub (for Spark 2.2), it works:
bin/spark-submit \
--master k8s://http://localhost:8080 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar
However, when I try to build a local Docker image,
sudo docker build -t spark:2.3 -f kubernetes/dockerfiles/spark/Dockerfile .
and submit the job as:
bin/spark-submit \
--master k8s://http://localhost:8080 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=spark:2.3 \
local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar
I get the following error; that is "repository docker.io/spark not found: does not exist or no pull access, reason=ErrImagePull, additionalProperties={})"
status: [ContainerStatus(containerID=null, image=spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=rpc error: code = 2 desc = repository docker.io/spark not found: does not exist or no pull access, reason=ErrImagePull, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2018-03-15 11:09:54 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: spark-pi-3a1a6e8ce615395fa7df81eac06d58ed-driver
namespace: default
labels: spark-app-selector -> spark-8d9fdaba274a4eb69e28e2a242fe86ca, spark-role -> driver
pod uid: 5271602b-2841-11e8-a78e-fa163ed09d5f
creation time: 2018-03-15T11:09:25Z
service account name: default
volumes: default-token-v4vhk
node name: mlaas-p4k3djw4nsca-minion-1
start time: 2018-03-15T11:09:25Z
container images: spark:2.3
phase: Pending
status: [ContainerStatus(containerID=null, image=spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=Back-off pulling image "spark:2.3", reason=ImagePullBackOff, additionalProperties={}), additionalProperties={}), additionalProperties={})]
Also, I tried to run a local Docker registry as described in:
https://docs.docker.com/registry/deploying/#run-a-local-registry
docker run -d -p 5000:5000 --restart=always --name registry registry:2
sudo docker tag spark:2.3 localhost:5000/spark:2.3
sudo docker push localhost:5000/spark:2.3
I can do this successfully:
docker pull localhost:5000/spark:2.3
However, when I submit the Spark job:
bin/spark-submit \
--master k8s://http://localhost:8080 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=localhost:5000/spark:2.3 \
local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar
I again got ErrImagePull:
status: [ContainerStatus(containerID=null, image=localhost:5000/spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=rpc error: code = 2 desc = Error while pulling image: Get http://localhost:5000/v1/repositories/spark/images: dial tcp [::1]:5000: getsockopt: connection refused, reason=ErrImagePull, additionalProperties={}), additionalProperties={}), additionalProperties={})]
Is there a way in Spark 2.3 to use local Docker images when submitting jobs natively to Kubernetes?
Thank you in advance.
I guess you using something like a minikube for set-up a local Kubernetes cluster and in most of cases it using a virtual machines to spawn a cluster.
So, when Kubernetes trying to pull image from localhost address, it connecting to virtual machine local address, not to your computer address. Moreover, your local registry bind only on localhost and not accessible from virtual machines.
The idea of a fix is to make your local docker registry accessible for your Kubernetes and to allow pull images from local insecure registry.
So, first of all, bind your docker registry on your PC to all interfaces:
docker run -d -p 0.0.0.0:5000:5000 --restart=always --name registry registry:2
Then, check your local IP address of the PC. It will be something like 172.X.X.X or 10.X.X.X. The way of the check is depends of your OS, so just google it if you don't know how to get it.
After, start your minikube with an additional option:
minikube start --insecure-registry="<your-local-ip-address>:5000", where a 'your-local-ip-address' is your local IP address.
Now you can try to run a spark job with a new address of a registry and K8s has be able to download your image:
spark.kubernetes.container.image=<your-local-ip-address>:5000/spark:2.3