Spark Kubernetes Error : Pod Already Exists - apache-spark

When i try to submit my app through spark-submit i get the following error:
Please help me resolve the problem
Error:
pod name: newdriver
namespace: default
labels: spark-app-selector -> spark-a17960c79886423383797eaa77f9f706, spark-role -> driver
pod uid: 0afa41ae-4e4c-47be-86a3-1ef77739506c
creation time: 2020-05-06T14:11:29Z
service account name: spark
volumes: spark-local-dir-1, spark-conf-volume, spark-token-tks2g
node name: minikube
start time: 2020-05-06T14:11:29Z
phase: Running
container status:
container name: spark-kubernetes-driver
container image: spark-py:v3.0
container state: running
container started at: 2020-05-06T14:11:31Z
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://172.17.0.2:8443/api/v1/namespaces/default/pods. Message: pods "newtrydriver" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=pods, name=newtrydriver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "newtrydriver" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:510)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:449)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:413)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:819)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:334)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
at org.apache.spark.deploy.k8s.submit.Client.$anonfun$run$2(KubernetesClientApplication.scala:130)
at org.apache.spark.deploy.k8s.submit.Client.$anonfun$run$2$adapted(KubernetesClientApplication.scala:129)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:129)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/05/06 14:11:34 INFO ShutdownHookManager: Shutdown hook called
20/05/06 14:11:34 INFO ShutdownHookManager: Deleting directory /tmp/spark-b7ea9c80-6040-460a-ba43-5c6e656d3039
Configuration for Submitting the job to k8s
./spark-submit
--master k8s://https://172.17.0.2:8443
--deploy-mode cluster
--conf spark.executor.instances=3
--conf spark.kubernetes.container.image=spark-py:v3.0
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
--name newtry
--conf spark.kubernetes.driver.pod.name=newdriver
local:///opt/spark/examples/src/main/python/spark-submit-old.py
Running spark on k8s in Cluster Mode
No other Pod with the name newdriver running on my minikube

Please check if there is a Pod named newdriver in namespace default by running kubectl get pods --namespace default --show-all. You probably already have Terminated or Completed Spark Driver Pod with this name left from the previous runs. If so, delete it by running kubectl delete pod newdriver --namespace default and then try to launch new Spark job again.

Related

How to deal with error code 101 for Spark-submit on Kubernetes

I am trying to run the following code to submit a spark application to a kubernetes' cluster:
/opt/spark/bin/spark-submit --master k8s://https://<spark-master-ip>:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=newfrontdocker/spark:v3.0.1-j14 local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar 100
When I try to run this code, the container runs temporarily before I end up getting this particular error message and the container terminates:
container status:
container name: spark-kubernetes-driver
container image: newfrontdocker/spark:v3.0.1-j14
container state: terminated
container started at: 2021-07-17T11:49:46Z
container finished at: 2021-07-17T11:49:48Z
exit code: 101
termination reason: Error
21/07/17 06:49:13 INFO LoggingPodStatusWatcherImpl: Application status for spark-c15c11340f204794b51bf8d79397bf9e (phase: Failed)
21/07/17 06:49:13 INFO LoggingPodStatusWatcherImpl: Container final statuses:
container name: spark-kubernetes-driver
container image: newfrontdocker/spark:v3.0.1-j14
container state: terminated
container started at: 2021-07-17T11:49:46Z
container finished at: 2021-07-17T11:49:48Z
exit code: 101
termination reason: Error
21/07/17 06:49:13 INFO LoggingPodStatusWatcherImpl: Application spark-pi with submission ID default:spark-pi-4ed9627ab44c778d-driver finished
21/07/17 06:49:14 INFO ShutdownHookManager: Shutdown hook called
21/07/17 06:49:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-3d1a7ff3-1dc9-4db0-acfa-ed52706122b6
What is the exit code 101 and what do I need to do to fix this issue so I run spark apps on the kubernetes' cluster?
In my case the reason was that the jar path was incorrect.
I guess that the path should be:
local:///opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar
(3.0.1 instead of 3.1.1), because you use newfrontdocker/spark:v3.0.1-j14 image.

Why does Spark fail with "No File System for scheme: local"?

I am trying to submit Spark job onto the Spark Cluster which is setup on AWS EKS as
NAME READY STATUS RESTARTS AGE
spark-master-5f98d5-5kdfd 1/1 Running 0 22h
spark-worker-878598b54-jmdcv 1/1 Running 2 3d11h
spark-worker-878598b54-sz6z6 1/1 Running 2 3d11h
i am using below manifest
apiVersion: batch/v1
kind: Job
metadata:
name: spark-on-eks
spec:
template:
spec:
containers:
- name: spark
image: repo:spark-appv6
command: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master spark://192.XXX.XXX.XXX:7077 \
--deploy-mode cluster \
--name spark-app \
--class com.xx.migration.convert.TestCase \
--conf spark.kubernetes.container.image=repo:spark-appv6
--conf spark.kubernetes.namespace=spark-pi \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-pi \
--conf spark.executor.instances=2 \
local:///opt/spark/examples/jars/testing-jar-with-dependencies.jar"
]
serviceAccountName: spark-pi
restartPolicy: Never
backoffLimit: 4
and getting below error log
20/12/25 10:06:41 INFO Utils: Successfully started service 'driverClient' on port 34511.
20/12/25 10:06:41 INFO TransportClientFactory: Successfully created connection to /192.XXX.XXX.XXX:7077 after 37 ms (0 ms spent in bootstraps)
20/12/25 10:06:41 INFO ClientEndpoint: Driver successfully submitted as driver-20201225100641-0011
20/12/25 10:06:41 INFO ClientEndpoint: ... waiting before polling master for driver state
20/12/25 10:06:46 INFO ClientEndpoint: ... polling master for driver state
20/12/25 10:06:46 INFO ClientEndpoint: State of driver-2020134340641-0011 is ERROR
20/12/25 10:06:46 ERROR ClientEndpoint: Exception from cluster was: java.io.IOException: No FileSystem for scheme: local
java.io.IOException: No FileSystem for scheme: local
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:535)
at org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:166)
at org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:177)
at org.apache.spark.deploy.worker.DriverRunner$$anon$2.run(DriverRunner.scala:96)
20/12/25 10:06:46 INFO ShutdownHookManager: Shutdown hook called
20/12/25 10:06:46 INFO ShutdownHookManager: Deleting directory /tmp/spark-d568b819-fe8e-486f-9b6f-741rerf87cf1
Also when i try to submit job in client mode without container parameter, it gets submitted successfully but job keeps runnings and spins multiple executors on worker nodes.
Spark version- 3.0.0
When used k8s://http://Spark-Master-ip:7077 \ i get following error
20/12/28 06:59:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/12/28 06:59:12 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
20/12/28 06:59:12 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
20/12/28 06:59:13 WARN WatchConnectionManager: Exec Failure
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at okio.Okio$2.read(Okio.java:140)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226)
at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215)
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:134)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:109)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Please help with above requirement, Thanks
Assuming you're not using spark on k8s operator the master should be:
k8s://https://kubernetes.default.svc.cluster.local
if not, you can get your master address by running:
$ kubectl cluster-info
Kubernetes master is running at https://kubernetes.docker.internal:6443
EDIT:
In spark-on-k8s cluster-mode the k8s://<api_server_host>:<k8s-apiserver-port> should be provided (note that adding the port is must!)
In spark-on-k8s the role of "master" (in spark) is played by kubernetes itself - which is responsible to allocate resources for running your driver and workers.
The real reason for the exception:
java.io.IOException: No FileSystem for scheme: local
was that a Worker of the Spark Standalone cluster wanted to downloadUserJar, but simply didn't recognize local URI scheme.
This is because Spark Standalone does not understand it and, unless I'm mistaken, the only cluster environments that support this local URI scheme are Spark on YARN and Spark on Kubernetes.
And that's where you can connect the dots why this exception was sorted out by changing the master URL. Well, the OP wanted to deploy the Spark application to Kubernetes (and followed the rules for Spark on Kubernetes) while the master URL was spark://192.XXX.XXX.XXX:7077 which is for Spark Standalone.

How to fix: pods "" is forbidden: User "system:anonymous" cannot watch resource "pods" in API group "" in the namespace "default"

I am trying to run my spark over k8, I have set up my RBAC using the below commands:
kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
Spark command from outside of k8 cluster:
bin/spark-submit --master k8s://https://<master_ip>:6443 --deploy-mode cluster --conf spark.kubernetes.authenticate.submission.caCertFile=/usr/local/spark/spark-2.4.5-bin-hadoop2.7/ca.crt --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=bitnami/spark:latest test.py
error:
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: pods "test-py-1590306482639-driver" is forbidden: User "system:anonymous" cannot watch resource "pods" in API group "" in the namespace "default"
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:206)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:134)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:350)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:759)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:738)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:69)
at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.apply(KubernetesClientApplication.scala:140)
at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.apply(KubernetesClientApplication.scala:140)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2542)
at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/05/24 07:48:04 INFO ShutdownHookManager: Shutdown hook called
20/05/24 07:48:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-f0eeb957-a02e-458f-8778-21fb2307cf42
Spark Docker images source --> docker pull bitnami/spark
I am also giving my crt file here present on the master of k8 cluster. I am trying to run spark-submit command from another GCP instance.
Can someone please help me here i am stuck with this since last couple of days.
Edit
I have created another clusterrole with cluster-admin permission but still it is not working
spark.kubernetes.authenticate applies only to deploy-mode client, and you run with deploy-mode cluster
Depending on how you authenticate to the kubernetes cluster, you might need to provide different config parameters starting with spark.kubernetes.authenticate.submission (these config parameters apply when running with deploy-mode cluster). Look for ~/.kube/config file and search for the user. For example, if the user section specifies
access-token: XXXX
then pass spark.kubernetes.authenticate.submission.oauthToken

starting container process caused \"exec: \\\"driver\\\": executable file not found in $PATH\"\n"

I have spring based based spark 2.3.0 application. I am trying to do spark submit on kubernetes(minikube).
I have Virtual Box with docker and minikube running.
opt/spark/bin/spark-submit --master k8s://https://192.168.99.101:8443 --name cfe2 --deploy-mode cluster --class com.yyy.Application --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=docker.io/anantpukale/spark_app:1.3 local://CashFlow-spark2.3.0-shaded.jar
Below is the stack trace:
start time: N/A
container images: N/A
phase: Pending
status: []
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: cfe2-c4f95aaeaefb3564b8106ad86e245457-driver
namespace: default
labels: spark-app-selector -> spark-dab914d1d34b4ecd9b747708f667ec2b, spark-role -> driver
pod uid: cc3b39e1-3d6e-11e8-ab1d-080027fcb315
creation time: 2018-04-11T09:57:51Z
service account name: default
volumes: default-token-v48xb
node name: minikube
start time: 2018-04-11T09:57:51Z
container images: docker.io/anantpukale/spark_app:1.3
phase: Pending
status: [ContainerStatus(containerID=null, image=docker.io/anantpukale/spark_app:1.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2018-04-11 09:57:52 INFO Client:54 - Waiting for application cfe2 to finish...
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: cfe2-c4f95aaeaefb3564b8106ad86e245457-driver
namespace: default
labels: spark-app-selector -> spark-dab914d1d34b4ecd9b747708f667ec2b, spark-role -> driver
pod uid: cc3b39e1-3d6e-11e8-ab1d-080027fcb315
creation time: 2018-04-11T09:57:51Z
service account name: default
volumes: default-token-v48xb
node name: minikube
start time: 2018-04-11T09:57:51Z
container images: anantpukale/spark_app:1.3
phase: Failed
status: [ContainerStatus(containerID=docker://40eae507eb9b615d3dd44349e936471157428259f583ec6a8ba3bd99d80b013e, image=anantpukale/spark_app:1.3, imageID=docker-pullable://anantpukale/spark_app#sha256:f61b3ef65c727a3ebd8a28362837c0bc90649778b668f78b6a33b7c0ce715227, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://40eae507eb9b615d3dd44349e936471157428259f583ec6a8ba3bd99d80b013e, exitCode=127, finishedAt=Time(time=2018-04-11T09:57:52Z, additionalProperties={}), message=invalid header field value **"oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"driver\\\": executable file not found in $PATH\"\n"**, reason=ContainerCannotRun, signal=null, startedAt=Time(time=2018-04-11T09:57:52Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver
Container image: anantpukale/spark_app:1.3
Container state: Terminated
Exit code: 127
2018-04-11 09:57:52 INFO Client:54 - Application cfe2 finished.
2018-04-11 09:57:52 INFO ShutdownHookManager:54 - Shutdown hook called
2018-04-11 09:57:52 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-d5813d6e-a4af-4bf6-b1fc-dc43c75cd643
Below is image of my docker file.
Error trace suggest that something in docker i have triggered with command "docker".
dockerfile
I was running into this issue. It is related to the docker image ENTRYPOINT. In spark 2.3.0 when using Kubernetes there now is an example of a Dockerfile which uses a specific script in the ENTRYPOINT found in kubernetes/dockerfiles/. If the docker image doesn't use that specific script as the ENTRYPOINT then the container doesn't start up properly. Spark Kubernetes Docker documentation
in you Dockerfile use ENV PATH="/opt/spark/bin:${PATH}" instead of your line.
Is it possible for you to login into the container using
#>docker run -it --rm docker.io/anantpukale/spark_app:1.3 sh
and try running the main program or command that you want to submit.
Based on this output we can try to investigate further .
along with #hichamx suggested changes with below code worked for me to overcome "exec: \"driver\" issue.
spark-submit \
--master k8s://http://127.0.0.1:8001 \
--name cfe2 \
--deploy-mode cluster \
--class com.oracle.Test \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=docker/anantpukale/spark_app:1.1 \
--conf spark.kubernetes.driver.container.image=docker.io/kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.executor.container.image=docker.io/kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 \
local://spark-0.0.1-SNAPSHOT.jar
Though this gave Error: Exit Code :127 and spark-kubernetes-driver terminated.

Spark execution on kubernetes - driver pod fails

I try to run simple spark code on kubernetes cluster using spark 2.3 native kubernetes deployment feature.
I have a kubernetes cluster running. At this time, the spark code does not read or write data. It creates an RDD from list and print out the result, just to validate the ability to run kubernetes on spark. Also, copied the spark app jar in the kubernetes container image too.
Below is the command i run.
bin/spark-submit --master k8s://https://k8-master --deploy-mode cluster --name sparkapp --class com.sparrkonk8.rdd.MockWordCount --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=myapp/sparkapp:1.0.0 local:///SparkApp.jar
2018-03-06 10:31:28 INFO LoggingPodStatusWatcherImpl:54 - State
changed, new state: pod name:
sparkapp-6e475a6ae18d3b7a89ca2b5f6ae7aae4-driver namespace: default
labels: spark-app-selector ->
spark-9649dd66e9a946d989e2136d342ef249, spark-role -> driver pod
uid: 6d3e98cf-2153-11e8-85af-1204f474c8d2 creation time:
2018-03-06T15:31:23Z service account name: default volumes:
default-token-vwxvr node name: 192-168-1-1.myapp.engg.com start
time: 2018-03-06T15:31:23Z container images:
dockerhub.com/myapp/sparkapp:1.0.0 phase: Failed status:
[ContainerStatus(containerID=docker://3617a400e4604600d5fcc69df396facafbb2d9cd485a63bc324c1406e72f0d35,
image=dockerhub.com/myapp/sparkapp:1.0.0,
imageID=docker-pullable://dockerhub.com/sparkapp#sha256:f051d86384422dff3e8c8a97db823de8e62af3ea88678da4beea3f58cdb924e5,
lastState=ContainerState(running=null, terminated=null, waiting=null,
additionalProperties={}), name=spark-kubernetes-driver, ready=false,
restartCount=0, state=ContainerState(running=null,
terminated=ContainerStateTerminated(containerID=docker://3617a400e4604600d5fcc69df396facafbb2d9cd485a63bc324c1406e72f0d35,
exitCode=1, finishedAt=Time(time=2018-03-06T15:31:24Z,
additionalProperties={}), message=null, reason=Error, signal=null,
startedAt=Time(time=2018-03-06T15:31:24Z, additionalProperties={}),
additionalProperties={}), waiting=null, additionalProperties={}),
additionalProperties={})] 2018-03-06 10:31:28 INFO
LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver Container image:
myapp/sparkapp:1.0.0 Container state: Terminated Exit code: 1
Below is the spark config that driver pod submits. I pulled this from K8s UI. #TobiSH let me know if this helps for troubleshooting my issue.
SPARK_DRIVER_MEMORY: 1g
SPARK_DRIVER_CLASS: com.sparrkonk8.rdd.MockWordCount
SPARK_DRIVER_ARGS:
SPARK_DRIVER_BIND_ADDRESS:
SPARK_MOUNTED_CLASSPATH: /SparkApp.jar:/SparkApp.jar
SPARK_JAVA_OPT_0: -Dspark.kubernetes.executor.podNamePrefix=sparkapp-028d46fa109e309b8dfe1a4eceb46b61
SPARK_JAVA_OPT_1: -Dspark.app.name=sparkapp
SPARK_JAVA_OPT_2: -Dspark.kubernetes.driver.pod.name=sparkapp-028d46fa109e309b8dfe1a4eceb46b61-driver
SPARK_JAVA_OPT_3: -Dspark.executor.instances=5
SPARK_JAVA_OPT_4: -Dspark.submit.deployMode=cluster
SPARK_JAVA_OPT_5: -Dspark.driver.blockManager.port=7079
SPARK_JAVA_OPT_6: -Dspark.kubernetes.container.image=docker.com/myapp/sparkapp:1.0.0
SPARK_JAVA_OPT_7: -Dspark.app.id=spark-5e3beb5109174f40a84635b786789c30
SPARK_JAVA_OPT_8: -Dspark.master= k8s://https://k8-master
SPARK_JAVA_OPT_9: -Dspark.driver.host=sparkapp-028d46fa109e309b8dfe1a4eceb46b61-driver-svc.default.svc
SPARK_JAVA_OPT_10: -Dspark.jars=/opt/spark/work-dir/SparkApp.jar,/opt/spark/work-dir/SparkApp.jar
SPARK_JAVA_OPT_11: -Dspark.driver.port=7078`
Since there are no logs, this means it is immediately crashing upon container creation. I recommend trying to run this with the local master configuration to ensure everything on the spark side is good and then try it again via kubernetes as the master.

Resources