How to deal with error code 101 for Spark-submit on Kubernetes - apache-spark

I am trying to run the following code to submit a spark application to a kubernetes' cluster:
/opt/spark/bin/spark-submit --master k8s://https://<spark-master-ip>:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=newfrontdocker/spark:v3.0.1-j14 local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar 100
When I try to run this code, the container runs temporarily before I end up getting this particular error message and the container terminates:
container status:
container name: spark-kubernetes-driver
container image: newfrontdocker/spark:v3.0.1-j14
container state: terminated
container started at: 2021-07-17T11:49:46Z
container finished at: 2021-07-17T11:49:48Z
exit code: 101
termination reason: Error
21/07/17 06:49:13 INFO LoggingPodStatusWatcherImpl: Application status for spark-c15c11340f204794b51bf8d79397bf9e (phase: Failed)
21/07/17 06:49:13 INFO LoggingPodStatusWatcherImpl: Container final statuses:
container name: spark-kubernetes-driver
container image: newfrontdocker/spark:v3.0.1-j14
container state: terminated
container started at: 2021-07-17T11:49:46Z
container finished at: 2021-07-17T11:49:48Z
exit code: 101
termination reason: Error
21/07/17 06:49:13 INFO LoggingPodStatusWatcherImpl: Application spark-pi with submission ID default:spark-pi-4ed9627ab44c778d-driver finished
21/07/17 06:49:14 INFO ShutdownHookManager: Shutdown hook called
21/07/17 06:49:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-3d1a7ff3-1dc9-4db0-acfa-ed52706122b6
What is the exit code 101 and what do I need to do to fix this issue so I run spark apps on the kubernetes' cluster?

In my case the reason was that the jar path was incorrect.
I guess that the path should be:
local:///opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar
(3.0.1 instead of 3.1.1), because you use newfrontdocker/spark:v3.0.1-j14 image.

Related

Spark Kubernetes Error : Pod Already Exists

When i try to submit my app through spark-submit i get the following error:
Please help me resolve the problem
Error:
pod name: newdriver
namespace: default
labels: spark-app-selector -> spark-a17960c79886423383797eaa77f9f706, spark-role -> driver
pod uid: 0afa41ae-4e4c-47be-86a3-1ef77739506c
creation time: 2020-05-06T14:11:29Z
service account name: spark
volumes: spark-local-dir-1, spark-conf-volume, spark-token-tks2g
node name: minikube
start time: 2020-05-06T14:11:29Z
phase: Running
container status:
container name: spark-kubernetes-driver
container image: spark-py:v3.0
container state: running
container started at: 2020-05-06T14:11:31Z
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://172.17.0.2:8443/api/v1/namespaces/default/pods. Message: pods "newtrydriver" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=pods, name=newtrydriver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "newtrydriver" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:510)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:449)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:413)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:241)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:819)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:334)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
at org.apache.spark.deploy.k8s.submit.Client.$anonfun$run$2(KubernetesClientApplication.scala:130)
at org.apache.spark.deploy.k8s.submit.Client.$anonfun$run$2$adapted(KubernetesClientApplication.scala:129)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:129)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/05/06 14:11:34 INFO ShutdownHookManager: Shutdown hook called
20/05/06 14:11:34 INFO ShutdownHookManager: Deleting directory /tmp/spark-b7ea9c80-6040-460a-ba43-5c6e656d3039
Configuration for Submitting the job to k8s
./spark-submit
--master k8s://https://172.17.0.2:8443
--deploy-mode cluster
--conf spark.executor.instances=3
--conf spark.kubernetes.container.image=spark-py:v3.0
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
--name newtry
--conf spark.kubernetes.driver.pod.name=newdriver
local:///opt/spark/examples/src/main/python/spark-submit-old.py
Running spark on k8s in Cluster Mode
No other Pod with the name newdriver running on my minikube
Please check if there is a Pod named newdriver in namespace default by running kubectl get pods --namespace default --show-all. You probably already have Terminated or Completed Spark Driver Pod with this name left from the previous runs. If so, delete it by running kubectl delete pod newdriver --namespace default and then try to launch new Spark job again.

Unable to see output or error messages for Spark on Kubernetes

Trying to run a simple Spark application using Kubernetes master. But I don't get the intended output/processing, neither do I see any error messages. The final pod phase is 'Failed' and the error code is 101. The pod logs show the usual log4j warnings, but nothing else.
Running minikube v1.0.1 on windows (amd64) on my office laptop using hyperv. Have already increased the #cpus and memory on minikube VM to 3 and 4 GB as recommended.
Made sure that the applications run fine with Spark Standalone. The first application 'Hello' is supposed to print a 'Hello' message. The second application 'Calculate Monthly Revenue' is supposed to read data from Teradata over JDBC, aggregate it and write the result back to Teradata table over JDBC.
Also made sure that 'hello minikube' works fine.
In all the code snippets below, ... indicates portions omitted for brevity, >>> indicates command prompt.
>>> spark-submit --master k8s://https://153.65.225.219:8443 --deploy-mode cluster --name Hello --class Hello --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=rahulvkulkarni/default:spark-td-run --conf spark.kubernetes.container.image.pullSecrets=regcred local://hello_2.12-0.1.0-SNAPSHOT.jar
log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/05/20 16:59:09 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: hello-1558351748442-driver
...
phase: Pending
status: []
...
19/05/20 16:59:13 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: hello-1558351748442-driver
...
phase: Failed
status: [ContainerStatus(containerID=docker://464c9c0e23d543f20954d373218c9cefefc31107711cbd2ada4d93bb31ce4d80, image=rahulvkulkarni/default:spark-td-run, imageID=docker-pullable://rahulvkulkarni/default#sha256:1de9951c4ac9f0b5f26efa3949e1effa779b0605066f2043738402ce20e8179b, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://464c9c0e23d543f20954d373218c9cefefc31107711cbd2ada4d93bb31ce4d80, exitCode=101, finishedAt=2019-05-17T18:26:41Z, message=null, reason=Error, signal=null, startedAt=2019-05-17T18:26:40Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
19/05/20 16:59:13 INFO LoggingPodStatusWatcherImpl: Container final statuses:
Container name: spark-kubernetes-driver
Container image: rahulvkulkarni/default:spark-td-run
Container state: Terminated
Exit code: 101
19/05/20 16:59:13 INFO Client: Application Hello finished.
...
>>> kubectl logs hello-1558351748442-driver
++ id -u
...
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$#")
+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.5 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class Hello spark-internal
19/05/17 18:26:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.spark.deploy.SparkSubmit$$anon$2).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
What does exit code 101 mean? How to find the actual error?
Then I tried to configure log4j for detailed logging as described in How to stop INFO messages displaying on spark console?. Renamed and used the log4j.properties template provided in the conf directory. But spark-submit is not able to find the log4j.properties file that I have already included in the docker build.
>>> spark-submit --master k8s://https://153.65.225.219:8443 --deploy-mode cluster --files /opt/spark/conf/log4j.properties --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties" --name "Calculate Monthly Revenue" --class mthRev --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=rahulvkulkarni/default:spark-td-run --conf spark.kubernetes.container.image.pullSecrets=regcred local://mthrev_2.10-0.1-SNAPSHOT.jar <username> <password> <server name>
19/05/20 20:02:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/05/20 20:02:52 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: calculate-monthly-revenue-1558362771110-driver
...
Container name: spark-kubernetes-driver
Container image: rahulvkulkarni/default:spark-td-run
Container state: Terminated
Exit code: 1
>>> kubectl logs -c spark-kubernetes-driver calculate-monthly-revenue-1558362771110-driver
++ id -u
...
log4j:ERROR Could not read configuration file from URL [file:/opt/spark/conf/log4j.properties].
java.io.FileNotFoundException: /opt/spark/conf/log4j.properties (No such file or directory)
...
log4j:ERROR Ignoring configuration file [file:/opt/spark/conf/log4j.properties].
19/05/17 21:30:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 2: C:
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.hadoop.fs.Path.<init>(Path.java:93)
at org.apache.hadoop.fs.Globber.glob(Globber.java:211)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657)
at org.apache.spark.deploy.DependencyUtils$.org$apache$spark$deploy$DependencyUtils$$resolveGlobPath(DependencyUtils.scala:192)
at org.apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:147)
at org.apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:145)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.apache.spark.deploy.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:145)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$4.apply(SparkSubmit.scala:355)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$4.apply(SparkSubmit.scala:355)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:355)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 2: C:
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.failExpecting(URI.java:2854)
at java.net.URI$Parser.parse(URI.java:3057)
at java.net.URI.<init>(URI.java:746)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 23 more
[INFO tini (1)] Main child exited normally (with status '1')
I tried several variations of specifying the log4j.properties file: local file on my Windows laptop (file:///C$/Users//spark-2.4.3-bin-hadoop2.7/conf/log4j.properties and file:///C:/Users//spark-2.4.3-bin-hadoop2.7/conf/log4j.properties), local file in the Linux container (file:///opt/spark/conf/log4j.properties). But I keep getting the message:
log4j:ERROR Could not read configuration file from URL [file:/C$/Users/<my-username>/spark-2.4.3-bin-hadoop2.7/conf/log4j.properties].
The IllegalArgumentException exception went away when I tried the path without the colon (C:), i.e. either the Linux path or the Windows path with C$.
But I still don't get the desired output of my program and don't know if/what is the error!
There was a typo in the spark-submit command in the specification of the application jar. I was using only two forward slashes: local://hello_2.12-0.1.0-SNAPSHOT.jar. Hence, Spark was not able to locate it and (I think) was ignoring it silently and then had no work to do. Hence, there was no message. I'd expect it to give a warning at least.
Changed it to three slashes and it moved ahead:
local:///hello_2.12-0.1.0-SNAPSHOT.jar
I now have another issue related to Kubernetes RBAC, which I will solve separately. The log4j issue still remains, but is not a concern for me now.
I solve this by deploying config file to blobs
https://$(container_name).blob.core.windows.net/jars/log4jconfig1
and give him config to spark-submit
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=https://<container_name>.blob.core.windows.net/jars/log4jconfig1" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=https://<container_name>.blob.core.windows.net/jars/log4jconfig1" \

Spark in Yarn Cluster Mode - Yarn client reports FAILED even when job completes successfully

I am experimenting with running Spark in yarn cluster mode (v2.3.0). We have traditionally been running in yarn client mode, but some jobs are submitted from .NET web services, so we have to keep a host process running in the background when using client mode (HostingEnvironment.QueueBackgroundWorkTime...). We are hoping we can execute these jobs in a more "fire and forget" style.
Our jobs continue to run successfully, but we see a curious entry in the logs where the yarn client that submits the job to the application manager is always reporting failure:
18/11/29 16:54:35 INFO yarn.Client: Application report for application_1539978346138_110818 (state: RUNNING)
18/11/29 16:54:36 INFO yarn.Client: Application report for application_1539978346138_110818 (state: RUNNING)
18/11/29 16:54:37 INFO yarn.Client: Application report for application_1539978346138_110818 (state: FINISHED)
18/11/29 16:54:37 INFO yarn.Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: <ip address>
ApplicationMaster RPC port: 0
queue: root.default
start time: 1543510402372
final status: FAILED
tracking URL: http://server.host.com:8088/proxy/application_1539978346138_110818/
user: p800s1
Exception in thread "main" org.apache.spark.SparkException: Application application_1539978346138_110818 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1153)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1568)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/11/29 16:54:37 INFO util.ShutdownHookManager: Shutdown hook called
We always create a SparkSession and always return sys.exit(0) (although that appears to be ignored by the Spark framework regardless of how we submit a job). We also have our own internal error logging that routes to Kafka/ElasticSearch. No errors are reported during the job run.
Here's an example of the submit command: spark2-submit --keytab /etc/keytabs/p800s1.ktf --principal p800s1#OURDOMAIN.COM --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 4g --class com.path.to.MainClass /path/to/UberJar.jar arg1 arg2
This seems to be harmless noise, but I don't like noise that I don't understand. Has anyone experienced something similar?

starting container process caused \"exec: \\\"driver\\\": executable file not found in $PATH\"\n"

I have spring based based spark 2.3.0 application. I am trying to do spark submit on kubernetes(minikube).
I have Virtual Box with docker and minikube running.
opt/spark/bin/spark-submit --master k8s://https://192.168.99.101:8443 --name cfe2 --deploy-mode cluster --class com.yyy.Application --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=docker.io/anantpukale/spark_app:1.3 local://CashFlow-spark2.3.0-shaded.jar
Below is the stack trace:
start time: N/A
container images: N/A
phase: Pending
status: []
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: cfe2-c4f95aaeaefb3564b8106ad86e245457-driver
namespace: default
labels: spark-app-selector -> spark-dab914d1d34b4ecd9b747708f667ec2b, spark-role -> driver
pod uid: cc3b39e1-3d6e-11e8-ab1d-080027fcb315
creation time: 2018-04-11T09:57:51Z
service account name: default
volumes: default-token-v48xb
node name: minikube
start time: 2018-04-11T09:57:51Z
container images: docker.io/anantpukale/spark_app:1.3
phase: Pending
status: [ContainerStatus(containerID=null, image=docker.io/anantpukale/spark_app:1.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2018-04-11 09:57:52 INFO Client:54 - Waiting for application cfe2 to finish...
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: cfe2-c4f95aaeaefb3564b8106ad86e245457-driver
namespace: default
labels: spark-app-selector -> spark-dab914d1d34b4ecd9b747708f667ec2b, spark-role -> driver
pod uid: cc3b39e1-3d6e-11e8-ab1d-080027fcb315
creation time: 2018-04-11T09:57:51Z
service account name: default
volumes: default-token-v48xb
node name: minikube
start time: 2018-04-11T09:57:51Z
container images: anantpukale/spark_app:1.3
phase: Failed
status: [ContainerStatus(containerID=docker://40eae507eb9b615d3dd44349e936471157428259f583ec6a8ba3bd99d80b013e, image=anantpukale/spark_app:1.3, imageID=docker-pullable://anantpukale/spark_app#sha256:f61b3ef65c727a3ebd8a28362837c0bc90649778b668f78b6a33b7c0ce715227, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://40eae507eb9b615d3dd44349e936471157428259f583ec6a8ba3bd99d80b013e, exitCode=127, finishedAt=Time(time=2018-04-11T09:57:52Z, additionalProperties={}), message=invalid header field value **"oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"driver\\\": executable file not found in $PATH\"\n"**, reason=ContainerCannotRun, signal=null, startedAt=Time(time=2018-04-11T09:57:52Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2018-04-11 09:57:52 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver
Container image: anantpukale/spark_app:1.3
Container state: Terminated
Exit code: 127
2018-04-11 09:57:52 INFO Client:54 - Application cfe2 finished.
2018-04-11 09:57:52 INFO ShutdownHookManager:54 - Shutdown hook called
2018-04-11 09:57:52 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-d5813d6e-a4af-4bf6-b1fc-dc43c75cd643
Below is image of my docker file.
Error trace suggest that something in docker i have triggered with command "docker".
dockerfile
I was running into this issue. It is related to the docker image ENTRYPOINT. In spark 2.3.0 when using Kubernetes there now is an example of a Dockerfile which uses a specific script in the ENTRYPOINT found in kubernetes/dockerfiles/. If the docker image doesn't use that specific script as the ENTRYPOINT then the container doesn't start up properly. Spark Kubernetes Docker documentation
in you Dockerfile use ENV PATH="/opt/spark/bin:${PATH}" instead of your line.
Is it possible for you to login into the container using
#>docker run -it --rm docker.io/anantpukale/spark_app:1.3 sh
and try running the main program or command that you want to submit.
Based on this output we can try to investigate further .
along with #hichamx suggested changes with below code worked for me to overcome "exec: \"driver\" issue.
spark-submit \
--master k8s://http://127.0.0.1:8001 \
--name cfe2 \
--deploy-mode cluster \
--class com.oracle.Test \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=docker/anantpukale/spark_app:1.1 \
--conf spark.kubernetes.driver.container.image=docker.io/kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.executor.container.image=docker.io/kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 \
local://spark-0.0.1-SNAPSHOT.jar
Though this gave Error: Exit Code :127 and spark-kubernetes-driver terminated.

Spark execution on kubernetes - driver pod fails

I try to run simple spark code on kubernetes cluster using spark 2.3 native kubernetes deployment feature.
I have a kubernetes cluster running. At this time, the spark code does not read or write data. It creates an RDD from list and print out the result, just to validate the ability to run kubernetes on spark. Also, copied the spark app jar in the kubernetes container image too.
Below is the command i run.
bin/spark-submit --master k8s://https://k8-master --deploy-mode cluster --name sparkapp --class com.sparrkonk8.rdd.MockWordCount --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=myapp/sparkapp:1.0.0 local:///SparkApp.jar
2018-03-06 10:31:28 INFO LoggingPodStatusWatcherImpl:54 - State
changed, new state: pod name:
sparkapp-6e475a6ae18d3b7a89ca2b5f6ae7aae4-driver namespace: default
labels: spark-app-selector ->
spark-9649dd66e9a946d989e2136d342ef249, spark-role -> driver pod
uid: 6d3e98cf-2153-11e8-85af-1204f474c8d2 creation time:
2018-03-06T15:31:23Z service account name: default volumes:
default-token-vwxvr node name: 192-168-1-1.myapp.engg.com start
time: 2018-03-06T15:31:23Z container images:
dockerhub.com/myapp/sparkapp:1.0.0 phase: Failed status:
[ContainerStatus(containerID=docker://3617a400e4604600d5fcc69df396facafbb2d9cd485a63bc324c1406e72f0d35,
image=dockerhub.com/myapp/sparkapp:1.0.0,
imageID=docker-pullable://dockerhub.com/sparkapp#sha256:f051d86384422dff3e8c8a97db823de8e62af3ea88678da4beea3f58cdb924e5,
lastState=ContainerState(running=null, terminated=null, waiting=null,
additionalProperties={}), name=spark-kubernetes-driver, ready=false,
restartCount=0, state=ContainerState(running=null,
terminated=ContainerStateTerminated(containerID=docker://3617a400e4604600d5fcc69df396facafbb2d9cd485a63bc324c1406e72f0d35,
exitCode=1, finishedAt=Time(time=2018-03-06T15:31:24Z,
additionalProperties={}), message=null, reason=Error, signal=null,
startedAt=Time(time=2018-03-06T15:31:24Z, additionalProperties={}),
additionalProperties={}), waiting=null, additionalProperties={}),
additionalProperties={})] 2018-03-06 10:31:28 INFO
LoggingPodStatusWatcherImpl:54 - Container final statuses:
Container name: spark-kubernetes-driver Container image:
myapp/sparkapp:1.0.0 Container state: Terminated Exit code: 1
Below is the spark config that driver pod submits. I pulled this from K8s UI. #TobiSH let me know if this helps for troubleshooting my issue.
SPARK_DRIVER_MEMORY: 1g
SPARK_DRIVER_CLASS: com.sparrkonk8.rdd.MockWordCount
SPARK_DRIVER_ARGS:
SPARK_DRIVER_BIND_ADDRESS:
SPARK_MOUNTED_CLASSPATH: /SparkApp.jar:/SparkApp.jar
SPARK_JAVA_OPT_0: -Dspark.kubernetes.executor.podNamePrefix=sparkapp-028d46fa109e309b8dfe1a4eceb46b61
SPARK_JAVA_OPT_1: -Dspark.app.name=sparkapp
SPARK_JAVA_OPT_2: -Dspark.kubernetes.driver.pod.name=sparkapp-028d46fa109e309b8dfe1a4eceb46b61-driver
SPARK_JAVA_OPT_3: -Dspark.executor.instances=5
SPARK_JAVA_OPT_4: -Dspark.submit.deployMode=cluster
SPARK_JAVA_OPT_5: -Dspark.driver.blockManager.port=7079
SPARK_JAVA_OPT_6: -Dspark.kubernetes.container.image=docker.com/myapp/sparkapp:1.0.0
SPARK_JAVA_OPT_7: -Dspark.app.id=spark-5e3beb5109174f40a84635b786789c30
SPARK_JAVA_OPT_8: -Dspark.master= k8s://https://k8-master
SPARK_JAVA_OPT_9: -Dspark.driver.host=sparkapp-028d46fa109e309b8dfe1a4eceb46b61-driver-svc.default.svc
SPARK_JAVA_OPT_10: -Dspark.jars=/opt/spark/work-dir/SparkApp.jar,/opt/spark/work-dir/SparkApp.jar
SPARK_JAVA_OPT_11: -Dspark.driver.port=7078`
Since there are no logs, this means it is immediately crashing upon container creation. I recommend trying to run this with the local master configuration to ensure everything on the spark side is good and then try it again via kubernetes as the master.

Resources