spark-submit custom jar on kubernetes fails with: failed to load class [duplicate] - apache-spark

This question already has answers here:
How to deploy Spark application jar file to Kubernetes cluster?
(2 answers)
Closed 2 years ago.
I cannot run a custom Spark application on Kubernetes.
I have followed the setup and the steps on examplel like these: and I can run the spark-pi example.
I even recreated the spark image and have it contain my xxx.jar in both /opt/spark/examples/jars and /opt/spark/jars but still I get the failed to load class issue. Any ideas what I may have missed. This is extra baffling to me because I checked that the jar is part of the image right next to the example jars, and those work fine.
I run spark-submit like this.
--master k8s://https://localhost:6443
--deploy-mode cluster
--conf spark.executor.instances=3
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
--conf spark.kubernetes.container.image=spark2:latest
--conf spark.kubernetes.container.image.pullPolicy=Never
--name myApp
Update: Added stacktrace:
spark.driver.bindAddress= --deploy-mode client --properties- file /opt/spark/conf/ --class local:///opt/spark/examples/jars/xxx.jar
20/05/13 11:02:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Error: Failed to load class
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).

Verify the jar which you referring has Application.class file at using jar xf <jar name>.jar
in your spark-submit command


Spark --jars option added jar are not working

I am trying to add redshift jar using spark-submit option:
Running command on Spark 2.1.0
spark-submit --class Test --master spark://xyz.local:7077 --executor-cores 4 --total-executor-cores 32 --executor-memory 6G --driver-memory 4G --driver-cores 2 --deploy-mode cluster -jars s3a://d11-batch-jobs-on-spark/jars/redshift-jdbc42-,s3a://mybucket/jars/spark-redshift_2.11-3.0.0-preview1.jar s3a://mybucket/jars/app.jar
and in code I am reading from redshift table but getting
ClassNotFoundException: com.databricks.spark.redshift.DefaultSource
What am I doing wrong?
I'm having issues using the --jars as well...
My advise is, for packages in the Maven repository, to use --packages instead of --jars, as it resolves other dependencies withing those packages.
spark-submit --packages <groupId>:<artifactId>:<version>
In your case, except all other options and args, it'd look like this:
spark-submit --packages
You can find IDs and version from an XML-style provided by Maven after following the link to your desired version.
The accepted answer to this question provides more info on --jars and -packages

How to share files on master node to executors in Spark, How to use --files argument?

Can someone please explain, How can i ship my files in master to all executors using --files argument in spark-submit
/bin/spark-submit --master yarn --queue development --conf spark.memory.offHeap.enabled=true --conf spark.memory.offHeap.size=128G --files /keras/mnist.npz
But this gives me error. I am new to spark.
Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource.
Obviously you didn't specify the application class on this command. Find more details on Running Spark On Yarn.

Exception in thread "main" org.apache.spark.SparkException: Must specify the driver container image

I am trying to do spark-submit on minikube(Kubernetes) from local machine CLI with command
spark-submit --master k8s:// --name cfe2
--deploy-mode cluster --class com.yyy.Test --conf spark.executor.instances=2 --conf spark.kubernetes.container.image local://spark-0.0.1-SNAPSHOT.jar
I have a simple spark job jar built on verison 2.3.0. I also have containerized it in docker and minikube up and running on virtual box.
Below is exception stack:
Exception in thread "main" org.apache.spark.SparkException: Must specify the driver container image at org.apache.spark.deploy.k8s.submit.steps.BasicDriverConfigurationStep$$anonfun$3.apply(BasicDriverConfigurationStep.scala:51) at org.apache.spark.deploy.k8s.submit.steps.BasicDriverConfigurationStep$$anonfun$3.apply(BasicDriverConfigurationStep.scala:51) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.deploy.k8s.submit.steps.BasicDriverConfigurationStep.<init>(BasicDriverConfigurationStep.scala:51)
at org.apache.spark.deploy.k8s.submit.DriverConfigOrchestrator.getAllConfigurationSteps(DriverConfigOrchestrator.scala:82)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:229)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:227)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2585)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:192)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 2018-04-06 13:33:52 INFO ShutdownHookManager:54 - Shutdown hook called 2018-04-06 13:33:52 INFO ShutdownHookManager:54 - Deleting directory C:\Users\anant\AppData\Local\Temp\spark-6da93408-88cb-4fc7-a2de-18ed166c3c66
Look like bug with default value for parameters spark.kubernetes.driver.container.image, that must be spark.kubernetes.container.image. So try specify driver/executor container image directly:
From the source code, the only available conf options are:
And I noticed that Spark 2.3.0 has changed a lot in terms of k8s implementation compared to 2.2.0. For example, instead of specifying driver and executor separately, the official starter's guide is to use a single image given to spark.kubernetes.container.image.
See if this works:
spark-submit \
--master k8s:// \
--name cfe2 \
--deploy-mode cluster \
--class \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=docker/anantpukale/spark_app:1.1 \
--conf spark.kubernetes.authenticate.submission.oauthToken=YOUR_TOKEN \
--conf spark.kubernetes.authenticate.submission.caCertFile=PATH_TO_YOUR_CERT \
The token and cert can be found on k8s dashboard. Follow the instructions to make Spark 2.3.0 compatible docker images.

Dependency is not distributed to Spark cluster

I'm trying to execute Spark job on Mesos cluster that depends on spark-cassandra-connector library, but it keeps failing with
Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/package$
As I understand from spark documentation
JARs and files are copied to the working directory for each SparkContext on the executor nodes.
Users may also include any other dependencies by supplying a comma-delimited list of maven coordinates with --packages.
But it seems that only pucker-assembly-1.0.jar task jar is distributed.
I'm running spark 1.6.1 with scala 2.10.6.
And here's spark-submit command I'm executing:
spark-submit --deploy-mode cluster
--master mesos://localhost:57811
--conf spark.ssl.noCertVerification=true
--packages datastax:spark-cassandra-connector:1.5.1-s_2.10
--driver-cores 3
--driver-memory 4000M
--class SimpleApp
So why isn't spark-cassandra-connector distributed to all my spark executers?
You should use the correct Maven coordinate syntax:
--packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.0

How to give dependent jars to spark submit in cluster mode

I am running spark using cluster mode for deployment . Below is the command
dse spark-submit -v --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
--executor-memory 512M \
--total-executor-cores 3 \
--deploy-mode "cluster" \
--master spark://$MASTER:7077 \
--jars=$JARS \
--supervise \
--class "com.testclass" $APP_JAR input.json \
--files "/home/test/input.json"
The above command is working fine in client mode. But when I use it in cluster mode I get class not found exception
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$
In client mode the dependent jars are getting copied to the /var/lib/spark/work directory whereas in cluster mode it is not. Please help me in getting this solved.
I am using nfs and I have mounted the same directory on all the spark nodes under same name. Still I get the error. How it is able to pick the application jar which is also under same directory but not the dependent jars ?
In client mode the dependent jars are getting copied to the
/var/lib/spark/work directory whereas in cluster mode it is not.
In Cluster mode, driver pragram is running in the cluster not in local(compared to client mode) and dependent jars should be accessible in cluster, otherwise driver program and executor will throw "java.lang.NoClassDefFoundError" exception.
Actually When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster.
Your extra jars could be added to --jars, they will be copied to cluster automatically.
please refer to "Advanced Dependency Management" section in below link:
As spark documentation says,
Keep all jars and dependencies in same local path in all nodes in cluster or
Keep the jar is distributed files system where all nodes have access to.
Spark properties
