Could not load class | Spark-submit Intellij - apache-spark

I know this was asked here before but my case is a bit different I think.
I have a working simple project in Intellij
enter image description here
when I run through Intellij the program works fine and i can see results but whenever i export as a jar to run locally through spark-submit it fails with error "failed to load class"
Im running using : spark-submit --class com.CarbonEmission --master local[*] MyPath\TestSparkJar.jar
Below is my sbt :
enter image description here
I've been stuck over this for some days now I hope someone can help.

The required --class file is not added in exported jar. Please extract and see the file

Related

Spark-shell does not import specified jar file

I am a complete beginner to all this stuff in general so pardon if I'm missing some totally obvious step. I installed spark 3.1.2 and cassandra 3.11.11 and I'm trying to connect both of them through this guide I found where I made a fat jar for execution. In the link I posted when they execute the spark-shell command with the jar file, there's a line which occurs at the start.
INFO SparkContext: Added JAR file:/home/chbatey/dev/tmp/spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar at http://192.168.0.34:51235/jars/spark-15/01/26 16:16:10 INFO SparkILoop: Created spark context..
I followed all of the steps properly but it doesn't show any line like that in my shell. To confirm that it hasn't been added I try the sample program on that website and it throws an error
java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging
What should I do? I'm using spark-cassandra-connector-3.1.0
You don't need to compile it yourself, just follow official documentation - use --packages to automatically download all dependencies:
spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0
Your error is that connector file doesn't contain dependencies, you need to list all things, like, java driver, etc. So if you still want to use --jars option, then just download assembly version of it (link to jar) - it will contain all necessary dependencies.

how to add third party library to spark running on local machine

i am listening to eventhub stream and have seen article to attach library to cluster(databricks) and my code runs file.
For debugging i am running the code on local machine/cluster, but it fails for missing library. How can i add library when running on local machine.
i tried sparkcontext.addfile(fullpathtojar), but still same error.
You can use spark-submit --packages
Example: spark-submit --packages org.postgresql:postgresql:42.1.1
You would need to find the package that you are using and check the compatibility with spark.
With a single jar file you'd use spark-submit --jars instead.
i used spark-submit --packages {package} and it works.

How to deploy war file in spark-submit command (spark)

I am using
spark-submit --class main.Main --master local[2] /user/sampledata/parser-0.0.1-SNAPSHOT.jar
to run a java-spark code. Is it possible to run this code using war file instead of jar, since I am looking to deploy it on tomcat.
I tried war file but it gives class not found exception.

Error: Unrecognized option: --packages

I'm porting an existing script from BigInsights to Spark on Bluemix. I'm trying to run the following against Spark on Bluemix:
./spark-submit.sh --vcap ./vcap.json --deploy-mode cluster \
--master https://x.x.x.x:8443 --jars ./truststore.jar \
--packages org.elasticsearch:elasticsearch-spark_2.10:2.3.0 \
./export_to_elasticsearch.py ...
However, I get the following error:
Error: Unrecognized option: --packages
How can I pass the --packages parameter?
Bluemix uses a customized Spark version, with a customized spark-submit.sh script that only supports a subset of the original script parameters. You can see all the configuration properties and parameters you can use on its documentation.
Additionally, you can download the Bluemix version of the script from this link, and there you can see that there is no argument --packages.
Therefore, the problem with your approach is that the Bluemix version of spark-submit does not accept the --packages parameter, probably due to security reasons. However, alternatively, you can download the jar for the package you want (and maybe a fat jar for the dependencies) and upload them using the --jars parameter. Note: To avoid the necessity of uploading the jar files each time you call spark-submit, you can pre-upload them using curl. The details of this procedure can be found on this link.
Adding to Daniel's post, while using the method to pre-upload your package, you might want to upload your package to "${cluster_master_url}/tenant/data/libs", since Spark service sets these four spark properties "spark.driver.extraClassPath", "spark.driver.extraLibraryPath", "spark.executor.extraClassPath", and "spark.executor.extraLibraryPath" to ./data/libs/*
Reference: https://console.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index-gentopic3.html#spark-submit_properties

How to lauch prorams in Apache spark?

I have a “myprogram.py” and my “myprogram.scala” that I need to run on my spark machine. How Can I upload and launch them?
I have been using shell to do my transformation and calling actions, but now I want to launch a complete program on spark machine instead of entering single commands every time. Also I believe that will make it easy for me to make changes to my program instead of starting to enter commands in shell.
I did standalone installation in Ubuntu 14.04, on single machine, not a cluster, used spark 1.4.1.
I went through spark docs online, but I only find instruction on how to do that on cluster. Please help me on that.
Thank you.
The documentation to do this (as commented above) is available here: http://spark.apache.org/docs/latest/submitting-applications.html
However, the code you need is here:
# Run application locally on 8 cores
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
/path/to/examples.jar \
100
You'll need to compile the scala file using sbt (documentation here: http://www.scala-sbt.org/0.13/tutorial/index.html)
Here's some information on the build.sbt file you'll need in order to grab the right dependencies: http://spark.apache.org/docs/latest/quick-start.html
Once the scala file is compiled, you'll send the resulting jar using the above submit command.
Put it simply:
In Linux terminal, cd to the directory that spark is unpacked/installed
Note, this folder normally contains subfolders like “bin”, “conf”, “lib”, “logs” and so on.
To run the Python program locally with simple/default settings, type command
./bin/spark-submit --master local[*] myprogram.py
More complete descriptions are here like zero323 and ApolloFortyNine described.

Resources