Spark connect to Kafka get error - apache-spark

why am I getting the java.lang.NoClassDefFoundError error on spark-submit?
stacktrace shown below
[appadm#elk01 spark]$ bin/spark-submit --class "com.ipponusa.SparkStringConsumer" --master localhost:9092 samples/my-app-1.0-SNAPSHORT.jar 1
Hello
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at kafka.utils.Pool.<init>(Pool.scala:26)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.<init>(FetchRequestAndResponseStats.scala:61)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.<clinit>(FetchRequestAndResponseStats.scala)
at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:44)
at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:52)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:345)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:342)

You have to use the same version of Scala as the one with which you compliled you Spark code. Check them and probably you'll need to downgrade your Scala version.

Related

Azure Blob Storage Spark

I'm trying to connect Spark to azure blob storage (wasbs).
I add the following jars in the hadoop classpath
com.microsoft.azure_azure-storage-7.0.0.jar
org.apache.hadoop_hadoop-annotations-3.1.2.jar
org.apache.hadoop_hadoop-auth-3.1.2.jar
org.apache.hadoop_hadoop-azure-3.1.2.jar
org.apache.hadoop_hadoop-common-3.1.2.jar
org.eclipse.jetty_jetty-http-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-io-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-security-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-server-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-servlet-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-webapp-9.3.24.v20180605.jar
org.eclipse.jetty_jetty-xml-9.3.24.v20180605.jar
and i try to use spark-submit using:
spark-submit --class mainClass --jars jars/org.apache.hadoop_hadoop-azure-3.1.2.jar,jars/com.microsoft.azure_azure-storage-7.0.0.jar,jars/org.apache.hadoop_hadoop-common-3.1.2.jar myjar.jar
and i get the following exception:
Exception in thread "main" java.lang.NoClassDefFoundError: org/eclipse/jetty/util/ajax/JSON$Convertor
If i remove hadoop-commons from the spark-submit --jars i get:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities
and if i add --jars jars/* to include all the jar files along with the jetty-utils i get
java.lang.ClassNotFoundException: my.package.MainClass
i saw similar posts that indicate multiple versions of jetty, but i can't find other versions anywhere.
For the first exception, you're missing jetty util
https://mvnrepository.com/artifact/org.eclipse.jetty/jetty-util/9.3.24.v20180605
And you should verify hadoop classpath returns what you want
For the remaining exceptions, you should verify that you can run hadoop fs - ls wasb://path on each potential Spark executor

spark2-submit throwing error with multiple packages (--packages)

I'm trying to submit following Spark2 job on CDH 5.16 cluster and it's only taking first parameter of --packages option and throwing error for second parameter
spark2-submit --packages com.databricks:spark-xml_2.11:0.4.1, com.databricks:spark-csv_2.11:1.5.0 /path/to/python-script
Exception in thread "main" org.apache.spark.SparkException: Cannot load main class from JAR com.databricks:spark-csv_2.11:1.5.0 with URI com.databricks. Please specify a class through --class.
at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:657)
at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:224)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:116)
at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$1.<init>(SparkSubmit.scala:911)
at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:911)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:81)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Running this job in CDH5.16 cluster and spark installed with Spark2 CSD
Thanks in advance.
Don't give space between packages
spark2-submit --packages com.databricks:spark-xml_2.11:0.4.1,com.databricks:spark-csv_2.11:1.5.0 /path/to/python-script

java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession in scala eclipse

I have written a Spark Job in Java in scala eclipse. If I submit the Job using spark-submit command then it is working fine. but when I run the application in scala eclipse, it gives below error:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/sql/SparkSession
it gives error in below line code
val SPARK_SESSION = SparkSession.builder().appName("spark-job1").master("local[*]").getOrCreate()

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration while running spark Hbase using java API

I am running simple application to get data from HBase in Spark using java API
running spark-submit command e.g.
bin/spark-submit --master spark://192.168.43.75:7077 --class com.scry.NLPAnnotationController --driver-class-path /usr/lib/hbase/hbase-0.98.22-hadoop2/conf:$SPARK_HOME/lib_managed/jars/*.jar:$HBASE_CLASSPATH/*.jar --jars $SPARK_HOME/lib_managed/jars/*.jar:$HBASE_CLASSPATH/*.jar /home/deepak/hbase.jar
It gives error like
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
Please help to resolve this issue.
Thanks in advance,
Deepak
In the console where you are running spark-submit, first execute below command and then run spark-submit
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_CLASSPATH
If this works then add this entry in your hadoop-env.sh
Hope it helps...

Error starting pyspark with options (Without Spack packages)

can anyone tell me why I'm getting the errors below ? According to the
README for the pyspark-cassandra connector, what I am trying below should work (without Spark packages): https://github.com/TargetHolding/pyspark-cassandra
$ pyspark_jar="$HOME/devel/sandbox/Learning/Spark/pyspark-cassandra/target/scala-2.10/pyspark-cassandra-assembly-0.2.2.jar"
$ pyspark_egg="$HOME/devel/sandbox/Learning/Spark/pyspark-cassandra/target/pyspark_cassandra-0.2.2-py2.7.egg"
$ pyspark --jars $pyspark_jar --py_files $pyspark_egg --conf spark.cassandra.connection.host=localhost
This results in:
Exception in thread "main" java.lang.IllegalArgumentException: pyspark does not support any application options.
at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:222)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildPySparkShellCommand(SparkSubmitCommandBuilder.java:239)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:113)
at org.apache.spark.launcher.Main.main(Main.java:74)
Figured out the problem. I needed to use
--py-files
instead of
--py_files

Resources