Runtime error: ClassNotFound exception with Spark - apache-spark

I want to run an OOTB example of Spark streaming (version 1.6) by itself, to build on it. I am able to compile and run the example as is, bundled with the other code samples.
I.e. :
./bin/run-example streaming.StatefulNetworkWordCount localhost 9999
However i am unable to do so (same code) in my own project.
Any help ?
build.sbt:
import sbtassembly.AssemblyKeys
name := "stream-test"
version := "1.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" % "1.6.0"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.6.0"
libraryDependencies += "org.json4s" %% "json4s-native" % "3.2.10"
libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.4" % "test"
assemblyJarName in assembly := "stream_test_" + version.value + ".jar"
assemblyMergeStrategy in assembly := {
case PathList("org", "apache", xs # _*) => MergeStrategy.last
case PathList("com", "google", xs # _*) => MergeStrategy.last
case PathList("com", "esotericsoftware", xs # _*) => MergeStrategy.last
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
This compiles fine. However, getting an error upon running:
(note that i am using spark 1.6 to run this):
$ ../../../app/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --jars /app/spark-streaming_2.11-1.6.0.jar --master local[4] --class "StatefulNetworkWordCount" ./target/scala-2.10/stream-test_2.10-1.0.jar localhost 9999
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/02/10 22:16:30 INFO SparkContext: Running Spark version 1.4.1
2016-02-10 22:16:32.451 java[86932:5664316] Unable to load realm info from SCDynamicStore
16/02/10 22:16:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
..
16/02/10 22:16:39 INFO Utils: Successfully started service 'sparkDriver' on port 60720.
16/02/10 22:16:40 INFO SparkEnv: Registering MapOutputTracker
16/02/10 22:16:40 INFO SparkEnv: Registering BlockManagerMaster
16/02/10 22:16:42 INFO SparkUI: Started SparkUI at http://xxx:4040
16/02/10 22:16:43 INFO SparkContext: Added JAR file://app/spark-streaming_2.11-1.6.0.jar at http://xxx:60721/jars/spark-streaming_2.11-1.6.0.jar with timestamp 1455171403485
16/02/10 22:16:43 INFO SparkContext: Added JAR file:/projects/spark/test/./target/scala-2.10/stream-test_2.10-1.0.jar at http://xxx:60721/jars/stream-test_2.10-1.0.jar with timestamp 1455171403562
16/02/10 22:16:44 INFO Executor: Starting executor ID driver on host localhost
16/02/10 22:16:44 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60722.
..
16/02/10 22:16:44 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.streaming.dstream.PairDStreamFunctions.mapWithState(Lorg/apache/spark/streaming/StateSpec;Lscala/reflect/ClassTag;Lscala/reflect/ClassTag;)Lorg/apache/spark/streaming/dstream/MapWithStateDStream;
at StatefulNetworkWordCount$.main(StatefulNetworkWordCount.scala:50)
at StatefulNetworkWordCount.main(StatefulNetworkWordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
This method is in the Jar's class, so i dont understand ..

Found the answer ..
Even though i was running spark-submit from 1.6, my SPARK_HOME was still pointing to the previous version of Spark, 1.4.
So setting SPARK_HOME to the 1.6 version instead, same as the spark-submit to run the code, solved the problem.

Related

Error while deploy: Class org.apache.hadoop.fs.LocalFileSystem not found

I am trying to write a parquet file in Scala/sbt. My code works fine on my computer but always fails when deploy on a server with Jenkins.
I have the following error:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:288)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.parquet.hadoop.util.HadoopOutputFile.fromPath(HadoopOutputFile.java:58)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:677)
at com.github.mjakubowski84.parquet4s.ParquetWriter$.internalWriter(ParquetWriter.scala:129)
at com.github.mjakubowski84.parquet4s.ParquetWriterImpl.<init>(ParquetWriter.scala:186)
at com.github.mjakubowski84.parquet4s.ParquetWriter$BuilderImpl.build(ParquetWriter.scala:111)
at com.github.mjakubowski84.parquet4s.ParquetWriter$BuilderImpl.writeAndClose(ParquetWriter.scala:113)
at ParquetExport$.$anonfun$tryExport$1(ParquetExport.scala:307)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:658)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
... 29 more
I first tried to use spark:
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.3.0"
And then changed to code to work with these:
libraryDependencies ++= Seq(
"com.github.mjakubowski84" %% "parquet4s-core" % "2.6.0",
"org.apache.hadoop" % "hadoop-client" % "2.10.2"
)
And still encountering the same error.
Setting the Hadoop configuration didn't help:
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getname)
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getname)
Neither did changing the ClassLoader:
Thread.currentThread.setContextClassLoader(getClass.getClassLoader)
Everything work fine in local but not on the server. Any idea?
I already faced the same issue.
If you look at your logs :
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
You understand that it can't find the name of the class LocalFileSystem you set to your hadoop config.
Try to change this :
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getname)
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getname)
To :
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem")
hadoopConfig.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem")

Though I have setMaster as local, my spark application gives error

I have the following application (I am starting and stopping spark) in Windows. I use Scala-IDE(Eclipse). I get "A master URL must be set in your configuration" error even though I have set it here. I use spark-2.4.4 version.
Can someone please help me to fix this issue.
import org.apache.spark._;
import org.apache.spark.sql._;
object SampleApp {
def main(args: Array[String]) {
val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("Simple Application")
val sc = new SparkContext(conf)
sc.stop()
}
}
The error is:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/10/28 22:58:56 INFO SparkContext: Running Spark version 2.4.4
19/10/28 22:58:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/10/28 22:58:56 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:368)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.spark.renga.SampleApp$.main(SampleApp.scala:8)
at com.spark.renga.SampleApp.main(SampleApp.scala)
19/10/28 22:58:56 ERROR Utils: Uncaught exception in thread main
java.lang.NullPointerException
at org.apache.spark.SparkContext.postApplicationEnd(SparkContext.scala:2416)
at org.apache.spark.SparkContext.$anonfun$stop$2(SparkContext.scala:1931)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1931)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:585)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.spark.renga.SampleApp$.main(SampleApp.scala:8)
at com.spark.renga.SampleApp.main(SampleApp.scala)
19/10/28 22:58:56 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:368)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.spark.renga.SampleApp$.main(SampleApp.scala:8)
at com.spark.renga.SampleApp.main(SampleApp.scala)
if you are using version 2.4.4 try this:
import org.apache.spark.sql.SparkSession
object SampleApp {
def main(args: Array[String]) {
val spark = SparkSession
.builder
.master("local[*]")
.appName("test")
.getOrCreate()
println(spark.sparkContext.version)
spark.stop()
}
}

Exception while integrating spark with Kafka

Doing Spark-kafka streaming on word-count. Built a jar using sbt.
When I do spark-submit the following exception is throwing.
Exception in thread "streaming-start" java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.isDirectory()Z
at org.apache.spark.streaming.util.FileBasedWriteAheadLog.initializeOrRecover(FileBasedWriteAheadLog.scala:245)
at org.apache.spark.streaming.util.FileBasedWriteAheadLog.<init>(FileBasedWriteAheadLog.scala:80)
at org.apache.spark.streaming.util.WriteAheadLogUtils$$anonfun$2.apply(WriteAheadLogUtils.scala:142)
at org.apache.spark.streaming.util.WriteAheadLogUtils$$anonfun$2.apply(WriteAheadLogUtils.scala:142)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.streaming.util.WriteAheadLogUtils$.createLog(WriteAheadLogUtils.scala:141)
at org.apache.spark.streaming.util.WriteAheadLogUtils$.createLogForDriver(WriteAheadLogUtils.scala:99)
at org.apache.spark.streaming.scheduler.ReceivedBlockTracker$$anonfun$createWriteAheadLog$1.apply(ReceivedBlockTracker.scala:256)
at org.apache.spark.streaming.scheduler.ReceivedBlockTracker$$anonfun$createWriteAheadLog$1.apply(ReceivedBlockTracker.scala:254)
at scala.Option.map(Option.scala:146)
at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.createWriteAheadLog(ReceivedBlockTracker.scala:254)
at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.<init>(ReceivedBlockTracker.scala:77)
at org.apache.spark.streaming.scheduler.ReceiverTracker.<init>(ReceiverTracker.scala:109)
at org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:87)
at org.apache.spark.streaming.StreamingContext$$anonfun$liftedTree1$1$1.apply$mcV$sp(StreamingContext.scala:583)
at org.apache.spark.streaming.StreamingContext$$anonfun$liftedTree1$1$1.apply(StreamingContext.scala:578)
at org.apache.spark.streaming.StreamingContext$$anonfun$liftedTree1$1$1.apply(StreamingContext.scala:578)
at org.apache.spark.util.ThreadUtils$$anon$2.run(ThreadUtils.scala:126)
18/03/27 12:43:55 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#12010fd1{/streaming,null,AVAILABLE,#Spark}
18/03/27 12:43:55 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#552ed807{/streaming/json,null,AVAILABLE,#Spark}
18/03/27 12:43:55 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#7318daf8{/streaming/batch,null,AVAILABLE,#Spark}
18/03/27 12:43:55 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#3f1ddac2{/streaming/batch/json,null,AVAILABLE,#Spark}
18/03/27 12:43:55 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#37864b77{/static/streaming,null,AVAILABLE,#Spark}
18/03/27 12:43:55 INFO streaming.StreamingContext: StreamingContext started
my spark submit:
spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 --class "KafkaWordCount" --master local[4] scala_project_2.11-1.0.jar localhost:2181 test-consumer-group word-count 1
scala_version: 2.11.8
spark_version: 2.2.0
sbt_version: 1.0.3
object KafkaWordCount {
def main(args: Array[String]) {
val (zkQuorum, group, topics, numThreads) = ("localhost:2181", "test-consumer-group", "word-count", 1)
val sparkConf = new SparkConf()
.setMaster("local[*]")
.setAppName("KafkaWordCount")
val ssc = new StreamingContext(sparkConf, Seconds(2))
ssc.checkpoint("checkpoint")
val topicMap = topics.split(",").map((_, numThreads)).toMap
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
val words = lines.flatMap(_.split(" "))
words.foreachRDD(rdd => println("#####################rdd###################### " + rdd.first))
val wordCounts = words.map(x => (x, 1L))
.reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)
wordCounts.print()
ssc.start()
ssc.awaitTermination()
}
}

Error connecting to Zookeeper with Spark Streaming Structured

My Spark Streaming Structured keeps disconnecting from Zookeeper when trying to read from a Kafkatopic:
WARN clients.NetworkClient: Bootstrap broker [zk host]:2181 disconnected
When I check the ZK logs, I see this error being prompted all the time:
Exception causing close of session 0x0 due to java.io.EOFException
I´m running on Cloudera 5.11 with Spark 2.1, these are my SBT libraries:
val sparkVer = "2.1.0"
Seq(
"org.apache.spark" %% "spark-core" % sparkVer % "provided" withSources(),
"org.apache.spark" %% "spark-streaming" % sparkVer % "provided",
"org.apache.spark" %% "spark-sql" % sparkVer % "provided",
"org.apache.spark" % "spark-sql-kafka-0-10_2.11" % sparkVer
)
This is my submit command:
# Set KAFKA to 0.10 see (https://community.cloudera.com/t5/Data-Ingestion-Integration/KafkaConsumer-subscribe-0-9-vs-0-10-in-Structured-streaming/td-p/60161)
export SPARK_KAFKA_VERSION=0.10
spark2-submit --class myMainClass --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0 myapp.jar topic2345 [zk host 1]:2181,[zk host 2]:2181
And this is the code creating the stream:
private def createKafkaStrem(spark: SparkSession, args: Array[String]) = {
spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", args(1))
.option("subscribe", args(0))
.load()
}
EDIT: After activating the DEBUG ouput, this is the complete error stack:
java.io.EOFException
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323)
at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:974)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$org$apache$spark$sql$kafka010$KafkaSource$$fetchLatestOffsets$1.apply(KafkaSource.scala:374)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$org$apache$spark$sql$kafka010$KafkaSource$$fetchLatestOffsets$1.apply(KafkaSource.scala:372)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$withRetriesWithoutInterrupt$1.apply$mcV$sp(KafkaSource.scala:442)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$withRetriesWithoutInterrupt$1.apply(KafkaSource.scala:441)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$withRetriesWithoutInterrupt$1.apply(KafkaSource.scala:441)
at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:79)
at org.apache.spark.sql.kafka010.KafkaSource.withRetriesWithoutInterrupt(KafkaSource.scala:440)
at org.apache.spark.sql.kafka010.KafkaSource.org$apache$spark$sql$kafka010$KafkaSource$$fetchLatestOffsets(KafkaSource.scala:372)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$initialPartitionOffsets$1.apply(KafkaSource.scala:141)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$initialPartitionOffsets$1.apply(KafkaSource.scala:138)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.kafka010.KafkaSource.initialPartitionOffsets$lzycompute(KafkaSource.scala:138)
at org.apache.spark.sql.kafka010.KafkaSource.initialPartitionOffsets(KafkaSource.scala:121)
at org.apache.spark.sql.kafka010.KafkaSource.getOffset(KafkaSource.scala:157)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9$$anonfun$apply$5.apply(StreamExecution.scala:391)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9$$anonfun$apply$5.apply(StreamExecution.scala:391)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:265)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:46)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9.apply(StreamExecution.scala:390)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9.apply(StreamExecution.scala:388)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch(StreamExecution.scala:388)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$populateStartOffsets(StreamExecution.scala:362)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply$mcV$sp(StreamExecution.scala:260)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:257)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:257)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:265)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:46)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:257)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:43)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:252)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:187)
18/03/21 10:47:27 DEBUG clients.NetworkClient: Node -2 disconnected.
18/03/21 10:47:27 WARN clients.NetworkClient: Bootstrap broker [zk host]:2181 disconnected
18/03/21 10:47:27 DEBUG clients.NetworkClient: Sending metadata request {topics=[topic2345]} to node -1
18/03/21 10:47:27 DEBUG network.Selector: Connection with /[zk host] disconnected
kafka.bootstrap.servers takes a list of Kafka brokers, not a Zookeeper quorum.
The "new" Kafka Consumer API does not use a Zookeeper connection string

Exception in thread "main" java.lang.NoClassDefFoundError: com/typesafe/config/ConfigFactory

I'm getting the above error message when I'm trying to run spark submit comment :
spark-submit --class "retail.DataValidator" --master local --executor-memory 2g --total-executor-cores 2 sample-spark-180417_2.11-1.0.jar /home/hduser/Downloads/inputfiles/ /home/hduser/output/
ERROR Message:
Exception in thread "main" java.lang.NoClassDefFoundError: com/typesafe/config/ConfigFactory
at retail.DataValidator$.main(DataValidator.scala:12)
at retail.DataValidator.main(DataValidator.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.typesafe.config.ConfigFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
build.sbt file:
name := "sample-spark-180417"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0"
libraryDependencies += "com.typesafe" % "config" % "1.3.1"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
libraryDependencies += "org.apache.spark" % "spark-hive_2.11" % "2.1.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.42"
libraryDependencies += "org.scala-lang" % "scala-swing" % "2.10+"
I'm not having any maven dependencies or pom.xml file.
Thanks
As it's not a fat jar, the spark cluster is not having typesafe jar in its classpath.
Submit the spark job as::
spark-submit --jars ./typesafe-***.jar --class "retail.DataValidator" --master local --executor-memory 2g --total-executor-cores 2 sample-spark-180417_2.11-1.0.jar /home/hduser/Downloads/inputfiles/ /home/hduser/output/
It will keep that jar in classpath, and submit the job.

Resources