Error connecting to Zookeeper with Spark Streaming Structured - apache-spark

My Spark Streaming Structured keeps disconnecting from Zookeeper when trying to read from a Kafkatopic:
WARN clients.NetworkClient: Bootstrap broker [zk host]:2181 disconnected
When I check the ZK logs, I see this error being prompted all the time:
Exception causing close of session 0x0 due to java.io.EOFException
I´m running on Cloudera 5.11 with Spark 2.1, these are my SBT libraries:
val sparkVer = "2.1.0"
Seq(
"org.apache.spark" %% "spark-core" % sparkVer % "provided" withSources(),
"org.apache.spark" %% "spark-streaming" % sparkVer % "provided",
"org.apache.spark" %% "spark-sql" % sparkVer % "provided",
"org.apache.spark" % "spark-sql-kafka-0-10_2.11" % sparkVer
)
This is my submit command:
# Set KAFKA to 0.10 see (https://community.cloudera.com/t5/Data-Ingestion-Integration/KafkaConsumer-subscribe-0-9-vs-0-10-in-Structured-streaming/td-p/60161)
export SPARK_KAFKA_VERSION=0.10
spark2-submit --class myMainClass --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0 myapp.jar topic2345 [zk host 1]:2181,[zk host 2]:2181
And this is the code creating the stream:
private def createKafkaStrem(spark: SparkSession, args: Array[String]) = {
spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", args(1))
.option("subscribe", args(0))
.load()
}
EDIT: After activating the DEBUG ouput, this is the complete error stack:
java.io.EOFException
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323)
at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:974)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$org$apache$spark$sql$kafka010$KafkaSource$$fetchLatestOffsets$1.apply(KafkaSource.scala:374)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$org$apache$spark$sql$kafka010$KafkaSource$$fetchLatestOffsets$1.apply(KafkaSource.scala:372)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$withRetriesWithoutInterrupt$1.apply$mcV$sp(KafkaSource.scala:442)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$withRetriesWithoutInterrupt$1.apply(KafkaSource.scala:441)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$withRetriesWithoutInterrupt$1.apply(KafkaSource.scala:441)
at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:79)
at org.apache.spark.sql.kafka010.KafkaSource.withRetriesWithoutInterrupt(KafkaSource.scala:440)
at org.apache.spark.sql.kafka010.KafkaSource.org$apache$spark$sql$kafka010$KafkaSource$$fetchLatestOffsets(KafkaSource.scala:372)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$initialPartitionOffsets$1.apply(KafkaSource.scala:141)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$initialPartitionOffsets$1.apply(KafkaSource.scala:138)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.kafka010.KafkaSource.initialPartitionOffsets$lzycompute(KafkaSource.scala:138)
at org.apache.spark.sql.kafka010.KafkaSource.initialPartitionOffsets(KafkaSource.scala:121)
at org.apache.spark.sql.kafka010.KafkaSource.getOffset(KafkaSource.scala:157)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9$$anonfun$apply$5.apply(StreamExecution.scala:391)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9$$anonfun$apply$5.apply(StreamExecution.scala:391)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:265)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:46)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9.apply(StreamExecution.scala:390)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9.apply(StreamExecution.scala:388)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch(StreamExecution.scala:388)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$populateStartOffsets(StreamExecution.scala:362)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply$mcV$sp(StreamExecution.scala:260)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:257)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:257)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:265)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:46)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:257)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:43)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:252)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:187)
18/03/21 10:47:27 DEBUG clients.NetworkClient: Node -2 disconnected.
18/03/21 10:47:27 WARN clients.NetworkClient: Bootstrap broker [zk host]:2181 disconnected
18/03/21 10:47:27 DEBUG clients.NetworkClient: Sending metadata request {topics=[topic2345]} to node -1
18/03/21 10:47:27 DEBUG network.Selector: Connection with /[zk host] disconnected

kafka.bootstrap.servers takes a list of Kafka brokers, not a Zookeeper quorum.
The "new" Kafka Consumer API does not use a Zookeeper connection string

Related

Error while deploy: Class org.apache.hadoop.fs.LocalFileSystem not found

I am trying to write a parquet file in Scala/sbt. My code works fine on my computer but always fails when deploy on a server with Jenkins.
I have the following error:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:288)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.parquet.hadoop.util.HadoopOutputFile.fromPath(HadoopOutputFile.java:58)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:677)
at com.github.mjakubowski84.parquet4s.ParquetWriter$.internalWriter(ParquetWriter.scala:129)
at com.github.mjakubowski84.parquet4s.ParquetWriterImpl.<init>(ParquetWriter.scala:186)
at com.github.mjakubowski84.parquet4s.ParquetWriter$BuilderImpl.build(ParquetWriter.scala:111)
at com.github.mjakubowski84.parquet4s.ParquetWriter$BuilderImpl.writeAndClose(ParquetWriter.scala:113)
at ParquetExport$.$anonfun$tryExport$1(ParquetExport.scala:307)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:658)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
... 29 more
I first tried to use spark:
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.3.0"
And then changed to code to work with these:
libraryDependencies ++= Seq(
"com.github.mjakubowski84" %% "parquet4s-core" % "2.6.0",
"org.apache.hadoop" % "hadoop-client" % "2.10.2"
)
And still encountering the same error.
Setting the Hadoop configuration didn't help:
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getname)
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getname)
Neither did changing the ClassLoader:
Thread.currentThread.setContextClassLoader(getClass.getClassLoader)
Everything work fine in local but not on the server. Any idea?
I already faced the same issue.
If you look at your logs :
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
You understand that it can't find the name of the class LocalFileSystem you set to your hadoop config.
Try to change this :
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getname)
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getname)
To :
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem")
hadoopConfig.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem")

Exception in thread "main" java.lang.NoClassDefFoundError: com/typesafe/config/ConfigFactory

I'm getting the above error message when I'm trying to run spark submit comment :
spark-submit --class "retail.DataValidator" --master local --executor-memory 2g --total-executor-cores 2 sample-spark-180417_2.11-1.0.jar /home/hduser/Downloads/inputfiles/ /home/hduser/output/
ERROR Message:
Exception in thread "main" java.lang.NoClassDefFoundError: com/typesafe/config/ConfigFactory
at retail.DataValidator$.main(DataValidator.scala:12)
at retail.DataValidator.main(DataValidator.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.typesafe.config.ConfigFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
build.sbt file:
name := "sample-spark-180417"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0"
libraryDependencies += "com.typesafe" % "config" % "1.3.1"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
libraryDependencies += "org.apache.spark" % "spark-hive_2.11" % "2.1.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.42"
libraryDependencies += "org.scala-lang" % "scala-swing" % "2.10+"
I'm not having any maven dependencies or pom.xml file.
Thanks
As it's not a fat jar, the spark cluster is not having typesafe jar in its classpath.
Submit the spark job as::
spark-submit --jars ./typesafe-***.jar --class "retail.DataValidator" --master local --executor-memory 2g --total-executor-cores 2 sample-spark-180417_2.11-1.0.jar /home/hduser/Downloads/inputfiles/ /home/hduser/output/
It will keep that jar in classpath, and submit the job.

spark-solr CDH job fails with java.lang.VerifyError at solrj.impl.HttpClientUtil

I'm running CDH 5.7.1 and submitting a Spark job that uses spark-solr 2.0.1 in yarn-cluster mode and the job is failing with the error below that arises from a (possibly) binary compatibility issue in org.apache.solr.client.solrj.impl.HttpClientUtil. The job runs fine in local mode allowing me to index a Spark dataframe to Solr 6.1.0.
I'm building the fat jar that is deployed to the cluster with the sbt-assembly plugin, see below for my build.sbt configuration.
I've tried various things to fix this such as specifying the --jars argument with spark-submit to provide the appropriate solr-solrj and httpclient jars with no luck, as well as numerous build.sbt changes to specify versions of solrj and httpclient explicitly, again with no luck.
Any insights or solutions would be greatly appreciated.
Spark job history error from yarn-cluster mode:
16/07/21 02:10:07 ERROR yarn.ApplicationMaster: User class threw exception: com.google.common.util.concurrent.ExecutionError: java.lang.VerifyError: Bad return type
Exception Details:
Location:
org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient; #58: areturn
Reason:
Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, stack[0]) is not assignable to 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
Current Frame:
bci: #58
flags: { }
locals: { 'org/apache/solr/common/params/SolrParams', 'org/apache/http/conn/ClientConnectionManager', 'org/apache/solr/common/params/ModifiableSolrParams', 'org/apache/http/impl/client/DefaultHttpClient' }
stack: { 'org/apache/http/impl/client/DefaultHttpClient' }
Bytecode:
0x0000000: bb00 0359 2ab7 0004 4db2 0005 b900 0601
0x0000010: 0099 001e b200 05bb 0007 59b7 0008 1209
0x0000020: b600 0a2c b600 0bb6 000c b900 0d02 002b
0x0000030: b800 104e 2d2c b800 0f2d b0
Stackmap Table:
append_frame(#47,Object[#143])
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
at com.lucidworks.spark.util.SolrSupport$.getCachedCloudClient(SolrSupport.scala:93)
at com.lucidworks.spark.util.SolrSupport$.getSolrBaseUrl(SolrSupport.scala:97)
at com.lucidworks.spark.util.SolrQuerySupport$.getUniqueKey(SolrQuerySupport.scala:82)
at com.lucidworks.spark.rdd.SolrRDD.<init>(SolrRDD.scala:32)
at com.lucidworks.spark.SolrRelation.<init>(SolrRelation.scala:63)
at solr.DefaultSource.createRelation(DefaultSource.scala:26)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at com.tempurer.intelligence.searchindexer.SolrIndexer3$.runJob(SolrIndexer3.scala:127)
at com.tempurer.intelligence.searchindexer.SolrIndexer3$.main(SolrIndexer3.scala:80)
at com.tempurer.intelligence.searchindexer.SolrIndexer3.main(SolrIndexer3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
Caused by: java.lang.VerifyError: Bad return type
I'm building the Spark app as a fat jar with the sbt-assembly plugin and here is my build.sbt file:
name := "search-indexer"
version := "0.1.0-SNAPSHOT"
scalaVersion := "2.10.6"
resolvers ++= Seq(
"Cloudera CDH 5.0" at "https://repository.cloudera.com/artifactory/cloudera-repos"
)
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-common" % "2.6.0-cdh5.7.0" % "provided",
"org.apache.hadoop" % "hadoop-hdfs" % "2.6.0-cdh5.7.0" % "provided",
"org.apache.hive" % "hive-exec" % "1.1.0-cdh5.7.0",
"org.apache.spark" % "spark-core_2.10" % "1.6.0-cdh5.7.0" % "provided",
"org.apache.spark" % "spark-sql_2.10" % "1.6.0-cdh5.7.0" % "provided",
"org.apache.spark" % "spark-catalyst_2.10" % "1.6.0-cdh5.7.0" % "provided",
"org.apache.spark" % "spark-mllib_2.10" % "1.6.0-cdh5.7.0" % "provided",
"org.apache.spark" % "spark-graphx_2.10" % "1.6.0-cdh5.7.0" % "provided",
"org.apache.spark" % "spark-streaming_2.10" % "1.6.0-cdh5.7.0" % "provided",
"com.databricks" % "spark-avro_2.10" % "2.0.1",
"com.databricks" % "spark-csv_2.10" % "1.4.0",
"com.lucidworks.spark" % "spark-solr" % "2.0.1",
"com.fasterxml.jackson.core" % "jackson-core" % "2.8.0", // Solves runtime error: java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
"org.scalatest" % "scalatest_2.10" % "2.2.4" % "test"
)
// See: https://github.com/sbt/sbt-assembly
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
}
More info, including the logged YARN context can be found in this github project ticket I opened: https://github.com/lucidworks/spark-solr/issues/75

Runtime error: ClassNotFound exception with Spark

I want to run an OOTB example of Spark streaming (version 1.6) by itself, to build on it. I am able to compile and run the example as is, bundled with the other code samples.
I.e. :
./bin/run-example streaming.StatefulNetworkWordCount localhost 9999
However i am unable to do so (same code) in my own project.
Any help ?
build.sbt:
import sbtassembly.AssemblyKeys
name := "stream-test"
version := "1.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" % "1.6.0"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.6.0"
libraryDependencies += "org.json4s" %% "json4s-native" % "3.2.10"
libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.4" % "test"
assemblyJarName in assembly := "stream_test_" + version.value + ".jar"
assemblyMergeStrategy in assembly := {
case PathList("org", "apache", xs # _*) => MergeStrategy.last
case PathList("com", "google", xs # _*) => MergeStrategy.last
case PathList("com", "esotericsoftware", xs # _*) => MergeStrategy.last
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
This compiles fine. However, getting an error upon running:
(note that i am using spark 1.6 to run this):
$ ../../../app/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --jars /app/spark-streaming_2.11-1.6.0.jar --master local[4] --class "StatefulNetworkWordCount" ./target/scala-2.10/stream-test_2.10-1.0.jar localhost 9999
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/02/10 22:16:30 INFO SparkContext: Running Spark version 1.4.1
2016-02-10 22:16:32.451 java[86932:5664316] Unable to load realm info from SCDynamicStore
16/02/10 22:16:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
..
16/02/10 22:16:39 INFO Utils: Successfully started service 'sparkDriver' on port 60720.
16/02/10 22:16:40 INFO SparkEnv: Registering MapOutputTracker
16/02/10 22:16:40 INFO SparkEnv: Registering BlockManagerMaster
16/02/10 22:16:42 INFO SparkUI: Started SparkUI at http://xxx:4040
16/02/10 22:16:43 INFO SparkContext: Added JAR file://app/spark-streaming_2.11-1.6.0.jar at http://xxx:60721/jars/spark-streaming_2.11-1.6.0.jar with timestamp 1455171403485
16/02/10 22:16:43 INFO SparkContext: Added JAR file:/projects/spark/test/./target/scala-2.10/stream-test_2.10-1.0.jar at http://xxx:60721/jars/stream-test_2.10-1.0.jar with timestamp 1455171403562
16/02/10 22:16:44 INFO Executor: Starting executor ID driver on host localhost
16/02/10 22:16:44 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60722.
..
16/02/10 22:16:44 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.streaming.dstream.PairDStreamFunctions.mapWithState(Lorg/apache/spark/streaming/StateSpec;Lscala/reflect/ClassTag;Lscala/reflect/ClassTag;)Lorg/apache/spark/streaming/dstream/MapWithStateDStream;
at StatefulNetworkWordCount$.main(StatefulNetworkWordCount.scala:50)
at StatefulNetworkWordCount.main(StatefulNetworkWordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
This method is in the Jar's class, so i dont understand ..
Found the answer ..
Even though i was running spark-submit from 1.6, my SPARK_HOME was still pointing to the previous version of Spark, 1.4.
So setting SPARK_HOME to the 1.6 version instead, same as the spark-submit to run the code, solved the problem.

Spark Core (1.5.2) and Scala.collection.mutable.Map [duplicate]

I have a simple scala object file with the following content:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object X {
def main(args: Array[String]) {
val params = Map[String, String](
"abc" -> "22",)
println("Creating Spark Configuration");
val conf = new SparkConf().setAppName("X")
val sc = new SparkContext(conf)
val txtFileLines = sc.textFile("/tmp/x.txt", 2).cache()
val count = txtFileLines.count()
println("Count" + count)
}
}
My build.sbt looks like:
name := "x"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" % "provided"
I then do sbt package to create x.jar under target/scala-2.11/
When I execute the above code as:
spark-submit --class X --master local[2] x.jar
I get the following error:
Creating Spark Configuration
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at Sweeper$.main(Sweeper.scala:14)
at Sweeper.main(Sweeper.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
As you are using Scala 2.11 in your project. You should use spark core library build for Scala 2.11.
Can download spark-core_2.11 from here http://mvnrepository.com/search?q=Spark
Refer spark-core_2.11 jar in project.

Resources