Exception in thread "main" java.lang.NoClassDefFoundError: com/typesafe/config/ConfigFactory - apache-spark

I'm getting the above error message when I'm trying to run spark submit comment :
spark-submit --class "retail.DataValidator" --master local --executor-memory 2g --total-executor-cores 2 sample-spark-180417_2.11-1.0.jar /home/hduser/Downloads/inputfiles/ /home/hduser/output/
ERROR Message:
Exception in thread "main" java.lang.NoClassDefFoundError: com/typesafe/config/ConfigFactory
at retail.DataValidator$.main(DataValidator.scala:12)
at retail.DataValidator.main(DataValidator.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.typesafe.config.ConfigFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
build.sbt file:
name := "sample-spark-180417"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0"
libraryDependencies += "com.typesafe" % "config" % "1.3.1"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
libraryDependencies += "org.apache.spark" % "spark-hive_2.11" % "2.1.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.42"
libraryDependencies += "org.scala-lang" % "scala-swing" % "2.10+"
I'm not having any maven dependencies or pom.xml file.
Thanks

As it's not a fat jar, the spark cluster is not having typesafe jar in its classpath.
Submit the spark job as::
spark-submit --jars ./typesafe-***.jar --class "retail.DataValidator" --master local --executor-memory 2g --total-executor-cores 2 sample-spark-180417_2.11-1.0.jar /home/hduser/Downloads/inputfiles/ /home/hduser/output/
It will keep that jar in classpath, and submit the job.

Related

Error while deploy: Class org.apache.hadoop.fs.LocalFileSystem not found

I am trying to write a parquet file in Scala/sbt. My code works fine on my computer but always fails when deploy on a server with Jenkins.
I have the following error:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:288)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.parquet.hadoop.util.HadoopOutputFile.fromPath(HadoopOutputFile.java:58)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:677)
at com.github.mjakubowski84.parquet4s.ParquetWriter$.internalWriter(ParquetWriter.scala:129)
at com.github.mjakubowski84.parquet4s.ParquetWriterImpl.<init>(ParquetWriter.scala:186)
at com.github.mjakubowski84.parquet4s.ParquetWriter$BuilderImpl.build(ParquetWriter.scala:111)
at com.github.mjakubowski84.parquet4s.ParquetWriter$BuilderImpl.writeAndClose(ParquetWriter.scala:113)
at ParquetExport$.$anonfun$tryExport$1(ParquetExport.scala:307)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:658)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
... 29 more
I first tried to use spark:
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.3.0"
And then changed to code to work with these:
libraryDependencies ++= Seq(
"com.github.mjakubowski84" %% "parquet4s-core" % "2.6.0",
"org.apache.hadoop" % "hadoop-client" % "2.10.2"
)
And still encountering the same error.
Setting the Hadoop configuration didn't help:
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getname)
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getname)
Neither did changing the ClassLoader:
Thread.currentThread.setContextClassLoader(getClass.getClassLoader)
Everything work fine in local but not on the server. Any idea?
I already faced the same issue.
If you look at your logs :
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
You understand that it can't find the name of the class LocalFileSystem you set to your hadoop config.
Try to change this :
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getname)
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getname)
To :
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem")
hadoopConfig.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem")

Hive Warehouse Connector + Spark = signer information does not match signer information of other classes in the same package

I'm trying to use hive warehouse connector and spark on hdp 3.1 and getting exception even with simplest example (below).
The class causing problems: JaninoRuntimeException - is in org.codehaus.janino:janino:jar:3.0.8 (dependency of spark_sql) and in com.hortonworks.hive:hive-warehouse-connector_2.11:jar.
I've tried to exclude janino library from spark_sql, but this resulted in missing other classes from janino. And I need hwc to for the new functionality.
Anyone had same error? Any ideas how to deal with it?
I'm getting error:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:66)
Caused by: java.lang.SecurityException: class "org.codehaus.janino.JaninoRuntimeException"'s signer information does not match signer information of other classes in the same package
at java.lang.ClassLoader.checkCerts(ClassLoader.java:898)
at java.lang.ClassLoader.preDefineClass(ClassLoader.java:668)
at java.lang.ClassLoader.defineClass(ClassLoader.java:761)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:197)
at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:36)
at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1321)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3277)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
at org.apache.spark.sql.Dataset.show(Dataset.scala:723)
at org.apache.spark.sql.Dataset.show(Dataset.scala:682)
at org.apache.spark.sql.Dataset.show(Dataset.scala:691)
at Main$.main(Main.scala:15)
at Main.main(Main.scala)
... 5 more
my sbt file:
name := "testHwc"
version := "0.1"
scalaVersion := "2.11.11"
resolvers += "Hortonworks repo" at "http://repo.hortonworks.com/content/repositories/releases/"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "3.1.1.3.1.0.0-78"
// https://mvnrepository.com/artifact/com.hortonworks.hive/hive-warehouse-connector
libraryDependencies += "com.hortonworks.hive" %% "hive-warehouse-connector" % "1.0.0.3.1.0.0-78"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.2.3.1.0.0-78"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.2.3.1.0.0-78"
And the source code:
import com.hortonworks.hwc.HiveWarehouseSession
import org.apache.spark.sql.SparkSession
object Main {
def main(args: Array[String]): Unit = {
val ss = SparkSession.builder()
.config("spark.sql.hive.hiveserver2.jdbc.url", "nnn")
.master("local[*]").getOrCreate()
import ss.sqlContext.implicits._
val rdd = ss.sparkContext.makeRDD(Seq(1, 2, 3, 4, 5, 6, 7))
rdd.toDF("col1").show()
val hive = HiveWarehouseSession.session(ss).build()
}
}
After some investigation I've discovered that the presence of error depends on the order of libraries in classpath.
For unknown reason when I was running this project in IntelliJ IDEA the classpath was always with random order and the app was failing and succeeding randomly.
In the end - HiveWarehouseConnector jar in classpath should be after Spark Sql jar.
UPDATE
As suggested in this answer - the order inside IntelliJ IDEA can be changed in dependencies tab.
Otherwise - I was not able to solve this issue for IntelliJ - the order was always random, but when i executed program outside of IntelliJ - I set the order I needed.
In maven projects, I solve it like this
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.1</version>
<exclusions>
<exclusion>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>3.0.10</version>
</dependency>

Error connecting to Zookeeper with Spark Streaming Structured

My Spark Streaming Structured keeps disconnecting from Zookeeper when trying to read from a Kafkatopic:
WARN clients.NetworkClient: Bootstrap broker [zk host]:2181 disconnected
When I check the ZK logs, I see this error being prompted all the time:
Exception causing close of session 0x0 due to java.io.EOFException
I´m running on Cloudera 5.11 with Spark 2.1, these are my SBT libraries:
val sparkVer = "2.1.0"
Seq(
"org.apache.spark" %% "spark-core" % sparkVer % "provided" withSources(),
"org.apache.spark" %% "spark-streaming" % sparkVer % "provided",
"org.apache.spark" %% "spark-sql" % sparkVer % "provided",
"org.apache.spark" % "spark-sql-kafka-0-10_2.11" % sparkVer
)
This is my submit command:
# Set KAFKA to 0.10 see (https://community.cloudera.com/t5/Data-Ingestion-Integration/KafkaConsumer-subscribe-0-9-vs-0-10-in-Structured-streaming/td-p/60161)
export SPARK_KAFKA_VERSION=0.10
spark2-submit --class myMainClass --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0 myapp.jar topic2345 [zk host 1]:2181,[zk host 2]:2181
And this is the code creating the stream:
private def createKafkaStrem(spark: SparkSession, args: Array[String]) = {
spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", args(1))
.option("subscribe", args(0))
.load()
}
EDIT: After activating the DEBUG ouput, this is the complete error stack:
java.io.EOFException
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323)
at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:974)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$org$apache$spark$sql$kafka010$KafkaSource$$fetchLatestOffsets$1.apply(KafkaSource.scala:374)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$org$apache$spark$sql$kafka010$KafkaSource$$fetchLatestOffsets$1.apply(KafkaSource.scala:372)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$withRetriesWithoutInterrupt$1.apply$mcV$sp(KafkaSource.scala:442)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$withRetriesWithoutInterrupt$1.apply(KafkaSource.scala:441)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$withRetriesWithoutInterrupt$1.apply(KafkaSource.scala:441)
at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:79)
at org.apache.spark.sql.kafka010.KafkaSource.withRetriesWithoutInterrupt(KafkaSource.scala:440)
at org.apache.spark.sql.kafka010.KafkaSource.org$apache$spark$sql$kafka010$KafkaSource$$fetchLatestOffsets(KafkaSource.scala:372)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$initialPartitionOffsets$1.apply(KafkaSource.scala:141)
at org.apache.spark.sql.kafka010.KafkaSource$$anonfun$initialPartitionOffsets$1.apply(KafkaSource.scala:138)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.kafka010.KafkaSource.initialPartitionOffsets$lzycompute(KafkaSource.scala:138)
at org.apache.spark.sql.kafka010.KafkaSource.initialPartitionOffsets(KafkaSource.scala:121)
at org.apache.spark.sql.kafka010.KafkaSource.getOffset(KafkaSource.scala:157)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9$$anonfun$apply$5.apply(StreamExecution.scala:391)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9$$anonfun$apply$5.apply(StreamExecution.scala:391)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:265)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:46)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9.apply(StreamExecution.scala:390)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$9.apply(StreamExecution.scala:388)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch(StreamExecution.scala:388)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$populateStartOffsets(StreamExecution.scala:362)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply$mcV$sp(StreamExecution.scala:260)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:257)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:257)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:265)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:46)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:257)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:43)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:252)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:187)
18/03/21 10:47:27 DEBUG clients.NetworkClient: Node -2 disconnected.
18/03/21 10:47:27 WARN clients.NetworkClient: Bootstrap broker [zk host]:2181 disconnected
18/03/21 10:47:27 DEBUG clients.NetworkClient: Sending metadata request {topics=[topic2345]} to node -1
18/03/21 10:47:27 DEBUG network.Selector: Connection with /[zk host] disconnected
kafka.bootstrap.servers takes a list of Kafka brokers, not a Zookeeper quorum.
The "new" Kafka Consumer API does not use a Zookeeper connection string

spark-solr CDH job fails with java.lang.VerifyError at solrj.impl.HttpClientUtil

I'm running CDH 5.7.1 and submitting a Spark job that uses spark-solr 2.0.1 in yarn-cluster mode and the job is failing with the error below that arises from a (possibly) binary compatibility issue in org.apache.solr.client.solrj.impl.HttpClientUtil. The job runs fine in local mode allowing me to index a Spark dataframe to Solr 6.1.0.
I'm building the fat jar that is deployed to the cluster with the sbt-assembly plugin, see below for my build.sbt configuration.
I've tried various things to fix this such as specifying the --jars argument with spark-submit to provide the appropriate solr-solrj and httpclient jars with no luck, as well as numerous build.sbt changes to specify versions of solrj and httpclient explicitly, again with no luck.
Any insights or solutions would be greatly appreciated.
Spark job history error from yarn-cluster mode:
16/07/21 02:10:07 ERROR yarn.ApplicationMaster: User class threw exception: com.google.common.util.concurrent.ExecutionError: java.lang.VerifyError: Bad return type
Exception Details:
Location:
org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient; #58: areturn
Reason:
Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, stack[0]) is not assignable to 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
Current Frame:
bci: #58
flags: { }
locals: { 'org/apache/solr/common/params/SolrParams', 'org/apache/http/conn/ClientConnectionManager', 'org/apache/solr/common/params/ModifiableSolrParams', 'org/apache/http/impl/client/DefaultHttpClient' }
stack: { 'org/apache/http/impl/client/DefaultHttpClient' }
Bytecode:
0x0000000: bb00 0359 2ab7 0004 4db2 0005 b900 0601
0x0000010: 0099 001e b200 05bb 0007 59b7 0008 1209
0x0000020: b600 0a2c b600 0bb6 000c b900 0d02 002b
0x0000030: b800 104e 2d2c b800 0f2d b0
Stackmap Table:
append_frame(#47,Object[#143])
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
at com.lucidworks.spark.util.SolrSupport$.getCachedCloudClient(SolrSupport.scala:93)
at com.lucidworks.spark.util.SolrSupport$.getSolrBaseUrl(SolrSupport.scala:97)
at com.lucidworks.spark.util.SolrQuerySupport$.getUniqueKey(SolrQuerySupport.scala:82)
at com.lucidworks.spark.rdd.SolrRDD.<init>(SolrRDD.scala:32)
at com.lucidworks.spark.SolrRelation.<init>(SolrRelation.scala:63)
at solr.DefaultSource.createRelation(DefaultSource.scala:26)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at com.tempurer.intelligence.searchindexer.SolrIndexer3$.runJob(SolrIndexer3.scala:127)
at com.tempurer.intelligence.searchindexer.SolrIndexer3$.main(SolrIndexer3.scala:80)
at com.tempurer.intelligence.searchindexer.SolrIndexer3.main(SolrIndexer3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
Caused by: java.lang.VerifyError: Bad return type
I'm building the Spark app as a fat jar with the sbt-assembly plugin and here is my build.sbt file:
name := "search-indexer"
version := "0.1.0-SNAPSHOT"
scalaVersion := "2.10.6"
resolvers ++= Seq(
"Cloudera CDH 5.0" at "https://repository.cloudera.com/artifactory/cloudera-repos"
)
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-common" % "2.6.0-cdh5.7.0" % "provided",
"org.apache.hadoop" % "hadoop-hdfs" % "2.6.0-cdh5.7.0" % "provided",
"org.apache.hive" % "hive-exec" % "1.1.0-cdh5.7.0",
"org.apache.spark" % "spark-core_2.10" % "1.6.0-cdh5.7.0" % "provided",
"org.apache.spark" % "spark-sql_2.10" % "1.6.0-cdh5.7.0" % "provided",
"org.apache.spark" % "spark-catalyst_2.10" % "1.6.0-cdh5.7.0" % "provided",
"org.apache.spark" % "spark-mllib_2.10" % "1.6.0-cdh5.7.0" % "provided",
"org.apache.spark" % "spark-graphx_2.10" % "1.6.0-cdh5.7.0" % "provided",
"org.apache.spark" % "spark-streaming_2.10" % "1.6.0-cdh5.7.0" % "provided",
"com.databricks" % "spark-avro_2.10" % "2.0.1",
"com.databricks" % "spark-csv_2.10" % "1.4.0",
"com.lucidworks.spark" % "spark-solr" % "2.0.1",
"com.fasterxml.jackson.core" % "jackson-core" % "2.8.0", // Solves runtime error: java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
"org.scalatest" % "scalatest_2.10" % "2.2.4" % "test"
)
// See: https://github.com/sbt/sbt-assembly
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
}
More info, including the logged YARN context can be found in this github project ticket I opened: https://github.com/lucidworks/spark-solr/issues/75

Spark Core (1.5.2) and Scala.collection.mutable.Map [duplicate]

I have a simple scala object file with the following content:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object X {
def main(args: Array[String]) {
val params = Map[String, String](
"abc" -> "22",)
println("Creating Spark Configuration");
val conf = new SparkConf().setAppName("X")
val sc = new SparkContext(conf)
val txtFileLines = sc.textFile("/tmp/x.txt", 2).cache()
val count = txtFileLines.count()
println("Count" + count)
}
}
My build.sbt looks like:
name := "x"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" % "provided"
I then do sbt package to create x.jar under target/scala-2.11/
When I execute the above code as:
spark-submit --class X --master local[2] x.jar
I get the following error:
Creating Spark Configuration
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at Sweeper$.main(Sweeper.scala:14)
at Sweeper.main(Sweeper.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
As you are using Scala 2.11 in your project. You should use spark core library build for Scala 2.11.
Can download spark-core_2.11 from here http://mvnrepository.com/search?q=Spark
Refer spark-core_2.11 jar in project.

Resources