Logging issue in apache spark streaming - apache-spark

Getting below exception while creating KafkaUtils.createStream();
Below is my spark dependency.same thing was working in spark streaming older version 1.5.2
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.11</artifactId>
<version>1.6.2</version>
</dependency>
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:91)
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:66)
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:110)
at org.apache.spark.streaming.kafka.KafkaUtils.createStream(KafkaUtils.scala)
at com.tcs.iux.core.config.RealtimeProcessing.startSpark(RealtimeProcessing.java:78)
at com.tcs.iux.core.processor.StartRealTimeProcessing.main(StartRealTimeProcessing.java:32)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 18 more

Logging was made private in Spark 2.0
use 2.0.0 version of spark-streaming-kafka_2.11 to match it with your spark version.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.11</artifactId>
<version>2.0.0</version>
</dependency>

You're mixing between Spark versions, which is never a good idea. You need to be aligned both for Spark Streaming and it's external Kafka driver.
The dependencies have changed a bit. There's now experimental support for Kafka 0.10.0.x. If you're using Kafka 0.8.2.x, you'll need:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
<version>2.0.0</version>
</dependency>
If you want to try out Kafka 0.10.0.x, you'll need:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.0.0</version>
</dependency>

Related

ClassNotFoundException: com.fasterxml.jackson.annotation.JsonMerge when starting Spark

While there are many similar errors with Spark I am still at a loss how to fix the following. Note that I do not have any Jackson maven entries: only the Spark that pulls in jackson as transitive dependencies:
19/07/14 17:42:49 ERROR MetricsSystem: Sink class org.apache.spark.metrics.sink.MetricsServlet cannot be instantiated
19/07/14 17:42:49 ERROR SparkContext: Error initializing SparkContext.
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:200)
at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:514)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
..
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:131)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
Caused by: java.lang.NoClassDefFoundError: com/fasterxml/jackson/annotation/JsonMerge
at com.fasterxml.jackson.databind.introspect.JacksonAnnotationIntrospector.<clinit>(JacksonAnnotationIntrospector.java:50)
at com.fasterxml.jackson.databind.ObjectMapper.<clinit>(ObjectMapper.java:291)
at org.apache.spark.metrics.sink.MetricsServlet.<init>(MetricsServlet.scala:48)
... 41 more
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.annotation.JsonMerge
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 44 more
The only reference to jackson at all is in an AWS but it is an exclusion :
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.7.4</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</exclusion>
</exclusions>
</dependency>
How can this jackson issue (probably a versioning) be resolved? Note: I have tried with Spark 2.4.2 and then downgraded to 2.3.0: same errors. I can not downgrade more than that.
I added following to the pom.xml. Have no idea why this dependency were not pulled in automatically
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.9.8</version>
</dependency>

NoClassDefFoundError: com/aventstack/extentreports/reporter/ExtentHtmlReporter

I am trying to obtain a report in my maven project using Cucumber, and ExtentReport. I am using java 8. Here are my dependencies:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.4.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/info.cukes/cucumber-java -->
<dependency>
<groupId>info.cukes</groupId>
<artifactId>cucumber-java</artifactId>
<version>1.2.5</version>
</dependency>
<!-- https://mvnrepository.com/artifact/info.cukes/cucumber-junit -->
<dependency>
<groupId>info.cukes</groupId>
<artifactId>cucumber-junit</artifactId>
<version>1.2.5</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/com.vimalselvam/cucumber-extentsreport -->
<dependency>
<groupId>com.vimalselvam</groupId>
<artifactId>cucumber-extentsreport</artifactId>
<version>3.1.1</version>
</dependency>
</dependencies>
and then i used the below runner class, to run all my *.feature files:
import org.junit.runner.RunWith;
import cucumber.api.CucumberOptions;
import cucumber.api.junit.Cucumber;
#RunWith(Cucumber.class)
#CucumberOptions(
features = {"src/test/resources/features" },
monochrome = true,
plugin = {"com.vimalselvam.cucumber.listener.ExtentCucumberFormatter:output/report.html"},
glue = {"stepDefinitions"}
)
public class TestRunnerJUnit {
}
But when i run the test-Runner, it complains with:
cucumber.runtime.CucumberException: java.lang.NoClassDefFoundError: com/aventstack/extentreports/reporter/ExtentHtmlReporter
at cucumber.runtime.formatter.PluginFactory.instantiate(PluginFactory.java:114)
at cucumber.runtime.formatter.PluginFactory.create(PluginFactory.java:87)
at cucumber.runtime.RuntimeOptions.getPlugins(RuntimeOptions.java:245)
at cucumber.runtime.RuntimeOptions$1.invoke(RuntimeOptions.java:291)
at com.sun.proxy.$Proxy11.done(Unknown Source)
at cucumber.runtime.junit.JUnitReporter.done(JUnitReporter.java:227)
at cucumber.api.junit.Cucumber.run(Cucumber.java:101)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:89)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:41)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:541)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:763)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:463)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:209)
Caused by: java.lang.NoClassDefFoundError: com/aventstack/extentreports/reporter/ExtentHtmlReporter
at com.vimalselvam.cucumber.listener.ExtentCucumberFormatter.setExtentHtmlReport(ExtentCucumberFormatter.java:55)
at com.vimalselvam.cucumber.listener.ExtentCucumberFormatter.<init>(ExtentCucumberFormatter.java:38)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at cucumber.runtime.formatter.PluginFactory.instantiate(PluginFactory.java:107)
... 12 more
Update:
When i add the below dependency:
<dependency>
<groupId>com.aventstack</groupId>
<artifactId>extentreports</artifactId>
<version>3.1.1</version>
</dependency>
I receive this new error:
java.lang.IllegalArgumentException: testName cannot be null or empty
at com.aventstack.extentreports.ExtentTest.<init>(ExtentTest.java:80)
at com.aventstack.extentreports.ExtentReports.createTest(ExtentReports.java:105)
at com.aventstack.extentreports.ExtentReports.createTest(ExtentReports.java:145)
at com.vimalselvam.cucumber.listener.ExtentCucumberFormatter.feature(ExtentCucumberFormatter.java:155)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at cucumber.runtime.Utils$1.call(Utils.java:40)
at cucumber.runtime.Timeout.timeout(Timeout.java:16)
at cucumber.runtime.Utils.invoke(Utils.java:34)
at cucumber.runtime.RuntimeOptions$1.invoke(RuntimeOptions.java:294)
at com.sun.proxy.$Proxy11.feature(Unknown Source)
at cucumber.runtime.junit.JUnitReporter.feature(JUnitReporter.java:184)
at cucumber.runtime.junit.FeatureRunner.run(FeatureRunner.java:69)
at cucumber.api.junit.Cucumber.runChild(Cucumber.java:95)
at cucumber.api.junit.Cucumber.runChild(Cucumber.java:38)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at cucumber.api.junit.Cucumber.run(Cucumber.java:100)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:89)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:41)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:541)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:763)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:463)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:209)
try to replace this
<dependency>
<groupId>com.aventstack</groupId>
<artifactId>extentreports</artifactId>
<version>3.1.1</version>
</dependency>
with this one
<dependency>
<groupId>com.aventstack</groupId>
<artifactId>extentreports</artifactId>
<version>3.0.7</version>
</dependency>
the component "cucumber-extentsreport" and "extentreports" are different things.
I had the same issue and solved by using "extentreports" version 3.0.7 with "cucumber-extentsreport" version 3.1.1
Good luck!
I got the answer.
Everything was fine, but i needed to added a value to Feature in the *.feature file.
So, here you can see the Feature key word should exist, and exactly in front of it should be a text. If you write the text in next line, you will face with the error.
Feature: Check Welcome-message
As a user, I should find a welcome-message in my mailbox, and my age in my profile.
I got the answer. Everything was fine, but i needed to added a value to Feature in the *.feature file. So, here you can see the Feature key word should exist, and exactly in front of it should be a text. If you write the text in next line, you will face with the error. :- Yes its work for me too

spark kinesis failing on cloudera with java.lang.AbstractMethodError

below is my POM file. I am writing a spark streaming with aws kinesis
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>amazon-kinesis-client</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kinesis-asl_2.10</artifactId>
<version>1.6.0</version>
</dependency>
I am facing below exception during run of spark of spark program on Cloudera 5.10
17/04/27 05:34:04 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 58.0 (TID 179, hadoop1.local, executor 5): java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:50)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.log(KinesisCheckpointer.scala:39)
at org.apache.spark.Logging$class.logDebug(Logging.scala:62)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.logDebug(KinesisCheckpointer.scala:39)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.startCheckpointerThread(KinesisCheckpointer.scala:119)
at org.apache.spark.streaming.kinesis.KinesisCheckpointer.<init>(KinesisCheckpointer.scala:50)
at org.apache.spark.streaming.kinesis.KinesisReceiver.onStart(KinesisReceiver.scala:149)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2000)
at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2000)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This runs perfectly fine on EMR4.4 However CDH fails. Any suggestion
The underlying problem seems to be the use of org.apache.spark.Logging:
NOTE: DO NOT USE this class outside of Spark.
It is intended as an internal utility.
This will likely be changed or removed in future releases.
http://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/Logging.html
This is fixed in 2.0.0 as mentioned in https://issues.apache.org/jira/browse/SPARK-9307.

Spark-Cassandra Connector Always Defaulting to 127.0.1.1

The Spark connector to Cassandra trying to connect to 127.0.1.1:9042, even though I am hardcoding the address.
Even hardcoding the address
conf.set("cassandra.connection.host", "37.61.205.66"), does not work. I do not want Cassandra CQL port to run on 127.0.1.1. What are the solutions.
POM.xml:
<dependencies>
<!-- Scala and Spark dependencies -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.5.0-RC1</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.5.0-RC1</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.0.0-rc1</version>
</dependency>
<dependency>
Error:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:163)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:82)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:110)
at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:121)
at com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:322)
at com.datastax.spark.connector.cql.Schema$.tableFromCassandra(Schema.scala:342)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.tableDef(CassandraTableRowReaderProvider.scala:50)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef$lzycompute(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.verify(CassandraTableRowReaderProvider.scala:137)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.verify(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:232)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.RDD$$anonfun$distinct$2.apply(RDD.scala:401)
at org.apache.spark.rdd.RDD$$anonfun$distinct$2.apply(RDD.scala:401)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.distinct(RDD.scala:400)
at com.access_company.twine.OnlineGatewayCount$.main(OnlineGatewayCount.scala:93)
at com.access_company.twine.OnlineGatewayCount.main(OnlineGatewayCount.scala)
... 6 more
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.1.1:9042 (com.datastax.driver.core.exceptions.TransportException: [/127.0.1.1] Cannot connect))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:233)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1424)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:403)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:156)
The proper setting is prefixed with 'spark'. Please see the docs.
conf.set("spark.cassandra.connection.host", cassandraHost)

Getting java.lang.NoClassDefFoundError: kafka/serializer/StringDecoder Exception while streaming kafka from Spark streaming

I am trying to read the kafka streaming data from spark streaming application; while in the process of reading data I am getting following exception:
16/12/24 11:09:05 INFO storage.BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.NoClassDefFoundError: kafka/serializer/StringDecoder
at com.inndata.RSVPSDataStreaming.KafkaToSparkStreaming.main(KafkaToSparkStreaming.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.serializer.StringDecoder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
Here is my versions info:
spark: 1.6.2
kafka: 0.8.2
Here is pom.xml:
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka_2.11 -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>0.8.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.6.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8_2.11 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.8.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-8-assembly_2.10</artifactId>
<version>2.0.0-preview</version>
</dependency>
Seems like implicit String Encoder is needed
Try to apply this
import org.apache.spark.sql.Encoder
implicit val stringpEncoder = org.apache.spark.sql.Encoders.kryo[String]
you can find more about Encoders here and from official Document here
You use incompatible and duplicated artifact versions. Remember that when using Spark:
all Scala artifacts have to use the same major Scala version (2.10, 2.11).
all Spark artifacts have to use the same major Spark version (1.6, 2.0).
In you build definition you mix spark-streaming 1.6 with spark-core 2.0 and you include duplicated spark-streaming-kafka for Scala 2.10 while remaining dependencies are for Scala 2.11.

Resources