java.sql.SQLException: ERROR 2007 (INT09): Outdated jars - apache-spark

I am new to spark and kafka. We have a requirement to integrate kafka+spark+Hbase(with Phoenix).
ERROR:
Exception in thread "main" java.sql.SQLException: ERROR 2007 (INT09): Outdated jars. The following servers require an updated phoenix.jar to be put in the classpath of HBase:
I ended up with the above ERROR. If anybody could you please help how to resolve this issue.
Below is error log:
jdbc:phoenix:localhost.localdomain:2181:/hbase-unsecure
testlocalhost.localdomain:6667
18/03/05 16:18:52 INFO Metrics: Initializing metrics system: phoenix
18/03/05 16:18:52 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-phoenix.properties,hadoop-metrics2.properties
18/03/05 16:18:52 INFO MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
18/03/05 16:18:52 INFO MetricsSystemImpl: phoenix metrics system started
18/03/05 16:18:52 INFO ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
18/03/05 16:18:52 INFO ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x161f6fc5e4800a3
18/03/05 16:18:52 INFO ZooKeeper: Session: 0x161f6fc5e4800a3 closed
18/03/05 16:18:52 INFO ClientCnxn: EventThread shut down
Exception in thread "main" java.sql.SQLException: ERROR 2007 (INT09): Outdated jars. The following servers require an updated phoenix.jar to be put in the classpath of HBase: region=SYSTEM.CATALOG,,1519831518459.b16e566d706c68469922eba74844a444., hostname=localhost,16020,1520282812066, seqNum=59
at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:476)
at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.checkClientServerCompatibility(ConnectionQueryServicesImpl.java:1272)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:1107)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1429)
at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:2574)
at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:1024)
at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:212)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:358)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:341)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:339)
at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1492)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2437)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2382)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2382)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:255)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:149)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at com.spark.kafka.PhoenixJdbcClient.getConnection(PhoenixJdbcClient.scala:41)
at com.spark.kafka.PhoenixJdbcClient.currentTableSchema(PhoenixJdbcClient.scala:595)
at com.spark.kafka.SparkHBaseClient$.main(SparkHBaseClient.scala:47)
at com.spark.kafka.SparkHBaseClient.main(SparkHBaseClient.scala)
18/03/05 16:18:52 INFO SparkContext: Invoking stop() from shutdown hook
18/03/05 16:18:52 INFO SparkUI: Stopped Spark web UI at http://192.168.1.103:4040
18/03/05 16:18:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/03/05 16:18:53 INFO MemoryStore: MemoryStore cleared
18/03/05 16:18:53 INFO BlockManager: BlockManager stopped
18/03/05 16:18:53 INFO BlockManagerMaster: BlockManagerMaster stopped
18/03/05 16:18:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/03/05 16:18:53 INFO SparkContext: Successfully stopped SparkContext
18/03/05 16:18:53 INFO ShutdownHookManager: Shutdown hook called
18/03/05 16:18:53 INFO ShutdownHookManager: Deleting directory /tmp/spark-c8dd26fc-74dd-40fb-a339-8c5dda36b973
We are using Amabri Server 2.6.1.3 with HDP-2.6.3.0 and below components:
Hbase-1.1.2
kafka-0.10.1
spark-2.2.0
phoenix
Below are the POM artifact's I have added for HBase and Phoenix.
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-spark</artifactId>
<version>4.10.0-HBase-1.2</version>
</dependency>
<dependency>

Try the following
1.Copy Phoenix server jar to all HBase region servers(HBase lib folder)
2.Restart HBase master

Related

Spark-submit fails when using kafka structured streaming in pyspark 2.3.1

Spark-submit fails when using kafka structured streaming in pyspark 2.3.1
But the same code works in pyspark command, so l want to know how to solve it
from pyspark.sql.types import *
from pyspark.sql import SparkSession
topic="topicname"
spark=SparkSession\
.builder\
.appName("test_{}".format(topic))\
.getOrCreate()
source_df = spark.readStream\
.format("kafka")\
.option("kafka.bootstrap.servers","ip:6667")\
.option("subscribe", topic)\
.option("failOnDataLoss","false")\
.option("maxOffsetsPerTrigger",30000)\
.load()
query=source_df.selectExpr("CAST(key AS STRING)")\
.writeStream\
.format("json")\
.option("checkpointLocation","/data/testdata_test")\
.option("path","/data/testdata_test_checkpoint")\
.start()
command is like this
//use this command -> fail
spark-submit --master yarn --jars hdfs:///vcrm_data/spark-sql-kafka-0-10_2.11-2.3.1.jar,hdfs:///vcrm_data/kafka-clients-1.1.1.3.2.0.0-520.jar test.py
//use this command then regist code -> success
pyspark --jars hdfs:///vcrm_data/spark-sql-kafka-0-10_2.11-2.3.1.jar,hdfs:///vcrm_data/kafka-clients-2.6.0.jar
my spark env is
HDP(Hortonworks) 3.0.1(Spark2.3.1),
Kafka 1.1.1
Spark-submit log
20/08/24 20:19:01 INFO AppInfoParser: Kafka version: 2.6.0
20/08/24 20:19:01 INFO AppInfoParser: Kafka commitId: 62abe01bee039651
20/08/24 20:19:01 INFO AppInfoParser: Kafka startTimeMs: 1598267941674
20/08/24 20:19:01 INFO KafkaConsumer: [Consumer clientId=consumer-spark-kafka-source-6d5eb2af-8039-4073-a3f1-3ba44d01fedc--47946854-driver-0-2, groupId=spark-kafka-source-6d5eb2af-8039-4073-a3f1-3ba44d01fedc--47946854-driver-0] Subscribed to topic(s): test
20/08/24 20:19:01 INFO SparkContext: Invoking stop() from shutdown hook
20/08/24 20:19:01 INFO MicroBatchExecution: Starting new streaming query.
20/08/24 20:19:01 INFO AbstractConnector: Stopped Spark#644a2858{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/08/24 20:19:01 INFO SparkUI: Stopped Spark web UI at http://p0gn001.io:4040
20/08/24 20:19:01 INFO YarnClientSchedulerBackend: Interrupting monitor thread
20/08/24 20:19:01 INFO YarnClientSchedulerBackend: Shutting down all executors
20/08/24 20:19:01 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
20/08/24 20:19:01 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
20/08/24 20:19:01 INFO YarnClientSchedulerBackend: Stopped
20/08/24 20:19:01 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/08/24 20:19:01 INFO MemoryStore: MemoryStore cleared
20/08/24 20:19:01 INFO BlockManager: BlockManager stopped
20/08/24 20:19:01 INFO BlockManagerMaster: BlockManagerMaster stopped
20/08/24 20:19:01 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/08/24 20:19:01 INFO SparkContext: Successfully stopped SparkContext
20/08/24 20:19:01 INFO ShutdownHookManager: Shutdown hook called
20/08/24 20:19:01 INFO ShutdownHookManager: Deleting directory /tmp/temporaryReader-ea374033-f3fc-4bca-9c15-4cccaa7da8ac
20/08/24 20:19:01 INFO ShutdownHookManager: Deleting directory /tmp/spark-075415ca-8e98-4bb0-916c-a89c4d4f9d1f
20/08/24 20:19:01 INFO ShutdownHookManager: Deleting directory /tmp/spark-83270204-3330-4361-bf56-a82c47d8c96f
20/08/24 20:19:01 INFO ShutdownHookManager: Deleting directory /tmp/spark-075415ca-8e98-4bb0-916c-a89c4d4f9d1f/pyspark-eb26535b-1bf6-495e-83a1-4bbbdc658c7a

Spark Cassandra Connector Maven Build Issue

Hi I am trying to write Spark application which reads data from Cassandra. My Scala version is 2.11 and Spark version is 2.2.0. Unfortunately I am facing build issue. It says "missing or invalid dependency detected while loading class file 'package.class'. I do not know what is causing this issue.
Here's my POM File
<properties>
<maven.compiler.source>1.6</maven.compiler.source>
<maven.compiler.target>1.6</maven.compiler.target>
<encoding>UTF-8</encoding>
<!--scala.tools.version>2.11.8</scala.tools.version-->
<scala.version>2.11.8</scala.version>
</properties>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<!-- see http://davidb.github.com/scala-maven-plugin -->
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.1.3</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<!--arg>-make:transitive</arg-->
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.13</version>
<configuration>
<useFile>false</useFile>
<disableXmlReport>true</disableXmlReport>
<!-- If you have classpath issue like NoDefClassError,... -->
<!-- useManifestOnlyJar>false</useManifestOnlyJar -->
<includes>
<include>**/*Test.*</include>
<include>**/*Suite.*</include>
</includes>
</configuration>
</plugin>
<!-- "package" command plugin -->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4.1</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<dependencies>
<!-- Scala and Spark dependencies -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-xml</artifactId>
<version>2.11.0-M4</version>
</dependency>
<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-parser-combinators_2.11</artifactId>
<version>1.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.0.7</version>
</dependency>
<!--dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.11</artifactId>
<version>1.5.0-RC1</version>
</dependency-->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.12</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.7.1</version>
</dependency>
</dependencies>
I am getting the following error
[INFO] --- maven-resources-plugin:2.3:resources (default-resources) # search-count ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO]
[INFO] --- maven-compiler-plugin:2.0.2:compile (default-compile) # search-count ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- scala-maven-plugin:3.1.3:compile (default) # search-count ---
[WARNING] Expected all dependencies to require Scala version: 2.11.8
[WARNING] search-count:search-count:0.0.1-SNAPSHOT requires scala version: 2.11.8
[WARNING] org.scala-lang.modules:scala-parser-combinators_2.11:1.0.2 requires scala version: 2.11.1
[WARNING] Multiple versions of scala libraries detected!
[ERROR] error: missing or invalid dependency detected while loading class file 'package.class'.
[INFO] Could not access type DataFrame in value org.apache.spark.sql.package,
[INFO] because it (or its dependencies) are missing. Check your build definition for
[INFO] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[INFO] A full rebuild may help if 'package.class' was compiled against an incompatible version of org.apache.spark.sql.package.
[ERROR] one error found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 9.052s
[INFO] Finished at: Wed Apr 04 11:33:51 CEST 2018
[INFO] Final Memory: 22M/425M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.3:compile (default) on project search-count: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 1(Exit value: 1) -> [Help 1]
Any ideas what could be the problem?
Console logs after running my app
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/04/04 14:15:31 INFO SparkContext: Running Spark version 2.2.0
18/04/04 14:15:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/04 14:15:32 WARN Utils: Your hostname, obel-pc0083 resolves to a loopback address: 127.0.1.1; using 10.96.20.75 instead (on interface eth0)
18/04/04 14:15:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/04/04 14:15:32 INFO SparkContext: Submitted application: Online Gateway Count
18/04/04 14:15:32 INFO Utils: Successfully started service 'sparkDriver' on port 45111.
18/04/04 14:15:32 INFO SparkEnv: Registering MapOutputTracker
18/04/04 14:15:32 INFO SparkEnv: Registering BlockManagerMaster
18/04/04 14:15:32 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/04/04 14:15:32 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/04/04 14:15:32 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e7cfde5b-87f0-4447-a19e-771d100d7422
18/04/04 14:15:32 INFO MemoryStore: MemoryStore started with capacity 1137.6 MB
18/04/04 14:15:32 INFO SparkEnv: Registering OutputCommitCoordinator
18/04/04 14:15:32 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/04/04 14:15:32 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.96.20.75:4040
18/04/04 14:15:33 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://10.96.20.75:7077...
18/04/04 14:15:33 INFO TransportClientFactory: Successfully created connection to /10.96.20.75:7077 after 59 ms (0 ms spent in bootstraps)
18/04/04 14:15:33 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20180404141533-0009
18/04/04 14:15:33 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39062.
18/04/04 14:15:33 INFO NettyBlockTransferService: Server created on 10.96.20.75:39062
18/04/04 14:15:33 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/04/04 14:15:33 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20180404141533-0009/0 on worker-20180403185515-10.96.20.75-38166 (10.96.20.75:38166) with 4 cores
18/04/04 14:15:33 INFO StandaloneSchedulerBackend: Granted executor ID app-20180404141533-0009/0 on hostPort 10.96.20.75:38166 with 4 cores, 1024.0 MB RAM
18/04/04 14:15:33 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.96.20.75, 39062, None)
18/04/04 14:15:33 INFO BlockManagerMasterEndpoint: Registering block manager 10.96.20.75:39062 with 1137.6 MB RAM, BlockManagerId(driver, 10.96.20.75, 39062, None)
18/04/04 14:15:33 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.96.20.75, 39062, None)
18/04/04 14:15:33 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.96.20.75, 39062, None)
18/04/04 14:15:33 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180404141533-0009/0 is now RUNNING
18/04/04 14:15:33 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
18/04/04 14:15:34 INFO Native: Could not load JNR C Library, native system calls through this library will not be available (set this logger level to DEBUG to see the full stack trace).
18/04/04 14:15:34 INFO ClockFactory: Using java.lang.System clock to generate timestamps.
18/04/04 14:15:35 INFO NettyUtil: Found Netty's native epoll transport in the classpath, using it
18/04/04 14:15:36 INFO Cluster: New Cassandra host /10.96.20.75:9042 added
18/04/04 14:15:36 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
18/04/04 14:15:36 INFO SparkContext: Starting job: count at SearchCount.scala:47
18/04/04 14:15:36 INFO DAGScheduler: Registering RDD 4 (distinct at SearchCount.scala:47)
18/04/04 14:15:36 INFO DAGScheduler: Got job 0 (count at SearchCount.scala:47) with 6 output partitions
18/04/04 14:15:36 INFO DAGScheduler: Final stage: ResultStage 1 (count at SearchCount.scala:47)
18/04/04 14:15:36 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
18/04/04 14:15:36 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
18/04/04 14:15:36 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[4] at distinct at SearchCount.scala:47), which has no missing parents
18/04/04 14:15:37 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 9.6 KB, free 1137.6 MB)
18/04/04 14:15:37 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.2 KB, free 1137.6 MB)
18/04/04 14:15:37 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.96.20.75:39062 (size: 5.2 KB, free: 1137.6 MB)
18/04/04 14:15:37 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
18/04/04 14:15:37 INFO DAGScheduler: Submitting 6 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[4] at distinct at SearchCount.scala:47) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5))
18/04/04 14:15:37 INFO TaskSchedulerImpl: Adding task set 0.0 with 6 tasks
18/04/04 14:15:37 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.96.20.75:43727) with ID 0
18/04/04 14:15:37 INFO BlockManagerMasterEndpoint: Registering block manager 10.96.20.75:46125 with 366.3 MB RAM, BlockManagerId(0, 10.96.20.75, 46125, None)
18/04/04 14:15:37 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.96.20.75, executor 0, partition 0, NODE_LOCAL, 12327 bytes)
18/04/04 14:15:38 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.96.20.75, executor 0, partition 1, NODE_LOCAL, 11729 bytes)
18/04/04 14:15:38 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 10.96.20.75, executor 0, partition 2, NODE_LOCAL, 13038 bytes)
18/04/04 14:15:38 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, 10.96.20.75, executor 0, partition 3, NODE_LOCAL, 12445 bytes)
18/04/04 14:15:38 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, 10.96.20.75, executor 0, partition 4, NODE_LOCAL, 12209 bytes)
18/04/04 14:15:38 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, 10.96.20.75, executor 0, partition 5, NODE_LOCAL, 6864 bytes)
18/04/04 14:15:38 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.96.20.75, executor 0): java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:309)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Edit: I really missed that 1.5.0-RC1 was commented out...
It should be enough to specify cassandra-spark-connector dependency - it already has dependency on spark-core & spark-sql. But if you use Spark 2.x, you need to use 2.x version of cassandra-spark-connector (although it has dependncy on 2.0.2, it could work with 2.2.0).
I don't know where did you take version 1.5.0-RC1 - it's veeeery old...

Spark UI's kill is not killing Driver

I am trying to kill my spark-kafka streaming job from Spark UI. It is able to kill the application but the driver is still running.
Can anyone help me with this. I am good with my other streaming jobs. only one of the streaming jobs is giving this problem ever time.
I can't kill the driver through command or spark UI. Spark Master is alive.
Output i collected from logs is -
16/10/25 03:14:25 INFO BlockManagerMaster: Removed 0 successfully in removeExecutor
16/10/25 03:14:25 INFO SparkUI: Stopped Spark web UI at http://***:4040
16/10/25 03:14:25 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/10/25 03:14:25 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/10/25 03:14:35 INFO AppClient: Stop request to Master timed out; it may already be shut down.
16/10/25 03:14:35 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/10/25 03:14:35 INFO MemoryStore: MemoryStore cleared
16/10/25 03:14:35 INFO BlockManager: BlockManager stopped
16/10/25 03:14:35 INFO BlockManagerMaster: BlockManagerMaster stopped
16/10/25 03:14:35 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/10/25 03:14:35 INFO SparkContext: Successfully stopped SparkContext
16/10/25 03:14:35 ERROR Inbox: Ignoring error
org.apache.spark.SparkException: Exiting due to error from cluster scheduler: Master removed our application: KILLED
at org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:438)
at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:124)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:172)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/10/25 03:14:35 WARN NettyRpcEnv: Ignored message: true
16/10/25 03:14:35 WARN AppClient$ClientEndpoint: Connection to master:7077 failed; waiting for master to reconnect...
16/10/25 03:14:35 WARN AppClient$ClientEndpoint: Connection to master:7077 failed; waiting for master to reconnect...
Get the running driverId from spark UI, and hit the post rest call(spark master rest port like 6066) to kill the pipeline. I have tested it with spark 1.6.1
curl -X POST http://localhost:6066/v1/submissions/kill/driverId
Hope it helps...

Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/Logging

I'm new to Spark. I attempted to run a Spark app (.jar) on CDH 5.8.0-0 on Oracle VirtualBox 5.1.4r110228 which leveraged Spark Steaming to perform sentiment analysis on twitter. I have my twitter account created and all required (4) tokens were generated. I was blocked by the NoClassDefFoundError exception.
I've been googling around for a couple of days. The best advice I found so far was in the URL below but apparently my environment is still missing something.
http://javarevisited.blogspot.com/2011/06/noclassdeffounderror-exception-in.html#ixzz4Ia99dsp0
What does it mean by a library showed up in Compile by was missing at RunTime? How can we fix this?
What is the library of Logging? I came across an article stating this Logging is subject to be deprecated. Besides that, I do see log4j in my environment.
In my CDH 5.8, I'm running these versions of software:
Spark-2.0.0-bin-hadoop2.7 / spark-core_2.10-2.0.0
jdk-8u101-linux-x64 / jre-bu101-linux-x64
I appended the detail of the exception at the end. Here is the procedure I performed to execute the app and some verification I did after hitting the exception:
Unzip twitter-streaming.zip (the Spark app)
cd twitter-streaming
run ./sbt/sbt assembly
Update env.sh with your Twitter account
$ cat env.sh
export SPARK_HOME=/home/cloudera/spark-2.0.0-bin-hadoop2.7
export CONSUMER_KEY=<my_consumer_key>
export CONSUMER_SECRET=<my_consumer_secret>
export ACCESS_TOKEN=<my_twitterapp_access_token>
export ACCESS_TOKEN_SECRET=<my_twitterapp_access_token>
The submit.sh script wrapped up the spark-submit command with required credential info in env.sh:
$ cat submit.sh
source ./env.sh
$SPARK_HOME/bin/spark-submit --class "TwitterStreamingApp" --master local[*] ./target/scala-2.10/twitter-streaming-assembly-1.0.jar $CONSUMER_KEY $CONSUMER_SECRET $ACCESS_TOKEN $ACCESS_TOKEN_SECRET
The log of the assembly process:
[cloudera#quickstart twitter-streaming]$ ./sbt/sbt assembly
Launching sbt from sbt/sbt-launch-0.13.7.jar
[info] Loading project definition from /home/cloudera/workspace/twitter-streaming/project
[info] Set current project to twitter-streaming (in build file:/home/cloudera/workspace/twitter-streaming/)
[info] Including: twitter4j-stream-3.0.3.jar
[info] Including: twitter4j-core-3.0.3.jar
[info] Including: scala-library.jar
[info] Including: unused-1.0.0.jar
[info] Including: spark-streaming-twitter_2.10-1.4.1.jar
[info] Checking every *.class/*.jar file's SHA-1.
[info] Merging files...
[warn] Merging 'META-INF/LICENSE.txt' with strategy 'first'
[warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.spark-project.spark/unused/pom.properties' with strategy 'first'
[warn] Merging 'META-INF/maven/org.spark-project.spark/unused/pom.xml' with strategy 'first'
[warn] Merging 'log4j.properties' with strategy 'discard'
[warn] Merging 'org/apache/spark/unused/UnusedStubClass.class' with strategy 'first'
[warn] Strategy 'discard' was applied to 2 files
[warn] Strategy 'first' was applied to 4 files
[info] SHA-1: 69146d6fdecc2a97e346d36fafc86c2819d5bd8f
[info] Packaging /home/cloudera/workspace/twitter-streaming/target/scala-2.10/twitter-streaming-assembly-1.0.jar ...
[info] Done packaging.
[success] Total time: 6 s, completed Aug 27, 2016 11:58:03 AM
Not sure exactly what it means but everything looked good when I ran Hadoop NativeCheck:
$ hadoop checknative -a
16/08/27 13:27:22 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
16/08/27 13:27:22 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1
lz4: true revision:10301
bzip2: true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so
Here is the console log of my exception:
$ ./submit.sh
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/28 20:13:23 INFO SparkContext: Running Spark version 2.0.0
16/08/28 20:13:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/28 20:13:24 WARN Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth0)
16/08/28 20:13:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/08/28 20:13:24 INFO SecurityManager: Changing view acls to: cloudera
16/08/28 20:13:24 INFO SecurityManager: Changing modify acls to: cloudera
16/08/28 20:13:24 INFO SecurityManager: Changing view acls groups to:
16/08/28 20:13:24 INFO SecurityManager: Changing modify acls groups to:
16/08/28 20:13:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); groups with view permissions: Set(); users with modify permissions: Set(cloudera); groups with modify permissions: Set()
16/08/28 20:13:25 INFO Utils: Successfully started service 'sparkDriver' on port 37550.
16/08/28 20:13:25 INFO SparkEnv: Registering MapOutputTracker
16/08/28 20:13:25 INFO SparkEnv: Registering BlockManagerMaster
16/08/28 20:13:25 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-37a0492e-67e3-4ad5-ac38-40448c25d523
16/08/28 20:13:25 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/08/28 20:13:25 INFO SparkEnv: Registering OutputCommitCoordinator
16/08/28 20:13:25 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/08/28 20:13:25 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040
16/08/28 20:13:25 INFO SparkContext: Added JAR file:/home/cloudera/workspace/twitter-streaming/target/scala-2.10/twitter-streaming-assembly-1.1.jar at spark://10.0.2.15:37550/jars/twitter-streaming-assembly-1.1.jar with timestamp 1472440405882
16/08/28 20:13:26 INFO Executor: Starting executor ID driver on host localhost
16/08/28 20:13:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41264.
16/08/28 20:13:26 INFO NettyBlockTransferService: Server created on 10.0.2.15:41264
16/08/28 20:13:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 41264)
16/08/28 20:13:26 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:41264 with 366.3 MB RAM, BlockManagerId(driver, 10.0.2.15, 41264)
16/08/28 20:13:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 41264)
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)I
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.streaming.twitter.TwitterUtils$.createStream(TwitterUtils.scala:44)
at TwitterStreamingApp$.main(TwitterStreamingApp.scala:42)
at TwitterStreamingApp.main(TwitterStreamingApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 23 more
16/08/28 20:13:26 INFO SparkContext: Invoking stop() from shutdown hook
16/08/28 20:13:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
16/08/28 20:13:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/08/28 20:13:26 INFO MemoryStore: MemoryStore cleared
16/08/28 20:13:26 INFO BlockManager: BlockManager stopped
16/08/28 20:13:26 INFO BlockManagerMaster: BlockManagerMaster stopped
16/08/28 20:13:26 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/08/28 20:13:26 INFO SparkContext: Successfully stopped SparkContext
16/08/28 20:13:26 INFO ShutdownHookManager: Shutdown hook called
16/08/28 20:13:26 INFO ShutdownHookManager: Deleting directory /tmp/spark-5e29c3b2-74c2-4d89-970f-5be89d176b26
I understand my post was lengthy. Your advice or insights are highly appreciated!!
-jsung8
Use: spark-core_2.11-1.5.2.jar
I had the same problem described by #jsung8 and tried to find the .jar suggested by #youngstephen but could not. However linking in spark-core_2.11-1.5.2.jar instead of spark-core_2.11-1.5.2.logging.jar resolved the exception in the way #youngstephen suggested.
org/apache/spark/Logging was removed after spark 1.5.2.
Since your spark-core version is 2.0, then the simplest solution is:
download a single spark-core_2.11-1.5.2.logging.jar and put it in the jars directory under your spark root directory.
Anyway, It solves my problem, hope it helps.
One reason that may cause this problem is lib and class conflict.
I faced this problem and solved it using some maven exclusions:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.0.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
You're using an old version of the Spark Twitter connector. This class from your stack trace hints at that:
org.apache.spark.streaming.twitter.TwitterUtils
Spark removed that integration in version 2.0. You're using the one from an old Spark version that references the old Logging class which moved to a different package in Spark 2.0.
If you want to use Spark 2.0, you'll need to use the Twitter connector from the Bahir project.
Spark core version should be degraded to 1.5 due to the below error
java.lang.NoClassDefFoundError: org/apache/spark/Logging
http://bahir.apache.org/docs/spark/2.0.0/spark-streaming-twitter/ provides the better solution for this. By adding the below dependency, my issue was resolved.
<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>2.0.0</version>
</dependency>

spark-cassandra java.lang.NoClassDefFoundError: com/datastax/spark/connector/japi/CassandraJavaUtil

16/04/26 16:58:46 DEBUG ProtobufRpcEngine: Call: complete took 3ms
Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/japi/CassandraJavaUtil
at com.baitic.mcava.lecturahdfssaveincassandra.TratamientoCSV.main(TratamientoCSV.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.japi.CassandraJavaUtil
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
16/04/26 16:58:46 INFO SparkContext: Invoking stop() from shutdown hook
16/04/26 16:58:46 INFO SparkUI: Stopped Spark web UI at http://10.128.0.5:4040
16/04/26 16:58:46 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/04/26 16:58:46 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/04/26 16:58:46 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/04/26 16:58:46 INFO MemoryStore: MemoryStore cleared
16/04/26 16:58:46 INFO BlockManager: BlockManager stopped
16/04/26 16:58:46 INFO BlockManagerMaster: BlockManagerMaster stopped
16/04/26 16:58:46 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/04/26 16:58:46 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/04/26 16:58:46 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/04/26 16:58:46 INFO SparkContext: Successfully stopped SparkContext
16/04/26 16:58:46 INFO ShutdownHookManager: Shutdown hook called
16/04/26 16:58:46 INFO ShutdownHookManager: Deleting directory /srv/spark/tmp/spark-2bf57fa2-a2d5-4f8a-980c-994e56b61c44
16/04/26 16:58:46 DEBUG Client: stopping client from cache: org.apache.hadoop.ipc.Client#3fb9a67f
16/04/26 16:58:46 DEBUG Client: removing client from cache: org.apache.hadoop.ipc.Client#3fb9a67f
16/04/26 16:58:46 DEBUG Client: stopping actual client because no more references remain: org.apache.hadoop.ipc.Client#3fb9a67f
16/04/26 16:58:46 DEBUG Client: Stopping client
16/04/26 16:58:46 DEBUG Client: IPC Client (2107841088) connection to mcava-master/10.128.0.5:54310 from baiticpruebas2: closed
16/04/26 16:58:46 DEBUG Client: IPC Client (2107841088) connection to mcava-master/10.128.0.5:54310 from baiticpruebas2: stopped, remaining connections 0
16/04/26 16:58:46 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
i make this simple code:
/ String pathDatosTratados="hdfs://mcava-master:54310/srv/hadoop/data/spark/DatosApp/medidasSensorTratadas.txt";
String jarPath ="hdfs://mcava-master:54310/srv/hadoop/data/spark/original-LecturaHDFSsaveInCassandra-1.0-SNAPSHOT.jar";
String jar="hdfs://mcava-master:54310/srv/hadoop/data/spark/spark-cassandra-connector-assembly-1.6.0-M1-4-g6f01cfe.jar";
String jar2="hdfs://mcava-master:54310/srv/hadoop/data/spark/spark-cassandra-connector-java-assembly-1.6.0-M1-4-g6f01cfe.jar";
String[] jars= new String[3];
jars[0]=jarPath;
jars[2]=jar;
jars[1]=jar2;
SparkConf conf=new SparkConf().setAppName("TratamientoCSV").setJars(jars);
conf.set("spark.cassandra.connection.host", "10.128.0.5");
conf.set("spark.kryoserializer.buffer.max","512");
conf.set("spark.kryoserializer.buffer","256");
// conf.setJars(jars);
JavaSparkContext sc= new JavaSparkContext(conf);
JavaRDD<String> input= sc.textFile(pathDatos);
i also put the path to cassandra drive in spark-default.conf
spark.driver.extraClassPath hdfs://mcava-master:54310/srv/hadoop/data/spark/spark-cassandra-connector-java-assembly-1.6.0-M1-4-g6f01cfe.jar
spark.executor.extraClassPath hdfs://mcava-master:54310/srv/hadoop/data/spark/spark-cassandra-connector-java-assembly-1.6.0-M1-4-g6f01cfe.jar
i also put the flag --jars to the path of driver but i have always the same error i do not understand why??
i work in google engine
Try to add package when you submit your app.
$SPARK_HOME/bin/spark-submit --packages datastax:spark-cassandra-connector:1.6.0-M2-s_2.11 ....
I add this argument to solve this problem: --packages datastax:spark-cassandra-connector:1.6.0-M2-s_2.10.
At least for 3.0+ spark cassandra connector, the official assembly jar works well for me. It has all the necessary dependencies.
i solve the problem... i maked a fat jar with all dependencies and it not necessary to indicate the references to the cassandra connector only the reference to the fat jar.
I used Spark in my Java programm, and had the same issue.
The problem was, because I didn`t include spark-cassandra-connector into my maven dependencies of my project.
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.0.7</version> <!-- Check actual version in maven repo -->
</dependency>
After that I builld fat jar with all my dependencies - and it`s worked!
Maybe it will help someone

Resources