Spark Core (1.5.2) and Scala.collection.mutable.Map [duplicate] - apache-spark

I have a simple scala object file with the following content:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object X {
def main(args: Array[String]) {
val params = Map[String, String](
"abc" -> "22",)
println("Creating Spark Configuration");
val conf = new SparkConf().setAppName("X")
val sc = new SparkContext(conf)
val txtFileLines = sc.textFile("/tmp/x.txt", 2).cache()
val count = txtFileLines.count()
println("Count" + count)
}
}
My build.sbt looks like:
name := "x"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" % "provided"
I then do sbt package to create x.jar under target/scala-2.11/
When I execute the above code as:
spark-submit --class X --master local[2] x.jar
I get the following error:
Creating Spark Configuration
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at Sweeper$.main(Sweeper.scala:14)
at Sweeper.main(Sweeper.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

As you are using Scala 2.11 in your project. You should use spark core library build for Scala 2.11.
Can download spark-core_2.11 from here http://mvnrepository.com/search?q=Spark
Refer spark-core_2.11 jar in project.

Related

Error while deploy: Class org.apache.hadoop.fs.LocalFileSystem not found

I am trying to write a parquet file in Scala/sbt. My code works fine on my computer but always fails when deploy on a server with Jenkins.
I have the following error:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:288)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.parquet.hadoop.util.HadoopOutputFile.fromPath(HadoopOutputFile.java:58)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:677)
at com.github.mjakubowski84.parquet4s.ParquetWriter$.internalWriter(ParquetWriter.scala:129)
at com.github.mjakubowski84.parquet4s.ParquetWriterImpl.<init>(ParquetWriter.scala:186)
at com.github.mjakubowski84.parquet4s.ParquetWriter$BuilderImpl.build(ParquetWriter.scala:111)
at com.github.mjakubowski84.parquet4s.ParquetWriter$BuilderImpl.writeAndClose(ParquetWriter.scala:113)
at ParquetExport$.$anonfun$tryExport$1(ParquetExport.scala:307)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:658)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
... 29 more
I first tried to use spark:
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.3.0"
And then changed to code to work with these:
libraryDependencies ++= Seq(
"com.github.mjakubowski84" %% "parquet4s-core" % "2.6.0",
"org.apache.hadoop" % "hadoop-client" % "2.10.2"
)
And still encountering the same error.
Setting the Hadoop configuration didn't help:
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getname)
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getname)
Neither did changing the ClassLoader:
Thread.currentThread.setContextClassLoader(getClass.getClassLoader)
Everything work fine in local but not on the server. Any idea?
I already faced the same issue.
If you look at your logs :
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
You understand that it can't find the name of the class LocalFileSystem you set to your hadoop config.
Try to change this :
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getname)
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getname)
To :
val hadoopConfig = new Configuration()
hadoopConfig.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem")
hadoopConfig.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem")

I save a DataFrame in Hbase and I get: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/TableDescriptor

I created a project on Apache Spark.
Version:
scala 2.11.8
apache spark 2.3.0
apache hbase 1.2.0
hortonworks shc 1.1.0.3.1.2.0-4 (the hortonworks connector)
I need to save a simple DataFrame in an HBase table. For this I started HBase 1.2.0 in Docker container (https://github.com/zhao-y/docker-hbase-pseudo) and created the following table:
$ hbase(main):002:0> create "table1", "cf1", "cf2", "cf3", "cf4", "cf5", "cf6", "cf7", "cf8"
$ 0 row (s) in 1.4440 seconds
To save a DataFrame in Hbase I use: https://github.com/hortonworks-spark/shc
I declared the catalog exactly as in the example
I created a catalog-based dataframe
I tried to save dataframe in hbase as in example:
dataFrame.write.options(
Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5"))
.format("org.apache.spark.sql.execution.datasources.hbase")
.save()
Code:
import org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.junit.Test
class SparkTest {
case class HBaseRecord(
col0: String,
col1: Boolean,
col2: Double,
col3: Float,
col4: Int,
col5: Long,
col6: Short,
col7: String,
col8: Byte)
object HBaseRecord {
def apply(i: Int, t: String): HBaseRecord = {
val s = s"""row${"%03d".format(i)}"""
HBaseRecord(s,
i % 2 == 0,
i.toDouble,
i.toFloat,
i,
i.toLong,
i.toShort,
s"String$i: $t",
i.toByte)
}
}
#Test
def bar(): Unit = {
val sparkSession = SparkSession.builder
.appName("SparkTest")
.master("local[*]")
.config("spark.testing.memory", 2147480000)
.getOrCreate()
val data = (0 to 255).map { i => HBaseRecord(i, "extra") }
val dataFrame = sparkSession.createDataFrame(data)
dataFrame.show
dataFrame.write.options(
Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5"))
.format("org.apache.spark.sql.execution.datasources.hbase")
.save()
}
}
Error:
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/TableDescriptor
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:63)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
at SparkTest.bar(SparkTest.scala:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:59)
at org.junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java:98)
at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:79)
at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:87)
at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77)
at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:42)
at org.junit.internal.runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88)
at org.junit.internal.runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51)
at org.junit.internal.runners.JUnit4ClassRunner$1.run(JUnit4ClassRunner.java:44)
at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:27)
at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37)
at org.junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java:42)
at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.TableDescriptor
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 41 more
val sparkSession = SparkSession.builder
.appName("SparkTest")
.master("local[*]")
.config("spark.testing.memory", 2147480000)
.getOrCreate()
means you are running that in local and your hbase client jar is missing. (if its there in classpath then you can change the scope to runtime rather than compile)
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.1.4</version>
</dependency>
if you are using intellij to run locally, you can see hbase client jar is present in the .iml file.
normal way of runnning in cluster or client modes(not local) would be hbase claasspath add it to
export HBASE_CLASSPATH=$HBASE_CLASSPATH:`hbase classpath`
which will add all the hbase jars in to the classpath
to see/print all the jars in classpath below will be helpful to understand which jars in your classpath.
def urlsinclasspath(cl: ClassLoader): Array[java.net.URL] = cl match {
case null => Array()
case u: java.net.URLClassLoader => u.getURLs() ++ urlsinclasspath(cl.getParent)
case _ => urlsinclasspath(cl.getParent)
}
Caller would be...
val urls = urlsinclasspath(getClass.getClassLoader).foreach(println)

Spark Streaming works in Local mode but "stages fail" with "could not initialize class" in Client/Cluster mode

I have a Spark + Kafka streaming app that runs fine in Local mode, however when I try to launch it in yarn + local/cluster mode I get several errors like below
The first error I always see is
WARN TaskSetManager: Lost task 1.1 in stage 3.0 (TID 9, ip-xxx-24-129-36.ec2.internal, executor 2): java.lang.NoClassDefFoundError: Could not initialize class TestStreaming$
at TestStreaming$$anonfun$main$1$$anonfun$apply$1.apply(TestStreaming.scala:60)
at TestStreaming$$anonfun$main$1$$anonfun$apply$1.apply(TestStreaming.scala:59)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:917)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:917)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Next error I get is
ERROR JobScheduler: Error running job streaming job 1541786030000 ms.0
followed by
java.lang.NoClassDefFoundError: Could not initialize class
Spark version 2.1.0
Scala 2.11
Kafka version 10
Part of my code when I launch it loads the config in main. I pass this config file at runtime with -conf AFTER the jar (see below). I'm not quite sure but must I pass this config to the executors as well?
I launch my streaming app with the command below. One shows Local mode, the other shows client mode.
runJar = myProgram.jar
loggerPath=/path/to/log4j.properties
mainClass=TestStreaming
logger=-DPHDTKafkaConsumer.app.log4j=$loggerPath
confFile=application.conf
-----------Local Mode----------
SPARK_KAFKA_VERSION=0.10 nohup spark2-submit --driver-java-options
"$logger" --conf "spark.executor.extraJavaOptions=$logger" --class
$mainClass --master local[4] $runJar -conf $confFile &
-----------Client Mode----------
SPARK_KAFKA_VERSION=0.10 nohup spark2-submit --master yarn --conf >"spark.executor.extraJavaOptions=$logger" --conf >"spark.driver.extraJavaOptions=$logger" --class $mainClass $runJar -conf >$confFile &
Here is my code below. Been battling this for over a week now.
import Util.UtilFunctions
import UtilFunctions.config
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkConf
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
import org.apache.spark.streaming.kafka010.KafkaUtils
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.log4j.Logger
object TestStreaming extends Serializable {
#transient lazy val logger: Logger = Logger.getLogger(getClass.getName)
def main(args: Array[String]) {
logger.info("Starting app")
UtilFunctions.loadConfig(args)
UtilFunctions.loadLogger()
val props: Map[String, String] = setKafkaProperties()
val topic = Set(config.getString("config.TOPIC_NAME"))
val conf = new SparkConf()
.setAppName(config.getString("config.SPARK_APP_NAME"))
.set("spark.streaming.backpressure.enabled", "true")
val spark = SparkSession.builder()
.config(conf)
.getOrCreate()
val ssc = new StreamingContext(spark.sparkContext, Seconds(10))
ssc.sparkContext.setLogLevel("INFO")
ssc.checkpoint(config.getString("config.SPARK_CHECKPOINT_NAME"))
val kafkaStream = KafkaUtils.createDirectStream[String, String](ssc, PreferConsistent, Subscribe[String, String](topic, props))
val distRecordsStream = kafkaStream.map(record => (record.key(), record.value()))
distRecordsStream.window(Seconds(10), Seconds(10))
distRecordsStream.foreachRDD(rdd => {
if(!rdd.isEmpty()) {
rdd.foreach(record => {
println(record._2) //value from kafka
})
}
})
ssc.start()
ssc.awaitTermination()
ssc.stop()
}
def setKafkaProperties(): Map[String, String] = {
val deserializer = "org.apache.kafka.common.serialization.StringDeserializer"
val zookeeper = config.getString("config.ZOOKEEPER")
val offsetReset = config.getString("config.OFFSET_RESET")
val brokers = config.getString("config.BROKERS")
val groupID = config.getString("config.GROUP_ID")
val autoCommit = config.getString("config.AUTO_COMMIT")
val maxPollRecords = config.getString("config.MAX_POLL_RECORDS")
val maxPollIntervalms = config.getString("config.MAX_POLL_INTERVAL_MS")
val props = Map(
"bootstrap.servers" -> brokers,
"zookeeper.connect" -> zookeeper,
"group.id" -> groupID,
"key.deserializer" -> deserializer,
"value.deserializer" -> deserializer,
"enable.auto.commit" -> autoCommit,
"auto.offset.reset" -> offsetReset,
"max.poll.records" -> maxPollRecords,
"max.poll.interval.ms" -> maxPollIntervalms)
props
}
}

Exception in thread "main" java.lang.NoClassDefFoundError: com/typesafe/config/ConfigFactory

I'm getting the above error message when I'm trying to run spark submit comment :
spark-submit --class "retail.DataValidator" --master local --executor-memory 2g --total-executor-cores 2 sample-spark-180417_2.11-1.0.jar /home/hduser/Downloads/inputfiles/ /home/hduser/output/
ERROR Message:
Exception in thread "main" java.lang.NoClassDefFoundError: com/typesafe/config/ConfigFactory
at retail.DataValidator$.main(DataValidator.scala:12)
at retail.DataValidator.main(DataValidator.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.typesafe.config.ConfigFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
build.sbt file:
name := "sample-spark-180417"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0"
libraryDependencies += "com.typesafe" % "config" % "1.3.1"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
libraryDependencies += "org.apache.spark" % "spark-hive_2.11" % "2.1.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.42"
libraryDependencies += "org.scala-lang" % "scala-swing" % "2.10+"
I'm not having any maven dependencies or pom.xml file.
Thanks
As it's not a fat jar, the spark cluster is not having typesafe jar in its classpath.
Submit the spark job as::
spark-submit --jars ./typesafe-***.jar --class "retail.DataValidator" --master local --executor-memory 2g --total-executor-cores 2 sample-spark-180417_2.11-1.0.jar /home/hduser/Downloads/inputfiles/ /home/hduser/output/
It will keep that jar in classpath, and submit the job.

Unable to find CassandraSQLContext

I am using spark 1.6 prebuild with hadoop 2.6
spark cassandra connector 1.6
cassandra 2.1.12
I wrote a simple scala program to run simple select count(*) query on cassandra here is my code
import org.apache.spark.{SparkContext, SparkConf}
import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra.CassandraSQLContext
import org.apache.spark.sql._
object Hi {
def main(args: Array[String])
{
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "172.16.4.196")
val sc = new SparkContext("spark://naresh-pc:7077", "test", conf)
val csc = new CassandraSQLContext(sc)
val rdd1 = csc.sql("SELECT count(*) from cw.testdata")
println(rdd1.count)
println(rdd1.first)
}
}
it is successfully building with sbt assembly and creating jar
when i submit using spark submit
it gives the following error
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/cassandra/CassandraSQLContext
at Hi$.main(trySpark.scala:15)
at Hi.main(trySpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.cassandra.CassandraSQLContext
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
Any help on this ?
Moreover when i run with spark-shell it works fine

Resources