Just installed spark and scala. Returns unsupported class file major version: 58 - apache-spark

I just installed Scala and Spark. I ran the following code on spark shell
scala> val data = Array(1,2,3,4,5)
scala> val rdd1 = sc.parallelize(data)
scala> rdd1.collect()
It returns the following error messages
java.lang.IllegalArgumentException: Unsupported class file major version 58
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148)
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:237)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:49)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:517) at
org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:500) at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:500)
at org.apache.xbean.asm6.ClassReader.readCode(ClassReader.java:2175)
at org.apache.xbean.asm6.ClassReader.readMethod(ClassReader.java:1238)
at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:631)
at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:355)
at
org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:307)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:306)
at scala.collection.immutable.List.foreach(List.scala:392) at
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:306) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2100)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1409)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.RDD.take(RDD.scala:1382)
... 49 elided
I have installed Java 8 and spark-2.4.5-bin-hadoop2.6, and jdk-14.0.1. I have window 10. The error messages is totally incomprehensible. Any advice would be appreciated.

This error is caused due to the wrong java version being used with spark as some classes in different versions are either removed or changed.
If you want to set java environment for spark on yarn, you can set it before spark-submit/spark-shell by adding the below in the command:
--conf spark.yarn.appMasterEnv.JAVA_HOME=/usr/java/jdk1.8.0_121 \
Or else specify the Java version in the spark environment configuration adding JAVA_HOME in conf/spark-env.sh https://spark.apache.org/docs/latest/configuration.html
Note that conf/spark-env.sh does not exist by default when Spark is installed. However, you can copy conf/spark-env.sh.template to create it. Make sure you make the copy executable.

Related

Unrecognized Hadoop major version number

I am trying to initialize an Apache Spark instance on Windows 10 to run a local test. My problem is during the initialization of the Spark instance, I get an error message. This code has worked for me a lot of times previously, so I am guessing something might have changed in the dependencies or the configuration. I am running using JDK version 1.8.0_192, Hadoop should be 3.0.0 and Spark version is 2.4.0. I am also using Maven as a build tool if that is relevant.
Here is the way I am setting up the session:
def withSparkSession(testMethod: SparkSession => Any) {
val uuid = UUID.randomUUID().toString
val pathRoot = s"C:/data/temp/spark-testcase/$uuid" // TODO: make this independent from Windows
val derbyRoot = s"C:/data/temp/spark-testcase/derby_system_root"
// TODO: clear me up -- Derby based metastore should be cleared up
System.setProperty("derby.system.home", s"${derbyRoot}")
val conf = new SparkConf()
.set("testcase.root.dir", s"${pathRoot}")
.set("spark.sql.warehouse.dir", s"${pathRoot}/test-hive-dwh")
.set("spark.sql.catalogImplementation", "hive")
.set("hive.exec.scratchdir", s"${pathRoot}/hive-scratchdir")
.set("hive.exec.dynamic.partition.mode", "nonstrict")
.setMaster("local[*]")
.setAppName("Spark Hive Test case")
val spark = SparkSession.builder()
.config(conf)
.enableHiveSupport()
.getOrCreate()
try {
testMethod(spark)
}
finally {
spark.sparkContext.stop()
println(s"Deleting test case root directory: $pathRoot")
deleteRecursively(nioPaths.get(pathRoot))
}
}
And this is the error message I receive:
An exception or error caused a run to abort.
java.lang.ExceptionInInitializerError
at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1117)
at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:866)
.
.
.
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSpecLike$$anon$1.apply(FunSpecLike.scala:454)
at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196)
at org.scalamock.scalatest.AbstractMockFactory$$anonfun$withFixture$1.apply(AbstractMockFactory.scala:35)
at org.scalamock.scalatest.AbstractMockFactory$$anonfun$withFixture$1.apply(AbstractMockFactory.scala:34)
at org.scalamock.MockFactoryBase$class.withExpectations(MockFactoryBase.scala:41)
at org.scalamock.scalatest.AbstractMockFactory$class.withFixture(AbstractMockFactory.scala:34)
at org.scalatest.FunSpecLike$class.invokeWithFixture$1(FunSpecLike.scala:451)
at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:464)
at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:464)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
at org.scalatest.FunSpecLike$class.runTest(FunSpecLike.scala:464)
at org.scalatest.FunSpec.runTest(FunSpec.scala:1630)
at org.scalatest.FunSpecLike$$anonfun$runTests$1.apply(FunSpecLike.scala:497)
at org.scalatest.FunSpecLike$$anonfun$runTests$1.apply(FunSpecLike.scala:497)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:373)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:410)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
at org.scalatest.FunSpecLike$class.runTests(FunSpecLike.scala:497)
at org.scalatest.FunSpec.runTests(FunSpec.scala:1630)
at org.scalatest.Suite$class.run(Suite.scala:1147)
at org.scalatest.FunSpec.org$scalatest$FunSpecLike$$super$run(FunSpec.scala:1630)
at org.scalatest.FunSpecLike$$anonfun$run$1.apply(FunSpecLike.scala:501)
at org.scalatest.FunSpecLike$$anonfun$run$1.apply(FunSpecLike.scala:501)
at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
at org.scalatest.FunSpecLike$class.run(FunSpecLike.scala:501)
at org.scalatest.FunSpec.run(FunSpec.scala:1630)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1346)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1340)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1010)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1506)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
at org.scalatest.tools.Runner$.run(Runner.scala:850)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:43)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:26)
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.0.0-cdh6.3.4
at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
... 64 more
Process finished with exit code 2
So far I tried changing up the JDK versions to jdk1.8.0_181 and jdk11+28-x64. I also tried deleting the HADOOP_HOME environment variables from the system, but they didn't help. (Currently they are set to C:\Data\devtools\hadoop-win\3.0.0)
If you're on windows, you shouldn't be pulling CDH dependencies (3.0.0-cdh6.3.4), as Cloudera doesn't support Windows, last I checked.
But, you should be using Spark3, if you have Hadoop3+, and keep HADOOP_HOME, as that is definitely necessary.
Also, only Hadoop 3.3.4 has introduced Java 11 runtime support, so Java 8 is what you should stick with.
I have solved the problem. During the project development we also added HBase to the build, which pulled in a different Hadoop version from Cloudera as its dependency, so the versions got mixed up. Taking it out HBase dependency from the pom.xml solved the problem.

Zeppelin Null Pointer Exception

I wrote this simple code in my zeppelin notebook
import org.apache.spark.sql.SQLContext
val sqlConext = new SQLContext(sc)
val df = sqlContext.read.format("csv").option("header", "true").load("hdfs:///user/admin/foo/2018.csv")
df.printSchema()
Earlier it was not able to find spark-csv. so I added it as a dependency to spark1 and spark2 interpreters. But when I run this code I get an error
java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:614)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:493)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
This file has just 300 rows. So I don't think it causes any memory issues. I have a 4 node cluster, so how can I determine where is the log file where a more detailed error may reside?
OK. I resolved it. It seems Zeppelin uses Scala 2.10 I had added dependency of Scala csv for version 2.11 that caused the null pointer error.
I went and changed my dependency to 2.10 and restarted the interpreter and now it works fine.

Spark read s3 using sc.textFile("s3a://bucket/filePath"). java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager

I have add blew jars to spark/jars path.
hadoop-aws-2.7.3.jar
aws-java-sdk-s3-1.11.126.jar
aws-java-sdk-core-1.11.126.jar
spark-2.1.0
In spark-shell
scala> sc.hadoopConfiguration.set("fs.s3a.access.key", "***")
scala> sc.hadoopConfiguration.set("fs.s3a.secret.key", "***")
scala> val f = sc.textFile("s3a://bucket/README.md")
scala> f.count
java.lang.NoSuchMethodError:
com.amazonaws.services.s3.transfer.TransferManager.(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
at
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at
org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258)
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) at
org.apache.spark.rdd.RDD.count(RDD.scala:1157) ... 48 elided
"java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager" is raised by mismatched jar? (hadoop-aws, aws-java-sdk)
To access data stored in Amazon S3 from Spark applications should use Hadoop file APIs. So is hadoop-aws.jar contains the Hadoop file APIS or must run hadoop env ?
Mismatched JARs; the AWS SDK is pretty brittle across versions.
Hadoop S3A code is in hadoop-aws JAR; also needs hadoop-common. Hadoop 2.7 is built against AWS S3 SDK 1.10.6. (*updated: No, it's 1.7.4. The move to 1.10.6 went into Hadoop 2.8)HADOOP-12269
You must use that version. If you want to use The 1.11 JARs then you will need to check out the hadoop source tree and build branch-2 yourself. The good news: that uses the shaded AWS SDK so its versions of jackson and joda time don't break things. Oh, and if you check out spark master, and build with the -Phadoop-cloud profile, it pulls the right stuff in to set Spark's dependencies up right.
Update: Oct 1 2017: Hadoop 2.9.0-alpha and 3.0-beta-1 use 1.11.199; assume the shipping versions will be that or more recent.

Spark submit throws error while using Hive tables

i have a strange error, i am trying to write data to hive, it works well in spark-shell, but while i am using spark-submit, it throwing database/table not found in default error.
Following is the coding i am trying to write in spark-submit , i am using custom build of spark 2.0.0
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
sqlContext.table("spark_schema.iris_ori")
Following is the command i am using,
/home/ec2-user/Spark_Source_Code/spark/bin/spark-submit --class TreeClassifiersModels --master local[*] /home/ec2-user/Spark_Snapshots/Spark_2.6/TreeClassifiersModels/target/scala-2.11/treeclassifiersmodels_2.11-1.0.3.jar /user/ec2-user/Input_Files/defPath/iris_spark SPECIES~LBL+PETAL_LENGTH+PETAL_WIDTH RAN_FOREST 0.7 123 12
Following is the Error,
16/05/20 09:05:18 INFO SparkSqlParser: Parsing command: spark_schema.measures_20160520090502
Exception in thread "main" org.apache.spark.sql.AnalysisException: Database 'spark_schema' does not exist;
at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:37)
at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:195)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:360)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:464)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:458)
at TreeClassifiersModels$.main(TreeClassifiersModels.scala:71)
at TreeClassifiersModels.main(TreeClassifiersModels.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:726)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:183)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:208)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:122)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
The issue was because of the deprecation happened on Spark Version 2.0.0. Hive Context was deprecated in Spark 2.0.0. To read/Write Hive tables on Spark 2.0.0 we need to use Spark session as follows.
val sparkSession = SparkSession.withHiveSupport(sc)

Spark-Cassandra Connector : Failed to open native connection to Cassandra

I am new to Spark and Cassandra. On trying to submit a spark job, I am getting an error while connecting to Cassandra.
Details:
Versions:
Spark : 1.3.1 (build for hadoop 2.6 or later : spark-1.3.1-bin-hadoop2.6)
Cassandra : 2.0
Spark-Cassandra-Connector: 1.3.0-M1
scala : 2.10.5
Spark and Cassandra is on a virtual cluster
Cluster details:
Spark Master : 192.168.101.13
Spark Slaves : 192.168.101.11 and 192.168.101.12
Cassandra Nodes: 192.168.101.11 (seed node) and 192.168.101.12
I am trying to submit a job through my client machine (laptop) - 172.16.0.6.
After googling for this error, I have made sure that I can ping all the machines on the cluster from the client machine : spark master/slaves and cassandra nodes and also disabled the firewall on all machines. But I am
still struggling with this error.
Cassandra.yaml
listen_address: 192.168.101.11 (192.168.101.12 on other cassandra node)
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: 192.168.101.11 (192.168.101.12 on other cassandra node)
rpc_port: 9160
I am trying to run a minimal sample job
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import com.datastax.spark.connector._
val rdd = sc.cassandraTable("test", "words")
rdd.toArray.foreach(println)
To submit the job, I use spark-shell (:paste the code in spark shell):
spark-shell --jars "/home/ameya/.m2/repository/com/datastax/spark/spark-cassandra-connector_2.10/1.3.0-M1/spark-cassandra-connector_2.10-1.3.0-M1.jar","/home/ameya/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.5/cassandra-driver-core-2.1.5.jar","/home/ameya/.m2/repository/com/google/collections/google-collections/1.0/google-collections-1.0.jar","/home/ameya/.m2/repository/io/netty/netty/3.8.0.Final/netty-3.8.0.Final.jar","/home/ameya/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.jar","/home/ameya/.m2/repository/io/dropwizard/metrics/metrics-core/3.1.0/metrics-core-3.1.0.jar","/home/ameya/.m2/repository/org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar","/home/ameya/.m2/repository/com/google/collections/google-collections/1.0/google-collections-1.0.jar","/home/ameya/.m2/repository/io/netty/netty/3.8.0.Final/netty-3.8.0.Final.jar","/home/ameya/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.jar","/home/ameya/.m2/repository/org/apache/cassandra/cassandra-clientutil/2.1.5/cassandra-clientutil-2.1.5.jar","/home/ameya/.m2/repository/joda-time/joda-time/2.3/joda-time-2.3.jar","/home/ameya/.m2/repository/org/apache/cassandra/cassandra-thrift/2.1.3/cassandra-thrift-2.1.3.jar","/home/ameya/.m2/repository/org/joda/joda-convert/1.2/joda-convert-1.2.jar","/home/ameya/.m2/repository/org/apache/thrift/libthrift/0.9.2/libthrift-0.9.2.jar","/home/ameya/.m2/repository/org/apache/thrift/libthrift/0.9.2/libthrift-0.9.2.jar" --master spark://192.168.101.13:7077 --conf spark.cassandra.connection.host=192.168.101.11 --conf spark.cassandra.auth.username=cassandra --conf spark.cassandra.auth.password=cassandra
The error I am getting:
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
**java.io.IOException: Failed to open native connection to Cassandra at {192.168.101.11}:9042**
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:181)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:167)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:167)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:76)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:104)
at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:115)
at com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:243)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.tableDef(CassandraTableRowReaderProvider.scala:49)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef$lzycompute(CassandraTableScanRDD.scala:59)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef(CassandraTableScanRDD.scala:59)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.verify(CassandraTableRowReaderProvider.scala:148)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.verify(CassandraTableScanRDD.scala:59)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:118)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)
at org.apache.spark.rdd.RDD.collect(RDD.scala:813)
at org.apache.spark.rdd.RDD.toArray(RDD.scala:833)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:50)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:52)
at $iwC$$iwC$$iwC.<init>(<console>:54)
at $iwC$$iwC.<init>(<console>:56)
at $iwC.<init>(<console>:58)
at <init>(<console>:60)
at .<init>(<console>:64)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
**Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /192.168.101.11:9042 (com.datastax.driver.core.TransportException: [/192.168.101.11:9042] Connection has been closed))**
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:223)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1236)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:333)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:174)
... 71 more
Can anyone point out what am I doing wrong here ?
you did not specified spark.cassandra.connection.host by default spark assume that cassandra host is same as spark master node.
var sc:SparkContext=_
val conf = new SparkConf().setAppName("Cassandra Demo").setMaster(master)
.set("spark.cassandra.connection.host", "192.168.101.11")
c=new SparkContext(conf)
val rdd = sc.cassandraTable("test", "words")
rdd.toArray.foreach(println)
it should work if you have properly set seed nodein cassandra.yaml
I struggled with this issue overnight, and finally got a combination that works. I am writing it down for those who may run into similar issue.
First of all, this is a version issue cassandra-driver-core's dependency. But to track down the exact combination that works takes me quite a bit time.
Secondly, this is the combination that works for me.
Spark 1.6.2 with Hadoop 2.6, cassandra 2.1.5 (Ubuntu 14.04, Java 1.8),
In built.sbt (sbt assembly, scalaVersion := "2.10.5"), use
"com.datastax.spark" %% "spark-cassandra-connector" % "1.4.0",
"com.datastax.cassandra" % "cassandra-driver-core" % "2.1.5"
Thirdly, let me clarify my frustrations. With spark-cassandra-connector 1.5.0, I can run the assembly with spark-submit with --master "local[2]" on the same machine with remote cassandra connection without any problem. Any combination of connector 1.5.0, 1.6.0 with Cassandra 2.0, 2.1, 2.2, 3,4 works well. But if I try to submit the job to a cluster from the same machine (NodeManager) with --master yarn --deploy-mode cluster, then I will always run into the problem: Failed to open native connection to Cassandra at {192.168.122.12}:9042
What is going on here? Any from DataStarX can take a look at this issue? I can only guess it has something to do with "cqlversion", which should match the version of Cassandra cluster.
Anybody know a better solution? [cassandra], [apache-spark]
It's worked finally :
steps :
set listen_address to private IP of EC2 instance.
do not set any broadcast_address
set rpc_address to 0.0.0.0
set broadcast_rpc_address to public ip of EC2 instance.
The issue resolved. It was due to some mess up with the dependencies. I built a jar with dependencies and passed it to spark-submit, instead of specifying dependent jars separately.
This is an issue with version of the cassandra-driver-core jar's dependency.
The provided cassandra's version is 2.0
The provided cassandra-driver-core jar's version is 2.1.5
The jar should be the same as the version of the cassandra running.
In this case, the included jar file should be cassandra-driver-core-2.0.0.jar

Resources