I'm running spark using the following Docker command:
docker run -it \
-p 8088:8088 -p 8042:8042 -p 50070:50070 \
-v "$(PWD)"/log4j.properties:/usr/local/spark/conf/log4j.properties \
-v "$(PWD)":/app -h sandbox sequenceiq/spark:1.6.0 bash
Running spark-submit --version reports version 1.6.0
My spark-submit command is the following:
spark-submit --class io.jobi.GithubDay \
--master local[*] \
--name "Daily Github Push Counter" \
/app/min-spark_2.11-1.0.jar \
"file:///app/data/github-archive/*.json" \
"/app/data/ghEmployees.txt" \
"file:///app/data/emp-gh-push-output" "json"
build.sbt
name := """min-spark"""
version := "1.0"
scalaVersion := "2.11.7"
lazy val sparkVersion = "1.6.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided"
)
// Change this to another test framework if you prefer
libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.4" % "test"
GithubDay.scala
package io.jobi
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import scala.io.Source.fromFile
/**
* Created by hammer on 7/15/16.
*/
object GithubDay {
def main(args: Array[String]): Unit = {
println("Application arguments: ")
args.foreach(println)
val conf = new SparkConf()
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
try {
println("args(0): " + args(0))
val ghLog = sqlContext.read.json(args(0))
val pushes = ghLog.filter("type = 'PushEvent'")
val grouped = pushes.groupBy("actor.login").count()
val ordered = grouped.orderBy(grouped("count").desc)
val employees = Set() ++ (
for {
line <- fromFile(args(1)).getLines()
} yield line.trim
)
val bcEmployees = sc.broadcast(employees)
import sqlContext.implicits._
println("register function")
val isEmployee = sqlContext.udf.register("SetContainsUdf", (u: String) => bcEmployees.value.contains(u))
println("registered udf")
val filtered = ordered.filter(isEmployee($"login"))
println("applied filter")
filtered.write.format(args(3)).save(args(2))
} finally {
sc.stop()
}
}
}
I build using sbt clean package but the output when I run it is:
Application arguments:
file:///app/data/github-archive/*.json
/app/data/ghEmployees.txt
file:///app/data/emp-gh-push-output
json
args(0): file:///app/data/github-archive/*.json
imported implicits
defined isEmp
register function
Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
at io.jobi.GithubDay$.main(GithubDay.scala:53)
at io.jobi.GithubDay.main(GithubDay.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
From what I've read NoSuchMethodError is a result of version incompatibilities, but I'm building with 1.6.0 and deploying to 1.6.0 so I don't understand what's happening.
Unless you've compiled Spark yourself, out of the box version 1.6.0 is compiled with Scala 2.10.x. This is stated in the docs (which says 1.6.2, but is also relevant to 1.6.0):
Spark runs on Java 7+, Python 2.6+ and R 3.1+. For the Scala API,
Spark 1.6.2 uses Scala 2.10. You will need to use a compatible Scala
version (2.10.x).
You want:
scalaVersion := "2.10.6"
One hint to that is that the error is in a Scala class: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)
Related
I tried to write a simple unit for Spark join from the tutorial - Apache Spark Unit Testing Part 2 — Spark SQL.
Added all dependencies from reference and
"org.apache.spark" %% "spark-core" % "2.4.7" % Test,
"org.apache.spark" %% "spark-core" % "2.4.7" % Test classifier "tests",
"org.scalatest" %% "scalatest" % "3.2.3" % Test,
but can't add extended QueryTest because it involves AnyFunSuite in private class. Now the code looks like
class TestScalaCheck extends AnyFunSuite with SharedSparkSession {
import testImplicits._
test("join - join using") {
val df = Seq(1, 2, 3).map(i => (i, i.toString)).toDF("int", "str")
val df2 = Seq(1, 2, 3).map(i => (i, (i + 1).toString)).toDF("int", "str")
checkAnswer(
df.join(df2, "int"),
Row(1, "1", "2") :: Row(2, "2", "3") :: Row(3, "3", "5") :: Nil)
}
And it generates the exception message:
An exception or error caused a run to abort: org.apache.spark.sql.test.SharedSparkSession.eventually(Lorg/scalatest/concurrent/PatienceConfiguration$Timeout;Lorg/scalatest/concurrent/PatienceConfiguration$Interval;Lscala/Function0;Lorg/scalactic/source/Position;)Ljava/lang/Object;
java.lang.NoSuchMethodError: org.apache.spark.sql.test.SharedSparkSession.eventually(Lorg/scalatest/concurrent/PatienceConfiguration$Timeout;Lorg/scalatest/concurrent/PatienceConfiguration$Interval;Lscala/Function0;Lorg/scalactic/source/Position;)Ljava/lang/Object;
at org.apache.spark.sql.test.SharedSparkSession.afterEach(SharedSparkSession.scala:135)
at org.apache.spark.sql.test.SharedSparkSession.afterEach$(SharedSparkSession.scala:129)
at lesson2.TestScalaCheck.afterEach(TestScalaCheck.scala:14)
at org.scalatest.BeforeAndAfterEach.$anonfun$runTest$1(BeforeAndAfterEach.scala:247)
at org.scalatest.Status.$anonfun$withAfterEffect$1(Status.scala:377)
at org.scalatest.Status.$anonfun$withAfterEffect$1$adapted(Status.scala:373)
at org.scalatest.SucceededStatus$.whenCompleted(Status.scala:462)
at org.scalatest.Status.withAfterEffect(Status.scala:373)
at org.scalatest.Status.withAfterEffect$(Status.scala:371)
at org.scalatest.SucceededStatus$.withAfterEffect(Status.scala:434)
at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:246)
at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
at lesson2.TestScalaCheck.runTest(TestScalaCheck.scala:14)
at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
at org.scalatest.Suite.run(Suite.scala:1112)
at org.scalatest.Suite.run$(Suite.scala:1094)
at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
at lesson2.TestScalaCheck.org$scalatest$BeforeAndAfterAll$$super$run(TestScalaCheck.scala:14)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at lesson2.TestScalaCheck.run(TestScalaCheck.scala:14)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1320)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1314)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:993)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:971)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
at org.scalatest.tools.Runner$.run(Runner.scala:798)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:38)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:25)
Could you help me to resolve it?
The original code is - DataFrameJoinSuite.scala
Learned so far, I think SparkSQL can create hive table,because of spark-sql is also improved based on shark,shark from hive.
hive create table
Does spark-sql create other nosql table ? such as hbase,cassandra,elasticsearch.
I searched the relevant documents, did not find spark-sql create table api,
Will it be supported in the future ?
creating indexes in Elasticsearch, tables in Hbase & Cassandra directly through spark-sql options are not possible at this moment.
you can use the hbase-client option for natively interact with hbase. - https://mvnrepository.com/artifact/org.apache.hbase/hbase-client
1. creating a table in Hbase
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.conf.Configuration;
val admin = new HBaseAdmin(conf)
if (!admin.tableExists(myTable)) {
val htd = new HTableDescriptor(myTable)
htd.addFamily(new HColumnDescriptor("id"))
htd.addFamily(new HColumnDescriptor("name"))
htd.addFamily(new HColumnDescriptor("country))
htd.addFamily(new HColumnDescriptor("pincode"))
admin.createTable(htd)
}
2. Write to Hbase table that is created.
import org.apache.spark.streaming.kafka010.KafkaUtils
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
import org.apache.hadoop.hbase.{HBaseConfiguration, HColumnDescriptor, HTableDescriptor}
import org.apache.spark.sql.SparkSession
import org.apache.hadoop.hbase.client.ConnectionFactory
import org.apache.hadoop.hbase.client.Connection
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.util.Bytes
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "cmaster.localcloud.com:9092,cworker2.localcloud.com:9092,cworker1.localcloud.com:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "use_a_separate_group_id_for_each_stream",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("test")
val spark = SparkSession.builder().master("local[8]").appName("KafkaSparkHBasePipeline").getOrCreate()
spark.sparkContext.setLogLevel("OFF")
val result = kafkaStream.map(record => (record.key, record.value))
result.foreachRDD(x => {
val cols = x.map(x => x._2.split(","))
val arr =cols.map(x => {
val id = x(0)
val name = x(1)
val country = x(2)
val pincode = x(3)
(id,name,country,pincode)
})
arr.foreachPartition { iter =>
val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.quorum", "cmaster.localcloud.com")
conf.set("hbase.rootdir", "hdfs://localhost:8020/hbase")
conf.set("hbase.zookeeper.property.clientPort", "2181")
conf.set("zookeeper.znode.parent", "/hbase")
conf.set("hbase.unsafe.stream.capability.enforce", "false")
conf.set("hbase.cluster.distributed", "true")
val conn = ConnectionFactory.createConnection(conf)
import org.apache.hadoop.hbase.TableName
val tableName = "htd"
val table = TableName.valueOf(tableName)
val HbaseTable = conn.getTable(table)
val cfPersonal = "personal"
iter.foreach(x => {
val keyValue = "Key_" + x._1
val id = new Put(Bytes.toBytes(keyValue))
val name = x._2.toString
val country = x._3.toString
val pincode = x._4.toString
id.addColumn(Bytes.toBytes(cfPersonal), Bytes.toBytes("name"), Bytes.toBytes(name))
id.addColumn(Bytes.toBytes(cfPersonal), Bytes.toBytes("country"), Bytes.toBytes(country))
id.addColumn(Bytes.toBytes(cfPersonal), Bytes.toBytes("pincode"), Bytes.toBytes(pincode))
HbaseTable.put(id)
})
HbaseTable.close()
conn.close()
}
})
3. dependency I have used for this project,
name := "KafkaSparkHBasePipeline"
version := "0.1"
scalaVersion := "2.11.8"
resolvers += "Mavenrepository" at "https://mvnrepository.com"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.2"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.2"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "2.2.0"
libraryDependencies += "org.apache.kafka" %% "kafka" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.4.3"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.4.3"
4. to verify the data in hbase.
smart#cmaster sathyadev]$ hbase shell
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html
#shell Version 2.1.0-cdh6.2.1, rUnknown, Wed Sep 11 01:05:56 PDT 2019 Took 0.0064 seconds
hbase(main):001:0> scan 'htd';
ROW COLUMN+CELL Key_1010 column=personal:country, timestamp=1584125319078, value=USA Key_1010 column=personal:name, timestamp=1584125319078, value=Mark Key_1010 column=personal:pincode, timestamp=1584125319078, value=54321 Key_1011 column=personal:country, timestamp=1584125320073, value=CA
Trying to migrate code from Spark 1.6, Scala 2.10 to Spark 2.4, Scala 2.11.
Cannot get the code to compile. Showing dependency versions, minimal example and compilation error below.
// Dependencies
, "org.apache.spark" %% "spark-core" % "2.4.0"
, "org.apache.spark" %% "spark-sql" % "2.4.0"
, "org.apache.hbase" % "hbase-server" % "1.2.0-cdh5.14.4"
, "org.apache.hbase" % "hbase-common" % "1.2.0-cdh5.14.4"
, "org.apache.hbase" % "hbase-spark" % "1.2.0-cdh5.14.4"
// Minimal example
package spark2.hbase
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.spark.HBaseContext
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
object ConnectToHBase {
def main(args: Array[String]): Unit = {
implicit val spark: SparkSession = SparkSession.builder.appName("Connect to HBase from Spark 2")
.config("spark.master", "local")
.getOrCreate()
implicit val sc: SparkContext = spark.sparkContext
val hbaseConf = HBaseConfiguration.create()
val hbaseContext = new HBaseContext(sc, hbaseConf)
}
}
// Compilation error
[error] missing or invalid dependency detected while loading class file 'HBaseContext.class'.
[error] Could not access type Logging in package org.apache.spark,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'HBaseContext.class' was compiled against an incompatible version of org.apache.spark.
This works:
lazy val sparkVer = "2.4.0-cdh6.2.0"
lazy val hbaseVer = "2.1.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVer
, "org.apache.spark" %% "spark-sql" % sparkVer
, "org.apache.spark" %% "spark-streaming" % sparkVer
, "org.apache.hbase" % "hbase-common" % hbaseVer
, "org.apache.hbase" % "hbase-client" % hbaseVer
, "org.apache.hbase.connectors.spark" % "hbase-spark" % "1.0.0"
)
The essential piece here is using Cloudera CDH 6 (not 5) and using a different version of "hbase-spark" because CDH 5 cannot work with Spark 2.
I am trying to save data in cassandra in a standalone mode from spark. By running following command:
bin/spark-submit --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
--class "pl.japila.spark.SparkMeApp" --master local /home/hduser2/code14/target/scala-2.10/simple-project_2.10-1.0.jar
My build.sbt file is :-
**name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.0"
resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven"
libraryDependencies += "datastax" % "spark-cassandra-connector" % "1.6.0-s_2.10"
libraryDependencies ++= Seq(
"org.apache.cassandra" % "cassandra-thrift" % "3.5" ,
"org.apache.cassandra" % "cassandra-clientutil" % "3.5",
"com.datastax.cassandra" % "cassandra-driver-core" % "3.0.0"
)**
My Spark code is :-
package pl.japila.spark
import org.apache.spark.sql._
import com.datastax.spark.connector._
import com.datastax.driver.core._
import com.datastax.spark.connector.cql._
import org.apache.spark.{SparkContext, SparkConf}
import com.datastax.driver.core.QueryOptions._
import org.apache.spark.SparkConf
import com.datastax.driver.core._
import com.datastax.spark.connector.rdd._
object SparkMeApp {
def main(args: Array[String]) {
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext("local", "test", conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val rdd = sc.cassandraTable("test", "kv")
val collection = sc.parallelize(Seq(("cat", 30), ("fox", 40)))
collection.saveToCassandra("test", "kv", SomeColumns("key", "value"))
}
}
And I got this error:-
Exception in thread "main" java.lang.NoSuchMethodError: com.datastax.driver.core.QueryOptions.setRefreshNodeIntervalMillis(I)Lcom/datastax/driver/core/QueryOptions;**
at com.datastax.spark.connector.cql.DefaultConnectionFactory$.clusterBuilder(CassandraConnectionFactory.scala:49)
at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:92)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:153)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
Versions used are :-
Spark - 1.6.0
Scala - 2.10.4
cassandra-driver-core jar - 3.0.0
cassandra version 2.2.7
spark-cassandra connector - 1.6.0-s_2.10
SOMEBODY PLEASE HELP !!
I would start by removing
libraryDependencies ++= Seq(
"org.apache.cassandra" % "cassandra-thrift" % "3.5" ,
"org.apache.cassandra" % "cassandra-clientutil" % "3.5",
"com.datastax.cassandra" % "cassandra-driver-core" % "3.0.0"
)
Since the libraries which are dependencies of the connector will be included automatically with the packages dependency.
Then I would test the packages resolution by launching the spark-shell with
./bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
you see the following resolutions happening correctly
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found datastax#spark-cassandra-connector;1.6.0-s_2.10 in spark-packages
found org.apache.cassandra#cassandra-clientutil;3.0.2 in list
found com.datastax.cassandra#cassandra-driver-core;3.0.0 in list
...
[2.10.5] org.scala-lang#scala-reflect;2.10.5
:: resolution report :: resolve 627ms :: artifacts dl 10ms
:: modules in use:
com.datastax.cassandra#cassandra-driver-core;3.0.0 from list in [default]
com.google.guava#guava;16.0.1 from list in [default]
com.twitter#jsr166e;1.1.0 from list in [default]
datastax#spark-cassandra-connector;1.6.0-s_2.10 from spark-packages in [default]
...
If these appear to resolve correctly but everything still doesn't work, I would try clearing out the cache for these artifacts.
it raise the error
Caused by: java.lang.NoSuchMethodError:
org.apache.hadoop.hbase.util.Addressing.getIpAddress()Ljava/net/InetAddress;
while I can success connect to hbase by using spark shell. could anyone knows where the problem is?
the detail error
15/07/01 18:57:57 ERROR yarn.ApplicationMaster: User class threw exception: java.io.IOException: java.lang.reflect.InvocationTargetException
java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
at com.koudai.resys.tmp.HbaseLearning$.main(HbaseLearning.scala:22)
at com.koudai.resys.tmp.HbaseLearning.main(HbaseLearning.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:483)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
... 9 more
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.util.Addressing.getIpAddress()Ljava/net/InetAddress;
at org.apache.hadoop.hbase.client.ClientIdGenerator.getIpAddressBytes(ClientIdGenerator.java:83)
at org.apache.hadoop.hbase.client.ClientIdGenerator.generateClientId(ClientIdGenerator.java:43)
at org.apache.hadoop.hbase.client.PerClientRandomNonceGenerator.<init>(PerClientRandomNonceGenerator.java:37)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:682)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:630)
... 14 more
the sbt config:
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.4.0" % "provided"
libraryDependencies += "org.apache.hbase" % "hbase" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-hadoop2-compat" % "1.1.0.1"
the running code:
val sc = new SparkConf().setAppName("hbase#user")
val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.property.clientPort", "2181")
conf.set("hbase.zookeeper.quorum", "idc02-rs-sfa-10")
// the error raised from here
val conn = ConnectionFactory.createConnection(conf)
using reflection to list the methods for org.apache.hadoop.hbase.util.Addressing found it is hbase 0.94 version, where could be it from?
parsePort
createHostAndPortStr
createInetSocketAddressFromHostAndPortStr
getIpAddress
getIp4Address
getIp6Address
parseHostname
isLocalAddress
wait
wait
wait
equals
toString
hashCode
getClass
notify
notifyAll
The problems is classpath conflict, somewhere exist hbase-0.94 which conflict with hbase-1.1.0.1
provided way how to identify such problems for others:
way 1: using reflection to identify the class
val methods = new org.apache.hadoop.hbase.util.Addressing().getClass.getMethods
methods.foreach(method=>println(method.getName))
way 2: print classpath on cloud node to debug
def urlses(cl: ClassLoader): Array[java.net.URL] = cl match {
case null => Array()
case u: java.net.URLClassLoader => u.getURLs() ++ urlses(cl.getParent)
case _ => urlses(cl.getParent)
}
val urls = urlses(getClass.getClassLoader)
urls.foreach(url=>println(url.toString))