Spark1.4.0 cannot connect to Hbase 1.1.0.1 - apache-spark

it raise the error
Caused by: java.lang.NoSuchMethodError:
org.apache.hadoop.hbase.util.Addressing.getIpAddress()Ljava/net/InetAddress;
while I can success connect to hbase by using spark shell. could anyone knows where the problem is?
the detail error
15/07/01 18:57:57 ERROR yarn.ApplicationMaster: User class threw exception: java.io.IOException: java.lang.reflect.InvocationTargetException
java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
at com.koudai.resys.tmp.HbaseLearning$.main(HbaseLearning.scala:22)
at com.koudai.resys.tmp.HbaseLearning.main(HbaseLearning.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:483)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
... 9 more
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.util.Addressing.getIpAddress()Ljava/net/InetAddress;
at org.apache.hadoop.hbase.client.ClientIdGenerator.getIpAddressBytes(ClientIdGenerator.java:83)
at org.apache.hadoop.hbase.client.ClientIdGenerator.generateClientId(ClientIdGenerator.java:43)
at org.apache.hadoop.hbase.client.PerClientRandomNonceGenerator.<init>(PerClientRandomNonceGenerator.java:37)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:682)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:630)
... 14 more
the sbt config:
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.4.0" % "provided"
libraryDependencies += "org.apache.hbase" % "hbase" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-hadoop2-compat" % "1.1.0.1"
the running code:
val sc = new SparkConf().setAppName("hbase#user")
val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.property.clientPort", "2181")
conf.set("hbase.zookeeper.quorum", "idc02-rs-sfa-10")
// the error raised from here
val conn = ConnectionFactory.createConnection(conf)
using reflection to list the methods for org.apache.hadoop.hbase.util.Addressing found it is hbase 0.94 version, where could be it from?
parsePort
createHostAndPortStr
createInetSocketAddressFromHostAndPortStr
getIpAddress
getIp4Address
getIp6Address
parseHostname
isLocalAddress
wait
wait
wait
equals
toString
hashCode
getClass
notify
notifyAll

The problems is classpath conflict, somewhere exist hbase-0.94 which conflict with hbase-1.1.0.1
provided way how to identify such problems for others:
way 1: using reflection to identify the class
val methods = new org.apache.hadoop.hbase.util.Addressing().getClass.getMethods
methods.foreach(method=>println(method.getName))
way 2: print classpath on cloud node to debug
def urlses(cl: ClassLoader): Array[java.net.URL] = cl match {
case null => Array()
case u: java.net.URLClassLoader => u.getURLs() ++ urlses(cl.getParent)
case _ => urlses(cl.getParent)
}
val urls = urlses(getClass.getClassLoader)
urls.foreach(url=>println(url.toString))

Related

Unable to run spark-testing-base test with Spark v3.2.1

I was trying to run Spark UT with spark-testing-base and scalatest, getting following exceptions:
[error] sbt.ForkMain$ForkError: java.lang.IncompatibleClassChangeError: Expected instance not static method org.scalatest.Assertions.assertionsHelper()Lorg/scalatest/Assertions$AssertionsHelper;
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.verifyOutput(StreamingSuiteBase.scala:77)
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.verifyOutput$(StreamingSuiteBase.scala:61)
[error] at com.central.spark.aggregation.streaming.BaseAggregatorSuite.verifyOutput(BaseAggregatorSuite.scala:23)
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.$anonfun$testOperation$1(StreamingSuiteBase.scala:162)
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.$anonfun$testOperation$1$adapted(StreamingSuiteBase.scala:158)
[error] at com.holdenkarau.spark.testing.StreamingSuiteCommon.withOutputAndStreamingContext(StreamingSuiteCommon.scala:122)
[error] at com.holdenkarau.spark.testing.StreamingSuiteCommon.withOutputAndStreamingContext$(StreamingSuiteCommon.scala:114)
[error] at com.central.spark.aggregation.streaming.BaseAggregatorSuite.withOutputAndStreamingContext(BaseAggregatorSuite.scala:23)
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.testOperation(StreamingSuiteBase.scala:158)
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.testOperation$(StreamingSuiteBase.scala:149)
[error] at com.central.spark.aggregation.streaming.BaseAggregatorSuite.testOperation(BaseAggregatorSuite.scala:23)
[error] at com.central.spark.aggregation.streaming.BaseAggregatorSuite.$anonfun$new$1(BaseAggregatorSuite.scala:89)
[error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) (truc)
The dependency and their versions are:
lazy val coreTestDeps = Seq(
"org.mockito" % "mockito-all" % 1.10.19 % "test",
"org.scalatest" %% "scalatest" % 3.2.12 % "it,test",
"net.sf.opencsv" % "opencsv" % 2.3 % "test",
"org.json4s" %% "json4s-native" % 3.7.0-M11 % "it,test",
"org.json4s" %% "json4s-jackson" % 3.7.0-M11 % "it,test",
"org.apache.spark" %% "spark-streaming" % 3.2.1 % "provided" classifier "tests",
"org.apache.spark" %% "spark-core" % 3.2.1 % "provided" classifier "tests",
"com.holdenkarau" %% "spark-testing-base" % "3.2.0_1.1.1" % "test",
"org.elasticsearch.client" % "elasticsearch-rest-high-level-client" % 7.9.3 % "it,test"
)
I tried downgrading the org.scalatest to 3.0.9 and other version, But it is not working. Have followings in build.sbt as suggested at spark-testing-base repo:
scalaVersion := "2.12.15",
Test / parallelExecution := false,
Test / fork := true,
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MetaspaceSize=2048M", "-XX:+CMSClassUnloadingEnabled")
Any suggestion on how should i proceed?
Appreciate any help.

SingleStore Spark Connector:NULLPointer Exception while Read/Write Operations

Read/Write Operations are working from spark shell. But throwing NULLPointer Exception when executed from Development IDE locally.
val df = spark.read
.format("singlestore")
.option("ddlEndpoint", "host:port")
.option("user", "xxxxx")
.option("password","xxxxx")
.option("database","xxxxx")
.load("schema.table_name")
I am getting the below error:
Exception in thread "main" java.lang.NullPointerException
at com.singlestore.spark.JdbcHelpers$ConnectionHelpers.withStatement(JdbcHelpers.scala:26)
at com.singlestore.spark.JdbcHelpers$.getSinglestoreVersion(JdbcHelpers.scala:322)
at com.singlestore.spark.SQLGen$SQLGenContext$.getSinglestoreVersion(SQLGen.scala:532)
at com.singlestore.spark.SQLGen$SQLGenContext$.apply(SQLGen.scala:550)
at com.singlestore.spark.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:197)
Below dependencies are set in the sbt.build file.
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.7"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.7"
libraryDependencies += "com.singlestore" % "singlestore-spark-connector_2.11" % "3.1.2-spark-2.4.7"
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "3.0.4"
Could someone please help me to resolve this . Thanks!
It looks like you need to update one of your dependencies in the sbt.build file.
Try replacing:
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "3.0.4"
with
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "2.+"

Spark 2 Connect to HBase

Trying to migrate code from Spark 1.6, Scala 2.10 to Spark 2.4, Scala 2.11.
Cannot get the code to compile. Showing dependency versions, minimal example and compilation error below.
// Dependencies
, "org.apache.spark" %% "spark-core" % "2.4.0"
, "org.apache.spark" %% "spark-sql" % "2.4.0"
, "org.apache.hbase" % "hbase-server" % "1.2.0-cdh5.14.4"
, "org.apache.hbase" % "hbase-common" % "1.2.0-cdh5.14.4"
, "org.apache.hbase" % "hbase-spark" % "1.2.0-cdh5.14.4"
// Minimal example
package spark2.hbase
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.spark.HBaseContext
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
object ConnectToHBase {
def main(args: Array[String]): Unit = {
implicit val spark: SparkSession = SparkSession.builder.appName("Connect to HBase from Spark 2")
.config("spark.master", "local")
.getOrCreate()
implicit val sc: SparkContext = spark.sparkContext
val hbaseConf = HBaseConfiguration.create()
val hbaseContext = new HBaseContext(sc, hbaseConf)
}
}
// Compilation error
[error] missing or invalid dependency detected while loading class file 'HBaseContext.class'.
[error] Could not access type Logging in package org.apache.spark,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'HBaseContext.class' was compiled against an incompatible version of org.apache.spark.
This works:
lazy val sparkVer = "2.4.0-cdh6.2.0"
lazy val hbaseVer = "2.1.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVer
, "org.apache.spark" %% "spark-sql" % sparkVer
, "org.apache.spark" %% "spark-streaming" % sparkVer
, "org.apache.hbase" % "hbase-common" % hbaseVer
, "org.apache.hbase" % "hbase-client" % hbaseVer
, "org.apache.hbase.connectors.spark" % "hbase-spark" % "1.0.0"
)
The essential piece here is using Cloudera CDH 6 (not 5) and using a different version of "hbase-spark" because CDH 5 cannot work with Spark 2.

spark sql error when reading data from Avro Table

When I try reading data from an avro table using spark-sql, I am getting this error.
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories(AvroObjectInspectorGenerator.java:142)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:91)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:121)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspector(AvroObjectInspectorGenerator.java:83)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.<init>(AvroObjectInspectorGenerator.java:56)
This is my sbt file
val sparkVersion = "2.4.2"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % sparkVersion
)
libraryDependencies += "com.databricks" %% "spark-avro" % "4.0.0"
Do I need to add any dependencies? THe code works fine in hive, but spark is having issues.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SQLContext

I am using IntelliJ 2016.3 version.
import sbt.Keys._
import sbt._
object ApplicationBuild extends Build {
object Versions {
val spark = "1.6.3"
}
val projectName = "example-spark"
val common = Seq(
version := "1.0",
scalaVersion := "2.11.7"
)
val customLibraryDependencies = Seq(
"org.apache.spark" %% "spark-core" % Versions.spark % "provided",
"org.apache.spark" %% "spark-sql" % Versions.spark % "provided",
"org.apache.spark" %% "spark-hive" % Versions.spark % "provided",
"org.apache.spark" %% "spark-streaming" % Versions.spark % "provided",
"org.apache.spark" %% "spark-streaming-kafka" % Versions.spark
exclude("log4j", "log4j")
exclude("org.spark-project.spark", "unused"),
"com.typesafe.scala-logging" %% "scala-logging" % "3.1.0",
"org.slf4j" % "slf4j-api" % "1.7.10",
"org.slf4j" % "slf4j-log4j12" % "1.7.10"
exclude("log4j", "log4j"),
"log4j" % "log4j" % "1.2.17" % "provided",
"org.scalatest" %% "scalatest" % "2.2.4" % "test"
)
I have been getting below run time exception., even though i mentioned all the dependencies correctly as shown above.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SQLContext
at example.SparkSqlExample.main(SparkSqlExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SQLContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 6 more
Investigated more on this web.And found that this is mainly due to in-appropriate entries in buld.sbt or version mismatches.But in my case everything looks good as shown above.
Please suggest where did i do wrong here?
I guess this is because you marked your dependencies as "provided", but apparently you (or IDEA) don't provide them.
Try to remove the "provided" option or (my preferred way): move the class with the main method to src/test/scala

Resources