spark sql error when reading data from Avro Table - apache-spark

When I try reading data from an avro table using spark-sql, I am getting this error.
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories(AvroObjectInspectorGenerator.java:142)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:91)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:121)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspector(AvroObjectInspectorGenerator.java:83)
at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.<init>(AvroObjectInspectorGenerator.java:56)
This is my sbt file
val sparkVersion = "2.4.2"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % sparkVersion
)
libraryDependencies += "com.databricks" %% "spark-avro" % "4.0.0"
Do I need to add any dependencies? THe code works fine in hive, but spark is having issues.

Related

SingleStore Spark Connector:NULLPointer Exception while Read/Write Operations

Read/Write Operations are working from spark shell. But throwing NULLPointer Exception when executed from Development IDE locally.
val df = spark.read
.format("singlestore")
.option("ddlEndpoint", "host:port")
.option("user", "xxxxx")
.option("password","xxxxx")
.option("database","xxxxx")
.load("schema.table_name")
I am getting the below error:
Exception in thread "main" java.lang.NullPointerException
at com.singlestore.spark.JdbcHelpers$ConnectionHelpers.withStatement(JdbcHelpers.scala:26)
at com.singlestore.spark.JdbcHelpers$.getSinglestoreVersion(JdbcHelpers.scala:322)
at com.singlestore.spark.SQLGen$SQLGenContext$.getSinglestoreVersion(SQLGen.scala:532)
at com.singlestore.spark.SQLGen$SQLGenContext$.apply(SQLGen.scala:550)
at com.singlestore.spark.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:197)
Below dependencies are set in the sbt.build file.
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.7"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.7"
libraryDependencies += "com.singlestore" % "singlestore-spark-connector_2.11" % "3.1.2-spark-2.4.7"
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "3.0.4"
Could someone please help me to resolve this . Thanks!
It looks like you need to update one of your dependencies in the sbt.build file.
Try replacing:
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "3.0.4"
with
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "2.+"

Spark 2 Connect to HBase

Trying to migrate code from Spark 1.6, Scala 2.10 to Spark 2.4, Scala 2.11.
Cannot get the code to compile. Showing dependency versions, minimal example and compilation error below.
// Dependencies
, "org.apache.spark" %% "spark-core" % "2.4.0"
, "org.apache.spark" %% "spark-sql" % "2.4.0"
, "org.apache.hbase" % "hbase-server" % "1.2.0-cdh5.14.4"
, "org.apache.hbase" % "hbase-common" % "1.2.0-cdh5.14.4"
, "org.apache.hbase" % "hbase-spark" % "1.2.0-cdh5.14.4"
// Minimal example
package spark2.hbase
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.spark.HBaseContext
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
object ConnectToHBase {
def main(args: Array[String]): Unit = {
implicit val spark: SparkSession = SparkSession.builder.appName("Connect to HBase from Spark 2")
.config("spark.master", "local")
.getOrCreate()
implicit val sc: SparkContext = spark.sparkContext
val hbaseConf = HBaseConfiguration.create()
val hbaseContext = new HBaseContext(sc, hbaseConf)
}
}
// Compilation error
[error] missing or invalid dependency detected while loading class file 'HBaseContext.class'.
[error] Could not access type Logging in package org.apache.spark,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'HBaseContext.class' was compiled against an incompatible version of org.apache.spark.
This works:
lazy val sparkVer = "2.4.0-cdh6.2.0"
lazy val hbaseVer = "2.1.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVer
, "org.apache.spark" %% "spark-sql" % sparkVer
, "org.apache.spark" %% "spark-streaming" % sparkVer
, "org.apache.hbase" % "hbase-common" % hbaseVer
, "org.apache.hbase" % "hbase-client" % hbaseVer
, "org.apache.hbase.connectors.spark" % "hbase-spark" % "1.0.0"
)
The essential piece here is using Cloudera CDH 6 (not 5) and using a different version of "hbase-spark" because CDH 5 cannot work with Spark 2.

Spark Session Catalog Failure

I'm reading data in batch from a Cassandra database & also in streaming from Azure EventHubs using Scala Spark API.
session.read
.format("org.apache.spark.sql.cassandra")
.option("keyspace", keyspace)
.option("table", table)
.option("pushdown", pushdown)
.load()
&
session.readStream
.format("eventhubs")
.options(eventHubsConf.toMap)
.load()
Everything was running fine, but now I get this exception out frow nowhere...
User class threw exception: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(Lscala/Function0;Lscala/Function0;Lorg/apache/spark/sql/catalyst/analysis/FunctionRegistry;Lorg/apache/spark/sql/internal/SQLConf;Lorg/apache/hadoop/conf/Configuration;Lorg/apache/spark/sql/catalyst/parser/ParserInterface;Lorg/apache/spark/sql/catalyst/catalog/FunctionResourceLoader;)V
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog$lzycompute(BaseSessionStateBuilder.scala:132)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog(BaseSessionStateBuilder.scala:131)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.<init>(BaseSessionStateBuilder.scala:157)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:157)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:428)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:233)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
I don't know what changed exactly but here is my dependencies :
ThisBuild / scalaVersion := "2.11.11"
val sparkVersion = "2.4.0"
libraryDependencies ++= Seq(
"org.apache.logging.log4j" % "log4j-core" % "2.11.1",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.microsoft.azure" % "azure-eventhubs-spark_2.11" % "2.3.10",
"com.microsoft.azure" % "azure-eventhubs" % "2.3.0",
"com.datastax.spark" %% "spark-cassandra-connector" % "2.4.1",
"org.scala-lang.modules" %% "scala-java8-compat" % "0.9.0",
"com.twitter" % "jsr166e" % "1.1.0",
"com.holdenkarau" %% "spark-testing-base" % "2.4.0_0.12.0" % Test,
"MrPowers" % "spark-fast-tests" % "0.19.2-s_2.11" % Test
)
Anyone have a clue ?
java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init(
scala/Function0;Lscala/Function0;
Lorg/apache/spark/sql/catalyst/analysis/FunctionRegistry;
Lorg/apache/spark/sql/internal/SQLConf;
Lorg/apache/hadoop/conf/Configuration;
Lorg/apache/spark/sql/catalyst/parser/ParserInterface;
Lorg/apache/spark/sql/catalyst/catalog/FunctionResourceLoader;)
Suggests to me that one of the ilbraries was compiled against a version of Spark that is different than the one that is currently on the runtime path. Since the above method signature does match the Spark 2.4.0 signature see
https://github.com/apache/spark/blob/v2.4.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L56-L63
But not the Spark 2.3.0 Signature.
My guess would be there is a runtime Spark 2.3.0 somewhere? Perhaps you are running the application using Spark-Submit from a Spark 2.3.0 install?

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SQLContext

I am using IntelliJ 2016.3 version.
import sbt.Keys._
import sbt._
object ApplicationBuild extends Build {
object Versions {
val spark = "1.6.3"
}
val projectName = "example-spark"
val common = Seq(
version := "1.0",
scalaVersion := "2.11.7"
)
val customLibraryDependencies = Seq(
"org.apache.spark" %% "spark-core" % Versions.spark % "provided",
"org.apache.spark" %% "spark-sql" % Versions.spark % "provided",
"org.apache.spark" %% "spark-hive" % Versions.spark % "provided",
"org.apache.spark" %% "spark-streaming" % Versions.spark % "provided",
"org.apache.spark" %% "spark-streaming-kafka" % Versions.spark
exclude("log4j", "log4j")
exclude("org.spark-project.spark", "unused"),
"com.typesafe.scala-logging" %% "scala-logging" % "3.1.0",
"org.slf4j" % "slf4j-api" % "1.7.10",
"org.slf4j" % "slf4j-log4j12" % "1.7.10"
exclude("log4j", "log4j"),
"log4j" % "log4j" % "1.2.17" % "provided",
"org.scalatest" %% "scalatest" % "2.2.4" % "test"
)
I have been getting below run time exception., even though i mentioned all the dependencies correctly as shown above.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SQLContext
at example.SparkSqlExample.main(SparkSqlExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SQLContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 6 more
Investigated more on this web.And found that this is mainly due to in-appropriate entries in buld.sbt or version mismatches.But in my case everything looks good as shown above.
Please suggest where did i do wrong here?
I guess this is because you marked your dependencies as "provided", but apparently you (or IDEA) don't provide them.
Try to remove the "provided" option or (my preferred way): move the class with the main method to src/test/scala

Spark1.4.0 cannot connect to Hbase 1.1.0.1

it raise the error
Caused by: java.lang.NoSuchMethodError:
org.apache.hadoop.hbase.util.Addressing.getIpAddress()Ljava/net/InetAddress;
while I can success connect to hbase by using spark shell. could anyone knows where the problem is?
the detail error
15/07/01 18:57:57 ERROR yarn.ApplicationMaster: User class threw exception: java.io.IOException: java.lang.reflect.InvocationTargetException
java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
at com.koudai.resys.tmp.HbaseLearning$.main(HbaseLearning.scala:22)
at com.koudai.resys.tmp.HbaseLearning.main(HbaseLearning.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:483)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
... 9 more
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.util.Addressing.getIpAddress()Ljava/net/InetAddress;
at org.apache.hadoop.hbase.client.ClientIdGenerator.getIpAddressBytes(ClientIdGenerator.java:83)
at org.apache.hadoop.hbase.client.ClientIdGenerator.generateClientId(ClientIdGenerator.java:43)
at org.apache.hadoop.hbase.client.PerClientRandomNonceGenerator.<init>(PerClientRandomNonceGenerator.java:37)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:682)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:630)
... 14 more
the sbt config:
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.4.0" % "provided"
libraryDependencies += "org.apache.hbase" % "hbase" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.1.0.1"
libraryDependencies += "org.apache.hbase" % "hbase-hadoop2-compat" % "1.1.0.1"
the running code:
val sc = new SparkConf().setAppName("hbase#user")
val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.property.clientPort", "2181")
conf.set("hbase.zookeeper.quorum", "idc02-rs-sfa-10")
// the error raised from here
val conn = ConnectionFactory.createConnection(conf)
using reflection to list the methods for org.apache.hadoop.hbase.util.Addressing found it is hbase 0.94 version, where could be it from?
parsePort
createHostAndPortStr
createInetSocketAddressFromHostAndPortStr
getIpAddress
getIp4Address
getIp6Address
parseHostname
isLocalAddress
wait
wait
wait
equals
toString
hashCode
getClass
notify
notifyAll
The problems is classpath conflict, somewhere exist hbase-0.94 which conflict with hbase-1.1.0.1
provided way how to identify such problems for others:
way 1: using reflection to identify the class
val methods = new org.apache.hadoop.hbase.util.Addressing().getClass.getMethods
methods.foreach(method=>println(method.getName))
way 2: print classpath on cloud node to debug
def urlses(cl: ClassLoader): Array[java.net.URL] = cl match {
case null => Array()
case u: java.net.URLClassLoader => u.getURLs() ++ urlses(cl.getParent)
case _ => urlses(cl.getParent)
}
val urls = urlses(getClass.getClassLoader)
urls.foreach(url=>println(url.toString))

Resources