Spark Session Catalog Failure

Spark Session Catalog Failure - apache-spark

I'm reading data in batch from a Cassandra database & also in streaming from Azure EventHubs using Scala Spark API.
session.read
.format("org.apache.spark.sql.cassandra")
.option("keyspace", keyspace)
.option("table", table)
.option("pushdown", pushdown)
.load()
&
session.readStream
.format("eventhubs")
.options(eventHubsConf.toMap)
.load()
Everything was running fine, but now I get this exception out frow nowhere...
User class threw exception: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(Lscala/Function0;Lscala/Function0;Lorg/apache/spark/sql/catalyst/analysis/FunctionRegistry;Lorg/apache/spark/sql/internal/SQLConf;Lorg/apache/hadoop/conf/Configuration;Lorg/apache/spark/sql/catalyst/parser/ParserInterface;Lorg/apache/spark/sql/catalyst/catalog/FunctionResourceLoader;)V
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog$lzycompute(BaseSessionStateBuilder.scala:132)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog(BaseSessionStateBuilder.scala:131)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.<init>(BaseSessionStateBuilder.scala:157)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:157)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:428)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:233)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
I don't know what changed exactly but here is my dependencies :
ThisBuild / scalaVersion := "2.11.11"
val sparkVersion = "2.4.0"
libraryDependencies ++= Seq(
"org.apache.logging.log4j" % "log4j-core" % "2.11.1",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.microsoft.azure" % "azure-eventhubs-spark_2.11" % "2.3.10",
"com.microsoft.azure" % "azure-eventhubs" % "2.3.0",
"com.datastax.spark" %% "spark-cassandra-connector" % "2.4.1",
"org.scala-lang.modules" %% "scala-java8-compat" % "0.9.0",
"com.twitter" % "jsr166e" % "1.1.0",
"com.holdenkarau" %% "spark-testing-base" % "2.4.0_0.12.0" % Test,
"MrPowers" % "spark-fast-tests" % "0.19.2-s_2.11" % Test
)
Anyone have a clue ?

java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init(
scala/Function0;Lscala/Function0;
Lorg/apache/spark/sql/catalyst/analysis/FunctionRegistry;
Lorg/apache/spark/sql/internal/SQLConf;
Lorg/apache/hadoop/conf/Configuration;
Lorg/apache/spark/sql/catalyst/parser/ParserInterface;
Lorg/apache/spark/sql/catalyst/catalog/FunctionResourceLoader;)
Suggests to me that one of the ilbraries was compiled against a version of Spark that is different than the one that is currently on the runtime path. Since the above method signature does match the Spark 2.4.0 signature see
https://github.com/apache/spark/blob/v2.4.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L56-L63
But not the Spark 2.3.0 Signature.
My guess would be there is a runtime Spark 2.3.0 somewhere? Perhaps you are running the application using Spark-Submit from a Spark 2.3.0 install?

Related

SingleStore Spark Connector:NULLPointer Exception while Read/Write Operations

Read/Write Operations are working from spark shell. But throwing NULLPointer Exception when executed from Development IDE locally.
val df = spark.read
.format("singlestore")
.option("ddlEndpoint", "host:port")
.option("user", "xxxxx")
.option("password","xxxxx")
.option("database","xxxxx")
.load("schema.table_name")
I am getting the below error:
Exception in thread "main" java.lang.NullPointerException
at com.singlestore.spark.JdbcHelpers$ConnectionHelpers.withStatement(JdbcHelpers.scala:26)
at com.singlestore.spark.JdbcHelpers$.getSinglestoreVersion(JdbcHelpers.scala:322)
at com.singlestore.spark.SQLGen$SQLGenContext$.getSinglestoreVersion(SQLGen.scala:532)
at com.singlestore.spark.SQLGen$SQLGenContext$.apply(SQLGen.scala:550)
at com.singlestore.spark.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:197)
Below dependencies are set in the sbt.build file.
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.7"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.7"
libraryDependencies += "com.singlestore" % "singlestore-spark-connector_2.11" % "3.1.2-spark-2.4.7"
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "3.0.4"
Could someone please help me to resolve this . Thanks!

It looks like you need to update one of your dependencies in the sbt.build file.
Try replacing:
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "3.0.4"
with
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "2.+"

MicroBatchExecution Spark Structured Streaming with Kafka 2.4.0

When I try to run Spark Structured streaming application with Kafka integration I keep getting this error:
ERROR MicroBatchExecution: Query [id = ff14fce6-71d3-4616-bd2d-40f07a85a74b, runId = 42670f29-21a9-4f7e-abd0-66ead8807282] terminated with error
java.lang.IllegalStateException: No entry found for connection 2147483647
Why does this happen? Could it be some dependencies problem?
My build.sbt file looks like this:
name := "SparkAirflowK8s"
version := "0.1"
scalaVersion := "2.12.7"
val sparkVersion = "2.4.0"
val circeVersion = "0.11.0"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.9.8"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.9.8"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.12" % "2.9.8"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
resolvers += "confluent" at "http://packages.confluent.io/maven/"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion,
"org.apache.kafka" %% "kafka" % "2.1.0",
"org.scalatest" %% "scalatest" % "3.2.0-SNAP10" % "it, test",
"org.scalacheck" %% "scalacheck" % "1.14.0" % "it, test",
"io.kubernetes" % "client-java" % "3.0.0" % "it",
"org.json" % "json" % "20180813",
"io.circe" %% "circe-core" % circeVersion,
"io.circe" %% "circe-generic" % circeVersion,
"io.circe" %% "circe-parser" % circeVersion,
"org.apache.avro" % "avro" % "1.8.2",
"io.confluent" % "kafka-avro-serializer" % "5.0.1"
)
Here is the part of the code:
val sparkConf = new SparkConf()
.setMaster(args(0))
.setAppName("KafkaSparkJob")
val sparkSession = SparkSession
.builder()
.config(sparkConf)
.getOrCreate()
val avroStream = sparkSession.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "topic1")
.load()
val outputExample = avroStream
.writeStream
.outputMode("append")
.format("console")
.start()
outputExample.awaitTermination()

I have changed localhost to the NodePort service defined for Kafka deployment. Now, this exception does not appear.

SBT file for Spark Kafka

I am new to SBT. I am trying to create a project with a simple producer and consumer using spark and scala. Do I need to add anything else in this SBT file? Using IDEA Intellij. Spark 2.2, CDH 5.10, Kafka 0.10
import sbt.Keys._
import sbt._
name := "consumer"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0.cloudera1"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.2.0.cloudera1"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.2.0.cloudera1"
resolvers ++= Vector(
"Cloudera repo" at "https://repository.cloudera.com/artifactory/cloudera-repos/"
)

How to use Test Jars for spark in SBT

I am creating Spark 2.0.1 project and want to use Spark test-jars in my SBT Project.
build.sbt:
scalaVersion := "2.11.0"
val sparkVersion = "2.0.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "compile",
"org.apache.spark" %% "spark-sql" % sparkVersion % "compile",
"org.scalatest" %% "scalatest" % "2.2.6" % "test",
"org.apache.spark" %% "spark-core" % sparkVersion % "test" classifier "tests",
"org.apache.spark" %% "spark-sql" % sparkVersion % "test" classifier "tests",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % "test" classifier "tests"
)
My Test code:
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.SharedSQLContext
class LoaderTest extends org.apache.spark.sql.QueryTest with SharedSQLContext {
import testImplicits._
test("function current_date") {
val df1 = Seq((1, 2), (3, 1)).toDF("a", "b")
// Rest of test code and assertion using checkAnswer method
}
}
But when i try to run test using:
sbt clean test
It get following errors:
[info] Compiling 1 Scala source to /tstprg/test/target/scala-2.11/test-classes...
[error] bad symbolic reference to org.apache.spark.sql.catalyst.expressions.PredicateHelper encountered in class file 'PlanTest.class'.
[error] Cannot access type PredicateHelper in package org.apache.spark.sql.catalyst.expressions. The current classpath may be
[error] missing a definition for org.apache.spark.sql.catalyst.expressions.PredicateHelper, or PlanTest.class may have been compiled against a version that's
[error] incompatible with the one found on the current classpath.
[error] /tstprg/test/src/test/scala/facts/LoaderTest.scala:7: illegal inheritance;
[error] self-type facts.LoaderTest does not conform to org.apache.spark.sql.QueryTest's selftype org.apache.spark.sql.QueryTest
[error] class LoaderTest extends org.apache.spark.sql.QueryTest with SharedSQLContext {
[error] ^
[error] /tstprg/test/src/test/scala/facts/LoaderTest.scala:7: illegal inheritance;
[error] self-type facts.LoaderTest does not conform to org.apache.spark.sql.test.SharedSQLContext's selftype org.apache.spark.sql.test.SharedSQLContext
[error] class LoaderTest extends org.apache.spark.sql.QueryTest with SharedSQLContext {
[error] ^
[error] bad symbolic reference to org.apache.spark.sql.Encoder encountered in class file 'SQLImplicits.class'.
[error] Cannot access type Encoder in package org.apache.spark.sql. The current classpath may be
[error] missing a definition for org.apache.spark.sql.Encoder, or SQLImplicits.class may have been compiled against a version that's
[error] incompatible with the one found on the current classpath.
[error] /tstprg/test/src/test/scala/facts/LoaderTest.scala:11: bad symbolic reference to org.apache.spark.sql.catalyst.plans.logical encountered in class file 'SQLTestUtils.class'.
[error] Cannot access term logical in package org.apache.spark.sql.catalyst.plans. The current classpath may be
[error] missing a definition for org.apache.spark.sql.catalyst.plans.logical, or SQLTestUtils.class may have been compiled against a version that's
[error] incompatible with the one found on the current classpath.
[error] val df1 = Seq((1, 2), (3, 1)).toDF("a", "b")
[error] ^
[error] 5 errors found
[error] (test:compileIncremental) Compilation failed
Can anybody who tried using test-jars of spark to unit test using SBT help what i am missing?
Note: This test works fine when I run through IntelliJ IDE.

Try to use the scope as mentioned below
version := "0.1"
scalaVersion := "2.11.11"
val sparkVersion = "2.3.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % Provided,
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "test-sources",
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "test-sources",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "test-sources",
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.0",
"org.scalatest" %% "scalatest" % "3.0.4" % "test",
"org.typelevel" %% "cats-core" % "1.1.0",
"org.typelevel" %% "cats-effect" % "1.0.0-RC2",
"org.apache.spark" %% "spark-streaming" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion % Provided exclude ("net.jpountz.lz4", "lz4"),
"com.pusher" % "pusher-java-client" % "1.8.0") ```

Try changing scope of your dependencies marked as test like below
scalaVersion := "2.11.0"
val sparkVersion = "2.0.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.scalatest" %% "scalatest" % "2.2.6",
"org.apache.spark" %% "spark-core" % sparkVersion ,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-catalyst" % sparkVersion
)
or adding "compile".

value saveToCassandra is not a member of org.apache.spark.streaming.dstream.DStream[(String, Int)]

I'm new to Spark, I'm getting this error when i try to save data to cassandra.
I have imported: StreamingContext._ and SparkContext._, but still get the error.
These are the dependencies I'm using:
"org.apache.spark" %% "spark-core" % "1.5.2",
"org.apache.spark" %% "spark-streaming" % "1.5.2",
"com.datastax.spark" %% "spark-cassandra-connector" % "1.5.0",
"org.apache.spark" %% "spark-sql" % "1.5.2"
Thank you

To be able to use saveToCassandra on a DStream you have to import DStreamFunctions for example with:
import com.datastax.spark.connector.streaming._

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Spark Session Catalog Failure - apache-spark

Related

SingleStore Spark Connector:NULLPointer Exception while Read/Write Operations

MicroBatchExecution Spark Structured Streaming with Kafka 2.4.0

SBT file for Spark Kafka

How to use Test Jars for spark in SBT

value saveToCassandra is not a member of org.apache.spark.streaming.dstream.DStream[(String, Int)]

Categories

Resources