MicroBatchExecution Spark Structured Streaming with Kafka 2.4.0

MicroBatchExecution Spark Structured Streaming with Kafka 2.4.0 - apache-spark

When I try to run Spark Structured streaming application with Kafka integration I keep getting this error:
ERROR MicroBatchExecution: Query [id = ff14fce6-71d3-4616-bd2d-40f07a85a74b, runId = 42670f29-21a9-4f7e-abd0-66ead8807282] terminated with error
java.lang.IllegalStateException: No entry found for connection 2147483647
Why does this happen? Could it be some dependencies problem?
My build.sbt file looks like this:
name := "SparkAirflowK8s"
version := "0.1"
scalaVersion := "2.12.7"
val sparkVersion = "2.4.0"
val circeVersion = "0.11.0"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.9.8"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.9.8"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.12" % "2.9.8"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
resolvers += "confluent" at "http://packages.confluent.io/maven/"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion,
"org.apache.kafka" %% "kafka" % "2.1.0",
"org.scalatest" %% "scalatest" % "3.2.0-SNAP10" % "it, test",
"org.scalacheck" %% "scalacheck" % "1.14.0" % "it, test",
"io.kubernetes" % "client-java" % "3.0.0" % "it",
"org.json" % "json" % "20180813",
"io.circe" %% "circe-core" % circeVersion,
"io.circe" %% "circe-generic" % circeVersion,
"io.circe" %% "circe-parser" % circeVersion,
"org.apache.avro" % "avro" % "1.8.2",
"io.confluent" % "kafka-avro-serializer" % "5.0.1"
)
Here is the part of the code:
val sparkConf = new SparkConf()
.setMaster(args(0))
.setAppName("KafkaSparkJob")
val sparkSession = SparkSession
.builder()
.config(sparkConf)
.getOrCreate()
val avroStream = sparkSession.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "topic1")
.load()
val outputExample = avroStream
.writeStream
.outputMode("append")
.format("console")
.start()
outputExample.awaitTermination()

I have changed localhost to the NodePort service defined for Kafka deployment. Now, this exception does not appear.

Related

SingleStore Spark Connector:NULLPointer Exception while Read/Write Operations

Read/Write Operations are working from spark shell. But throwing NULLPointer Exception when executed from Development IDE locally.
val df = spark.read
.format("singlestore")
.option("ddlEndpoint", "host:port")
.option("user", "xxxxx")
.option("password","xxxxx")
.option("database","xxxxx")
.load("schema.table_name")
I am getting the below error:
Exception in thread "main" java.lang.NullPointerException
at com.singlestore.spark.JdbcHelpers$ConnectionHelpers.withStatement(JdbcHelpers.scala:26)
at com.singlestore.spark.JdbcHelpers$.getSinglestoreVersion(JdbcHelpers.scala:322)
at com.singlestore.spark.SQLGen$SQLGenContext$.getSinglestoreVersion(SQLGen.scala:532)
at com.singlestore.spark.SQLGen$SQLGenContext$.apply(SQLGen.scala:550)
at com.singlestore.spark.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:197)
Below dependencies are set in the sbt.build file.
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.7"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.7"
libraryDependencies += "com.singlestore" % "singlestore-spark-connector_2.11" % "3.1.2-spark-2.4.7"
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "3.0.4"
Could someone please help me to resolve this . Thanks!

It looks like you need to update one of your dependencies in the sbt.build file.
Try replacing:
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "3.0.4"
with
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "2.+"

Spark 2 Connect to HBase

Trying to migrate code from Spark 1.6, Scala 2.10 to Spark 2.4, Scala 2.11.
Cannot get the code to compile. Showing dependency versions, minimal example and compilation error below.
// Dependencies
, "org.apache.spark" %% "spark-core" % "2.4.0"
, "org.apache.spark" %% "spark-sql" % "2.4.0"
, "org.apache.hbase" % "hbase-server" % "1.2.0-cdh5.14.4"
, "org.apache.hbase" % "hbase-common" % "1.2.0-cdh5.14.4"
, "org.apache.hbase" % "hbase-spark" % "1.2.0-cdh5.14.4"
// Minimal example
package spark2.hbase
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.spark.HBaseContext
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
object ConnectToHBase {
def main(args: Array[String]): Unit = {
implicit val spark: SparkSession = SparkSession.builder.appName("Connect to HBase from Spark 2")
.config("spark.master", "local")
.getOrCreate()
implicit val sc: SparkContext = spark.sparkContext
val hbaseConf = HBaseConfiguration.create()
val hbaseContext = new HBaseContext(sc, hbaseConf)
}
}
// Compilation error
[error] missing or invalid dependency detected while loading class file 'HBaseContext.class'.
[error] Could not access type Logging in package org.apache.spark,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'HBaseContext.class' was compiled against an incompatible version of org.apache.spark.

This works:
lazy val sparkVer = "2.4.0-cdh6.2.0"
lazy val hbaseVer = "2.1.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVer
, "org.apache.spark" %% "spark-sql" % sparkVer
, "org.apache.spark" %% "spark-streaming" % sparkVer
, "org.apache.hbase" % "hbase-common" % hbaseVer
, "org.apache.hbase" % "hbase-client" % hbaseVer
, "org.apache.hbase.connectors.spark" % "hbase-spark" % "1.0.0"
)
The essential piece here is using Cloudera CDH 6 (not 5) and using a different version of "hbase-spark" because CDH 5 cannot work with Spark 2.

Spark Session Catalog Failure

I'm reading data in batch from a Cassandra database & also in streaming from Azure EventHubs using Scala Spark API.
session.read
.format("org.apache.spark.sql.cassandra")
.option("keyspace", keyspace)
.option("table", table)
.option("pushdown", pushdown)
.load()
&
session.readStream
.format("eventhubs")
.options(eventHubsConf.toMap)
.load()
Everything was running fine, but now I get this exception out frow nowhere...
User class threw exception: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(Lscala/Function0;Lscala/Function0;Lorg/apache/spark/sql/catalyst/analysis/FunctionRegistry;Lorg/apache/spark/sql/internal/SQLConf;Lorg/apache/hadoop/conf/Configuration;Lorg/apache/spark/sql/catalyst/parser/ParserInterface;Lorg/apache/spark/sql/catalyst/catalog/FunctionResourceLoader;)V
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog$lzycompute(BaseSessionStateBuilder.scala:132)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog(BaseSessionStateBuilder.scala:131)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.<init>(BaseSessionStateBuilder.scala:157)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:157)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:428)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:233)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
I don't know what changed exactly but here is my dependencies :
ThisBuild / scalaVersion := "2.11.11"
val sparkVersion = "2.4.0"
libraryDependencies ++= Seq(
"org.apache.logging.log4j" % "log4j-core" % "2.11.1",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.microsoft.azure" % "azure-eventhubs-spark_2.11" % "2.3.10",
"com.microsoft.azure" % "azure-eventhubs" % "2.3.0",
"com.datastax.spark" %% "spark-cassandra-connector" % "2.4.1",
"org.scala-lang.modules" %% "scala-java8-compat" % "0.9.0",
"com.twitter" % "jsr166e" % "1.1.0",
"com.holdenkarau" %% "spark-testing-base" % "2.4.0_0.12.0" % Test,
"MrPowers" % "spark-fast-tests" % "0.19.2-s_2.11" % Test
)
Anyone have a clue ?

java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init(
scala/Function0;Lscala/Function0;
Lorg/apache/spark/sql/catalyst/analysis/FunctionRegistry;
Lorg/apache/spark/sql/internal/SQLConf;
Lorg/apache/hadoop/conf/Configuration;
Lorg/apache/spark/sql/catalyst/parser/ParserInterface;
Lorg/apache/spark/sql/catalyst/catalog/FunctionResourceLoader;)
Suggests to me that one of the ilbraries was compiled against a version of Spark that is different than the one that is currently on the runtime path. Since the above method signature does match the Spark 2.4.0 signature see
https://github.com/apache/spark/blob/v2.4.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L56-L63
But not the Spark 2.3.0 Signature.
My guess would be there is a runtime Spark 2.3.0 somewhere? Perhaps you are running the application using Spark-Submit from a Spark 2.3.0 install?

SBT file for Spark Kafka

I am new to SBT. I am trying to create a project with a simple producer and consumer using spark and scala. Do I need to add anything else in this SBT file? Using IDEA Intellij. Spark 2.2, CDH 5.10, Kafka 0.10
import sbt.Keys._
import sbt._
name := "consumer"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0.cloudera1"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.2.0.cloudera1"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.2.0.cloudera1"
resolvers ++= Vector(
"Cloudera repo" at "https://repository.cloudera.com/artifactory/cloudera-repos/"
)

How to use Test Jars for spark in SBT

I am creating Spark 2.0.1 project and want to use Spark test-jars in my SBT Project.
build.sbt:
scalaVersion := "2.11.0"
val sparkVersion = "2.0.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "compile",
"org.apache.spark" %% "spark-sql" % sparkVersion % "compile",
"org.scalatest" %% "scalatest" % "2.2.6" % "test",
"org.apache.spark" %% "spark-core" % sparkVersion % "test" classifier "tests",
"org.apache.spark" %% "spark-sql" % sparkVersion % "test" classifier "tests",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % "test" classifier "tests"
)
My Test code:
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.SharedSQLContext
class LoaderTest extends org.apache.spark.sql.QueryTest with SharedSQLContext {
import testImplicits._
test("function current_date") {
val df1 = Seq((1, 2), (3, 1)).toDF("a", "b")
// Rest of test code and assertion using checkAnswer method
}
}
But when i try to run test using:
sbt clean test
It get following errors:
[info] Compiling 1 Scala source to /tstprg/test/target/scala-2.11/test-classes...
[error] bad symbolic reference to org.apache.spark.sql.catalyst.expressions.PredicateHelper encountered in class file 'PlanTest.class'.
[error] Cannot access type PredicateHelper in package org.apache.spark.sql.catalyst.expressions. The current classpath may be
[error] missing a definition for org.apache.spark.sql.catalyst.expressions.PredicateHelper, or PlanTest.class may have been compiled against a version that's
[error] incompatible with the one found on the current classpath.
[error] /tstprg/test/src/test/scala/facts/LoaderTest.scala:7: illegal inheritance;
[error] self-type facts.LoaderTest does not conform to org.apache.spark.sql.QueryTest's selftype org.apache.spark.sql.QueryTest
[error] class LoaderTest extends org.apache.spark.sql.QueryTest with SharedSQLContext {
[error] ^
[error] /tstprg/test/src/test/scala/facts/LoaderTest.scala:7: illegal inheritance;
[error] self-type facts.LoaderTest does not conform to org.apache.spark.sql.test.SharedSQLContext's selftype org.apache.spark.sql.test.SharedSQLContext
[error] class LoaderTest extends org.apache.spark.sql.QueryTest with SharedSQLContext {
[error] ^
[error] bad symbolic reference to org.apache.spark.sql.Encoder encountered in class file 'SQLImplicits.class'.
[error] Cannot access type Encoder in package org.apache.spark.sql. The current classpath may be
[error] missing a definition for org.apache.spark.sql.Encoder, or SQLImplicits.class may have been compiled against a version that's
[error] incompatible with the one found on the current classpath.
[error] /tstprg/test/src/test/scala/facts/LoaderTest.scala:11: bad symbolic reference to org.apache.spark.sql.catalyst.plans.logical encountered in class file 'SQLTestUtils.class'.
[error] Cannot access term logical in package org.apache.spark.sql.catalyst.plans. The current classpath may be
[error] missing a definition for org.apache.spark.sql.catalyst.plans.logical, or SQLTestUtils.class may have been compiled against a version that's
[error] incompatible with the one found on the current classpath.
[error] val df1 = Seq((1, 2), (3, 1)).toDF("a", "b")
[error] ^
[error] 5 errors found
[error] (test:compileIncremental) Compilation failed
Can anybody who tried using test-jars of spark to unit test using SBT help what i am missing?
Note: This test works fine when I run through IntelliJ IDE.

Try to use the scope as mentioned below
version := "0.1"
scalaVersion := "2.11.11"
val sparkVersion = "2.3.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % Provided,
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "test-sources",
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "test-sources",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "test-sources",
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.0",
"org.scalatest" %% "scalatest" % "3.0.4" % "test",
"org.typelevel" %% "cats-core" % "1.1.0",
"org.typelevel" %% "cats-effect" % "1.0.0-RC2",
"org.apache.spark" %% "spark-streaming" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion % Provided exclude ("net.jpountz.lz4", "lz4"),
"com.pusher" % "pusher-java-client" % "1.8.0") ```

Try changing scope of your dependencies marked as test like below
scalaVersion := "2.11.0"
val sparkVersion = "2.0.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.scalatest" %% "scalatest" % "2.2.6",
"org.apache.spark" %% "spark-core" % sparkVersion ,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-catalyst" % sparkVersion
)
or adding "compile".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

MicroBatchExecution Spark Structured Streaming with Kafka 2.4.0 - apache-spark

I have changed localhost to the NodePort service defined for Kafka deployment. Now, this exception does not appear.

Related

SingleStore Spark Connector:NULLPointer Exception while Read/Write Operations

Spark 2 Connect to HBase

Spark Session Catalog Failure

SBT file for Spark Kafka

How to use Test Jars for spark in SBT

Categories

Resources