Unable to run spark-testing-base test with Spark v3.2.1 - apache-spark

I was trying to run Spark UT with spark-testing-base and scalatest, getting following exceptions:
[error] sbt.ForkMain$ForkError: java.lang.IncompatibleClassChangeError: Expected instance not static method org.scalatest.Assertions.assertionsHelper()Lorg/scalatest/Assertions$AssertionsHelper;
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.verifyOutput(StreamingSuiteBase.scala:77)
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.verifyOutput$(StreamingSuiteBase.scala:61)
[error] at com.central.spark.aggregation.streaming.BaseAggregatorSuite.verifyOutput(BaseAggregatorSuite.scala:23)
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.$anonfun$testOperation$1(StreamingSuiteBase.scala:162)
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.$anonfun$testOperation$1$adapted(StreamingSuiteBase.scala:158)
[error] at com.holdenkarau.spark.testing.StreamingSuiteCommon.withOutputAndStreamingContext(StreamingSuiteCommon.scala:122)
[error] at com.holdenkarau.spark.testing.StreamingSuiteCommon.withOutputAndStreamingContext$(StreamingSuiteCommon.scala:114)
[error] at com.central.spark.aggregation.streaming.BaseAggregatorSuite.withOutputAndStreamingContext(BaseAggregatorSuite.scala:23)
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.testOperation(StreamingSuiteBase.scala:158)
[error] at com.holdenkarau.spark.testing.StreamingSuiteBase.testOperation$(StreamingSuiteBase.scala:149)
[error] at com.central.spark.aggregation.streaming.BaseAggregatorSuite.testOperation(BaseAggregatorSuite.scala:23)
[error] at com.central.spark.aggregation.streaming.BaseAggregatorSuite.$anonfun$new$1(BaseAggregatorSuite.scala:89)
[error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) (truc)
The dependency and their versions are:
lazy val coreTestDeps = Seq(
"org.mockito" % "mockito-all" % 1.10.19 % "test",
"org.scalatest" %% "scalatest" % 3.2.12 % "it,test",
"net.sf.opencsv" % "opencsv" % 2.3 % "test",
"org.json4s" %% "json4s-native" % 3.7.0-M11 % "it,test",
"org.json4s" %% "json4s-jackson" % 3.7.0-M11 % "it,test",
"org.apache.spark" %% "spark-streaming" % 3.2.1 % "provided" classifier "tests",
"org.apache.spark" %% "spark-core" % 3.2.1 % "provided" classifier "tests",
"com.holdenkarau" %% "spark-testing-base" % "3.2.0_1.1.1" % "test",
"org.elasticsearch.client" % "elasticsearch-rest-high-level-client" % 7.9.3 % "it,test"
)
I tried downgrading the org.scalatest to 3.0.9 and other version, But it is not working. Have followings in build.sbt as suggested at spark-testing-base repo:
scalaVersion := "2.12.15",
Test / parallelExecution := false,
Test / fork := true,
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MetaspaceSize=2048M", "-XX:+CMSClassUnloadingEnabled")
Any suggestion on how should i proceed?
Appreciate any help.

Related

SingleStore Spark Connector:NULLPointer Exception while Read/Write Operations

Read/Write Operations are working from spark shell. But throwing NULLPointer Exception when executed from Development IDE locally.
val df = spark.read
.format("singlestore")
.option("ddlEndpoint", "host:port")
.option("user", "xxxxx")
.option("password","xxxxx")
.option("database","xxxxx")
.load("schema.table_name")
I am getting the below error:
Exception in thread "main" java.lang.NullPointerException
at com.singlestore.spark.JdbcHelpers$ConnectionHelpers.withStatement(JdbcHelpers.scala:26)
at com.singlestore.spark.JdbcHelpers$.getSinglestoreVersion(JdbcHelpers.scala:322)
at com.singlestore.spark.SQLGen$SQLGenContext$.getSinglestoreVersion(SQLGen.scala:532)
at com.singlestore.spark.SQLGen$SQLGenContext$.apply(SQLGen.scala:550)
at com.singlestore.spark.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:332)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:197)
Below dependencies are set in the sbt.build file.
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.7"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.7"
libraryDependencies += "com.singlestore" % "singlestore-spark-connector_2.11" % "3.1.2-spark-2.4.7"
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "3.0.4"
Could someone please help me to resolve this . Thanks!
It looks like you need to update one of your dependencies in the sbt.build file.
Try replacing:
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "3.0.4"
with
libraryDependencies += "org.mariadb.jdbc" % "mariadb-java-client" % "2.+"

Spark 2 Connect to HBase

Trying to migrate code from Spark 1.6, Scala 2.10 to Spark 2.4, Scala 2.11.
Cannot get the code to compile. Showing dependency versions, minimal example and compilation error below.
// Dependencies
, "org.apache.spark" %% "spark-core" % "2.4.0"
, "org.apache.spark" %% "spark-sql" % "2.4.0"
, "org.apache.hbase" % "hbase-server" % "1.2.0-cdh5.14.4"
, "org.apache.hbase" % "hbase-common" % "1.2.0-cdh5.14.4"
, "org.apache.hbase" % "hbase-spark" % "1.2.0-cdh5.14.4"
// Minimal example
package spark2.hbase
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.spark.HBaseContext
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
object ConnectToHBase {
def main(args: Array[String]): Unit = {
implicit val spark: SparkSession = SparkSession.builder.appName("Connect to HBase from Spark 2")
.config("spark.master", "local")
.getOrCreate()
implicit val sc: SparkContext = spark.sparkContext
val hbaseConf = HBaseConfiguration.create()
val hbaseContext = new HBaseContext(sc, hbaseConf)
}
}
// Compilation error
[error] missing or invalid dependency detected while loading class file 'HBaseContext.class'.
[error] Could not access type Logging in package org.apache.spark,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'HBaseContext.class' was compiled against an incompatible version of org.apache.spark.
This works:
lazy val sparkVer = "2.4.0-cdh6.2.0"
lazy val hbaseVer = "2.1.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVer
, "org.apache.spark" %% "spark-sql" % sparkVer
, "org.apache.spark" %% "spark-streaming" % sparkVer
, "org.apache.hbase" % "hbase-common" % hbaseVer
, "org.apache.hbase" % "hbase-client" % hbaseVer
, "org.apache.hbase.connectors.spark" % "hbase-spark" % "1.0.0"
)
The essential piece here is using Cloudera CDH 6 (not 5) and using a different version of "hbase-spark" because CDH 5 cannot work with Spark 2.

Spark Session Catalog Failure

I'm reading data in batch from a Cassandra database & also in streaming from Azure EventHubs using Scala Spark API.
session.read
.format("org.apache.spark.sql.cassandra")
.option("keyspace", keyspace)
.option("table", table)
.option("pushdown", pushdown)
.load()
&
session.readStream
.format("eventhubs")
.options(eventHubsConf.toMap)
.load()
Everything was running fine, but now I get this exception out frow nowhere...
User class threw exception: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(Lscala/Function0;Lscala/Function0;Lorg/apache/spark/sql/catalyst/analysis/FunctionRegistry;Lorg/apache/spark/sql/internal/SQLConf;Lorg/apache/hadoop/conf/Configuration;Lorg/apache/spark/sql/catalyst/parser/ParserInterface;Lorg/apache/spark/sql/catalyst/catalog/FunctionResourceLoader;)V
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog$lzycompute(BaseSessionStateBuilder.scala:132)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog(BaseSessionStateBuilder.scala:131)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.<init>(BaseSessionStateBuilder.scala:157)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:157)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:428)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:233)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
I don't know what changed exactly but here is my dependencies :
ThisBuild / scalaVersion := "2.11.11"
val sparkVersion = "2.4.0"
libraryDependencies ++= Seq(
"org.apache.logging.log4j" % "log4j-core" % "2.11.1",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.microsoft.azure" % "azure-eventhubs-spark_2.11" % "2.3.10",
"com.microsoft.azure" % "azure-eventhubs" % "2.3.0",
"com.datastax.spark" %% "spark-cassandra-connector" % "2.4.1",
"org.scala-lang.modules" %% "scala-java8-compat" % "0.9.0",
"com.twitter" % "jsr166e" % "1.1.0",
"com.holdenkarau" %% "spark-testing-base" % "2.4.0_0.12.0" % Test,
"MrPowers" % "spark-fast-tests" % "0.19.2-s_2.11" % Test
)
Anyone have a clue ?
java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init(
scala/Function0;Lscala/Function0;
Lorg/apache/spark/sql/catalyst/analysis/FunctionRegistry;
Lorg/apache/spark/sql/internal/SQLConf;
Lorg/apache/hadoop/conf/Configuration;
Lorg/apache/spark/sql/catalyst/parser/ParserInterface;
Lorg/apache/spark/sql/catalyst/catalog/FunctionResourceLoader;)
Suggests to me that one of the ilbraries was compiled against a version of Spark that is different than the one that is currently on the runtime path. Since the above method signature does match the Spark 2.4.0 signature see
https://github.com/apache/spark/blob/v2.4.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L56-L63
But not the Spark 2.3.0 Signature.
My guess would be there is a runtime Spark 2.3.0 somewhere? Perhaps you are running the application using Spark-Submit from a Spark 2.3.0 install?

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SQLContext

I am using IntelliJ 2016.3 version.
import sbt.Keys._
import sbt._
object ApplicationBuild extends Build {
object Versions {
val spark = "1.6.3"
}
val projectName = "example-spark"
val common = Seq(
version := "1.0",
scalaVersion := "2.11.7"
)
val customLibraryDependencies = Seq(
"org.apache.spark" %% "spark-core" % Versions.spark % "provided",
"org.apache.spark" %% "spark-sql" % Versions.spark % "provided",
"org.apache.spark" %% "spark-hive" % Versions.spark % "provided",
"org.apache.spark" %% "spark-streaming" % Versions.spark % "provided",
"org.apache.spark" %% "spark-streaming-kafka" % Versions.spark
exclude("log4j", "log4j")
exclude("org.spark-project.spark", "unused"),
"com.typesafe.scala-logging" %% "scala-logging" % "3.1.0",
"org.slf4j" % "slf4j-api" % "1.7.10",
"org.slf4j" % "slf4j-log4j12" % "1.7.10"
exclude("log4j", "log4j"),
"log4j" % "log4j" % "1.2.17" % "provided",
"org.scalatest" %% "scalatest" % "2.2.4" % "test"
)
I have been getting below run time exception., even though i mentioned all the dependencies correctly as shown above.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SQLContext
at example.SparkSqlExample.main(SparkSqlExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SQLContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 6 more
Investigated more on this web.And found that this is mainly due to in-appropriate entries in buld.sbt or version mismatches.But in my case everything looks good as shown above.
Please suggest where did i do wrong here?
I guess this is because you marked your dependencies as "provided", but apparently you (or IDEA) don't provide them.
Try to remove the "provided" option or (my preferred way): move the class with the main method to src/test/scala

How to use Test Jars for spark in SBT

I am creating Spark 2.0.1 project and want to use Spark test-jars in my SBT Project.
build.sbt:
scalaVersion := "2.11.0"
val sparkVersion = "2.0.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "compile",
"org.apache.spark" %% "spark-sql" % sparkVersion % "compile",
"org.scalatest" %% "scalatest" % "2.2.6" % "test",
"org.apache.spark" %% "spark-core" % sparkVersion % "test" classifier "tests",
"org.apache.spark" %% "spark-sql" % sparkVersion % "test" classifier "tests",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % "test" classifier "tests"
)
My Test code:
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.SharedSQLContext
class LoaderTest extends org.apache.spark.sql.QueryTest with SharedSQLContext {
import testImplicits._
test("function current_date") {
val df1 = Seq((1, 2), (3, 1)).toDF("a", "b")
// Rest of test code and assertion using checkAnswer method
}
}
But when i try to run test using:
sbt clean test
It get following errors:
[info] Compiling 1 Scala source to /tstprg/test/target/scala-2.11/test-classes...
[error] bad symbolic reference to org.apache.spark.sql.catalyst.expressions.PredicateHelper encountered in class file 'PlanTest.class'.
[error] Cannot access type PredicateHelper in package org.apache.spark.sql.catalyst.expressions. The current classpath may be
[error] missing a definition for org.apache.spark.sql.catalyst.expressions.PredicateHelper, or PlanTest.class may have been compiled against a version that's
[error] incompatible with the one found on the current classpath.
[error] /tstprg/test/src/test/scala/facts/LoaderTest.scala:7: illegal inheritance;
[error] self-type facts.LoaderTest does not conform to org.apache.spark.sql.QueryTest's selftype org.apache.spark.sql.QueryTest
[error] class LoaderTest extends org.apache.spark.sql.QueryTest with SharedSQLContext {
[error] ^
[error] /tstprg/test/src/test/scala/facts/LoaderTest.scala:7: illegal inheritance;
[error] self-type facts.LoaderTest does not conform to org.apache.spark.sql.test.SharedSQLContext's selftype org.apache.spark.sql.test.SharedSQLContext
[error] class LoaderTest extends org.apache.spark.sql.QueryTest with SharedSQLContext {
[error] ^
[error] bad symbolic reference to org.apache.spark.sql.Encoder encountered in class file 'SQLImplicits.class'.
[error] Cannot access type Encoder in package org.apache.spark.sql. The current classpath may be
[error] missing a definition for org.apache.spark.sql.Encoder, or SQLImplicits.class may have been compiled against a version that's
[error] incompatible with the one found on the current classpath.
[error] /tstprg/test/src/test/scala/facts/LoaderTest.scala:11: bad symbolic reference to org.apache.spark.sql.catalyst.plans.logical encountered in class file 'SQLTestUtils.class'.
[error] Cannot access term logical in package org.apache.spark.sql.catalyst.plans. The current classpath may be
[error] missing a definition for org.apache.spark.sql.catalyst.plans.logical, or SQLTestUtils.class may have been compiled against a version that's
[error] incompatible with the one found on the current classpath.
[error] val df1 = Seq((1, 2), (3, 1)).toDF("a", "b")
[error] ^
[error] 5 errors found
[error] (test:compileIncremental) Compilation failed
Can anybody who tried using test-jars of spark to unit test using SBT help what i am missing?
Note: This test works fine when I run through IntelliJ IDE.
Try to use the scope as mentioned below
version := "0.1"
scalaVersion := "2.11.11"
val sparkVersion = "2.3.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % Provided,
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "test-sources",
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "test-sources",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "test-sources",
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.0",
"org.scalatest" %% "scalatest" % "3.0.4" % "test",
"org.typelevel" %% "cats-core" % "1.1.0",
"org.typelevel" %% "cats-effect" % "1.0.0-RC2",
"org.apache.spark" %% "spark-streaming" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion % Provided exclude ("net.jpountz.lz4", "lz4"),
"com.pusher" % "pusher-java-client" % "1.8.0") ```
Try changing scope of your dependencies marked as test like below
scalaVersion := "2.11.0"
val sparkVersion = "2.0.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.scalatest" %% "scalatest" % "2.2.6",
"org.apache.spark" %% "spark-core" % sparkVersion ,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-catalyst" % sparkVersion
)
or adding "compile".

Resources