AbstractMethodError exception while instantiating JavaStreamingContext - apache-spark

I am getting AbstractMethodError exception while creating a JavaStreamingContext.
My dependency pom is as below; Unable to find the clue , Can anyone suggest whats going wrong here ?
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.1</version>
<scope>provided</scope>
</dependency>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.5</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.2.2</version>
</dependency>
Exception in thread "main" java.lang.AbstractMethodError
at org.apache.spark.util.ListenerBus$class.$init$(ListenerBus.scala:35)
at org.apache.spark.streaming.scheduler.StreamingListenerBus.(StreamingListenerBus.scala:30)
at org.apache.spark.streaming.scheduler.JobScheduler.(JobScheduler.scala:57)
at org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:184)
at org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:76)
at org.apache.spark.streaming.api.java.JavaStreamingContext.(JavaStreamingContext.scala:130)

You're mixing many versions of Spark here
First of all, if you're using Apache Spark 2.3.1 and kafka 0.10+
I suggest the following :
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.1</version>
<scope>provided</scope>
</dependency>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.5</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<!-- Keep the same Spark version as before -->
<version>2.3.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<!-- Keep the same Spark version as before -->
<version>2.3.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<!-- Support for kafka version 0.8 is deprecated as of Spark 2.3.1 and you add the dependencies for kafka 0.10+ above -->
<version><your_kafka_version_0.10+></version>
</dependency>
It would be nice to know how you build/deploy your application ? according to your runtime environment, you may want to add some scope provided to prevent conflict between the built package and the existing environment.
Hope it helps

Related

spark-submit: NoSuchMethodError: com.fasterxml.jackson.databind.JsonMappingException

the method belongs to SparkSession , the name is getOrCreate()
detailed exception is
java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.JsonMappingException.<init>(Ljava/io/Closeable;Ljava/lang/String;)V
at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)
at com.fasterxml.jackson.module.scala.JacksonModule.setupModule$(JacksonModule.scala:46)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:17)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:718)
at org.apache.spark.util.JsonProtocol$.<init>(JsonProtocol.scala:62)
at org.apache.spark.util.JsonProtocol$.<clinit>(JsonProtocol.scala)
at org.apache.spark.scheduler.EventLoggingListener.initEventLog(EventLoggingListener.scala:89)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:84)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:610)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
at com.hiido.server.service.impl.SparkSqlJob.executing(SparkSqlJob.java:56)
at com.hiido.server.service.impl.SparkSqlJob.main(SparkSqlJob.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:737)
Someone said it happened due to version conflict, I disagree.
Because I have checked my spark version, it's spark_core_2.12-3.2.1 and jackson version is 2.12.3 and spark_version is 3.2.1-bin-hadoop2.7
I have no idea about this problem.
This problem only happend at spark cluster , when i use local it's ok.
thx.
additional:
this is my pom.xml , I only show my depency , sry
<properties>
<java.version>1.8</java.version>
<geospark.version>1.2.0</geospark.version>
<spark.compatible.verison>2.3</spark.compatible.verison>
<!--<spark.version>2.3.4</spark.version>-->
<spark.version>3.2.1</spark.version>
<hadoop.version>2.7.2</hadoop.version>
<geotools.version>19.0</geotools.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- https://mvnrepository.com/artifact/io.netty/netty-all -->
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.68.Final</version>
</dependency>
<!-- <dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.4.4</version>
</dependency>-->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId> jackson-annotations</artifactId>
<version>2.12.3</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId> jackson-core</artifactId>
<version>2.12.3</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.12.3</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>commons-compiler</artifactId>
<version>3.0.8</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.15</version>
</dependency>
<!-- geospark -->
<dependency>
<groupId>org.datasyslab</groupId>
<artifactId>geospark</artifactId>
<version>${geospark.version}</version>
</dependency>
<dependency>
<groupId>org.datasyslab</groupId>
<artifactId>geospark-sql_${spark.compatible.verison}</artifactId>
<version>${geospark.version}</version>
</dependency>
<dependency>
<groupId>org.datasyslab</groupId>
<artifactId>geospark-viz_${spark.compatible.verison}</artifactId>
<version>${geospark.version}</version>
</dependency>
<!-- geospark -->
<dependency>
<groupId>org.apache.sedona</groupId>
<artifactId>sedona-core-3.0_2.12</artifactId>
<version>1.1.1-incubating</version>
</dependency>
<dependency>
<groupId>org.apache.sedona</groupId>
<artifactId>sedona-sql-3.0_2.12</artifactId>
<version>1.1.1-incubating</version>
</dependency>
<dependency>
<groupId>org.apache.sedona</groupId>
<artifactId>sedona-viz-3.0_2.12</artifactId>
<version>1.1.1-incubating</version>
</dependency>
<dependency>
<groupId>org.datasyslab</groupId>
<artifactId>sernetcdf</artifactId>
<version>0.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<!-- <version>2.3.4</version>-->
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.12</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.1</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.12</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
<scope>${dependency.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<scope>${dependency.scope}</scope>
</dependency>
<dependency>
<groupId>net.sourceforge.javacsv</groupId>
<artifactId>javacsv</artifactId>
<version>2.0</version>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.2.5</version>
</dependency>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>3.0.8</version>
</dependency>
<dependency>
<groupId>org.geotools</groupId>
<artifactId>gt-grid</artifactId>
<version>${geotools.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.locationtech.spatial4j/spatial4j -->
<dependency>
<groupId>org.locationtech.spatial4j</groupId>
<artifactId>spatial4j</artifactId>
<version>0.8</version>
</dependency>
<!-- JTS is essentially only used for polygons. -->
<!-- https://mvnrepository.com/artifact/org.locationtech.jts/jts-core -->
<dependency>
<groupId>org.locationtech.jts</groupId>
<artifactId>jts-core</artifactId>
<version>1.18.1</version>
</dependency>
</dependencies>
This is most likely caused by the wrong Spark and Hadoop packaging strategy in your Maven pom.xml.
Your POM.xml includes many packages that shouldn't be put in 'compile' scope. E.g., spark, and hadoop dependencies. Spark clusters usually have all these libs already. If you mistakenly include them in your jars, it is uncertain that which jackson version will be used. Please change them to 'provided' scope. What Spark and Sedona developers usually do is that, use compile scope for local testing and change to provided scope when deploy to a cluster.
You usually don't need to include Hadoop dependencies since Spark jars come with many Hadoop dependencies. This will lead to many version conflicts in JackSon.
Your GeoSpark dependency is wrong. Please remove both your old GeoSpark and Sedona dependecies and follow the instruction here: https://sedona.apache.org/setup/maven-coordinates/#use-sedona-fat-jars
Here is a runnable example of Sedona + Spark project: https://github.com/apache/incubator-sedona/blob/master/examples/sql/build.sbt#L59 . Although it is written in sbt, POM shares the same logic.
Please pay close attention to those dependency scope and exclude parts.

Spark structured streaming - error while storing data into Azure datalake gen1

I got stuck in one problem and currently trying to find out the solution.
This problem is related to store the streaming data output to Azure Datalake. Below is the exception that I am getting while storing data
Exception in thread "main" org.apache.hadoop.fs.InvalidPathException: Invalid path name Wrong FS: adl://<azure-data-lake>.azuredatalakestore.net/eventstore/_spark_metadata, expected: adl://<azure-data-lake>.azuredatalakestore.net/
at org.apache.hadoop.fs.AbstractFileSystem.checkPath(AbstractFileSystem.java:383)
at org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:110)
at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1120)
at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1116)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1116)
at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1581)
at org.apache.spark.sql.execution.streaming.HDFSMetadataLog$FileContextManager.exists(HDFSMetadataLog.scala:390)
at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.<init>(HDFSMetadataLog.scala:65)
at org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.<init>(CompactibleFileStreamLog.scala:46)
at org.apache.spark.sql.execution.streaming.FileStreamSinkLog.<init>(FileStreamSinkLog.scala:85)
at org.apache.spark.sql.execution.streaming.FileStreamSink.<init>(FileStreamSink.scala:95)
at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:316)
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:293)
Below are my pom dependencies
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.microsoft.azure/azure-eventhubs-spark -->
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-eventhubs-spark_2.11</artifactId>
<version>2.3.12</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.4.3</version>
</dependency>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-eventhubs</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-data-lake-store-sdk</artifactId>
<version>2.2.8</version>
</dependency>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-eventhubs-eph</artifactId>
<version>2.4.0</version>
</dependency>
Any help regarding to this will be appreciated.
Finally I am able to resolve the issue by adding the appropriate maven dependencies.
The dependencies that I used are
hadoop-common - v3.8.1
azure-data-lake-store-sdk - v2.3.7
hadoop-azure-datalake v3.2.1
Hope this will help others to resolve this kind of issue.
Thanks
Avinash

xgboost4j-spark missing or invalid dependency detected while loading class file 'XGBoost.class'

I use xgboost with spark
scala version:2.11.8
java version:1.7
spark version:2.1.0
maven dependencies:
<dependencies>
<!-- https://mvnrepository.com/artifact/ml.dmlc/xgboost4j-spark -->
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark</artifactId>
<version>0.8</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
i use idea ,when i write codes,there is no error,but when i want to package my project to jar ,it shows that error.
the reason is lacking xgboost4j dependency,just add in maven:
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j</artifactId>
<version>0.8</version>
</dependency>

spark 1.6 hive context setConf Issue

I am having trouble to run a sql which loads data to partition table in hive context , I did set dynamic partition = true but still I am having issue.
SQL:
insert overwrite table target_table PARTITION (column1,column2) select * , deletion_flag ,'2018-12-23' as date_feed from source_table
Hive setconf:-
hiveContext.setConf("hive.exec.dynamic.partition","true")
hiveContext.setConf("hive.exec.max.dynamic.partitions","2048")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
Error:
org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean
Maven Dependency:-
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.1.0</version>
</dependency>
Thanks
I solved the issues after getting all the maven dependencies from cloudera repo.
<dependencies>
<!-- Scala and Spark dependencies -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0-cdh5.9.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.0-cdh5.9.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.0-cdh5.9.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.1.0-cdh5.9.2</version>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.10</artifactId>
<version>3.0.0-SNAP4</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<version>1.4.1</version>
</dependency>
<dependency>
<groupId>commons-dbcp</groupId>
<artifactId>commons-dbcp</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-xml_2.10</artifactId>
<version>0.2.0</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.0.12</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.172</version>
</dependency>
<dependency>
<groupId>com.github.scopt</groupId>
<artifactId>scopt_2.10</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>javax.mail</groupId>
<artifactId>mail</artifactId>
<version>1.4</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>maven-hadoop</id>
<name>Hadoop Releases</name>
<url>https://repository.cloudera.com/content/repositories/releases/</url>
</repository>
<repository>
<id>cloudera-repos</id>
<name>Cloudera Repos</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>

Save twitter data streams to Cassandra

I am trying to save real time tweets to cassandra, however, Eclipse is showing a marker in this line:
CassandraJavaUtil.javaFunctions(tweets).writerBuilder("Twitter","tweets", CassandraJavaUtil.mapToRow(tweets.class)).saveToCassandra();
Eclipse does not recognize mapToRow() and javaFunctions() even though i've imported all the packages.
I am using the 1.4.0 Cassandra connector in order to be compatible with spark version.
I've seen people using this approach,
javaFunctions(, .class).saveToCassandra("java_api", "sales");
but the documentation tells that it is deprecated with the new versions.
<dependencies>
<dependency>
<groupId>org.twitter4j</groupId>
<artifactId>twitter4j-core</artifactId>
<version>3.0.3</version>
</dependency>
<!--Spark Cassandra Connector -->
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.4.0-M1</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.4.0-M1</version>
</dependency>
<!--Spark -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.4.0</version>
</dependency>
<!-- Twitter -->
<dependency>
<groupId>org.twitter4j</groupId>
<artifactId>twitter4j-stream</artifactId>
<version>3.0.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-twitter_2.10</artifactId>
<version>1.4.0</version>
</dependency>

Resources