Error java.lang.NoSuchFieldError: NO_INTS - apache-spark

Getting the below error when running spark streaming application to fetch the data from kinesis.
Exception in thread "Kinesis Receiver 0" java.lang.NoSuchFieldError: NO_INTS
at com.fasterxml.jackson.dataformat.cbor.CBORParser.<init>(CBORParser.java:285)
at com.fasterxml.jackson.dataformat.cbor.CBORParserBootstrapper.constructParser(CBORParserBootstrapper.java:91)
at com.fasterxml.jackson.dataformat.cbor.CBORFactory._createParser(CBORFactory.java:392)
at com.fasterxml.jackson.dataformat.cbor.CBORFactory.createParser(CBORFactory.java:308)
at com.fasterxml.jackson.dataformat.cbor.CBORFactory.createParser(CBORFactory.java:295)
at com.fasterxml.jackson.dataformat.cbor.CBORFactory.createParser(CBORFactory.java:26)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2294)
at com.amazonaws.protocol.json.JsonContent.parseJsonContent(JsonContent.java:72)
at com.amazonaws.protocol.json.JsonContent.<init>(JsonContent.java:64)
at com.amazonaws.protocol.json.JsonContent.createJsonContent(JsonContent.java:54)
at com.amazonaws.http.JsonErrorResponseHandler.handle(JsonErrorResponseHandler.java:89)
at com.amazonaws.http.JsonErrorResponseHandler.handle(JsonErrorResponseHandler.java:40)
at com.amazonaws.http.AwsErrorResponseHandler.handleAse(AwsErrorResponseHandler.java:53)
at com.amazonaws.http.AwsErrorResponseHandler.handle(AwsErrorResponseHandler.java:41)
at com.amazonaws.http.AwsErrorResponseHandler.handle(AwsErrorResponseHandler.java:26)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1781)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1383)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1359)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524)
at com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke(AmazonKinesisClient.java:2809)
at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2776)
at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2765)
at com.amazonaws.services.kinesis.AmazonKinesisClient.executeListShards(AmazonKinesisClient.java:1557)
at com.amazonaws.services.kinesis.AmazonKinesisClient.listShards(AmazonKinesisClient.java:1528)
at com.amazonaws.services.kinesis.clientlibrary.proxies.KinesisProxy.listShards(KinesisProxy.java:325)
at com.amazonaws.services.kinesis.clientlibrary.proxies.KinesisProxy.getShardList(KinesisProxy.java:440)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisShardSyncer.getShardList(KinesisShardSyncer.java:349)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisShardSyncer.syncShardLeases(KinesisShardSyncer.java:159)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisShardSyncer.checkAndCreateLeasesForNewShards(KinesisShardSyncer.java:112)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncTask.call(ShardSyncTask.java:84)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.initialize(Worker.java:683)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.run(Worker.java:614)
at org.apache.spark.streaming.kinesis.KinesisReceiver$$anon$1.run(KinesisReceiver.scala:191)
Here is the command that was executed:
spark-submit --jars spark-streaming-kinesis-asl_2.11-2.4.5.jar,amazon-kinesis-client-1.13.2.jar,aws-java-sdk-kinesis-1.11.745.jar,aws-java-sdk-core-1.11.745.jar,aws-java-sdk-sts-1.11.745.jar,aws-java-sdk-1.11.745.jar,aws-java-sdk-dynamodb-1.11.745.jar,aws-java-sdk-cloudwatch-1.11.745.jar,jackson-core-2.9.8.jar,jackson-dataformat-cbor-2.9.8.jar,jackson-databind-2.9.8.jar snowplow_spark/src/main.py
And code is very basic:
kinesisStream = KinesisUtils.createStream(
ssc, kinesisAppName=appName, streamName=streamName, endpointUrl=endpointUrl,
regionName=regionName, initialPositionInStream=InitialPositionInStream.LATEST,
checkpointInterval=10)
I have been stuck on this since days and have no idea what to do. I know that somewhere the jackson version is not matching in spark and aws-sdk but dont know which one to put in --jars.

Not sure if you have it resolved already, but https://issues.apache.org/jira/browse/SPARK-25455 issue is related to that. I ran into same issue. I got it to work with the below dependency in pom.xml. Also all the aws libraries are of 2.16.0 version in my project. Hope this helps.
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-cbor</artifactId>
<version>2.6.7</version>
</dependency>

Related

Spark spark-sql-kafka - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer

I am experimenting with spark reading from a kafka topic through "Structured Streaming + Kafka Integration Guide".
Spark version: 3.2.1
Scala version: 2.12.15
Following their guide on the spark-shell including the dependencies, I start my shell:
spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1
However, once I run something like the following in my shell:
val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers","http://HOST:PORT").option("subscribe", "my-topic").load()
I get the following exception:
java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer`
Any ideas how to overcome this issue?
My assumption was with using --packages, all dependencies should be loaded as well. But this does not seem to be the case. From the logs I assume that the package gets loaded successfully, including the kafka-clients dependency:
org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
resolving dependencies :: org.apache.spark#spark-submit-parent-3b04f646-471c-4cc8-88fb-7e32bc3226ed;1.0
confs: \[default\]
found org.apache.spark#spark-sql-kafka-0-10_2.12;3.2.1 in central
found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.2.1 in central
found org.apache.kafka#kafka-clients;2.8.0 in central
found org.lz4#lz4-java;1.7.1 in central
found org.xerial.snappy#snappy-java;1.1.8.4 in central
found org.slf4j#slf4j-api;1.7.30 in central
found org.apache.hadoop#hadoop-client-runtime;3.3.1 in central
found org.spark-project.spark#unused;1.0.0 in central
found org.apache.hadoop#hadoop-client-api;3.3.1 in central
found org.apache.htrace#htrace-core4;4.1.0-incubating in central
found commons-logging#commons-logging;1.1.3 in central
found com.google.code.findbugs#jsr305;3.0.0 in central
found org.apache.commons#commons-pool2;2.6.2 in central
The logs seem fine, but you can try to include kafka-clients dependency in --packages argument as well
Otherwise, I'd suggest creating an uber jar instead of downloading libraries every time you submit the app

Unable to save model in Apache Spark -- Py4JJavaError

We're getting an error while trying to save a model. model.save('DT')
Py4JJavaError: An error occurred while calling o822.save.
: org.apache.spark.SparkException: Job aborted.```
Complete Error Stack --> http://dpaste.com/16Y07B9
Anything we missed here? It is creating the folder but not writing anything.
OS: Windows 10
TIA
So it turns out I was using Spark 3.0.0Preview and ran into trouble. Switched to 2.4.5 and resolved it.

Zeppelin + Spark: Reading Parquet from S3 throws NoSuchMethodError: com.fasterxml.jackson

Using Zeppelin 0.7.2 binaries from the main download, and Spark 2.1.0 w/ Hadoop 2.6, the following paragraph:
val df = spark.read.parquet(DATA_URL).filter(FILTER_STRING).na.fill("")
Produces the following:
java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49)
at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<clinit>(ScalaNumberDeserializersModule.scala)
at com.fasterxml.jackson.module.scala.deser.ScalaNumberDeserializersModule$class.$init$(ScalaNumberDeserializersModule.scala:61)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.<init>(DefaultScalaModule.scala:20)
at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<init>(DefaultScalaModule.scala:37)
at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<clinit>(DefaultScalaModule.scala)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)
at org.apache.spark.SparkContext.parallelize(SparkContext.scala:715)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425)
... 47 elided
This error does not happen in the normal spark-shell, only in Zeppelin. I have attempted the following fixes, which do nothing:
Download jackson 2.6.2 jars to the zeppelin lib folder and restart
Add jackson 2.9 dependencies from the maven repositories to the interpreter settings
Deleting the jackson jars from the zeppelin lib folder
Googling is turning up no similar situations. Please don't hesitate to ask for more information, or make suggestions. Thanks!
I had the same problem. I added com.amazonaws:aws-java-sdk and org.apache.hadoop:hadoop-aws as dependencies for the Spark interpreter. These dependencies bring in their own versions of com.fasterxml.jackson.core:* and conflict with Spark's.
You also must exclude com.fasterxml.jackson.core:* from other dependencies, this is an example ${ZEPPELIN_HOME}/conf/interpreter.json Spark interpreter depenency section:
"dependencies": [
{
"groupArtifactVersion": "com.amazonaws:aws-java-sdk:1.7.4",
"local": false,
"exclusions": ["com.fasterxml.jackson.core:*"]
},
{
"groupArtifactVersion": "org.apache.hadoop:hadoop-aws:2.7.1",
"local": false,
"exclusions": ["com.fasterxml.jackson.core:*"]
}
]
Another way is to include it right in the notebook cell:
%dep
z.load("com.fasterxml.jackson.core:jackson-core:2.6.2")

Spark JobServer NullPointerException

I'm trying to start a spark jobserver, here are the steps I'm following:
I configure the local.sh based on the template.
Then I run ./bin/server_deploy.sh and it finishes without any error.
Configure local.conf.
Run ./bin/server_start.sh in the deploy server.
But when I do the last step I get the following error:
Error: Exception thrown by the agent : java.lang.NullPointerException
Note: I'm using spark 1.4.1. I'm using version 0.5.2 from jobserver (https://github.com/spark-jobserver/spark-jobserver/tree/v0.5.2)
Any idea in how I can fix this (or at least debug it).
Thanks
The error log does not provide much information.
I encountered the same error. For my case, I had another instance of the JobServer running (and somehow ./bin/server_stop.sh did not catch it). It works after I manually killed the other process.
Hint : Error: Exception thrown by the agent : java.lang.NullPointerException when starting Java application

Spark SQL: TwitterUtils Streaming fails for unknown reason

I am using the latest Spark master and additionally, I am loading these jars:
- spark-streaming-twitter_2.10-1.1.0-SNAPSHOT.jar
- twitter4j-core-4.0.2.jar
- twitter4j-stream-4.0.2.jar
My simple test program that I execute in the shell looks as follows:
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
System.setProperty("twitter4j.oauth.consumerKey", "jXgXF...")
System.setProperty("twitter4j.oauth.consumerSecret", "mWPvQRl1....")
System.setProperty("twitter4j.oauth.accessToken", "26176....")
System.setProperty("twitter4j.oauth.accessTokenSecret", "J8Fcosm4...")
var ssc = new StreamingContext(sc, Seconds(1))
var tweets = TwitterUtils.createStream(ssc, None)
var statuses = tweets.map(_.getText)
statuses.print()
ssc.start()
However, I won't get any tweets. The main error I see is
14/08/04 10:52:35 ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0: Error starting receiver 0 - java.lang.NoSuchMethodError: twitter4j.TwitterStream.addListener(Ltwitter4j/StatusListener;)V
at org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:72)
....
And then for each iteration:
INFO scheduler.ReceiverTracker: Stream 0 received 0 blocks
I'm not sure where the problem lies.
How can I verify that my twitter credentials are correctly recognized?
Might there be another jar missing?
NoSuchMethodError should always cause you to ask whether you are running with the same versions of libraries and classes that you compiled with.
If you look at the pom.xml file for the Spark examples module, you'll see that it uses twitter4j 3.0.3. You're bringing incompatible 4.0.2 with you at runtime and that breaks it.
Yes, Sean Owen has given the good reason, after I add two dependency files on the pom.xml file:
<dependency>
<groupId>org.twitter4j</groupId>
<artifactId>twitter4j-core</artifactId>
<version>3.0.6</version>
</dependency>
<dependency>
<groupId>org.twitter4j</groupId>
<artifactId>twitter4j-stream</artifactId>
<version>3.0.6</version>
</dependency>
In this way we change the default twitter4j version from 4.0.x to 3.0.x (http://mvnrepository.com/artifact/org.twitter4j/twitter4j-core), then the incompatible problem will be solved.

Resources