Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging - apache-spark

I am trying java-spark code to initiate SparkSession using below method and versions to perform few sql operations:
public static SparkSession getSparkSession() {
return SparkSession.builder().enableHiveSupport().getOrCreate();
}
all dependencies spark-core/streaming/sql are w.r.t spark version 3.2 and scala 2.12
I am getting below error while running job:
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
I understand from online support platforms that we can try to resolve this using lower version of spark and scala. But our system got updated to higher version across all platforms. I am looking for a workaround in this case to resolve this class not found error.

Related

Elastic Search Conflict with Spark Connector, rest high level client no such field error

I am using elasticsearch version 7.9.2, and it is conflicting with nebula spark connector, a spark connector for nebula graph. I have seen cases where spark conflicts with elasticsearch, but with no existing solution.
Failed to instantiate [org.elasticsearch.client.RestHighLevelClient]: Factory method 'restHighLevelClient' threw exception; nested exception is java.lang.NoSuchFieldError: INSTANCE
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:185)
at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:653)
... 47 common frames omitted
Caused by: java.lang.NoSuchFieldError: INSTANCE
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$InternalAddressResolver.<init>(PoolingNHttpClientConnectionManager.java:619)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.<init>(PoolingNHttpClientConnectionManager.java:165)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.<init>(PoolingNHttpClientConnectionManager.java:149)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.<init>(PoolingNHttpClientConnectionManager.java:121)
at org.apache.http.impl.nio.client.HttpAsyncClientBuilder.build(HttpAsyncClientBuilder.java:668)
at java.security.AccessController.doPrivileged(Native Method)
at org.elasticsearch.client.RestClientBuilder.createHttpClient(RestClientBuilder.java:219)
at java.security.AccessController.doPrivileged(Native Method)
at org.elasticsearch.client.RestClientBuilder.build(RestClientBuilder.java:191)
at org.elasticsearch.client.RestHighLevelClient.<init>(RestHighLevelClient.java:287)
at org.elasticsearch.client.RestHighLevelClient.<init>(RestHighLevelClient.java:279)
at com.bybit.byassets.collection.config.ElasticSearchConfig.restHighLevelClient(ElasticSearchConfig.java:54)
at com.bybit.byassets.collection.config.ElasticSearchConfig$$EnhancerBySpringCGLIB$$e050e7f6.CGLIB$restHighLevelClient$0(<generated>)
at com.bybit.byassets.collection.config.ElasticSearchConfig$$EnhancerBySpringCGLIB$$e050e7f6$$FastClassBySpringCGLIB$$43cd2bed.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:244)
at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:331)
at com.bybit.byassets.collection.config.ElasticSearchConfig$$EnhancerBySpringCGLIB$$e050e7f6.restHighLevelClient(<generated>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154)
... 48 common frames omitted
I have tried shading the nebula spark connector jar, but the error still persists. Any help is appreciated. Thank you!
This looks similar to this question and this.
Could you check all dependencies on httpClient, there should be one old version before INSTANCE field was introduced, we should exclude that httpClient.

Spark Listener EventLoggingListener threw an exception / ConcurrentModificationException

In our application (Spark 2.0.1) we have this exception popping up frequently.
I can't find anything about this.
What could be the cause ?
16/10/27 11:18:24 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
at java.util.ArrayList$Itr.next(ArrayList.java:851)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.to(TraversableLike.scala:590)
at scala.collection.AbstractTraversable.to(Traversable.scala:104)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294)
at scala.collection.AbstractTraversable.toList(Traversable.scala:104)
at org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:314)
at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291)
at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291)
at scala.Option.map(Option.scala:146)
at org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:291)
at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283)
at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:283)
at org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:137)
at org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:157)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1249)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
EDIT: One more information, our application is long-running, and to resume from potentially failed spark context, we use the SparkBuilder.getOrCreate() method between two "jobs". Could this mess-up with the listeners ?
It's a known problem in Spark 2.0.1 (SPARK-17816) and will be fixed with Spark 2.0.2/2.1.0 (related pull request).
To get rid of the exception without waiting for Spark 2.0.2/2.1.0, clone the latest, unstable spark version and build apache-spark manually.
Update: They released Spark 2.0.2!
We also just upgraded to Spark 2.0.1 and started seeing the same exception. We narrowed the cause down to a section of Python code containing the following idiom:
a = spark_context.textFile('..')
a = a.map(stuff)
b = a.filter(stuff).map(stuff)
I've had issues in the past with variable self-assignment in Spark, but after upgrading to 2.0.1 the problem got really acute and we started seeing ConcurrentModification exceptions.
The fix for us was simply changing the code to not do any self-assignments.
A similar issue has surfaced in Spark 3.1.0, related to EventLoggingListener race condition and is described in the following bug reports:
https://issues.apache.org/jira/browse/SPARK-34731
https://issues.apache.org/jira/browse/SPARK-32027
The issue was fixed in Spark 3.1.2, so upgrading Spark from 3.1.0/3.1.1 to 3.1.2 would solve it. Alternatively, it is possible to avoid the error by disabling event logging altogether:
spark.eventLog.enabled=false

Hive on Spark CDH5.7 Execution Error

I've updated my cluster to CDH 5.7 recently and I am trying to run a Hive query processing on Spark.
I have configured the Hive client to use the Spark execution engine and Hive Dependency on a Spark Service from Cloudera Manager.
Via HUE, i'm simply running a simple select query but seem to get this error at all times: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Following are the logs for the same:
ERROR operation.Operation: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:374)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:180)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:72)
at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:232)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:245)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Any help to solve this would be great!
This problem is due to a open JIRA: https://issues.apache.org/jira/browse/HIVE-11519. You should use another serialization tool..
Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
is not the real error message, you'd better turn on the DEBUG info by using hive cli, like
bin/hive --hiveconf hive.root.logger=DEBUG,console
and you will get more detailed logs, such as, those are something i got before:
16/03/17 13:55:43 [fxxxxxxxxxxxxxxxx4 main]: INFO exec.SerializationUtilities: Serializing MapWork using kryo
java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
this is caused by some dependency conflicts, see https://issues.apache.org/jira/browse/HIVE-13301 for detail.

java.lang.ClassCastException: kafka.cluster.BrokerEndPoint cannot be cast to kafka.cluster.Broker

I am on Kafka 0.9.0.0 and Spark version 1.5. While trying to read from kafka, I get the exception below:
Exception in thread "main" java.lang.ClassCastException: kafka.cluster.BrokerEndPoint cannot be cast to kafka.cluster.Broker
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6$$anonfun$apply$7.apply(KafkaCluster.scala:90)
at scala.Option.map(Option.scala:145)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6.apply(KafkaCluster.scala:90)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$2$$anonfun$3$$anonfun$apply$6.apply(KafkaCluster.scala:87)
at
Has anyone experienced this and how to fix it? Could it be because of some version incompatibility? I have see this before Spark Streaming Kafka stream so that is why I suspect version issues.

Cassandra 1.1 or 1.2 for production usage?

We are encountering random SSTable corruptions with 1.2.3/1.2.4 (Datastax Community Edition) on single node development machines with a mixed read/write load using a data model with wide rows from a number of columns POV. Writes are more frequent than reads though. The problems manifests with stack traces like:
ERROR [ReadStage:13899] 2013-04-24 07:09:00,770 CassandraDaemon.java (line 132) Exception in thread Thread[ReadStage:13899,5,main]
java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1582)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:106)
... many more
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readFully(Unknown Source)
... many more
or
java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0
Unfortunately, we don't have a reproducible test case yet, because this happens randomly (e.g. after a few days) and not immediately.
I have also researched similar issues with 1.2 in this/other forum(s).
The question is: What is your experience with Cassandra 1.2 in production or would you recommend 1.1 being 1.2.4 the most recent release to date in the 1.2 series?
While we encounter these issues on single node development environments, things might get backed up when running the whole stuff in a cluster served by several nodes, but in our opinion things should run on a single node without corruption as well.
Any hints are much appreciated. Thanks.
I have better experience with cassandra-1.1 in production. Current version 1.2.6 still do not passes our heavy preproduction testing.

Resources