Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.javax.ws.rs.core.NoContentException - apache-spark

I've setup a 3-node cluster (1-master & 2-workers) of Hadoop with Yarn along with Spark.
My Pyspark scripts need org.elasticsearch.spark in order to write to Elasticsearch. I'm providing this with parameter --packages org.elasticsearch:elasticsearch-spark-30_2.12:8.4.1 while executing my Pyspark script , that is while executing using spark-submit .
Stuck with this error :
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/javax/ws/rs/core/NoContentException
at org.apache.hadoop.yarn.util.timeline.TimelineUtils.<clinit>(TimelineUtils.java:60)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:200)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:191)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1327)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.javax.ws.rs.core.NoContentException
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 13 more
What have I tried :
I have tried to add all the paths listed on this answer - https://stackoverflow.com/a/25393369/6490744 - doesn't work.
I had Hadoop-3.1.1, after checking https://github.com/apache/incubator-kyuubi/issues/2904 (they've mentioned that the issue is resolved in Hadoop 3.3.3) I have upgraded to 3.3.3. But the issue still persists.
I have also tried by manually downloading the jar to my spark/jars directory using wget -U "Any User Agent" https://repo1.maven.org/maven2/org/elasticsearch/elasticsearch-spark-30_2.12/8.4.1/elasticsearch-spark-30_2.12-8.4.1.jar => after downloading, tried to do spark-submit without passing --packages (since I have the jar in path).
All of this has been giving me the same error

After 2 hours of struggle, got the clue from - https://github.com/apache/incubator-kyuubi/issues/2904#issuecomment-1158643036 :
I had yarn.timeline-service.enabled set to true in my /etc/hadoop/yarn-site.xml - updated to false , now the error is gone.
Wonder how to setup the yarn-timeline-server now

Related

Spark2.4.6 without hadoop: A JNI error has occurred

On my windows machine, I am trying to use spark 2.4.6 without hadoop using -
spark-2.4.6-bin-without-hadoop-scala-2.12.tgz
After setting the SPARK_HOME, HADOOP_HOME and also SPARK_DIST_CLASSPATH with information from the post linked here
when i try to start the spark-shell, I get this error -
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
The link referenced above seems and many others point to SPARK_DIST_CLASSPATH , but I already have this in my system variables as -
$HADOOP_HOME;$HADOOP_HOME\etc\hadoop*;$HADOOP_HOME\share\hadoop\common\lib*;$HADOOP_HOME\share\hadoop\common*;$HADOOP_HOME\share\hadoop\hdfs*;$HADOOP_HOME\share\hadoop\hdfs\lib*;$HADOOP_HOME\share\hadoop\hdfs*;$HADOOP_HOME\share\hadoop\yarn\lib*;$HADOOP_HOME\share\hadoop\yarn*;$HADOOP_HOME\share\hadoop\mapreduce\lib*;$HADOOP_HOME\share\hadoop\mapreduce*;$HADOOP_HOME\share\hadoop\tools\lib*;
I also have this line in the spark-env.sh of the spark -
export SPARK_DIST_CLASSPATH=$(C:\opt\spark\hadoop-2.7.3\bin\hadoop classpath)
HADOOP_HOME = C:\opt\spark\hadoop-2.7.3
SPARK_HOME = C:\opt\spark\spark-2.4.6-bin-without-hadoop-scala-2.12
When I tried the spark 2.4.5 that came with hadoop seems to work just fine. This tells me there is something wrong with the way I have my hadoop set up. What am I missing here?
Thanks!
Found a solution here.
Go to your ~/.bashrc
Add export SPARK_DIST_CLASSPATH=$(hadoop classpath)
Apply environment using source ~/.bashrc

Apache Spark : How to add default and specific dependency?

My spark-defaults.conf :
#a package I need everytime
spark.jars.packages org.influxdb:influxdb-java:2.14
When I launch a job :
spark-shell --master yarn --num-executors 6 --packages "a random package that I need only for this job specifically"
I get this error :
java.lang.NoClassDefFoundError: org/influxdb/InfluxDBFactory
at ch.cern.sparkmeasure.InfluxDBSink.<init>(influxdbsink.scala:53)
at ch.cern.sparkmeasure.InfluxDBSinkExtended.<init>(influxdbsink.scala:232)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2688)
at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2680)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2680)
at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2387)
at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2386)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2386)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106)
... 62 elided
Caused by: java.lang.ClassNotFoundException: org.influxdb.InfluxDBFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 88 more
I think --packages overriding packages in spark-default.conf so I get this error
I don't want to include every time influxdb packages.
I want to have fixe packages in spark-default.conf and dynamic package when I launch a job with --packages but it seems incompatible.
Any idea ?
That class is indeed part of influxdb-java, so you shouldn't be getting that error. Especially if your code isn't even trying to use Influx packages
However, if you have packages specific to a certain application, you should actually package them as part of that application. Then others will be able to run it as well and not remember the specific invocation of the spark submit options
If you are using a build system for your code, look into creating an uber jar
I'm not familiar with the semantics of spark-defaults or if it's overridden, but if you do have packages that always need to be part of Spark applications, you should just download the JARs directly into the Spark workers classpaths, not make applications download them every time

Spark 2.3 java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.metric

SPARK 2.3 is throwing following exception. Can anyone please help!! I tried adding the JARs
308 [Driver] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - User class threw exception: java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.metric()Lio/netty/buffer/PooledByteBufAllocatorMetric;
java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.metric()Lio/netty/buffer/PooledByteBufAllocatorMetric;
at org.apache.spark.network.util.NettyMemoryMetrics.registerMetrics(NettyMemoryMetrics.java:80)
at org.apache.spark.network.util.NettyMemoryMetrics.(NettyMemoryMetrics.java:76)
at org.apache.spark.network.client.TransportClientFactory.(TransportClientFactory.java:109)
at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:99)
at org.apache.spark.rpc.netty.NettyRpcEnv.(NettyRpcEnv.scala:71)
at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:461)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:249)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256)
at org.apache.spark.SparkContext.(SparkContext.scala:423)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at com.voicebase.etl.HBasePhoenixPerformance2.main(HBasePhoenixPerformance2.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:706)
315 [main] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:486)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:800)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:799)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:824)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.util.concurrent.ExecutionException: Boxed Error
This is because Hadoop binaries compiled with an older version and need us to just replace them. I haven't faced any issues with Hadoop by replacing them.
You need to replace netty-3.6.2.Final.jar and netty-all-4.0.23.Final.jar from path $HADOOP_HOME\share\hadoop with netty-all-4.1.17.Final.jar and netty-3.9.9.Final.jar.
This issue plagues due to mismatch of the version that Hadoop and Spark are compiled on for Netty. So you can follow this.
Similar Issue , solved by manually compiling the Spark by using specific version of Netty
And the other one as recommended by Suhas , by copying the content of SPARK_HOME/jars folder to the various lib folder or only the one in yarn inside HADOOP_HOME/share/hadoop solves the problem also. But it's a dirty fix. So maybe use latest version of both or manually compile them.
An older version of Netty was required by the aws-java-sdk. Deleting all the netty jars and removing the aws-java-sdk from the project solved the problem.
Issue has been resolved by adding the below netty jars in the dependencies,
"io.netty" % "netty-all" % "4.1.68.Final"
"io.netty" % "netty-buffer" % "4.1.68.Final"
And excluding all existing netty jars by adding excludeAll code.
val excludeNettyBufferBinding = ExclusionRule(organization = "io.netty.buffer")
excludeAll(excludeNettyBufferBinding)

IllegalArgumentException with Spark 1.6

I'm running Spark 1.6.0 on CDH 5.7 and I've upgraded my original application from 1.4.1 to 1.6.0. When I try to run my application (which previously worked fine) I get the following error:
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:221)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$3.apply(Client.scala:473)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$3.apply(Client.scala:471)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:471)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:469)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:469)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:725)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:143)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1023)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1083)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I submit the application with:
--jars is a comma-separated list of jars (with absolute paths)
--files is a comma-separated list of files (with absolute paths)
--driver-class-path is a colon-separated list of resources (without the full path, just the file names)
I have tried it with full paths for the driver (and executor) class paths, but that gives me the same issue. All files and jars submitted with the app exist, I checked.
Could this be related to the issue with duplicates in the distributed cache or is this another issue?
From the source code I see that the only calls to require without a custom message (as in the stack trace) are related to the distribute() method. If so, how can I run applications without upgrading Spark?
This is the exception which results from having the same path/URI appear twice in the argument to the --files option.

Hadoop HDFS test running issue - org.apache.hadoop.conf.Configuration NoClassDefFoundError

I'm working with Hadoop 0.21.0. and trying to run the hdfs_test application that comes alongside the C API library. After many problems I was able to compile hdfs_test. Now when I'm running it:
./hdfs_test
I'm getting the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:153)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 1 more
Can't construct instance of class org.apache.hadoop.conf.Configuration
Oops! Failed to connect to hdfs!
Any help is appreciated.. thanks
Like any other Java program you need the dependencies in the classpath or inside the jar. Hadoop also has an HADOOP_CLASSPATH to tell the cluster where to find dependencies in map-reduce tasks. Also see How to run a Hadoop program?

Resources