Spark2.4.6 without hadoop: A JNI error has occurred - apache-spark

On my windows machine, I am trying to use spark 2.4.6 without hadoop using -
spark-2.4.6-bin-without-hadoop-scala-2.12.tgz
After setting the SPARK_HOME, HADOOP_HOME and also SPARK_DIST_CLASSPATH with information from the post linked here
when i try to start the spark-shell, I get this error -
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
The link referenced above seems and many others point to SPARK_DIST_CLASSPATH , but I already have this in my system variables as -
$HADOOP_HOME;$HADOOP_HOME\etc\hadoop*;$HADOOP_HOME\share\hadoop\common\lib*;$HADOOP_HOME\share\hadoop\common*;$HADOOP_HOME\share\hadoop\hdfs*;$HADOOP_HOME\share\hadoop\hdfs\lib*;$HADOOP_HOME\share\hadoop\hdfs*;$HADOOP_HOME\share\hadoop\yarn\lib*;$HADOOP_HOME\share\hadoop\yarn*;$HADOOP_HOME\share\hadoop\mapreduce\lib*;$HADOOP_HOME\share\hadoop\mapreduce*;$HADOOP_HOME\share\hadoop\tools\lib*;
I also have this line in the spark-env.sh of the spark -
export SPARK_DIST_CLASSPATH=$(C:\opt\spark\hadoop-2.7.3\bin\hadoop classpath)
HADOOP_HOME = C:\opt\spark\hadoop-2.7.3
SPARK_HOME = C:\opt\spark\spark-2.4.6-bin-without-hadoop-scala-2.12
When I tried the spark 2.4.5 that came with hadoop seems to work just fine. This tells me there is something wrong with the way I have my hadoop set up. What am I missing here?
Thanks!

Found a solution here.
Go to your ~/.bashrc
Add export SPARK_DIST_CLASSPATH=$(hadoop classpath)
Apply environment using source ~/.bashrc

Related

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.javax.ws.rs.core.NoContentException

I've setup a 3-node cluster (1-master & 2-workers) of Hadoop with Yarn along with Spark.
My Pyspark scripts need org.elasticsearch.spark in order to write to Elasticsearch. I'm providing this with parameter --packages org.elasticsearch:elasticsearch-spark-30_2.12:8.4.1 while executing my Pyspark script , that is while executing using spark-submit .
Stuck with this error :
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/javax/ws/rs/core/NoContentException
at org.apache.hadoop.yarn.util.timeline.TimelineUtils.<clinit>(TimelineUtils.java:60)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:200)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:191)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1327)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.javax.ws.rs.core.NoContentException
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 13 more
What have I tried :
I have tried to add all the paths listed on this answer - https://stackoverflow.com/a/25393369/6490744 - doesn't work.
I had Hadoop-3.1.1, after checking https://github.com/apache/incubator-kyuubi/issues/2904 (they've mentioned that the issue is resolved in Hadoop 3.3.3) I have upgraded to 3.3.3. But the issue still persists.
I have also tried by manually downloading the jar to my spark/jars directory using wget -U "Any User Agent" https://repo1.maven.org/maven2/org/elasticsearch/elasticsearch-spark-30_2.12/8.4.1/elasticsearch-spark-30_2.12-8.4.1.jar => after downloading, tried to do spark-submit without passing --packages (since I have the jar in path).
All of this has been giving me the same error
After 2 hours of struggle, got the clue from - https://github.com/apache/incubator-kyuubi/issues/2904#issuecomment-1158643036 :
I had yarn.timeline-service.enabled set to true in my /etc/hadoop/yarn-site.xml - updated to false , now the error is gone.
Wonder how to setup the yarn-timeline-server now

Directory expansion does not work in standalone deployment mode : Apache Spark

I'm trying to deploy a spark streaming consuming Kafka topic job on a standalone spark cluster using the following command:
./bin/spark-submit --class MaxwellCdc.MaxwellSreaming
~/cdc/cdc_2.11-0.1.jar --jars ~/cdc/kafka_2.11-0.10.0.1.jar,
~/cdc/kafka-clients-0.10.0.1.jar,~/cdc/mysql-connector-java-5.1.12.jar,
~/cdc/spark-streaming-kafka-0-10_2.11-2.2.1.jar
and getting this exception:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/kafka/common/serialization/StringDeserializer
at MaxwellCdc.MaxwellSreaming$.main(MaxwellSreaming.scala:30)
at MaxwellCdc.MaxwellSreaming.main(MaxwellSreaming.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException:
org.apache.kafka.common.serialization.StringDeserializer
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Any help would be appreciated.
Quoting from the documentation:
When using spark-submit, the application jar along with any jars
included with the --jars option will be automatically transferred to
the cluster. URLs supplied after --jars must be separated by commas.
That list is included in the driver and executor classpaths.
Directory expansion does not work with --jars..
What is directory expansion?
Expanding a file name means converting a relative file name to an absolute one. Since this is done relative to a default directory, you must specify the default directory name as well as the file name to be expanded. It also involves expanding abbreviations like ~/.
Therefore, try providing the absolute path for all the jars that are being provided with --jars option. I hope this helps.

Spark 2.3 java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.metric

SPARK 2.3 is throwing following exception. Can anyone please help!! I tried adding the JARs
308 [Driver] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - User class threw exception: java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.metric()Lio/netty/buffer/PooledByteBufAllocatorMetric;
java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.metric()Lio/netty/buffer/PooledByteBufAllocatorMetric;
at org.apache.spark.network.util.NettyMemoryMetrics.registerMetrics(NettyMemoryMetrics.java:80)
at org.apache.spark.network.util.NettyMemoryMetrics.(NettyMemoryMetrics.java:76)
at org.apache.spark.network.client.TransportClientFactory.(TransportClientFactory.java:109)
at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:99)
at org.apache.spark.rpc.netty.NettyRpcEnv.(NettyRpcEnv.scala:71)
at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:461)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:249)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256)
at org.apache.spark.SparkContext.(SparkContext.scala:423)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at com.voicebase.etl.HBasePhoenixPerformance2.main(HBasePhoenixPerformance2.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:706)
315 [main] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:486)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:800)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:799)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:824)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.util.concurrent.ExecutionException: Boxed Error
This is because Hadoop binaries compiled with an older version and need us to just replace them. I haven't faced any issues with Hadoop by replacing them.
You need to replace netty-3.6.2.Final.jar and netty-all-4.0.23.Final.jar from path $HADOOP_HOME\share\hadoop with netty-all-4.1.17.Final.jar and netty-3.9.9.Final.jar.
This issue plagues due to mismatch of the version that Hadoop and Spark are compiled on for Netty. So you can follow this.
Similar Issue , solved by manually compiling the Spark by using specific version of Netty
And the other one as recommended by Suhas , by copying the content of SPARK_HOME/jars folder to the various lib folder or only the one in yarn inside HADOOP_HOME/share/hadoop solves the problem also. But it's a dirty fix. So maybe use latest version of both or manually compile them.
An older version of Netty was required by the aws-java-sdk. Deleting all the netty jars and removing the aws-java-sdk from the project solved the problem.
Issue has been resolved by adding the below netty jars in the dependencies,
"io.netty" % "netty-all" % "4.1.68.Final"
"io.netty" % "netty-buffer" % "4.1.68.Final"
And excluding all existing netty jars by adding excludeAll code.
val excludeNettyBufferBinding = ExclusionRule(organization = "io.netty.buffer")
excludeAll(excludeNettyBufferBinding)

failure running sparkclr-submit.cmd

I am trying to run a simple spark clr program as local debug mode using VS2012 on my windows environment.
Please find the below steps that i did,
Downloaded v1.6.100 from the following page and extracted to my D drive
https://github.com/Microsoft/Mobius/releases
and in my D drive, the folder looks like this,
D:\SparkClr\spark-clr_2.10-1.6.100
Set up the following environment variables,
SPARK_HOME = D:\SparkClr\spark-clr_2.10-1.6.100\runtime
SPARKCLR_HOME = D:\SparkClr\spark-clr_2.10-1.6.100\runtime
JAVA_HOME = C:\Program Files\Java\jdk1.8.0_92
HADOOP_HOME = D:\HadoopDirectory (winutils.exe is present in D:\HadoopDirectory\bin)
Downloaded sparkclr nuget package
In order to set "CSharpBackendPortNumber" in app.config in my local VS program, I need to run in debug mode as per, https://github.com/Microsoft/Mobius/blob/master/notes/running-mobius-app.md#debug-mode
but when i run 'sparkclr-submit.cmd debug' from D:\SparkClr\spark-clr_2.10-1.6.100\runtime\scripts
I am getting the following exception,
D:\SparkClr\spark-clr_2.10-1.6.100\runtime\scripts>sparkclr-submit.cmd debug
'"D:\SparkClr\spark-clr_2.10-1.6.100\runtime\bin\load-spark-env.cmd"' is not rec
ognized as an internal or external command,
operable program or batch file.
SPARKCLR_JAR=spark-clr_2.10-1.6.100.jar
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq
at org.apache.spark.deploy.csharp.CSharpRunner.main(CSharpRunner.scala)
Caused by: java.lang.ClassNotFoundException: scala.collection.Seq
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
Could you please tell me whether i am missing something?
Thanks
SPARK_HOME environment variable should point to Spark directory. You have it pointing to Mobius directory.

Hadoop HDFS test running issue - org.apache.hadoop.conf.Configuration NoClassDefFoundError

I'm working with Hadoop 0.21.0. and trying to run the hdfs_test application that comes alongside the C API library. After many problems I was able to compile hdfs_test. Now when I'm running it:
./hdfs_test
I'm getting the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:153)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 1 more
Can't construct instance of class org.apache.hadoop.conf.Configuration
Oops! Failed to connect to hdfs!
Any help is appreciated.. thanks
Like any other Java program you need the dependencies in the classpath or inside the jar. Hadoop also has an HADOOP_CLASSPATH to tell the cluster where to find dependencies in map-reduce tasks. Also see How to run a Hadoop program?

Resources