I am getting this error while trying to write txt file to local path in windows.
Error:
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
Spark , hadoop versions : spark-3.0.3-bin-hadoop2.7.
winutils is placed in C:\winutils\bin
hadoop.dll is placed in C:\winutils\bin and c:\System32
Environment Variables set
HADOOP_HOME C:\winutils
Path %HADOOP_HOME%\bin
Tried restarting
I found below cause and solution:
Root Cause:
Gradle dependency was with higher version of spark. I have spark 3.0.3 installed but here it was 3.2.0
implementation 'org.apache.spark:spark-core_2.13:3.2.0'
Fix:
Replaced with
implementation 'org.apache.spark:spark-core_2.12:3.0.3'
i'm trying to learn a big data online course and came across the problem while installing apache spark.
i've done everything correctly but when i try to run spark-submit it seems that there is an issue with java i guess.
when i run this:
(base) C:\SparkCourse>spark-submit ratings-counter.py
i get this error:
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:54)
at org.apache.spark.internal.config.package$.<init>(package.scala:1095)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$3(SparkSubmitArguments.scala:157)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:157)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:115)
at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:1022)
at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:1022)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module #5b94b04d
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
at java.base/java.lang.reflect.Constructor.checkCanSetAccessible(Constructor.java:188)
at java.base/java.lang.reflect.Constructor.setAccessible(Constructor.java:181)
at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:56)
... 13 more
any ideas?
Cheers!
I reinstalled windows and started everything from scratch.
Installed jdk version 8, and this version of spark "spark-3.0.3-bin-hadoop2.7.tgz". Indicated all the paths correctly and. It worked as i can open pyspark shell and do spark-submit for example, but there still is a lot of text in the cmd that i can't get rid of.
I'm trying to create a Windows 10 developer vm with a Conda environment and PySpark but seeing constant problems with getting Spark & winutils to work.
Environment:
Windows 10 19042 (Fully Patched)
MiniConda
PySaprk 3.1.1
Java 11.0.11
I have created C:\Hadoop\bin and downloaded winutils from here https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.1/bin (I've also tried 3.2.0).
HADOOP_HOME is C:\Hadoop and Path has %HADOOP_HOME\bin in it. JAVA_HOME is correct.
This code works:
location = 'C:/myfiles/file.csv'
df = spark.read.format("csv").options(header=True).load(location)
This code fails:
location = 'C:/myfiles/'
df = spark.read.format("csv").options(header=True).load(location)
Error message:
An error occurred while calling o35.load.
: java.lang.UnsatisfiedLinkError: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)'
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:645)
at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1230)
at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1435)
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:493)
Winutils is being picked up because if I delete it the first example above then breaks as well in the expected way.
It seems that winutils is incompatible with Spark 3.1.1 and specifically a folder of files? I find it hard to believe that.
Bizarrely though I have another machine with pySpark 3.1.1 and this version of winutils and it works! Same Java version as well. I'm lost - I've even copied the winutils files from teh working machine to this one and it still didn't work.
Can anyone guide me on what that error means at least to help me understand where the issue could be?
In my case, I was able to resolve the issue by adding the hadoop.dll from the same link as provided in the question above(here) to the same location as winutils. Just keep a check there is no version mismatch.
I've downloaded a prebuilt version of Spark on my mac (OS Mavericks), but when I try to open an interactive shell, typing bin/pyspark, I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/launcher/Main
Caused by: java.lang.ClassNotFoundException: org.apache.spark.launcher.Main
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
I have googled every part of the error and checkout out some other stack overflow threads, but I can't find anything that addresses this error. Any idea what's going on/how to fix it?
One idea I have is that scala is a dependency that I need to download separately...but I really don't know.
I had the same issue before, and it turned out to be a permission issue and I'm under user who has no access to the spark files (root downloaded spark).
Another possibility is, you downloaded the source code and did not build the project from source code :P
Hope it helps.
I've downloaded the prebuild version of spark 1.4.0 without hadoop (with user-provided Haddop). When I ran the spark-shell command, I got this error:
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/
FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa
rkProperties$1.apply(SparkSubmitArguments.scala:111)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa
rkProperties$1.apply(SparkSubmitArguments.scala:111)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkPropert
ies(SparkSubmitArguments.scala:111)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArgume
nts.scala:97)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:106)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStr
eam
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 7 more
I've searched on Internet, it is said that HADOOP_HOME has not been set yet in spark-env.cmd. But I cannot find spark-env.cmd in the spark installation folder.
I've traced the spark-shell command and it seems that there are no HADOOP_CONFIG in there. I've tried to add the HADOOP_HOME on environment variable but it still give the same exception.
Actually I don't really using the hadoop. I downloaded hadoop as a workaround as suggested in this question
I am using windows 8 and scala 2.10.
Any help will be appreciated. Thanks.
The "without Hadoop" in the Spark's build name is misleading: it means the build is not tied to a specific Hadoop distribution, not that it is meant to run without it: the user should indicate where to find Hadoop (see https://spark.apache.org/docs/latest/hadoop-provided.html)
One clean way to fix this issue is to:
Obtain Hadoop Windows binaries. Ideally build them, but this is painful (for some hints see: Hadoop on Windows Building/ Installation Error). Otherwise Google some up, for instance currently you can download 2.6.0 from here: http://www.barik.net/archive/2015/01/19/172716/
Create a spark-env.cmd file looking like this (modify Hadoop path to match your installation):
#echo off
set HADOOP_HOME=D:\Utils\hadoop-2.7.1
set PATH=%HADOOP_HOME%\bin;%PATH%
set SPARK_DIST_CLASSPATH=<paste here the output of %HADOOP_HOME%\bin\hadoop classpath>
Put this spark-env.cmd either in a conf folder located at the same level as your Spark base folder (which may look weird), or in a folder indicated by the SPARK_CONF_DIR environment variable.
I had the same problem, in fact it's mentioned on the Getting started page of Spark how to handle it:
### in conf/spark-env.sh ###
# If 'hadoop' binary is on your PATH
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
# With explicit path to 'hadoop' binary
export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)
# Passing a Hadoop configuration directory
export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)
If you want to use your own hadoop follow one of the 3 options, copy and paste it into spark-env.sh file :
1- if you have the hadoop on your PATH
2- you want to show hadoop binary explicitly
3- you can also show hadoop configuration folder
http://spark.apache.org/docs/latest/hadoop-provided.html
I too had the issue,
export SPARK_DIST_CLASSPATH=`hadoop classpath`
resolved the issue.
I ran into the same error when trying to get familiar with spark. My understanding of the error message is that while spark doesn't need a hadoop cluster to run, it does need some of the hadoop classes. Since I was just playing around with spark and didn't care what version of hadoop libraries are used, I just downloaded a spark binary pre-built with a version of hadoop (2.6) and things started working fine.
linux
ENV SPARK_DIST_CLASSPATH="$HADOOP_HOME/etc/hadoop/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/hdfs/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/tools/lib/*"
windows
set SPARK_DIST_CLASSPATH=%HADOOP_HOME%\etc\hadoop\*;%HADOOP_HOME%\share\hadoop\common\lib\*;%HADOOP_HOME%\share\hadoop\common\*;%HADOOP_HOME%\share\hadoop\hdfs\*;%HADOOP_HOME%\share\hadoop\hdfs\lib\*;%HADOOP_HOME%\share\hadoop\hdfs\*;%HADOOP_HOME%\share\hadoop\yarn\lib\*;%HADOOP_HOME%\share\hadoop\yarn\*;%HADOOP_HOME%\share\hadoop\mapreduce\lib\*;%HADOOP_HOME%\share\hadoop\mapreduce\*;%HADOOP_HOME%\share\hadoop\tools\lib\*
Enter into SPARK_HOME -> conf
copy spark-env.sh.template file and rename it to spark-env.sh
Inside this file you can set the parameters for spark.
Run below from your package dir just before running spark-submit -
export SPARK_DIST_CLASSPATH=`hadoop classpath`
I finally find a solution to remove the exception.
In spark-class2.cmd, add :
set HADOOP_CLASS1=%HADOOP_HOME%\share\hadoop\common\*
set HADOOP_CLASS2=%HADOOP_HOME%\share\hadoop\common\lib\*
set HADOOP_CLASS3=%HADOOP_HOME%\share\hadoop\mapreduce\*
set HADOOP_CLASS4=%HADOOP_HOME%\share\hadoop\mapreduce\lib\*
set HADOOP_CLASS5=%HADOOP_HOME%\share\hadoop\yarn\*
set HADOOP_CLASS6=%HADOOP_HOME%\share\hadoop\yarn\lib\*
set HADOOP_CLASS7=%HADOOP_HOME%\share\hadoop\hdfs\*
set HADOOP_CLASS8=%HADOOP_HOME%\share\hadoop\hdfs\lib\*
set CLASSPATH=%HADOOP_CLASS1%;%HADOOP_CLASS2%;%HADOOP_CLASS3%;%HADOOP_CLASS4%;%HADOOP_CLASS5%;%HADOOP_CLASS6%;%HADOOP_CLASS7%;%HADOOP_CLASS8%;%LAUNCH_CLASSPATH%
Then, change :
"%RUNNER%" -cp %CLASSPATH%;%LAUNCH_CLASSPATH% org.apache.spark.launcher.Main %* > %LAUNCHER_OUTPUT%
to :
"%RUNNER%" -Dhadoop.home.dir=*hadoop-installation-folder* -cp %CLASSPATH% %JAVA_OPTS% %*
It works fine with me, but I'm not sure this is the best solution.
You should add these jars in you code:
common-cli-1.2.jar
hadoop-common-2.7.2.jar
Thank you so much. That worked great, but I had to add the spark jars to the classpath as well:
;c:\spark\lib*
Also, the last line of the cmd file is missing the word "echo"; so it should say:
echo %SPARK_CMD%
I had the same issue ....Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/
FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa
rkProperties$1.apply(SparkSubmitArguments.scala:111)...
Then I realized that I had installed the spark version without hadoop. I installed the "with-hadoop" version the problem went away.
for my case
running spark job locally differs from running it on cluster. on cluster you might have a different dependency/context to follow. so essentially in your pom.xml you might have dependencies declared as provided.
when running locally, you don't need these provided dependencies. just uncomment them and rebuild again.
I encountered the same error. I wanted to install spark on my windows PC and therefore downloaded the without hadoop version of spark, but turns out you need the hadoop libraries! so download any hadoop spark version and set the environment variables.
I got this error because the file was copied from Windows.
Resolve it using
dos2unix file_name
I think you need spark-core dependency of maven. It worked fine for me.
I used:
export SPARK_HOME=/opt/cloudera/parcels/SPARK2/lib/spark2
export HADOOP_MAPRED_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce
It's work for me!
I added hadoop-client-runtime-3.3.2.jar to my user library.