failure running sparkclr-submit.cmd - apache-spark

I am trying to run a simple spark clr program as local debug mode using VS2012 on my windows environment.
Please find the below steps that i did,
Downloaded v1.6.100 from the following page and extracted to my D drive
https://github.com/Microsoft/Mobius/releases
and in my D drive, the folder looks like this,
D:\SparkClr\spark-clr_2.10-1.6.100
Set up the following environment variables,
SPARK_HOME = D:\SparkClr\spark-clr_2.10-1.6.100\runtime
SPARKCLR_HOME = D:\SparkClr\spark-clr_2.10-1.6.100\runtime
JAVA_HOME = C:\Program Files\Java\jdk1.8.0_92
HADOOP_HOME = D:\HadoopDirectory (winutils.exe is present in D:\HadoopDirectory\bin)
Downloaded sparkclr nuget package
In order to set "CSharpBackendPortNumber" in app.config in my local VS program, I need to run in debug mode as per, https://github.com/Microsoft/Mobius/blob/master/notes/running-mobius-app.md#debug-mode
but when i run 'sparkclr-submit.cmd debug' from D:\SparkClr\spark-clr_2.10-1.6.100\runtime\scripts
I am getting the following exception,
D:\SparkClr\spark-clr_2.10-1.6.100\runtime\scripts>sparkclr-submit.cmd debug
'"D:\SparkClr\spark-clr_2.10-1.6.100\runtime\bin\load-spark-env.cmd"' is not rec
ognized as an internal or external command,
operable program or batch file.
SPARKCLR_JAR=spark-clr_2.10-1.6.100.jar
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq
at org.apache.spark.deploy.csharp.CSharpRunner.main(CSharpRunner.scala)
Caused by: java.lang.ClassNotFoundException: scala.collection.Seq
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
Could you please tell me whether i am missing something?
Thanks

SPARK_HOME environment variable should point to Spark directory. You have it pointing to Mobius directory.

Related

Spark2.4.6 without hadoop: A JNI error has occurred

On my windows machine, I am trying to use spark 2.4.6 without hadoop using -
spark-2.4.6-bin-without-hadoop-scala-2.12.tgz
After setting the SPARK_HOME, HADOOP_HOME and also SPARK_DIST_CLASSPATH with information from the post linked here
when i try to start the spark-shell, I get this error -
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
The link referenced above seems and many others point to SPARK_DIST_CLASSPATH , but I already have this in my system variables as -
$HADOOP_HOME;$HADOOP_HOME\etc\hadoop*;$HADOOP_HOME\share\hadoop\common\lib*;$HADOOP_HOME\share\hadoop\common*;$HADOOP_HOME\share\hadoop\hdfs*;$HADOOP_HOME\share\hadoop\hdfs\lib*;$HADOOP_HOME\share\hadoop\hdfs*;$HADOOP_HOME\share\hadoop\yarn\lib*;$HADOOP_HOME\share\hadoop\yarn*;$HADOOP_HOME\share\hadoop\mapreduce\lib*;$HADOOP_HOME\share\hadoop\mapreduce*;$HADOOP_HOME\share\hadoop\tools\lib*;
I also have this line in the spark-env.sh of the spark -
export SPARK_DIST_CLASSPATH=$(C:\opt\spark\hadoop-2.7.3\bin\hadoop classpath)
HADOOP_HOME = C:\opt\spark\hadoop-2.7.3
SPARK_HOME = C:\opt\spark\spark-2.4.6-bin-without-hadoop-scala-2.12
When I tried the spark 2.4.5 that came with hadoop seems to work just fine. This tells me there is something wrong with the way I have my hadoop set up. What am I missing here?
Thanks!
Found a solution here.
Go to your ~/.bashrc
Add export SPARK_DIST_CLASSPATH=$(hadoop classpath)
Apply environment using source ~/.bashrc

Running spark-shell on windows results in unusable shell [duplicate]

I got the following error when starting the spark-shell. I'm going to use Spark to process data in SQL Server. Can I ignore the errors?
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
tl;dr You'd rather not.
Well, it may be possible, but given you've just started your journey to Spark's land the efforts would not pay off.
Windows has never been a developer-friendly OS to me and whenever I teach people Spark and they use Windows I just take it as granted that we'll have to go through the winutils.exe setup but many times also how to work on command line.
Please install winutils.exe as follows:
Run cmd as administrator
Download winutils.exe binary from https://github.com/steveloughran/winutils repository (use hadoop-2.7.1 for Spark 2)
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin
Set HADOOP_HOME to reflect the directory with winutils.exe (without bin), e.g. set HADOOP_HOME=c:\hadoop
Set PATH environment variable to include %HADOOP_HOME%\bin
Create c:\tmp\hive directory
Execute winutils.exe chmod -R 777 \tmp\hive
Open spark-shell and run spark.range(1).show to see a one-row dataset.

spark-shell error on Windows - can it be ignored if not using hadoop?

I got the following error when starting the spark-shell. I'm going to use Spark to process data in SQL Server. Can I ignore the errors?
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
tl;dr You'd rather not.
Well, it may be possible, but given you've just started your journey to Spark's land the efforts would not pay off.
Windows has never been a developer-friendly OS to me and whenever I teach people Spark and they use Windows I just take it as granted that we'll have to go through the winutils.exe setup but many times also how to work on command line.
Please install winutils.exe as follows:
Run cmd as administrator
Download winutils.exe binary from https://github.com/steveloughran/winutils repository (use hadoop-2.7.1 for Spark 2)
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin
Set HADOOP_HOME to reflect the directory with winutils.exe (without bin), e.g. set HADOOP_HOME=c:\hadoop
Set PATH environment variable to include %HADOOP_HOME%\bin
Create c:\tmp\hive directory
Execute winutils.exe chmod -R 777 \tmp\hive
Open spark-shell and run spark.range(1).show to see a one-row dataset.

Getting this error when I try to run jar file that I created : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/log4j/Appender

I am getting an exception when I try to run the jar file that I created for my java application. I am using log4j for logging purpose and I created a custom log that records per cron job transactions.
Then I have written a shell script where I call the jar file. I have put the properties file outside the jar file.
I am running the jar file through a shell script. The command I use is
java -jar app.jar $1
The java application has 2 properties files
1) app.properties
2) sublog4j.properties
The sublog4j properties file has data like this:
log4j.appender.log=package.CustomFileAppender
log4j.appender.log.File=/serverpath/error.log
I have a gut feeling that I am getting error because of package.CustomFileAppender. It is a java file in app.jar but I don't know how to create a custom appender and use it in the external properties file.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/log4j/Appender
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2615)
at java.lang.Class.getMethod0(Class.java:2856)
at java.lang.Class.getMethod(Class.java:1668)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: org.apache.log4j.Appender
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
I was getting this error because I was not building the jar file correctly. So what I did is I created an executable jar file and also copied source code when I created the jar. This is how the issue got resolved.

Hadoop HDFS test running issue - org.apache.hadoop.conf.Configuration NoClassDefFoundError

I'm working with Hadoop 0.21.0. and trying to run the hdfs_test application that comes alongside the C API library. After many problems I was able to compile hdfs_test. Now when I'm running it:
./hdfs_test
I'm getting the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:153)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 1 more
Can't construct instance of class org.apache.hadoop.conf.Configuration
Oops! Failed to connect to hdfs!
Any help is appreciated.. thanks
Like any other Java program you need the dependencies in the classpath or inside the jar. Hadoop also has an HADOOP_CLASSPATH to tell the cluster where to find dependencies in map-reduce tasks. Also see How to run a Hadoop program?

Resources