Running spark-shell on windows results in unusable shell [duplicate] - apache-spark

I got the following error when starting the spark-shell. I'm going to use Spark to process data in SQL Server. Can I ignore the errors?
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive

tl;dr You'd rather not.
Well, it may be possible, but given you've just started your journey to Spark's land the efforts would not pay off.
Windows has never been a developer-friendly OS to me and whenever I teach people Spark and they use Windows I just take it as granted that we'll have to go through the winutils.exe setup but many times also how to work on command line.
Please install winutils.exe as follows:
Run cmd as administrator
Download winutils.exe binary from https://github.com/steveloughran/winutils repository (use hadoop-2.7.1 for Spark 2)
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin
Set HADOOP_HOME to reflect the directory with winutils.exe (without bin), e.g. set HADOOP_HOME=c:\hadoop
Set PATH environment variable to include %HADOOP_HOME%\bin
Create c:\tmp\hive directory
Execute winutils.exe chmod -R 777 \tmp\hive
Open spark-shell and run spark.range(1).show to see a one-row dataset.

Related

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.javax.ws.rs.core.NoContentException

I've setup a 3-node cluster (1-master & 2-workers) of Hadoop with Yarn along with Spark.
My Pyspark scripts need org.elasticsearch.spark in order to write to Elasticsearch. I'm providing this with parameter --packages org.elasticsearch:elasticsearch-spark-30_2.12:8.4.1 while executing my Pyspark script , that is while executing using spark-submit .
Stuck with this error :
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/javax/ws/rs/core/NoContentException
at org.apache.hadoop.yarn.util.timeline.TimelineUtils.<clinit>(TimelineUtils.java:60)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:200)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:191)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1327)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.javax.ws.rs.core.NoContentException
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 13 more
What have I tried :
I have tried to add all the paths listed on this answer - https://stackoverflow.com/a/25393369/6490744 - doesn't work.
I had Hadoop-3.1.1, after checking https://github.com/apache/incubator-kyuubi/issues/2904 (they've mentioned that the issue is resolved in Hadoop 3.3.3) I have upgraded to 3.3.3. But the issue still persists.
I have also tried by manually downloading the jar to my spark/jars directory using wget -U "Any User Agent" https://repo1.maven.org/maven2/org/elasticsearch/elasticsearch-spark-30_2.12/8.4.1/elasticsearch-spark-30_2.12-8.4.1.jar => after downloading, tried to do spark-submit without passing --packages (since I have the jar in path).
All of this has been giving me the same error
After 2 hours of struggle, got the clue from - https://github.com/apache/incubator-kyuubi/issues/2904#issuecomment-1158643036 :
I had yarn.timeline-service.enabled set to true in my /etc/hadoop/yarn-site.xml - updated to false , now the error is gone.
Wonder how to setup the yarn-timeline-server now

Why spark app with deploy-mode local use "ls -F" on windows? [duplicate]

I got the following error when starting the spark-shell. I'm going to use Spark to process data in SQL Server. Can I ignore the errors?
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
tl;dr You'd rather not.
Well, it may be possible, but given you've just started your journey to Spark's land the efforts would not pay off.
Windows has never been a developer-friendly OS to me and whenever I teach people Spark and they use Windows I just take it as granted that we'll have to go through the winutils.exe setup but many times also how to work on command line.
Please install winutils.exe as follows:
Run cmd as administrator
Download winutils.exe binary from https://github.com/steveloughran/winutils repository (use hadoop-2.7.1 for Spark 2)
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin
Set HADOOP_HOME to reflect the directory with winutils.exe (without bin), e.g. set HADOOP_HOME=c:\hadoop
Set PATH environment variable to include %HADOOP_HOME%\bin
Create c:\tmp\hive directory
Execute winutils.exe chmod -R 777 \tmp\hive
Open spark-shell and run spark.range(1).show to see a one-row dataset.

spark-shell error on Windows - can it be ignored if not using hadoop?

I got the following error when starting the spark-shell. I'm going to use Spark to process data in SQL Server. Can I ignore the errors?
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: (null) entry in command string: null ls -F C:\tmp\hive
tl;dr You'd rather not.
Well, it may be possible, but given you've just started your journey to Spark's land the efforts would not pay off.
Windows has never been a developer-friendly OS to me and whenever I teach people Spark and they use Windows I just take it as granted that we'll have to go through the winutils.exe setup but many times also how to work on command line.
Please install winutils.exe as follows:
Run cmd as administrator
Download winutils.exe binary from https://github.com/steveloughran/winutils repository (use hadoop-2.7.1 for Spark 2)
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin
Set HADOOP_HOME to reflect the directory with winutils.exe (without bin), e.g. set HADOOP_HOME=c:\hadoop
Set PATH environment variable to include %HADOOP_HOME%\bin
Create c:\tmp\hive directory
Execute winutils.exe chmod -R 777 \tmp\hive
Open spark-shell and run spark.range(1).show to see a one-row dataset.

Accumulo not getting initialised.

I am trying to initialise accumulo. I am configuring accumulo on hadoop2.0.0-cdh4.4.0.
I am making using tars on a MAC book.
I am getting an error when initialising accumulo : bin/accumulo init.
java.io.IOException: Mkdirs failed to create /accumulo/instance_id error.
The log says:
2014-05-24 01:24:33,935 [util.Initialize] FATAL: Failed to initialize filesystem
java.io.IOException: Mkdirs failed to create /accumulo/instance_id
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:447)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:867)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:829)
at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1129)
at org.apache.accumulo.server.util.Initialize.initFileSystem(Initialize.java:269)
at org.apache.accumulo.server.util.Initialize.initialize(Initialize.java:213)
at org.apache.accumulo.server.util.Initialize.doInit(Initialize.java:199)
at org.apache.accumulo.server.util.Initialize.main(Initialize.java:545)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.accumulo.start.Main$1.run(Main.java:103)
at java.lang.Thread.run(Thread.java:744)
2014-05-24 01:24:33,937 [conf.Configuration] WARN : fs.default.name is deprecated. Instead, use fs.defaultFS
2014-05-24 01:24:33,937 [util.Initialize] FATAL: Default filesystem value ('fs.defaultFS' or 'fs.default.name') was found in the Hadoop configuration
2014-05-24 01:24:33,938 [util.Initialize] FATAL: Please ensure that the Hadoop core-site.xml is on the classpath using 'general.classpaths' in accumulo-site.xml
Please suggest me , I tried to fix this by creating the /accumulo, /user/accumulo on hdfs and gave 777 permissions also.
The root cause is that the Hadoop jars and configuration are not being placed on Accumulo's classpath. I'm not familiar with how Cloudera packages their Hadoop artifacts.
If you notice in your stack trace, it lists out the ChecksumFileSystem class instead of the DistributedFileSystem. This means that Accumulo doesn't know about the HDFS instance you're trying to write to and is falling back to using the local file system (that's what the ChecksumFileSystem is doing).
To fix this, check a couple of things in your Accumulo configuration files. First, make sure that you have correctly defined HADOOP_PREFIX and HADOOP_CONF_DIR in accumulo-env.sh. Second, make sure that the value you have configured for general.classpaths in accumulo-site.xml all exist, specifically the ones that reference HADOOP_PREFIX and HADOOP_CONF_DIR.

Pig Cassandra cluster ClassNotFoundException: org.apache.cassandra.hadoop.ColumnFamilySplit

I am trying to run Cassandra-0.8.5, Hadoop 0.2.0 and Pig 0.8.1. I run a very simple pig scripts as
rows = LOAD 'cassandra://pygmalion/$CF' USING CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
counted = foreach (group rows all) generate COUNT($1);
dump counted;
it worked well if i run local mode. But if I run mapreduce mode, i alway get error message as below, i run out of idea. any help or hind will be greatly appreciated.
pig_813399.log:
Backend error message
java.io.IOException: java.lang.ClassNotFoundException: org.apache.cassandra.hadoop.ColumnFamilySplit
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:225)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:349)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:611)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.ClassNotFoundException: org.apache.c
Pig Stack Trace
ERROR 2997: Unable to recreate exception from backed error: java.io.IOException: java.lang.ClassNotFoundException: org.apache.cassandra.hadoop.ColumnFamilySplit
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias counted. Backend error : Unable to recreate exception from backed error: java.io.IOException: java.lang.ClassNotFoundException: org.apache.cassandra.hadoop.ColumnFamilySplit
at org.apache.pig.PigServer.openIterator(PigServer.java:753)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:615)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at org.apache.pig.tools.grunt.GruntParser.loadScript(GruntParser.java:477)
at org.apache.pig.tools.grunt.GruntParser.processScript(GruntParser.java:422)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.Script(PigScriptParser.java:692)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:425)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
at org.apache.pig.Main.run(Main.java:455)
at org.apache.pig.Main.main(Main.java:107)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.io.IOExcepti
on: java.lang.ClassNotFoundException: org.apache.cassandra.hadoop.ColumnFamilySplit
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:382)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1209)
at org.apache.pig.PigServer.storeEx(PigServer.java:885)
at org.apache.pig.PigServer.store(PigServer.java:827)
at org.apache.pig.PigServer.openIterator(PigServer.java:739)
It sounds like you need to update the HADOOP_CLASSPATH environment variable to include the cassandra jars. You will need to restart the hadoop processes after you do.
export HADOOP_CLASSPATH=/path/to/cassandra/lib/*:$HADOOP_CLASSPATH
See http://wiki.apache.org/cassandra/HadoopSupport for other possible problems with running hadoop/pig against cassandra.

Resources