spark error - getting installation eror

spark error - getting installation eror - apache-spark

After installing Spark and running
C:\spark-2.3.1-bin-hadoop2.7\bin>spark-shell
I am getting following error - any advice?
C:\spark-2.3.1-bin-hadoop2.7\bin>spark-shell
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/C:/spark-2.3.1-bin-hadoop2.7/jars/hadoop-auth-2.7.3.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2018-08-05 01:29:36 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Failed to initialize compiler: object java.lang.Object in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programmatically, settings.usejavacp.value = true.
Failed to initialize compiler: object java.lang.Object in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programmatically, settings.usejavacp.value = true.
Exception in thread "main" java.lang.NullPointerException

I think you dont have the right java or scala version.
Please note that Spark 2.3.1 runs on
Java 8+,
Python 2.7+/3.4+ and
R 3.1+.
For the Scala API, Spark 2.3.1 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).
Please check the below 2 things-
1. Check the java version installed on your machine where you are submitting your spark application
sudo update-alternatives --config java
sudo update-alternatives --config javac
check the scala version
scala -version

Related

Could not find or load main class org.codehaus.groovy.tools.GroovyStarter

I have this error coming from groovy -v.
Error: Could not find or load main class org.codehaus.groovy.tools.GroovyStarter
Caused by: java.lang.ClassNotFoundException: org.codehaus.groovy.tools.GroovyStarter
java version = openjdk version "17.0.6" 2023-01-17 LTS
got that from Microsoft https://learn.microsoft.com/en-us/java/openjdk/install
Groovy version 4.0.8
got groovy from jfrog.io
apache-groovy-binary-4.0.8.zip 22-01-23 01:00:24
Have set Java_home and Groovy_home. Both are appended to the path.
C:\Users<Myusername>\dev\languages\apache-groovy-sdk-4.0.8\groovy-4.0.8\bin
groovy was unzipped into this folder.
I was trying to start the use of groovy by checking its version number.

Spark sql errors

I try to work with spark-sql but I had the following errors :
error: missing or invalid dependency detected while loading class file
'package.class'. Could not access term annotation in package
org.apache.spark, because it (or its dependencies) are missing. Check
your build definition for missing or conflicting dependencies. (Re-run
with -Ylog-classpath to see the problematic classpath.) A full
rebuild may help if 'package.class' was compiled against an
incompatible version of org.apache.spark. warning: Class
org.apache.spark.annotation.InterfaceStability not found - continuing
with a stub. error: missing or invalid dependency detected while
loading class file 'SparkSession.class'. Could not access term
annotation in package org.apache.spark, because it (or its
dependencies) are missing. Check your build definition for missing or
conflicting dependencies. (Re-run with -Ylog-classpath to see the
problematic classpath.) A full rebuild may help if
'SparkSession.class' was compiled against an incompatible version of
org.apache.spark.
My configuration :
Scala 2.11.8
Spark-core_2.11-2.1.0
Spark-sql_2.11-2.1.0
Note: I use SparkSession.

After dig into the error message, I know how to solve this kind of errors.
For example:
Error - Symbol 'term org.apache.spark.annotation' is missing... A full rebuild may help if 'SparkSession.class' was compiled against an incompatible version of org.apache.spark
Open SparkSession.class, search "import org.apache.spark.annotation.", you will find import org.apache.spark.annotation.{DeveloperApi, Experimental, InterfaceStability}. It's sure that these classes is missing in classpath. You'll need to find the artifact which conclude these classes.
So open https://search.maven.org and search with c:"DeveloperApi" AND g:"org.apache.spark", you will find the missing artifact is spark-tags as #Prakash answered.
In my situation, just add dependencies spark-catalyst and spark-tags in pom.xml works.
But it's weird that why maven not auto resolve transitive dependencies here?
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.0</version>
<scope>provided</scope>
</dependency>
If I use the above depencency, only spark-core_2.11-2.2.0.jar is in maven dependency; While if I change version to 2.1.0 or 2.3.0, all transitive dependencies will be there.

You need to include following artifacts to avoid the dependency issues.
spark-unsafe_2.11-2.1.1
spark-tags_2.11-2.1.1

Cannot start spark-shell

I am using Spark 1.4.1.
I can use spark-submit without problem.
But when I ran ~/spark/bin/spark-shell
I got the error below
I have configured SPARK_HOME and JAVA_HOME.
However, It was OK with Spark 1.2
15/10/08 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
Exception in thread "main" java.lang.AssertionError: assertion failed: null
at scala.Predef$.assert(Predef.scala:179)
at org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I was having the same problem running spark but I found it was my fault for not configuring scala properly.
Make sure you have Java, Scala and sbt installed and Spark is built:
Edit your .bashrc file
vim .bashrc
Set your env variables:
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export PATH=$JAVA_HOME:$PATH
export SCALA_HOME=/usr/local/src/scala/scala-2.11.5
export PATH=$SCALA_HOME/bin:$PATH
export SPARK_HOME=/usr/local/src/apache/spark.2.0.0/spark
export PATH=$SPARK_HOME/bin:$PATH
Source your settings
. .bashrc
check scala
scala -version
make sure the repl starts
scala
if your repel starts try and start your spark shell again.
./path/to/spark/bin/spark-shell
you should get the spark repl

You could try running
spark-shell -usejavacp
It didn't work for me, but it did work for someone in the descriptions of Spark Issue 18778.

Have you installed scala and sbt?
The log said it didn't find the main class.

Exception in thread "main" java.lang.IllegalStateException: Library directory '/Users/dbl/spark/lib_managed/jars' does not exist

I built Spark 1.6 SNAPSHOT from sources with no issues:
$ mvn3 clean package -DskipTests.
I'm running:
OS X 10.10.5.
Java 1.8
Maven 3.3.3
Spark 1.6 SNAPSHOT
Scala 2.11.7
Zinc 0.3.5.3
Hadoop 3.0 SNAPSHOT
I added the following dependency to my pom.xml (to try to resolve the warning about native libraries):
<dependency>
<groupId>com.googlecode.netlib-java</groupId>
<artifactId>netlib</artifactId>
<version>1.1</version>
</dependency>
Environment variables:
HADOOP_INSTALL=/Users/davidlaxer/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT
HADOOP_CONF_DIR=/Users/davidlaxer/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop
HADOOP_OPTS=-Djava.library.path=/Users/davidlaxer/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/lib/native
CLASSPATH=/users/davidlaxer/trunk/core/src/test/java/:/Users/davidlaxer/hadoop/hadoop-dist/target/hadoop-dist-3.0.0-SNAPSHOT.jar:/Users/davidlaxer/clojure/target:/Users/davidlaxer/hadoop/lib/native:
SPARK_LIBRARY_PATH=/Users/davidlaxer/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/lib/native
When I try to launch spark with: spark-shell I get the following error:
./spark-shell
Exception in thread "main" java.lang.IllegalStateException: Library directory '/Users/davidlaxer/spark/lib_managed/jars' does not exist.
at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:249)
at org.apache.spark.launcher.AbstractCommandBuilder.buildClassPath(AbstractCommandBuilder.java:227)
at org.apache.spark.launcher.AbstractCommandBuilder.buildJavaCommand(AbstractCommandBuilder.java:115)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:196)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:121)
at org.apache.spark.launcher.Main.main(Main.java:86)
I reverted to Spark 1.5 and didn't have the problem:
git clone git://github.com/apache/spark.git -b branch-1.5

error while loading CharSequence (scala 2.11.4)

They suggested me to update scala so I did it:
$ scala -version
Scala code runner version 2.11.4 -- Copyright 2002-2013, LAMP/EPFL
But this error remains:
my_project $ sbt
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
[info] Loading project definition from /home/alex/Documents/projects/android/my_app123/sub_project1/project
[info] Compiling 1 Scala source to /home/alex/Documents/projects/android/my_app123/sub_project1/project/target/scala-2.9.2/sbt-0.12/classes...
[error] error while loading CharSequence, class file '/usr/local/java/jdk1.8.0_05/jre/lib/rt.jar(java/lang/CharSequence.class)' is broken
[error] (bad constant pool tag 15 at byte 1501)
[error] one error found
[error] (compile:compile) Compilation failed
Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q
So the error is the same. And also for some reason it has ...target/scala-2.9.2/sbt-0.12/classes in it path, why is it using scala 2.9.2. I tried deleting the directory target but it appeared again with the same scala 2.9.2.

The version of Scala installed on your system is irrelevant if you use sbt. What matters is that the following setting be present in your build.sbt file:
scalaVersion := "2.11.4"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string