Cannot start spark-shell - apache-spark

I am using Spark 1.4.1.
I can use spark-submit without problem.
But when I ran ~/spark/bin/spark-shell
I got the error below
I have configured SPARK_HOME and JAVA_HOME.
However, It was OK with Spark 1.2
15/10/08 02:40:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
Exception in thread "main" java.lang.AssertionError: assertion failed: null
at scala.Predef$.assert(Predef.scala:179)
at org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I was having the same problem running spark but I found it was my fault for not configuring scala properly.
Make sure you have Java, Scala and sbt installed and Spark is built:
Edit your .bashrc file
vim .bashrc
Set your env variables:
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export PATH=$JAVA_HOME:$PATH
export SCALA_HOME=/usr/local/src/scala/scala-2.11.5
export PATH=$SCALA_HOME/bin:$PATH
export SPARK_HOME=/usr/local/src/apache/spark.2.0.0/spark
export PATH=$SPARK_HOME/bin:$PATH
Source your settings
. .bashrc
check scala
scala -version
make sure the repl starts
scala
if your repel starts try and start your spark shell again.
./path/to/spark/bin/spark-shell
you should get the spark repl

You could try running
spark-shell -usejavacp
It didn't work for me, but it did work for someone in the descriptions of Spark Issue 18778.

Have you installed scala and sbt?
The log said it didn't find the main class.

Related

Cassandra with PySpark and Python >=3.6

I'm new to Cassandra and Pyspark, initially I installed cassandra version 3.11.1, openjdk 1.8, pyspark 3.x and scala 1.12. I was getting a lot of errors as shown below after running my python server.
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o33.load.
: java.lang.NoClassDefFoundError: scala/Product$class
at com.datastax.spark.connector.util.ConfigParameter.<init>(ConfigParameter.scala:7)
at com.datastax.spark.connector.rdd.ReadConf$.<init>(ReadConf.scala:33)
at com.datastax.spark.connector.rdd.ReadConf$.<clinit>(ReadConf.scala)
at org.apache.spark.sql.cassandra.DefaultSource$.<init>(DefaultSource.scala:134)
at org.apache.spark.sql.cassandra.DefaultSource$.<clinit>(DefaultSource.scala)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:55)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: scala.Product$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 23 more
I didn't know what exactly this error is but after some research I realized that the pyspark Cassandra connection is having some issues. Then I checked the versions also. During my research I saw that Cassandra versions other than 4.x is not compatible with Python3.9. I uninstalled Cassandra and tried to install cassandra4 distribution but that is throwing me another set of errors after running the command:
wget http://mirror.cogentco.com/pub/apache/cassandra/4.0-beta2/apache-cassandra-4.0-beta2-bin.tar.gz
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
cassandra : Depends: python3 (>= 3.6) but 3.5.1-3 is to be installed
Recommends: ntp but it is not going to be installed or
time-daemon
E: Unable to correct problems, you have held broken packages.
Can someone help me understand the issue? How can I install Cassandra and Pyspark along with Python3.9. Is there any version incompatibility here?
updating question based on answer
I have updated my versions on another machine:
Currently, I'm using the following versions: Pyspark 3.0.1 Cassandra:4.0 cqlsh:5.0.1 python:3.6 Scala:2.12
I tried using the connector 3.0.0 as well as 3.1.0 both are giving me errors:
UNRESOLVED DEPENDENCY: com.datastax.spark#spark-cassandra-connector_2.12;3.0.0: not found
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.datastax.spark#spark-cassandra-connector_2.12;3.0.0: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1389)
at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:308)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
.......
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
Connection string used: --packages com.datastax.spark:spark-cassandra-connector_2.12:3.0.0 --conf spark.cassandra.connection.host=127.0.0.1 pyspark-shell as pyspark version is 3.0.1 now.
You are using wrong version of Cassandra connector - if you are using pyspark 3.x, the you need to get corresponding version - 3.0 or 3.1. Your version is for old versions of Spark:
pyspark --packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0
P.S. Cassandra 4.0 is also released already - it makes no sense use beta2

Can't do dataframe show() or use sql :Zeppelin

I'm getting a dependency issue every time I do a spark/pyspark df.show() and if I run an sql query in their respective interpreters in Zeppelin (0.9.0). This is the error that I get,
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3389)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2550)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2764)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:291)
at org.apache.spark.sql.Dataset.show(Dataset.scala:751)
at org.apache.spark.sql.Dataset.show(Dataset.scala:710)
at org.apache.spark.sql.Dataset.show(Dataset.scala:719)
... 45 elided
I looked into this here and tried all changes that were asked to be made in the previously asked questions regarding the same issue.
Here are the list of things that I tried:
excluding jackson-databind jar in spark interpreter
deleting jackson-databind jar from zeppelin lib
deleting jackson-databind, jackson-annotations and jackson-core jars from zeppelin lib
replacing the jar with the conflicting version in spark and vise-versa
Tried changing the versions of spark - 2.4.0, 2.4.4, 2.4.5
Build the project from source
I still couldn't resolve this issue. Does anyone have an idea regarding how to go about this?

How to use spark-avro package to read avro file from spark-shell?

I'm trying to use the spark-avro package as described in Apache Avro Data Source Guide.
When I submit the following command:
val df = spark.read.format("avro").load("~/foo.avro")
I get an error:
java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
... 49 elided
Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.FileFormat.$init$(Lorg/apache/spark/sql/execution/datasources/FileFormat;)V
at org.apache.spark.sql.avro.AvroFileFormat.<init>(AvroFileFormat.scala:44)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 62 more
I've tried different versions of the org.apache.spark:spark-avro_2.12:2.4.0 package (2.4.0, 2.4.1, and 2.4.2), and I currently use Spark 2.4.1, but neither worked.
I start my spark-shell with the following command:
spark-shell --packages org.apache.spark:spark-avro_2.12:2.4.0
tl;dr Since Spark 2.4.x+ provides built-in support for reading and writing Apache Avro data, but the spark-avro module is external and not included in spark-submit or spark-shell by default, you should make sure that you use the same Scala version (ex. 2.12) for the spark-shell and --packages.
The reason for the exception is that you use spark-shell that is from Spark built against Scala 2.11.12 while --packages specifies a dependency with Scala 2.12 (in org.apache.spark:spark-avro_2.12:2.4.0).
Use --packages org.apache.spark:spark-avro_2.11:2.4.0 and you should be fine.
just incase if some one is interested for pyspark 2.7 and spark 2.4.3
below package works
bin/pyspark --packages org.apache.spark:spark-avro_2.11:2.4.3
One more thing I noticed when I had the same issue is that it runs fine for the first time and shows the error thereafter. So clear the cache by adding rm command to the docker file. That was sufficient in my case.

How to use Apache Spark in Arch?

I want to resolve the error when I run spark-shell
I am on arch installing apache spark from the AUR
When I run the command spark-shell I get the following error
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
I also installed the slf4j from aur and it does not help either. how can run spark on arch
It seems like you have a problem regarding Java which I canno't help you with...
Did you do the correct configuration for Spark ?
cd /etc/profile.d
sudo nano apache-spark.sh
Modifiy : SPARK_HOME=/opt/apache-spark by SPARK_HOME=/opt/apache-spark/bin
Worked for me

Javafx error on Ubuntu while using netbeans

When trying to run a basic javafx application on ubuntu linux, I am seeing the following error. The error appears while using either the commandline or netbeans to run the application.
Exception in thread "main" java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: Can't load library: /home/venkat/.m2/repository/com/oracle/javafx/javafx/2.1.0-beta/i386/libglass.so
at com.sun.javafx.tk.quantum.QuantumToolkit.startup(QuantumToolkit.java:277)
at com.sun.javafx.application.PlatformImpl.startup(PlatformImpl.java:90)
at com.sun.javafx.application.LauncherImpl.launchApplication1(LauncherImpl.java:163)
at com.sun.javafx.application.LauncherImpl.access$000(LauncherImpl.java:47)
at com.sun.javafx.application.LauncherImpl$1.run(LauncherImpl.java:115)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.UnsatisfiedLinkError: Can't load library: /home/venkat/.m2/repository/com/oracle/javafx/javafx/2.1.0-beta/i386/libglass.so
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1828)
at java.lang.Runtime.load0(Runtime.java:792)
at java.lang.System.load(System.java:1059)
at com.sun.glass.utils.NativeLibLoader.loadLibraryFullPath(NativeLibLoader.java:143)
at com.sun.glass.utils.NativeLibLoader.loadLibraryInternal(NativeLibLoader.java:56)
at com.sun.glass.utils.NativeLibLoader.loadLibrary(NativeLibLoader.java:31)
at com.sun.glass.ui.Application$1.run(Application.java:75)
at java.security.AccessController.doPrivileged(Native Method)
at com.sun.glass.ui.Application.loadNativeLibrary(Application.java:73)
at com.sun.glass.ui.Application.loadNativeLibrary(Application.java:85)
at com.sun.glass.ui.gtk.GtkPlatformFactory.<clinit>(GtkPlatformFactory.java:23)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at com.sun.glass.ui.PlatformFactory.getPlatformFactory(PlatformFactory.java:20)
at com.sun.glass.ui.Application.Run(Application.java:108)
at com.sun.javafx.tk.quantum.QuantumToolkit.startup(QuantumToolkit.java:267)
... 5 more
This exception is caused because maven is unable to find the native libraries on the classpath.
Netbeans
You can solve the problem by adding JVM argument inside the run profile.
-Djava.library.path=/home/venkat/Programs/javafx/2.1.0-beta/rt/lib/i386/
Image
Command line
If you face the same problem while trying to run the app on the commandline, the following exports should fix it.
export JAVAFX_HOME=/home/venkat/Programs/javafx/2.1.0-beta
export CLASSPATH=\$JAVAFX_HOME/rt/lib/jfxrt.jar
export LD_LIBRARY_PATH=/home/venkat/Programs/javafx/2.1.0-beta/rt/lib/i386/
The first two env variables above fix a classpath problem which is seen when javafx runtime cannot be found.

Resources