Will zeppelin 0.6.0 work with Spark 1.4.1? - apache-spark

I have installed zeppelin 0.6.0 on my cluster which has spark 1.4.1 (HDP 2.3). As per the release notes I see that it supports spark 1.6 but not sure if it is backward compatible.
When I try to run sc.version in the notebook, I can see that spark job is submitted in yarn but it is failing right away with the following error in application log Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher
My SPARK_HOME path is correct. So zeroing on the incompatibility issue
export MASTER=yarn-client
export SPARK_YARN_JAR=/usr/hdp/current/spark-client/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar
export SPARK_HOME=/usr/hdp/current/spark-client

Finally found solution for this. Zeppelin 0.6 works with Spark 1.4.1. This was happening due to interpreter. The logs zeppelin-zeppelin-servername.log and .out files were helpful to resolve this. I had added highcharts artifact into the spark interpreter and it was not able to find that jar file. After providing correct path, I was able to run highcharts and resolve this issue as well.

Related

Spark Error: illegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem

I'm a newbie with this so be a little patience please.
I'm running a Spark Job to write some stuff into hbase and I get this error:
2022-06-22 12:45:22:901 | ERROR | Caused by: java.lang.IllegalAccessError:
class org.apache.hadoop.hdfs.web.HftpFileSystem
cannot access its superinterface
org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator
I read Error through remote Spark Job: java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem and since I'm using gradle instead of maven I tried to exclude the class org.apache.hadoop.hdfs.web.HftpFileSystem like this...
compileOnly ("org.apache.spark:spark-core_$scala_major:$spark_version"){
exclude group: "org.apache.hadoop", module: "hadoop-client"
}
Compilations works fine but execution fails exactly in the same way.
These are my versions:
spark_version = 2.4.7
hadoop_version = 3.1.1
All I read is about conflicts between spark and haddop so.
How can I fix this? All I have is exclude the class from spark core dependency and add the rigth version of haddop dependency.
Where I can find some reference about what versions are compatible? (To set the rigth version of haddop lib )
Can this be solved by changing something into the cluster by the infra guys?
I am not sure if I understood the issue correctly.
Thanks.

Apache Spark Error Could not find or load main class C:\spark\jars\aircompressor-0.8.jar

I am new to Apache spark & recently installed it, but I got an error:
**Error: Could not find or load main class C:\spark\jars\aircompressor-0.8.jar**
I checked that file it present there, I set up environment variable and all stuff which is necessary to successfully run spark.
I ran into this and solved it by updating my installation of java here: https://www.java.com/en/download/.
I upgraded to version 8, Update 201

Jar conflicts between apache spark and hadoop

I try to setup and run a Spark cluster running on top of YARN and using HDFS.
I first set up Hadoop for HDFS using hadoop-3.1.0.
Then I configured YARN and started both.
I was able to upload data to the HDFS and yarn also seems to work fine.
Then I installed spark-2.3.0-bin-without-hadoop on my master only and tried to submit an application.
Since it is spark without Hadoop I had to modify spark-env.sh, adding the following line like mentioned in the documentation:
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
Using only this line I got the following exception:
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
Which I guess means that he does not find the Spark-libraries. So I added the spark jars to the classpath:
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath):/usr/local/spark/jars/*
But now I get the following Exception:
com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.7.8
As it turns out, Hadoop 3.1.0 provides Jackson 2.7.8 while Spark 2.3.0 provides Jackson 2.6.7. As I see it, both are now in the classpath resulting in a conflict.
Since it seems I really need both the Hadoop and Spark libraries to submit anything, I do not know how to get around that problem.
How is Hadoop-3.0.0 's compatibility with older versions of Hive, Pig, Sqoop and Spark
there was answer from #JacekLaskowski that spark is not supported on hadoop 3. As far as I know, nothing changed for last 6 month in that area.

(Zeppelin + Livy) SparkUI.appUIAddress(), something must be wrong

I'm trying to configure livy with Zeppelin following this docs:
https://zeppelin.apache.org/docs/0.7.3/interpreter/livy.html
However when I run:
%livy.spark
sc.version
I got the following error:
java.lang.RuntimeException: No result can be extracted from 'java.lang.NoSuchMethodException: org.apache.spark.ui.SparkUI.appUIAddress()', something must be wrong
I use Zeppelin 0.7.3, Spark 2.2.1, and Livy 0.4.0.
Spark running on YARN (hadoop 2.9.0). This is vanilla install, I don't use distribution like cloudera/HDP. All these software runs on one server.
I can run example org.apache.spark.examples.SparkPi in spark-shell with --master yarn without any problem. So I confirm that spark is running well on YARN.
Any help would be appreciated.
Thanks,
yusata.
This problem results from a method depreciation in spark 2.2.The appUiAddress no longer exist in spark 2.2.
As you can see in this PR https://github.com/apache/zeppelin/pull/2231.
This issue is already solved.
Somehow you still encounter the problem. I think either downgrade Spark or use a newer version of Zeppelin could solve the problem.

how to use graphframes inside SPARK on HDInsight cluster

I have setup an SPARK cluster on HDInsight and was am trying to use GraphFrames using this tutorial.
I have already used the custom scripts during the cluster creation to enable the GraphX on the spark cluster as described here.
When I am running the notepad,
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.graphframes._
i get the following error
<console>:45: error: object graphframes is not a member of package org
import org.graphframes._
^
I tried to install the graphframes from the spark terminal via Jupyter using the following command:
$SPARK_HOME/bin/spark-shell --packages graphframes:graphframes:0.1.0-spark1.5
but Still I am unable to get it working. I am new to Spark and HDInsight so can someone please point out what else I need to install on this cluster to get this working.
Today, this works in spark-shell, but doesn't work in jupyter notebook. So when you run this:
$SPARK_HOME/bin/spark-shell --packages graphframes:graphframes:0.1.0-spark1.5
It works (at least on spark 1.6 cluster version) in the context of this spark-shell session.
But in jupyter there is currently no way to load packages. This feature is going to be added soon to jupyter notebooks in the clusters. In the meantime you can use spark-shell, or spark-submit, etc.
Once you upload or import graphframes libraries from Maven repository, you need to restart your cluster so as to attach the library.
So it works for me.

Resources