Livy and Elasticsearch-Spark: Multiple ES-Hadoop versions detected - apache-spark

I'm trying to read from elasticsearch in a livy job using the elastisearch-spark jar. When I upload the jar to a livy client(like the example here) I get this error and I'm not sure how to parse it.
Caused by: java.lang.RuntimeException: java.lang.Error: Multiple ES-Hadoop
versions detected in the classpath; please use only one
jar:file:/tmp/tmp7d6epaqu/__livy__/elasticsearch-spark-20_2.11-6.2.2.jar
jar:file:/tmp/rsc-tmp3492512103399411501/__livy__/elasticsearch-spark-20_2.11-6.2.2.jar
I'm not sure what the temp directories are or why it's recognizing 2 jars when I'm only importing the one(If I remove the dependency from my pom it complains about Javasespark not existing). What am I doing wrong and what do I need to do to fix this?

Related

Spark Error: illegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem

I'm a newbie with this so be a little patience please.
I'm running a Spark Job to write some stuff into hbase and I get this error:
2022-06-22 12:45:22:901 | ERROR | Caused by: java.lang.IllegalAccessError:
class org.apache.hadoop.hdfs.web.HftpFileSystem
cannot access its superinterface
org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator
I read Error through remote Spark Job: java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem and since I'm using gradle instead of maven I tried to exclude the class org.apache.hadoop.hdfs.web.HftpFileSystem like this...
compileOnly ("org.apache.spark:spark-core_$scala_major:$spark_version"){
exclude group: "org.apache.hadoop", module: "hadoop-client"
}
Compilations works fine but execution fails exactly in the same way.
These are my versions:
spark_version = 2.4.7
hadoop_version = 3.1.1
All I read is about conflicts between spark and haddop so.
How can I fix this? All I have is exclude the class from spark core dependency and add the rigth version of haddop dependency.
Where I can find some reference about what versions are compatible? (To set the rigth version of haddop lib )
Can this be solved by changing something into the cluster by the infra guys?
I am not sure if I understood the issue correctly.
Thanks.

How do I import custom libraries in Databricks notebooks?

I uploaded a jar library on my cluster in Databricks following this tutorial, however I have been unable to import the library or use the methods of the library from the Databricks notebook. I have been unable to find forums or documentation that address this topic, so I'm unsure if it's even possible at this point.
I am able to run the jar file as a job in Databricks, I just haven't been able to import the jar library into the Notebook to run it from there.
I also tried running the jar file using the %sh magic command but received the following JNI error:
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: scala/Function0
This error generally caused because of Scala version. I would recommend upgrading Scala version and then try to import custom library.
Error: A JNI error has occurred, please check your installation and
try again Exception in thread "main" java.lang.NoClassDefFoundError:
scala/Function0
Refer - Apache Spark Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

Jar conflicts between apache spark and hadoop

I try to setup and run a Spark cluster running on top of YARN and using HDFS.
I first set up Hadoop for HDFS using hadoop-3.1.0.
Then I configured YARN and started both.
I was able to upload data to the HDFS and yarn also seems to work fine.
Then I installed spark-2.3.0-bin-without-hadoop on my master only and tried to submit an application.
Since it is spark without Hadoop I had to modify spark-env.sh, adding the following line like mentioned in the documentation:
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
Using only this line I got the following exception:
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
Which I guess means that he does not find the Spark-libraries. So I added the spark jars to the classpath:
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath):/usr/local/spark/jars/*
But now I get the following Exception:
com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.7.8
As it turns out, Hadoop 3.1.0 provides Jackson 2.7.8 while Spark 2.3.0 provides Jackson 2.6.7. As I see it, both are now in the classpath resulting in a conflict.
Since it seems I really need both the Hadoop and Spark libraries to submit anything, I do not know how to get around that problem.
How is Hadoop-3.0.0 's compatibility with older versions of Hive, Pig, Sqoop and Spark
there was answer from #JacekLaskowski that spark is not supported on hadoop 3. As far as I know, nothing changed for last 6 month in that area.

Load liquibase classpath failed

I generate app thanks to jhipster but when I run it I have the following error:
.m2\repository\org\liquibase\liquibase-core\3.5.3\liquibase-core-3.5.3.jar referenced one or more files that do not exist: .m2\repository\org\liquibase\liquibase-core\3.5.3\lib\snakeyaml-1.13.jar
The error happens in my IDE and when I run mvnw.
I can't find any solutions or workarounds.
Ok it seems that in fact my application failed to start cause my application-dev.yml was malformatted. But the error message what really not explicite and believed that the error come from the liquibase classpath.

Will zeppelin 0.6.0 work with Spark 1.4.1?

I have installed zeppelin 0.6.0 on my cluster which has spark 1.4.1 (HDP 2.3). As per the release notes I see that it supports spark 1.6 but not sure if it is backward compatible.
When I try to run sc.version in the notebook, I can see that spark job is submitted in yarn but it is failing right away with the following error in application log Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher
My SPARK_HOME path is correct. So zeroing on the incompatibility issue
export MASTER=yarn-client
export SPARK_YARN_JAR=/usr/hdp/current/spark-client/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar
export SPARK_HOME=/usr/hdp/current/spark-client
Finally found solution for this. Zeppelin 0.6 works with Spark 1.4.1. This was happening due to interpreter. The logs zeppelin-zeppelin-servername.log and .out files were helpful to resolve this. I had added highcharts artifact into the spark interpreter and it was not able to find that jar file. After providing correct path, I was able to run highcharts and resolve this issue as well.

Resources