java.lang.NoSuchMethodError when loading external FAT-JARs in Zeppelin - apache-spark

While trying to run a piece of code that used some FAT JARs (that share some common submodules) built using sbt assembly, I'm running into this nasty java.lang.NoSuchMethodError
The JAR is built on EMR itself (and not uploaded from some other environment), so version conflict in libraries / Spark / Scala etc is unlikely
My EMR environment:
Release label: emr-5.11.0
Hadoop distribution: Amazon 2.7.3
Applications: Spark 2.2.1, Zeppelin 0.7.3, Ganglia 3.7.2, Hive 2.3.2, Livy 0.4.0, Sqoop 1.4.6, Presto 0.187
Project configurations:
Scala 2.11.11
Spark 2.2.1
SBT 1.0.3

It turned out that the real culprit were the shared submodules in those jars.
Two fat jars built out of projects containing common submodules were leading to this conflict. Removing one of those jars resolved the issue.
I'm not sure if this conflict happened only under some particular circumstances or would always occur upon uploading such jars (that have same submodules) in Zeppelin interpreter, so still waiting for proper explanation.

Related

Spark netty Version Mismatch on HDInsight Cluster

I am currently having an issue when running my Spark job remotely in a HDInsight Cluster:
My project has a dependency on netty-all and here is what I explicitly specify for it in the pom file:
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.51.Final</version>
</dependency>
The final built jar includes this package with the specified version and running the Spark job on my local machine works fine. However, when I try to run it in the remote HDInsight cluster, the job throws the following exception:
java.lang.NoSuchMethodError: io.netty.handler.ssl.SslProvider.isAlpnSupported(Lio/netty/handler/ssl/SslProvider;)Z
I believe this is due to the netty version mismatch as Spark was picking up the old netty version (netty-all-4.1.17) from its default system classpath in the remote cluster rather than the newer netty package defined in the uber jar.
I have tried different ways to resolve this issue but they don't seem to work well:
Relocating classes using Maven Shade plugin:
More details and its issues are here - Missing Abstract Class using Maven Shade Plugin for Relocating Classes
Spark configurations
spark.driver.extraClassPath=<path to netty-all-4.1.50.Final.jar>
spark.executor.extraClassPath=<path to netty-all-4.1.50.Final.jar>
Would like to know if there is any other solutions to solve this issue or any steps missing here?
You will need to ensure you only have Netty 4.1.50.Final or higher on the classpath

Which Spark version should I download to run on top of Hadoop 3.1.2?

In Spark download page we can choose between releases 3.0.0-preview and 2.4.4.
For release 3.0.0-preview there are the package types
Pre-built for Apache Hadoop 2.7
Pre-built for Apache Hadoop 3.2 and later
Pre-built with user-provided Apache Hadoop
Source code
For release 2.4.4 there are the package types
Pre-built for Apache Hadoop 2.7
Pre-built for Apache Hadoop 2.6
Pre-built with user-provided Apache Hadoop
Pre-built with Scala 2.12 and user-provided Apache Hadoop
Source code
Since there isn't a Pre-built for Apache Hadoop 3.1.2 option, can I download a Pre-built with user-provided Apache Hadoop package or should I download Source code?
If you are comfortable building source code, then that is your best option.
Otherwise, you already have a Hadoop cluster, so pick "user-provided" and copy your relevant core-site.xml, hive-site.xml, yarn-site.xml, and hdfs-site.xml all into the $SPARK_CONF_DIR, and it hopefully mostly will work
Note: DataFrames don't work on Hadoop 3 until Spark 3.x - SPARK-18673

oozie spark action with spark2 code fails with "jackson version too old 2.4.4"

I am trying to run an oozie spark action that runs spark 2.x code. Followed the steps mentioned in Hortonwork's HDP documentation, however, the spark action fails with error "jackson version too old 2.4.4".
Spark2 oozie sharelib jars have 2.6.5 version of jackson jars, but oozie's oozie-sharelib jars have 2.4.4 version of jackson jars.
Hence, sometimes the job runs fine but sometimes it fails citing the version mismatch or NoSuchMethodExists exception (again due to mismatched jars ).
I dont want to delete the 2.4.4 version jars from oozie' oozie sharelib, but wondering why these jars are added to the classpath when spark action is running. Is there a way to only add jars from /user/oozie/share/lib//spark2 and restrict any other jars from getting added to classpath ?

Spark2.2 interpreter on Zeppelin 0.7.0 running on HDP 2.6

Simply trying to test a Zeppelin interpreter to run Spark 2.2 on YARN on Zeppelin 0.7.0(HDP2.6) but repeatedly getting:
java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
All I am running is
%spark2
sc.version
With the same Spark 2.2 I can run spark-submit s and spark-shell operations running on YARN(locally and remotely) but can't make Zeppelin listen to this new version of Spark. Does Zeppelin-HDP only support Spark 2.1 and 1.6? (My Spark 2.2 is a custom installation).
The only thing that makes me believe the above is that i can see in the logs of testing the Zeppelin notebook:
Added JAR file:/usr/hdp/current/zeppelin-server/interpreter/spark/zeppelin-spark_2.10-0.7.0.2.6.0.3-8.jar
which appears to be a HDP-specific zeppelin JAR.
Please help.
Yes you are right. I was hitting a similar issue while I was running zeppelin 0.7.0 and spark 2.2.0 on mesos. Infact have a look at this commit:
https://github.com/apache/zeppelin/commit/28310c2b95785d8b9e63bc0adc5a26df8b3c9dec
The support seems to be added in 0.7.3 so try upgrading zeppelin and give it a try. I built zeppelin from master branch and it worked for me but the tag v0.7.3 should work fine as well.

How to connect Zeppelin to Spark 1.5 built from the sources?

I pulled the latest source from the Spark repository and built locally. It works great from an interactive shell like spark-shell or spark-sql.
Now I want to connect Zeppelin to my Spark 1.5, according to this install manual. I published the custom Spark build to the local maven repository and set the custom Spark version in the Zeppelin build command. The build process finished successfully but when I try to run basic things like sc inside notebook, it throws:
akka.ConfigurationException: Akka JAR version [2.3.11] does not match the provided config version [2.3.4]
Version 2.3.4 is set in pom.xml and spark/pom.xml, but simply changing them won’t even let me get a build.
If I rebuild Zeppelin with the standard -Dspark.vesion=1.4.1, everything works.
Update 2016-01
Spark 1.6 support has landed to master and is available under -Pspark-1.6 profile.
Update 2015-09
Spark 1.5 support has landed to master and is available under -Pspark-1.5 profile.
Work on supporting Spark 1.5 in Apache Zeppelin (incubating) was done under this PR apache/incubator-zeppelin#269 which will lend to master soon.
For now, building from Spark_1.5 branch with -Pspark-1.5 should do the trick.

Resources