Spark notebook in Hue 3.11 - apache-spark

I am trying to setup Spark notebook in HUE(version 3.11) with Spark 2.0.0 using Livy 0.2.0.
With Spark 1.6.1 the notebook is working perfectly fine.
Livy only supports Scala 2.10 builds of Spark.So I did a build of Spark-2.0.0 with Scala-2.10.6.When I open up spark-shell(2.0.0) it clears says "Using Scala version 2.10.6".
But Spark notebook is not working with this build.In the Spark notebook when I do 1+1 and execute it , it gives the following error.
What could be wrong here?Below is the exception in the logs
"java.util.concurrent.ExecutionException: com.cloudera.livy.rsc.rpc.RpcException: java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.render(Lorg/json4s/JsonAST$JValue;)Lorg/json4s/JsonAST$JValue;\ncom.cloudera.livy.repl.ReplDriver$$anonfun$handle$2.apply(ReplDriver.scala:78)\ncom.cloudera.livy.repl.ReplDriver$$anonfun$handle$2.apply(ReplDriver.scala:78)\nscala.Option.map(Option.scala:145)\ncom.cloudera.livy.repl.ReplDriver.handle(ReplDriver.scala:78)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:606)\ncom.cloudera.livy.rsc.rpc.RpcDispatcher.handleCall(RpcDispatcher.java:130)\ncom.cloudera.livy.rsc.rpc.RpcDispatcher.channelRead0(RpcDispatcher.java:77)\nio.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)\nio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)\nio.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)\nio.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)\nio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)\nio.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)\nio.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)\nio.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103)\nio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)\nio.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)\nio.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)\nio.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)\nio.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)\nio.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)\nio.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)\nio.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)\nio.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)\njava.lang.Thread.run(Thread.java:745)" (error 500)

This solved my problem.
Download latest Livy code from Git hub.Use the below maven build command
mvn clean package -DskipTests -Dspark-2.0 -Dscala-2.11

I'm not sure that this is even possible.
Relying on release notes, Hue 3.11 not works with Spark 2.0 (it works with Spark 1.6).

Related

spark-cassnadra connector issue

I am using spark 1.6.2 with scala version 2.10.5.
Now I have installed cassndra locally and downloaded spark-cassandra-connector_2.10-1.6.2.jar from https://spark-packages.org/package/datastax/spark-cassandra-connector
But when I am trying to fire up the spark shell from the cassandra using the connector I am getting this error
can some one please help me if I am downloading the wrong version of the connector or there are some other issues?
Just put : between spark-cassandra-connector and 1.6.2 instead of _, and remove the ; character after the version of connector...
spark-shell --packages datastax:spark-cassandra-connector:1.6.2-s_2.10
But it's better to use latest from 1.6.x release: 1.6.11 instead of 1.6.2

java.lang.NoSuchMethodError when loading external FAT-JARs in Zeppelin

While trying to run a piece of code that used some FAT JARs (that share some common submodules) built using sbt assembly, I'm running into this nasty java.lang.NoSuchMethodError
The JAR is built on EMR itself (and not uploaded from some other environment), so version conflict in libraries / Spark / Scala etc is unlikely
My EMR environment:
Release label: emr-5.11.0
Hadoop distribution: Amazon 2.7.3
Applications: Spark 2.2.1, Zeppelin 0.7.3, Ganglia 3.7.2, Hive 2.3.2, Livy 0.4.0, Sqoop 1.4.6, Presto 0.187
Project configurations:
Scala 2.11.11
Spark 2.2.1
SBT 1.0.3
It turned out that the real culprit were the shared submodules in those jars.
Two fat jars built out of projects containing common submodules were leading to this conflict. Removing one of those jars resolved the issue.
I'm not sure if this conflict happened only under some particular circumstances or would always occur upon uploading such jars (that have same submodules) in Zeppelin interpreter, so still waiting for proper explanation.

Spark2.2 interpreter on Zeppelin 0.7.0 running on HDP 2.6

Simply trying to test a Zeppelin interpreter to run Spark 2.2 on YARN on Zeppelin 0.7.0(HDP2.6) but repeatedly getting:
java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
All I am running is
%spark2
sc.version
With the same Spark 2.2 I can run spark-submit s and spark-shell operations running on YARN(locally and remotely) but can't make Zeppelin listen to this new version of Spark. Does Zeppelin-HDP only support Spark 2.1 and 1.6? (My Spark 2.2 is a custom installation).
The only thing that makes me believe the above is that i can see in the logs of testing the Zeppelin notebook:
Added JAR file:/usr/hdp/current/zeppelin-server/interpreter/spark/zeppelin-spark_2.10-0.7.0.2.6.0.3-8.jar
which appears to be a HDP-specific zeppelin JAR.
Please help.
Yes you are right. I was hitting a similar issue while I was running zeppelin 0.7.0 and spark 2.2.0 on mesos. Infact have a look at this commit:
https://github.com/apache/zeppelin/commit/28310c2b95785d8b9e63bc0adc5a26df8b3c9dec
The support seems to be added in 0.7.3 so try upgrading zeppelin and give it a try. I built zeppelin from master branch and it worked for me but the tag v0.7.3 should work fine as well.

Running R on amazon EMR with spark 1.6 and Zeppelin 0.5.6

I am trying to setup the R interpreter to run in Zeppelin which is currently running on EMR. Zeppelin is working perfectly and I am able to write script in Scala and Python. When I use %r, %sparkR or %knitr I receive an error : "r interpreter not found"
The applications which I have running in my emr-4.7.2 cluster are: Hive 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0
Within the interpreter there is no mention of R so figure I am missing something but do not know what.
Any pointers greatly appreciated.
Zeppelin on Amazon EMR (till at least emr-5.0.0) does not support the SparkR interpreter.
You ought following the Elastic Map Reduce Release Guide/Zeppelin documentation to get more information.

How to connect Zeppelin to Spark 1.5 built from the sources?

I pulled the latest source from the Spark repository and built locally. It works great from an interactive shell like spark-shell or spark-sql.
Now I want to connect Zeppelin to my Spark 1.5, according to this install manual. I published the custom Spark build to the local maven repository and set the custom Spark version in the Zeppelin build command. The build process finished successfully but when I try to run basic things like sc inside notebook, it throws:
akka.ConfigurationException: Akka JAR version [2.3.11] does not match the provided config version [2.3.4]
Version 2.3.4 is set in pom.xml and spark/pom.xml, but simply changing them won’t even let me get a build.
If I rebuild Zeppelin with the standard -Dspark.vesion=1.4.1, everything works.
Update 2016-01
Spark 1.6 support has landed to master and is available under -Pspark-1.6 profile.
Update 2015-09
Spark 1.5 support has landed to master and is available under -Pspark-1.5 profile.
Work on supporting Spark 1.5 in Apache Zeppelin (incubating) was done under this PR apache/incubator-zeppelin#269 which will lend to master soon.
For now, building from Spark_1.5 branch with -Pspark-1.5 should do the trick.

Resources