Hive version compatibility with Spark - apache-spark

After various failed tries to use my Hive (1.2.1) with my Spark (Spark 1.4.1 built for Hadoop 2.2.0) I decided to try to build again Spark with Hive.
I would like to know what is the latest Hive version that can be used to build Spark at this point.
When downloading Spark 1.5 source and trying:
mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-1.2.1 -Phive-thriftserver -DskipTests clean package
I get :
The requested profile "hive-1.2.1" could not be activated because it does not exist.
Any help appreciated

Check your spark 1.5 pom.xml it contains hive 1.2.1 version therefore I don't thing you need to specify the hive version explicitly. Simply use mvn without hive version and it should work.

I'd recommend you to go through this compatibility chart :
http://hortonworks.com/wp-content/uploads/2016/03/asparagus-chart-hdp24.png

Spark website maintains good docs by version number regarding building with Hive support.
e.g. for v1.5 https://spark.apache.org/docs/1.5.0/building-spark.html
Listed example shows 2.4 but as the other answer pointed out above you can leave off the Phive-1.2.1 but according to the docs, if you do that with Spark 1.5.0 it will Build with Hive 0.13 Bindings by default.
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package
Index of all versions: https://spark.apache.org/docs/
Latest version: https://spark.apache.org/docs/latest/building-spark.html
It appears that it defaults to Hive 1.2.1 bindings from Spark version 1.6.2 onwards. Default doesn't necessarily indicate support limitation though,

Related

Elasticsearch for spark 3.0

Im getting issues while using spark3.0 for reading elastic.
My elasticsearch version 7.6.0
I used elastic jar of the same version.
Please suggest a solution.
Spark 3.0.0 relies on Scala 2.12, which is not yet supported by Elasticsearch-hadoop. This and a few further issues prevent us from using Spark 3.0.0 together with Elasticsearch. If you want to compile it yourself, there is a pull-request on elasticsearch-hadoop (https://github.com/elastic/elasticsearch-hadoop/pull/1308) which should at least allow using scala 2.12. Not sure if it will fix the other issues as well.
It's officially released for spark 3.0
Enhancements:
https://www.elastic.co/guide/en/elasticsearch/hadoop/7.12/eshadoop-7.12.0.html
Maven Repository:
https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-spark-30_2.12/7.12.0
It is not official for now, but you can compile the dependency on
https://github.com/elastic/elasticsearch, the steps are
git clone https://github.com/elastic/elasticsearch.git
cd elasticsearch-hadoop/
vim ~/.bashrc
export JAVA8_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
source ~/.bashrc
./gradlew elasticsearch-spark-30:distribution --console=plain
and finally you can find .jar package in folder: "elasticsearch-hadoop\spark\sql-30\build\distributions", elasticsearch-spark-30_2.12-8.0.0-SNAPSHOT.jar is the es packages

java.lang.NoSuchMethodError when loading external FAT-JARs in Zeppelin

While trying to run a piece of code that used some FAT JARs (that share some common submodules) built using sbt assembly, I'm running into this nasty java.lang.NoSuchMethodError
The JAR is built on EMR itself (and not uploaded from some other environment), so version conflict in libraries / Spark / Scala etc is unlikely
My EMR environment:
Release label: emr-5.11.0
Hadoop distribution: Amazon 2.7.3
Applications: Spark 2.2.1, Zeppelin 0.7.3, Ganglia 3.7.2, Hive 2.3.2, Livy 0.4.0, Sqoop 1.4.6, Presto 0.187
Project configurations:
Scala 2.11.11
Spark 2.2.1
SBT 1.0.3
It turned out that the real culprit were the shared submodules in those jars.
Two fat jars built out of projects containing common submodules were leading to this conflict. Removing one of those jars resolved the issue.
I'm not sure if this conflict happened only under some particular circumstances or would always occur upon uploading such jars (that have same submodules) in Zeppelin interpreter, so still waiting for proper explanation.

Spark notebook in Hue 3.11

I am trying to setup Spark notebook in HUE(version 3.11) with Spark 2.0.0 using Livy 0.2.0.
With Spark 1.6.1 the notebook is working perfectly fine.
Livy only supports Scala 2.10 builds of Spark.So I did a build of Spark-2.0.0 with Scala-2.10.6.When I open up spark-shell(2.0.0) it clears says "Using Scala version 2.10.6".
But Spark notebook is not working with this build.In the Spark notebook when I do 1+1 and execute it , it gives the following error.
What could be wrong here?Below is the exception in the logs
"java.util.concurrent.ExecutionException: com.cloudera.livy.rsc.rpc.RpcException: java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.render(Lorg/json4s/JsonAST$JValue;)Lorg/json4s/JsonAST$JValue;\ncom.cloudera.livy.repl.ReplDriver$$anonfun$handle$2.apply(ReplDriver.scala:78)\ncom.cloudera.livy.repl.ReplDriver$$anonfun$handle$2.apply(ReplDriver.scala:78)\nscala.Option.map(Option.scala:145)\ncom.cloudera.livy.repl.ReplDriver.handle(ReplDriver.scala:78)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:606)\ncom.cloudera.livy.rsc.rpc.RpcDispatcher.handleCall(RpcDispatcher.java:130)\ncom.cloudera.livy.rsc.rpc.RpcDispatcher.channelRead0(RpcDispatcher.java:77)\nio.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)\nio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)\nio.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)\nio.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)\nio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)\nio.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)\nio.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)\nio.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103)\nio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)\nio.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)\nio.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)\nio.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)\nio.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)\nio.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)\nio.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)\nio.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)\nio.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)\njava.lang.Thread.run(Thread.java:745)" (error 500)
This solved my problem.
Download latest Livy code from Git hub.Use the below maven build command
mvn clean package -DskipTests -Dspark-2.0 -Dscala-2.11
I'm not sure that this is even possible.
Relying on release notes, Hue 3.11 not works with Spark 2.0 (it works with Spark 1.6).

How to build latest version of Spark for a MapR version?

I am running an old MapR cluster, mapr3.
How can I build a custom distribution for Spark 1.5.x for mapr3?
What I understand is getting the right hadoop.version is the key step to make everything work.
I went back to version spark 1.3.1 and found the mapr3 profile in which it had hadoop.version=1.0.3-mapr-3.0.3. To build a complete distribution, the following command will work if you have JAVA_HOME set already:
./make-distribution.sh --name custom-spark --tgz -Dhadoop.version=1.0.3-mapr-3.0.3 -Phadoop-1 -DskipTests

How to connect Zeppelin to Spark 1.5 built from the sources?

I pulled the latest source from the Spark repository and built locally. It works great from an interactive shell like spark-shell or spark-sql.
Now I want to connect Zeppelin to my Spark 1.5, according to this install manual. I published the custom Spark build to the local maven repository and set the custom Spark version in the Zeppelin build command. The build process finished successfully but when I try to run basic things like sc inside notebook, it throws:
akka.ConfigurationException: Akka JAR version [2.3.11] does not match the provided config version [2.3.4]
Version 2.3.4 is set in pom.xml and spark/pom.xml, but simply changing them won’t even let me get a build.
If I rebuild Zeppelin with the standard -Dspark.vesion=1.4.1, everything works.
Update 2016-01
Spark 1.6 support has landed to master and is available under -Pspark-1.6 profile.
Update 2015-09
Spark 1.5 support has landed to master and is available under -Pspark-1.5 profile.
Work on supporting Spark 1.5 in Apache Zeppelin (incubating) was done under this PR apache/incubator-zeppelin#269 which will lend to master soon.
For now, building from Spark_1.5 branch with -Pspark-1.5 should do the trick.

Resources