Apache Zeppelin 0.7.3 Helium volume-leaflet - node.js

I am using Apache Zeppelin 0.7.3 and would like to use the volume-leaflet visualization.
volume leaflet npm package info
The above npm package info states at the bottom of the page:
Compatibility
Requires Zeppelin 0.8.0-SNAPSHOT+
So the npm package apparently requires Zeppelin 0.8.0 but I can find no information on Zeppelin's web page on how to download/install 0.8. The latest available version of Zeppelin is 0.7.3. What am I missing here?
And yes, I have tried volume-leaflet with 0.7.3 but had some challenges.
Thanks in advance for any feedback.

Zeppelin 0.8 is still in development. The active documentation can be found here: https://zeppelin.apache.org/docs/0.8.0-SNAPSHOT/. I am not aware of any nightly-builds, so you will need to build zeppelin on your own, see How to build.
However some of the Helium Plugins work with smaller Zeppelin versions, even if they claim not to. You can try this by adding the package specification to the helium.json. I did explain that at a conference lately.

Related

Uninstall pre installed java library from the databricks

There are some pre installed java libraries in azure databricks https://docs.databricks.com/release-notes/runtime/6.6.html#installed-java-and-scala-libraries-scala-211-cluster-version
Is there a way to uninstall such libraries?
I have a library conflict. I need to use another version of the spring-core library.
Databricks includes a number of default Java and Scala libraries. You can replace any of these libraries with another version by using a cluster-scoped init script to remove the default library jar and then install the version you require.
Important Note: Removing default libraries and installing new versions may cause instability or completely break your Databricks cluster. You should thoroughly test any new library version in your environment before running production jobs.
Refer: Databricks - Replace a default library jar.

Elasticsearch for spark 3.0

Im getting issues while using spark3.0 for reading elastic.
My elasticsearch version 7.6.0
I used elastic jar of the same version.
Please suggest a solution.
Spark 3.0.0 relies on Scala 2.12, which is not yet supported by Elasticsearch-hadoop. This and a few further issues prevent us from using Spark 3.0.0 together with Elasticsearch. If you want to compile it yourself, there is a pull-request on elasticsearch-hadoop (https://github.com/elastic/elasticsearch-hadoop/pull/1308) which should at least allow using scala 2.12. Not sure if it will fix the other issues as well.
It's officially released for spark 3.0
Enhancements:
https://www.elastic.co/guide/en/elasticsearch/hadoop/7.12/eshadoop-7.12.0.html
Maven Repository:
https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-spark-30_2.12/7.12.0
It is not official for now, but you can compile the dependency on
https://github.com/elastic/elasticsearch, the steps are
git clone https://github.com/elastic/elasticsearch.git
cd elasticsearch-hadoop/
vim ~/.bashrc
export JAVA8_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
source ~/.bashrc
./gradlew elasticsearch-spark-30:distribution --console=plain
and finally you can find .jar package in folder: "elasticsearch-hadoop\spark\sql-30\build\distributions", elasticsearch-spark-30_2.12-8.0.0-SNAPSHOT.jar is the es packages

Installing Spark 2 on CDH 5.* with RPM?

I have a Cloudera CDH 5.11 cluster installed from RPM packages (we don't want to use Cloudera Manager or parcels). Has anyone found/built Spark 2 RPM packages for CDH? It seems Cloudera only ships Spark 2 as parcels.
You won't. For now, the doc "Spark 2 Known Issues" clearly states:
Package Install is not Supported
The Cloudera Distribution of Apache Spark 2 is only installable as a parcel.
https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#ki_package_install
The best way is to use Spark on Yarn instead of using Spark Master/Worker. You are free to use any Spark version you like, independent of what the vendor ships.
What you need to do is to package Spark History Server to be able to look at jobs after they finishes. And, if you want to use Dynamic Allocation, you need Spark Shuffle Service configured in Yarn.
Looks like I can't comment on an issue so excuse this post as an answer.
Is it possible to install the Spark2 parcel on a RPM installed cluster using CM?
From CDH 6.0 Spark 2 is included as RPMs. Problem solved.

How to upgrade Apache Spark version

Currently, I have installed Spark 1.5.0 version on AWS using spark-ec2.sh script.
Now, I want to upgrade my Spark version to 1.5.1. How do i do this? Is there any upgrade procedure or do i have to build it from scratch using the spark-ec2 script? In that case i will lose all my existing configuration.
Please Advise
Thanks
1.5.1 has identical configuration fields with the 1.5.0, I am not aware of any automation tools, but upgrade should be trivial. C/P $SPARK_HOME/conf should suffice. Back up the old files, nevertheless.

How to connect Zeppelin to Spark 1.5 built from the sources?

I pulled the latest source from the Spark repository and built locally. It works great from an interactive shell like spark-shell or spark-sql.
Now I want to connect Zeppelin to my Spark 1.5, according to this install manual. I published the custom Spark build to the local maven repository and set the custom Spark version in the Zeppelin build command. The build process finished successfully but when I try to run basic things like sc inside notebook, it throws:
akka.ConfigurationException: Akka JAR version [2.3.11] does not match the provided config version [2.3.4]
Version 2.3.4 is set in pom.xml and spark/pom.xml, but simply changing them won’t even let me get a build.
If I rebuild Zeppelin with the standard -Dspark.vesion=1.4.1, everything works.
Update 2016-01
Spark 1.6 support has landed to master and is available under -Pspark-1.6 profile.
Update 2015-09
Spark 1.5 support has landed to master and is available under -Pspark-1.5 profile.
Work on supporting Spark 1.5 in Apache Zeppelin (incubating) was done under this PR apache/incubator-zeppelin#269 which will lend to master soon.
For now, building from Spark_1.5 branch with -Pspark-1.5 should do the trick.

Resources