Currently, I have installed Spark 1.5.0 version on AWS using spark-ec2.sh script.
Now, I want to upgrade my Spark version to 1.5.1. How do i do this? Is there any upgrade procedure or do i have to build it from scratch using the spark-ec2 script? In that case i will lose all my existing configuration.
Please Advise
Thanks
1.5.1 has identical configuration fields with the 1.5.0, I am not aware of any automation tools, but upgrade should be trivial. C/P $SPARK_HOME/conf should suffice. Back up the old files, nevertheless.
Related
There are some pre installed java libraries in azure databricks https://docs.databricks.com/release-notes/runtime/6.6.html#installed-java-and-scala-libraries-scala-211-cluster-version
Is there a way to uninstall such libraries?
I have a library conflict. I need to use another version of the spring-core library.
Databricks includes a number of default Java and Scala libraries. You can replace any of these libraries with another version by using a cluster-scoped init script to remove the default library jar and then install the version you require.
Important Note: Removing default libraries and installing new versions may cause instability or completely break your Databricks cluster. You should thoroughly test any new library version in your environment before running production jobs.
Refer: Databricks - Replace a default library jar.
Im getting issues while using spark3.0 for reading elastic.
My elasticsearch version 7.6.0
I used elastic jar of the same version.
Please suggest a solution.
Spark 3.0.0 relies on Scala 2.12, which is not yet supported by Elasticsearch-hadoop. This and a few further issues prevent us from using Spark 3.0.0 together with Elasticsearch. If you want to compile it yourself, there is a pull-request on elasticsearch-hadoop (https://github.com/elastic/elasticsearch-hadoop/pull/1308) which should at least allow using scala 2.12. Not sure if it will fix the other issues as well.
It's officially released for spark 3.0
Enhancements:
https://www.elastic.co/guide/en/elasticsearch/hadoop/7.12/eshadoop-7.12.0.html
Maven Repository:
https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-spark-30_2.12/7.12.0
It is not official for now, but you can compile the dependency on
https://github.com/elastic/elasticsearch, the steps are
git clone https://github.com/elastic/elasticsearch.git
cd elasticsearch-hadoop/
vim ~/.bashrc
export JAVA8_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
source ~/.bashrc
./gradlew elasticsearch-spark-30:distribution --console=plain
and finally you can find .jar package in folder: "elasticsearch-hadoop\spark\sql-30\build\distributions", elasticsearch-spark-30_2.12-8.0.0-SNAPSHOT.jar is the es packages
I am using Apache Zeppelin 0.7.3 and would like to use the volume-leaflet visualization.
volume leaflet npm package info
The above npm package info states at the bottom of the page:
Compatibility
Requires Zeppelin 0.8.0-SNAPSHOT+
So the npm package apparently requires Zeppelin 0.8.0 but I can find no information on Zeppelin's web page on how to download/install 0.8. The latest available version of Zeppelin is 0.7.3. What am I missing here?
And yes, I have tried volume-leaflet with 0.7.3 but had some challenges.
Thanks in advance for any feedback.
Zeppelin 0.8 is still in development. The active documentation can be found here: https://zeppelin.apache.org/docs/0.8.0-SNAPSHOT/. I am not aware of any nightly-builds, so you will need to build zeppelin on your own, see How to build.
However some of the Helium Plugins work with smaller Zeppelin versions, even if they claim not to. You can try this by adding the package specification to the helium.json. I did explain that at a conference lately.
I am currently stuck on Data migration, I want to migrate data from Oracle Database to Cassandra.
I have following tools installed on Linux
DSE 4.8
Hadoop 2.7.3
Sqoop 1.4.6
I am not sure why my SQOOP version is not having cql-import or any cassandra related commands.
Following are the available commands I can see in the "SQOOP help" output
Available commands:
codegen
create-hive-table
eval
export
help
import
import-all-tables
import-mainframe
job
list-databases
list-tables
merge
metastore
version
I have searched throughout the net and found following link having latest sqoop version, but cql-import tool is missing in all of them.
https://www-eu.apache.org/dist/sqoop/
http://mirrors.ibiblio.org/apache/sqoop/1.4.6/
It would be very helpful if any one has the link for a sqoop version which supports cassandra data migration commands like "cql-import".
Editted:
One more point to add, I have manually configured Hadoop and Sqoop.
Thanks in advance
I have a Cloudera CDH 5.11 cluster installed from RPM packages (we don't want to use Cloudera Manager or parcels). Has anyone found/built Spark 2 RPM packages for CDH? It seems Cloudera only ships Spark 2 as parcels.
You won't. For now, the doc "Spark 2 Known Issues" clearly states:
Package Install is not Supported
The Cloudera Distribution of Apache Spark 2 is only installable as a parcel.
https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#ki_package_install
The best way is to use Spark on Yarn instead of using Spark Master/Worker. You are free to use any Spark version you like, independent of what the vendor ships.
What you need to do is to package Spark History Server to be able to look at jobs after they finishes. And, if you want to use Dynamic Allocation, you need Spark Shuffle Service configured in Yarn.
Looks like I can't comment on an issue so excuse this post as an answer.
Is it possible to install the Spark2 parcel on a RPM installed cluster using CM?
From CDH 6.0 Spark 2 is included as RPMs. Problem solved.