installing spark 1.4.1 in CDH 5.4.2 - apache-spark

I am very new to spark and I want to install latest version of spark on my VM. Can anyone please guide me on how to install spark 1.4.1 on my cloudera VM version 5.4.2. I have currently spark 1.3.0 installed (Default that comes already installed in CDH 5.4.2) on my cloudera.
Thank you.

Officially you will need to wait for Cloudera to release (and support) the newer version of Spark with CDH.
If you need a newer version of Spark before then you can download Spark yourself and install it alongside CDH.
http://spark.apache.org/downloads.html
You can still use the other CDH Hadoop systems (e.g. HDFS, Hive, etc) from a separate Spark installation.

Related

How to install Apache Spark on Windows 10?

I am trying to install spark-3.3.1-bin-hadoop3 "Prebuilt for Apache Hadoop 3.3 and later" that I have downloaded from https://spark.apache.org/downloads.html , On my Windows 10 machine.
But when I search for winutils the latest version I could find is 3.3.1 from here: https://github.com/kontext-tech/winutils. Also there are other winutils with lower versions on other Github pages.
As the latest version of Hadoop is 3.3.4 I don't know what to do? I even don't know should I install Hadoop before installing Spark? Or what Github page is the official page for winutils?

Install Spark 3.* on HDP with Ambari

We need to install Spark 3.* on HDP 3.1.5 but i can't to find some instructions.
I find this instruction
https://community.cloudera.com/t5/Community-Articles/Steps-to-install-supplementary-Spark-on-HDP-cluster/ta-p/244199
Does this work for spark 3?
How to add this service to ambari?
I'm need help

spark 3.3.0 and kafka-clients library

I have successfully used spark 3.2.1 with kafka-clients 3.2.1.
Can kafka-clients 3.2.1 be used with spark 3.3.0 as I see no kafka-clients 3.3.0 in maven?

Problem to Install pyspark of version 2.3

I was trying to install pyspark 2.3 from the last couple days. But I have found out Version 3.0.1 and 2.4.7 only so far. Actually I was trying to run a code implemented in pyspark 2.3 as a part of my project. Is that version still available now ? Please send me the essential resources to install pyspark 2.3 if it is available to install as well as shareable. As it seems tough to me to implement that code in version 3.0.1.
Pyspark 2.3 should still be available via Conda-Forge.
Please checkout https://anaconda.org/conda-forge/pyspark/files?version=2.3.2
There you will find the following and more packages for a direct download:
linux-64/pyspark-2.3.2-py36_1000.tar.bz2
win-64/pyspark-2.3.2-py36_1000.tar.bz2
If you don't want the raw packages, you can also install it via conda:
conda install -c conda-forge pyspark=2.3.2

Configuring CDH cluster with Python 3

We are using CDH 5.8.3 community version and we want to add support for Python 3.5+ to our cluster
I know that Cloudera and Anaconda has such parcel to support Python, but this parcel support Python version 2.7.
What is the recommended way to enable Python version 3+ on CDH cluster?

Resources