spark-shell showing spark old version always - apache-spark

I am running macOS, and I have two versions of Spark: 3.2.1 and 3.2.0. I manually downloaded 3.2.0, and the older version I downloaded using sdkman was 3.2.1. I want to switch to 3.2.0, so I updated my SPARK HOME in .zshrc file as shown below, and I commented (also tried removing) older version 3.2.1, but it still shows older version when I run spark-shell —version.
I also tried below :
I removed old SPARK_HOME and kept only new one.
I updated the .zshrc file using , source ~/.zshrc
I tried to add the SPARK_HOME under .bash_profile
I have completely removed SPARK_HOME from everywhere still it is showing old version 3.2.1
vim ~/.zshrc
export JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
#export SPARK_HOME=/Users/A2001/.sdkman/candidates/spark/3.2.1
export SPARK_HOME=/Users/A2001/Downloads/spark-3.2.0-bin-hadoop3.2
export MAVEN_HOME=~/apache-maven-3.8.6
export PATH=$PATH:$SPARK_HOME/bin/:$SPARK_HOME/sbin:$MAVEN_HOME/bin
Output:
spark-shell --version
version 3.2.1
Expected o/p :
version 3.2.0
I want to frequently switch to new and old version as per the usecases

Update the SPARK_HOME, Source the .zshrc/.bash_profile and close/restart the terminal it will work

Related

Can't start spark-shell on windows 10 Spark 3.2.0 install

Issue
When I try to run spark-shell I get a huge message error that you can see here :
https://pastebin.com/8D6RGxUJ
Install
I used this tutorial, but I already have python and java installed. I used spark 3.2.0 instead.
Config :
Windows 10
HADOOP_HOME : C:\hadoop
downloaded from https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.0/bin
JAVA_HOME : C:\PROGRA~2\Java\jre1.8.0_311
SPARK_HOME : C:\Spark\spark-3.2.0-bin-hadoop3.2
in path :
%SPARK_HOME%\bin
%HADOOP_HOME%\bin
My guess is that you have to put winutils.exe in the same folder as the $SPARK_HOME%\bin folder. I discovered that after starting from scratch and following this tutorial!
By following this answer for a similar question, I downgraded from spark 3.2.1 to 3.0.3 and this seems to have solved this problem.
I managed to solve the problem with the following configuration:
Spark: spark-3.2.1-bin-hadoop2.7
Hadoop: winutils.exe and hadoop.dll (version 2.7.7 for both)
JDK: jdk-18.0.1
And I recommend that you put the environment variables in User, not System

SPARK/pyspark - not running hive.HiveSessionStateBuilder

I have a problem with running pyspark from Windows command line. I don't know what is causing the issue.
Spark-shell is running normally.
JAVA_HOME is set to C:Java, where I have installed JDK java version "1.8.0_161"
SPARK_HOME is set to C:\Users\heyde\Anaconda3\Lib\site-packages\pyspark, where I have installed it through pip in Anaconda
Also I have added C:\Users\heyde\Anaconda3\Lib\site-packages\pyspark\bin and C:Java\bin to system PATH.
Console
Spark-shell

Unable to find JAVA_HOME

I copied the latest java 1.8 to a red hat Linux server. I ran the command java -version and it returned version 1.7.0_131. I updated .bashrc and the jre to the latest version 1.8. When I run the command version java -version it still says its version 1.7.0_31. What I need to know where is this being picked up from. I have checked .profile, .bashrc and JAVA_HOME they all are pointing to the location where I update to 1.8. Greatly appreciate all your help.
Type which java and you will probably see (at least after you have followed all the symlinks) that the Java executable is taken from somewhere else than your newly set JAVA_HOME. You need to create the appropriate symlinks to version 1.8, too, like this:
ln -s /your/path/to/v18/bin/java /usr/bin/java
Be aware that existing applications might use the 1.7 Java version and you might break them when you set /usr/bin/java (or whatever path the which command showed you) to the 1.8 version.
Fyi, JAVA_HOME is not meant to be used by your shell to locate the Java command. It is to be used by other software that require Java to know where to find it.
Do it like this once
export JAVA_HOME=/jdk/path
If you use it continuously, add the above code to your .bashrc file or profile. Then open new terminal or run below code
source .bashrc

Cassandra : cqlsh not working on version 3.3 nor 2.2.5

I am new to Cassandra and trying to setup it on Linux version 2.6.18-404.el5 with Java 8 64 bit. I have tried Cassandra version 3.3. and 2.2.5 and getting an error:
cqlsh
File "./cqlsh.py", line 686
ssl_options=sslhandling.ssl_settings(hostname, CONFIG_FILE) if ssl else None, ^
SyntaxError: invalid syntax
It is reporting a syntax error on 'if'
You need to install python 2.7. Probably in parallel to your system installation so not to break your Linux distribution. For Centos 6.5 the instructions here worked like a charm for me. As Centos is a Redhat clone it should work for you as well.
Install dependencies
Download the python 2.7 sources.
Configure and build python; note that you have to make altinstall so not to mess with your system installation of python!
Verify the installation: try python2.7 --version, if everything worked out OK this will print the version info of your python 2.7 installation
Then edit the cqlsh shell script. It contains just one code line. At the start of that line replace python with python2.7, save the file, and now you should be able to run cqlsh.

Error - Apache Cassandra 2.0.5 & Datastax OpsCenter 4.1

I am using cassandra 2.0.5 on Centos 6.5 and OpsCenter 4 worked fine until i updated OpsCenter to version 4.1 . I access OpsCenter page, click on manage existing cluster and give the ip address of my node (127.0.0.1) and it gives me the following: "Error creating cluster: max() arg is an empty sequence".
Any clues ?
The bug is on 4.1.0, and is affecting those running Python 2.6. The complete fix for this is 4.1.1 (http://www.datastax.com/dev/blog/opscenter-4-1-1-now-available). To workaround this issue on 4.1.0, users should disable the auto-update feature, and manually re-populate the latest definitions. This will only need to be done once. This doesn't need to be done with 4.1.1, and that's the best fix. See the Known issues of the release notes (http://www.datastax.com/documentation/opscenter/4.1/opsc/release_notes/opscReleaseNotes410.html)
Add the following to opscenterd.conf to disable auto-update:
[definitions]
auto_update = False
Manually download the definition files
for tarball installs:
cd ./conf/definitions
for packages installs:
cd /etc/opscenter/definitions
Apply the latest definitions
curl https://opscenter.datastax.com/definitions/4.1.0/definition_files.tgz | tar xz
Restart opscenterd
I jus had today the same problem that you. I downloaded an older versions of opscenter (particulary version 4.0.2) from http://rpm.datastax.com/community/noarch/ and the error has gone.
I am also using the sam cassandra version and also on centos

Resources