spark-shell, pyspark are not working properly

spark-shell, pyspark are not working properly - apache-spark

I've installed spark-2.3.0-bin-hadoop2.7 on Ubuntu and I don’t think it has some problem with the java path. When I run "spark-submit --version" or "spark-shell" or "pyspark" I get the following error:
/usr/local/spark-2.3.0-bin-hadoop2.7/bin/spark-class: line 71: /usr/lib/jvm/java-8-openjdk-amd-64/jre/bin/java: No such file or directory
It seems "/bin/java" is problematic, but I'm not sure where to change the configuration. The spark-class file has the following lines:
if [ -n "${JAVA_HOME}" ]; then
RUNNER="${JAVA_HOME}/bin/java
The /etc/environment is:
bash: /etc/environment: Permission denied
What I now have in gedit ~/.bashrc is:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd-64/jre
export PATH=$PATH:JAVA_HOME/bin
This is the current java setup that I have:
root#ubuntu:~# update-alternatives --config java There is only one alternative in link group java (providing /usr/bin/java): /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
Nothing to configure.
bashrc has the following:
export PATH=$PATH:/usr/share/scala-2.11.8/bin
export SPARK_HOME=/usr/local/spark-2.3.0-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
Suggest me:
What files I need to change and
how I need to change them?

Java Home
Your JAVA_HOME should be set to your JDK
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd-64/jre
should be
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd-64
Here is the Oracle doc on JAVA_HOME (which should apply to Open JDK as well)
https://docs.oracle.com/cd/E19182-01/820-7851/inst_cli_jdk_javahome_t/
Spark Environmental Variables
The JAVA_HOME should also be set in the $SPARK_HOME/conf/spark-env.sh
https://spark.apache.org/docs/latest/configuration.html#environment-variables
😊

Related

Force to use specific version of JAVA in centos

There are 2 instances of java version on my VM and I want to force to use java "java-1.8.0-openjdk" using shell script.
# sudo alternatives --config java
There are 2 programs which provide 'java'.
Selection Command
-----------------------------------------------
* 1 java-1.8.0-openjdk.x86_64 (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-2.el8_5.x86_64/jre/bin/java)
+ 2 java-11-openjdk.x86_64 (/usr/lib/jvm/java-11-openjdk-11.0.13.0.8-4.el8_5.x86_64/bin/java)
I tried below :
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-2.el8_5.x86_64/
export JAVA_HOME
export PATH=$PATH:$JAVA_HOME
java -version
o/p: openjdk version "11.0.13" 2021-10-19 LTS
Help here would be really appreciated.

Suggesting to try again with this:
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-2.el8_5.x86_64/
export JAVA_HOME
export PATH=$PATH:$JAVA_HOME/bin
java -version
The difference in the the 3rd line. Added /bin to path.
If not working try (thanks to comment from #Jeff Schallerr):
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-2.el8_5.x86_64/
export JAVA_HOME
export PATH=$PATH:$JAVA_HOME/jre/bin
java -version
In your question you specify a very specific Java version. The java version are automatically updated when there is a Linux update.
Fixing your Java version is bad practice.
In folder /usr/lib/jvm there are a soft links to current JVM.
Probably /usr/lib/jvm/jre-1.8.0
Suggesting to inspect the Java version installed in your system:
ls -l /usr/lib/jvm
In order to set JAVA_HOME to the current Java version.
JAVA_HOME=/usr/lib/jvm/jre-1.8.0
export JAVA_HOME
export PATH=$JAVA_HOME/bin:$PATH
which java
java -version
$JAVA_HOME/bin/java -version
Now change the order to the PATH and test again:
JAVA_HOME=/usr/lib/jvm/jre-1.8.0
export JAVA_HOME
export PATH=$PATH:$JAVA_HOME/bin
which java
java -version
$JAVA_HOME/bin/java -version
There is difference in PATH. Hope you can appreciate the PATH mechanism.

First check whether your bashrc java path is override by another shell script's java path in somewhere. Print the $PATH and verify java 11 path is not included in there. If java 11 path also included in $PATH before java 8 it is override in shell script in somewhere.
echo $PATH
Check whether $JAVA_HOME variable in .bash_profile, /home/.profile, etc/.profile shell scripts. (Or else grep 'java-11-openjdk-11.0.13.0.8-4.el8_5.x86_64/bin/java' and find the shell scripts where the $JAVA_HOME configured by java 11 and it found remove those)
Then reboot the machine and try sudo alternatives --config java again

Wrong JAVA_HOME in hadoop for spark-shell

I needed to install Hadoop in order to have Spark running on my WSL2 Ubuntu for school projects. I installed Hadoop 3.3.1 and Spark 3.2.1 follow those two tutorials :
Hadoop Tutorial on Kontext.tech
Spark Tutorial on Kontext.tech
I correctly set up env variables in my .bashrc :
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
export PATH=$PATH:$JAVA_HOME
export HADOOP_HOME=~/hadoop/hadoop-3.3.1
export SPARK_HOME=~/hadoop/spark-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:/usr/local/hadoop/bin/
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$SPARK_HOME/bin:$PATH
# Configure Spark to use Hadoop classpath
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
As well as the ~/hadoop/spark-3.2.1/conf/spark-env.sh.template :
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/
However when I launch spark-shell, I get this error :
/home/adrien/hadoop/spark-3.2.1/bin/spark-class: line 71: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin//bin/java: No such file or directory
/home/adrien/hadoop/spark-3.2.1/bin/spark-class: line 96: CMD: bad array subscript
There seems to be a mess up in a redefinition of the $PATH variable but I can't figure out where it can be. Can you help me solve it please ? I don't know Hadoop and know spark well but I never had to install them.

First, certain Spark packages come with Hadoop, so you don't need to download them separately. More specifically, Spark is built against Hadoop 3.2 for now, so using the latest version might cause its own problems
For your problem, JAVA_HOME should not end in /bin or /bin/java. Check the linked post again...
If you used apt install for java, you shouldn't really need to set JAVA_HOME or the PATH for Java, either, as the package manager will do this for you. Or you can use https://sdkman.io
Note: Java 11 is preferred
You also need to remove .template from any config files for them to actually be used... However, JAVA_HOME is automatically detected by spark-submit, so it's completely optional in spark-env.sh
Same applies for hadoop-env.sh
Also remove /usr/local/hadoop/bin/ from your PATH since it doesn't appear you've put anything in that location

Hadoop name node format says "JAVA_HOME" not set

Machine: Linux Santiago
Hadoop Version: Apache hadoop 2.7.1
Mode Of Installation: Pseudo Distributed
Done: I downloaded the tar file and just extracted
Checks: I just set my JAVA_HOME in bashrc file and checked by echoing JAVA_HOME. It works.
Stmt which caused error: bin/hadoop namenode -format
Error: JAVA_HOME is not set
Any idea?

Note down the following things
1) Set JAVA_HOME in hadoop-env.sh
2) For newbie, don't forget to export JAVA_HOME
Then Formatting will be successful.

Ubuntu: hadoop command not found

I am trying to check my installation of hadoop. I did create the environment variables and when I call printenv, I do see my HADOOP_HOME and PATH variables printed and correct (home/hadoop and HADOOP_HOME/bin respectively).
If I go to home/hadoop in the terminal and call ls, I see the hadoop file there. If I try to run it by calling hadoop, it still tells me command not found.
First day on Linux, so there may be a stupid answer to this problem.

Your current working directory is probably not part of your path.
That is default on linux systems.
If you are in the same directory, where your hadoop file is, run that command with an relative path, like: ./hadoop

HOME DIRECTORY:
/home/hadoop is a home directory created by linux similar to Document and settings in windows.
Open your terminal and type:
ls -l /home/hadoop
Post your result for this command: ls -l /home/hadoop
SETTING GLOBAL PATH:
Go to /home/hadoop and open .bashrc in text editor.
Add these lines at the end:
export HADOOP_HOME=/path/to/your/hadoop/installation/folder
export PATH=$PATH:$HADOOP_HOME/bin
Save and exit. Now type, this in your teminal:
echo $PATH
echo $HADOOP_HOME
If these commands shows correct directories, try hadoop command. It should work.
Post your result for these command: echo $PATH and echo $HADOOP_HOME

Go to Hadoop-x.x.x/bin folder
check for hadoop folder there
run ./hadoop version

You must run “hadoop version” command.
If the hadoop setup is fine, then you should see the following result:
Hadoop 2.4.1
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
For installation related guide you can refer here:
Hadoop Environment Setup
Link to my quora answer https://qr.ae/TWngHN
Hope this helps.
Thanks

Enter which hadoop in your terminal. If you see a path as an output, hadoop is set in PATH of your system. If you get something similar to this,
usr/bin/which: no hadoop in (/usr/local/hadoop.... you might not have setup everything properly. Modify the /etc/bash.bashrc with
export HADOOP_HOME = /path/to/hadoop/folder and add it to PATH using export PATH=$PATH:HADOOP_HOME/bin

You may be editing the wrong ~/.bashrc file.
Open terminal and run sudo gedit ~/.bashrc and edit these command
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
Note: You must not use sudo gedit ~/.bashrc.sh these both work differently on newer OS

CDH4 JAVA_HOME Ubuntu

I have installed CDH4 in pseudo distributed mode on CentOs without any problems, but when I am installing it on Ubuntu 12.04 I am getting some errors with setting my JAVA_HOME environment variable.
I installed JDK and and have JAVA_HOME set correctly in /etc/profile.d and in ~/bash.rc using the following lines:
export JAVA_HOME=/usr/local/java/latest
export PATH=${JAVA_HOME}/bin:$PATH
I know that is redundant to define it in both places, but apparently setting it in /etc/profile.d wasn't working. From my user, when I type $echo $JAVA_HOME I get:
/usr/local/java/latest
From sudo, I run $ sudo -E echo $JAVA_HOME, I get:
/usr/local/java/latest
If you are wondering, I am specifying the -E option for sudo to preserver my environment.
So my real problem is when I am trying to start HDFS, using the following command:
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
I get the following error:
* Starting Hadoop datanode:
Error: JAVA_HOME is not set and could not be found.
Running the same command with the -E option gives me the same result. Has anyone had this problem?
Thanks in advance.

After some research, I found the answer to my question.
I am using CDH4 and have hadoop installed in pseudo-distributed mode.
To fix my JAVA_HOME problems, I created a the hadoop-env.sh file in /etc/hadoop/conf.pseudo.mr1
The file contained the line:
export JAVA_HOME=/usr/local/java/latest
Where /usr/local/java/latest is the path to my installation of JAVA_HOME

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

spark-shell, pyspark are not working properly - apache-spark

Related

Force to use specific version of JAVA in centos

Wrong JAVA_HOME in hadoop for spark-shell

Hadoop name node format says "JAVA_HOME" not set

Ubuntu: hadoop command not found

CDH4 JAVA_HOME Ubuntu

Categories

Resources