Machine: Linux Santiago
Hadoop Version: Apache hadoop 2.7.1
Mode Of Installation: Pseudo Distributed
Done: I downloaded the tar file and just extracted
Checks: I just set my JAVA_HOME in bashrc file and checked by echoing JAVA_HOME. It works.
Stmt which caused error: bin/hadoop namenode -format
Error: JAVA_HOME is not set
Any idea?
Note down the following things
1) Set JAVA_HOME in hadoop-env.sh
2) For newbie, don't forget to export JAVA_HOME
Then Formatting will be successful.
Related
I needed to install Hadoop in order to have Spark running on my WSL2 Ubuntu for school projects. I installed Hadoop 3.3.1 and Spark 3.2.1 follow those two tutorials :
Hadoop Tutorial on Kontext.tech
Spark Tutorial on Kontext.tech
I correctly set up env variables in my .bashrc :
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
export PATH=$PATH:$JAVA_HOME
export HADOOP_HOME=~/hadoop/hadoop-3.3.1
export SPARK_HOME=~/hadoop/spark-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:/usr/local/hadoop/bin/
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$SPARK_HOME/bin:$PATH
# Configure Spark to use Hadoop classpath
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
As well as the ~/hadoop/spark-3.2.1/conf/spark-env.sh.template :
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/
However when I launch spark-shell, I get this error :
/home/adrien/hadoop/spark-3.2.1/bin/spark-class: line 71: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin//bin/java: No such file or directory
/home/adrien/hadoop/spark-3.2.1/bin/spark-class: line 96: CMD: bad array subscript
There seems to be a mess up in a redefinition of the $PATH variable but I can't figure out where it can be. Can you help me solve it please ? I don't know Hadoop and know spark well but I never had to install them.
First, certain Spark packages come with Hadoop, so you don't need to download them separately. More specifically, Spark is built against Hadoop 3.2 for now, so using the latest version might cause its own problems
For your problem, JAVA_HOME should not end in /bin or /bin/java. Check the linked post again...
If you used apt install for java, you shouldn't really need to set JAVA_HOME or the PATH for Java, either, as the package manager will do this for you. Or you can use https://sdkman.io
Note: Java 11 is preferred
You also need to remove .template from any config files for them to actually be used... However, JAVA_HOME is automatically detected by spark-submit, so it's completely optional in spark-env.sh
Same applies for hadoop-env.sh
Also remove /usr/local/hadoop/bin/ from your PATH since it doesn't appear you've put anything in that location
I am trying to set up SPARK2 on my cloudera cluster. For that, I have JDK1.8:
I have installed scala 2.11.8 using the rpm file:
I have downloaded, extracted the spark version 2.2.0 on my home directory: /home/cloudera.
I made changes to the PATH variable in .bashrc as below:
But when I try to execute spark-shell from the home directory: /home/cloudera, it says no such file or directory which can be seen below:
[cloudera#quickstart ~]$ spark-shell
/home/cloudera/spark/bin/spark-class: line 71: /usr/java/jdk1.7.0_67-cloudera/bin/java: No such file or directory
[cloudera#quickstart ~]$
Could anyone let me know how can I fix the problem and configure it properly ?
Java/JVM applications (and spark-shell in particular) uses java binary to launch itself. Therefore they need to know where it is located, which is usually done via JAVA_HOME environment variable.
In your case it's not reset explicitely and value from Clauder's default one Java distribution is used (even if it points to empty location).
You need to set JAVA_HOME pointing to correct java distribution directory for the user under which you want to launch spark-shell and other application.
I've installed spark-2.3.0-bin-hadoop2.7 on Ubuntu and I don’t think it has some problem with the java path. When I run "spark-submit --version" or "spark-shell" or "pyspark" I get the following error:
/usr/local/spark-2.3.0-bin-hadoop2.7/bin/spark-class: line 71: /usr/lib/jvm/java-8-openjdk-amd-64/jre/bin/java: No such file or directory
It seems "/bin/java" is problematic, but I'm not sure where to change the configuration. The spark-class file has the following lines:
if [ -n "${JAVA_HOME}" ]; then
RUNNER="${JAVA_HOME}/bin/java
The /etc/environment is:
bash: /etc/environment: Permission denied
What I now have in gedit ~/.bashrc is:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd-64/jre
export PATH=$PATH:JAVA_HOME/bin
This is the current java setup that I have:
root#ubuntu:~# update-alternatives --config java There is only one alternative in link group java (providing /usr/bin/java): /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
Nothing to configure.
bashrc has the following:
export PATH=$PATH:/usr/share/scala-2.11.8/bin
export SPARK_HOME=/usr/local/spark-2.3.0-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
Suggest me:
What files I need to change and
how I need to change them?
Java Home
Your JAVA_HOME should be set to your JDK
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd-64/jre
should be
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd-64
Here is the Oracle doc on JAVA_HOME (which should apply to Open JDK as well)
https://docs.oracle.com/cd/E19182-01/820-7851/inst_cli_jdk_javahome_t/
Spark Environmental Variables
The JAVA_HOME should also be set in the $SPARK_HOME/conf/spark-env.sh
https://spark.apache.org/docs/latest/configuration.html#environment-variables
😊
I tried installing Apache Spark on my 64 bit Windwos 7 machine.
I used the guides -
Installing Spark on Windows 10
How to run Apache Spark on Windows 7
Installing Apache Spark on Windows 7 environment
This is what I did -
Install Scala
Set environment variable SCALA_HOME and add %SCALA_HOME%\bin to Path
Result: scala command works on command prompt
Unpack pre-built Spark
Set environment variable SPARK_HOME and add %SPARK_HOME%\bin to Path
Download winutils.exe
Place winutils.exe under C:/hadoop/bin
Set environment variable HADOOP_HOME and add %HADOOP_HOME%\bin to Path
I already have JDK 8 installed.
Now, the problem is, when I run spark-shell from C:/spark-2.1.1-bin-hadoop2.7/bin, I get this -
"C:\Program Files\Java\jdk1.8.0_131\bin\java" -cp "C:\spark-2.1.1-bin-hadoop2.7\conf\;C:\spark-2.1.1-bin-hadoop2.7\jars\*" "-Dscala.usejavacp=true" -Xmx1g org spark.repl.Main --name "Spark shell" spark-shell
Is it an error? Am I doing something wrong?
Thanks!
I have the same issue when trying to install Spark local with Windows 7. Please make sure the below paths is correct and I am sure I will work with you.
Create JAVA_HOME variable: C:\Program Files\Java\jdk1.8.0_181\bin
Add the following part to your path: ;%JAVA_HOME%\bin
Create SPARK_HOME variable: C:\spark-2.3.0-bin-hadoop2.7\bin
Add the following part to your path: ;%SPARK_HOME%\bin
The most important part Hadoop path should include bin file before winutils.ee as the following: C:\Hadoop\bin Sure you will locate winutils.exe inside this path.
Create HADOOP_HOME Variable: C:\Hadoop
Add the following part to your path: ;%HADOOP_HOME%\bin
Now you can run the cmd and write spark-shell it will work.
I have installed CDH4 in pseudo distributed mode on CentOs without any problems, but when I am installing it on Ubuntu 12.04 I am getting some errors with setting my JAVA_HOME environment variable.
I installed JDK and and have JAVA_HOME set correctly in /etc/profile.d and in ~/bash.rc using the following lines:
export JAVA_HOME=/usr/local/java/latest
export PATH=${JAVA_HOME}/bin:$PATH
I know that is redundant to define it in both places, but apparently setting it in /etc/profile.d wasn't working. From my user, when I type $echo $JAVA_HOME I get:
/usr/local/java/latest
From sudo, I run $ sudo -E echo $JAVA_HOME, I get:
/usr/local/java/latest
If you are wondering, I am specifying the -E option for sudo to preserver my environment.
So my real problem is when I am trying to start HDFS, using the following command:
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
I get the following error:
* Starting Hadoop datanode:
Error: JAVA_HOME is not set and could not be found.
Running the same command with the -E option gives me the same result. Has anyone had this problem?
Thanks in advance.
After some research, I found the answer to my question.
I am using CDH4 and have hadoop installed in pseudo-distributed mode.
To fix my JAVA_HOME problems, I created a the hadoop-env.sh file in /etc/hadoop/conf.pseudo.mr1
The file contained the line:
export JAVA_HOME=/usr/local/java/latest
Where /usr/local/java/latest is the path to my installation of JAVA_HOME