PATH variable not function for hadoop - linux

I installed hadoop in the following path,
/home/myname/hadoop-2.7.2
/home/myname/hadoop-2.7.2/bin/hadoop
contains the executable file "hadoop"
Now, I set my $PATH variable in .bashrc, and I did a echo $PATH, I get
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:
/home/myname/hadoop-2.7.2/:
/home/myname/hadoop-2.7.2/bin:
/home/myname/hadoop-2.7.2/sbin
i did some formatting here. When I run bin/hadoop, i get "No such file or directory", but when I run hadoop, i get the expected result.
Not sure what I did wrong here

Thats because hadoop as you declared in your PATH variable is not on the bin directory but here :
/home/myname/hadoop-2.7.2/bin/hadoop
so to run bin/hadoop you'll have to be in /home/myname/hadoop-2.7.2 directory.
the hadoop command is working for you thanks to your declaration in the PATH variable
If you want to make the command available for all users consider moving the folder to /opt for example.
and if using debian or a debian based distro take a look at this command
http://linux.die.net/man/8/update-alternatives
Im confused as what you want to achieve though. Since when you run hadoop you get to the expected result.

Related

Can't get spark to start

I have successfully installed and run apache spark in the past on my machine. Today I returned to it and tried to run it using : bin/spark-shell in the spark directory (bin file exists in this dir) but I am getting:
bin is not recognized as an internal or external command,
operable program or batch file.
It s running on windows 10 cmd shell, in case this is helpful. What could cause this?
I belive we need more info, to be able to answr your question.
Using './' specifies a path, starting in the root of your working directory. (Bash or powershell)
Are you running this in the cmd shell/powershell/bash shell?
What directory are you working in, when trying to execute your command?
Is there a bin folder in your current directory? (LS command or dir command)
JAVA_HOME was outdated... I had updated java without updating the path! That was the problem.
Check version of java installed and location where environment variable JAVA_HOME is pointing to.
In my case JAVA_HOME = C:\Program Files\Java\jdk1.7.0_79 (this is old version)
The cause of this issue was that I installed a new version of JDK and removed the previous installation but JAVA_HOME was pointing to the old environment which was missing.

Unable to run spark-shell from bin

I'm new to spark, I downloaded precompiled spark.
When I try to run spark-shell from bin folder on command line, it returns
:cd /users/denver/spark-1.6/bin
:spark-shell
command not found
But if I run it like
:cd /users/denver/spark-1.6
:./bin/spark-shell
it launches spark ..
can you please let me know why it is throwing error in the 1st case
The reason why you can not run the spark-shell command in first case is because of environment variable
The terminal searches for executables in $PATH. This is a Unix environment variable that lists directories containing system binaries (such as ls, echo, or gcc). If you call an executable that's not in a $PATH directory (such as spark-shell), you need to indicate its absolute path in the file system.
In the terminal . is a synonym for the current working directory, thus ./bin/spark-shell can work properly. You could equally well call ./some/path/bin/spark-shell.
Hope it helps.
In linux:
Add to your ~/.bashrc
export SPARK_HOME=/home/das/spark-1.6.2-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
Source ~/.bashrc.
Relaunch terminal
then spark-shell works from anywhere
If it doesn't work add to ~/.profile
This happened to me too. Using ./bin/spark-shell after changing to the spark directory should solve the issue.

Starting hadoop - command not found

I have zero experience in hadoop and trying to set up hadoop in ec2 environment. After formatted the filesystem, I tried to start hadoop and it keeps saying command not found.
I think I have tried every advice I found on stackoverflow previous questions/answers.
Here is the line I am having trouble with:
[root#ip-172-31-22-92 ~]# start-hadoop.sh
-bash: start-hadoop.sh: command not found
I have tried all the following commands (which I found on previous answers)
[root#ip-172-31-22-92 ~]# hadoop-daemon.sh start namenode
-bash: hadoop-daemon.sh: command not found
[root#ip-172-31-22-92 ~]# ./start-all.sh
-bash: ./start-all.sh: No such file or directory
[root#ip-172-31-22-92 ~]# cd /usr/local/hadoop/
-bash: cd: /usr/local/hadoop/: No such file or directory
Honestly, I don't know what I am doing wrong. Plus, I am doing this as root...is this right? it seems like I should be in user...?! (discard this question if i just sounded dumber)
I am not sure whether you have downloaded/installed the hadoop package or not, so let me walk you through the process of it briefly:
Download the latest package using wget:
wget http://apache.cs.utah.edu/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
Extract the package relative to where you have downloaded it:
tar xzf hadoop-2.7.1.tar.gz
change the dir into the extracted directory
cd hadoop-2.7.1
Now you would be able to find or start the hadoop daemons using:
sbin/start-all.sh
You can find the script's you are trying to use in the extracted dir's (hadoop-2.7.1) sbin folder.
Make sure you follow the proper documentation to get it completed properly, because I haven't really covered installing Java or configuring hadoop which are extensively covered in the following documentation link:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
The scripts in this repository could help you to understand the steps to install hadoop. https://github.com/lalosam/EasyHadoop (hadoop.sh). You could try to download it and execute it. The script should download the hadoop library and confgure it as pseudo cluster. start-hadoop and stop-hadoop scripts start and stop all the services required for hadoop.
First You may have to add your HADOOP_HOME variable in .bashrc file .
Ex:
export HADOOP_HOME=/usr/local/bigdata/hadoop/hadoop-1.2.1
export CLASSPATH=$JAVA_HOME:/usr/local/bigdata/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar
export PATH=$PATH:$HADOOP_HOME/bin
Then open a new session and execute ./start-all.sh

Ubuntu: hadoop command not found

I am trying to check my installation of hadoop. I did create the environment variables and when I call printenv, I do see my HADOOP_HOME and PATH variables printed and correct (home/hadoop and HADOOP_HOME/bin respectively).
If I go to home/hadoop in the terminal and call ls, I see the hadoop file there. If I try to run it by calling hadoop, it still tells me command not found.
First day on Linux, so there may be a stupid answer to this problem.
Your current working directory is probably not part of your path.
That is default on linux systems.
If you are in the same directory, where your hadoop file is, run that command with an relative path, like: ./hadoop
HOME DIRECTORY:
/home/hadoop is a home directory created by linux similar to Document and settings in windows.
Open your terminal and type:
ls -l /home/hadoop
Post your result for this command: ls -l /home/hadoop
SETTING GLOBAL PATH:
Go to /home/hadoop and open .bashrc in text editor.
Add these lines at the end:
export HADOOP_HOME=/path/to/your/hadoop/installation/folder
export PATH=$PATH:$HADOOP_HOME/bin
Save and exit. Now type, this in your teminal:
echo $PATH
echo $HADOOP_HOME
If these commands shows correct directories, try hadoop command. It should work.
Post your result for these command: echo $PATH and echo $HADOOP_HOME
Go to Hadoop-x.x.x/bin folder
check for hadoop folder there
run ./hadoop version
You must run “hadoop version” command.
If the hadoop setup is fine, then you should see the following result:
Hadoop 2.4.1
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
For installation related guide you can refer here:
Hadoop Environment Setup
Link to my quora answer https://qr.ae/TWngHN
Hope this helps.
Thanks
Enter which hadoop in your terminal. If you see a path as an output, hadoop is set in PATH of your system. If you get something similar to this,
usr/bin/which: no hadoop in (/usr/local/hadoop.... you might not have setup everything properly. Modify the /etc/bash.bashrc with
export HADOOP_HOME = /path/to/hadoop/folder and add it to PATH using export PATH=$PATH:HADOOP_HOME/bin
You may be editing the wrong ~/.bashrc file.
Open terminal and run sudo gedit ~/.bashrc and edit these command
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
Note: You must not use sudo gedit ~/.bashrc.sh these both work differently on newer OS

Why does spark-submit and spark-shell fail with "Failed to find Spark assembly JAR. You need to build Spark before running this program."?

I was trying to run spark-submit and I get
"Failed to find Spark assembly JAR.
You need to build Spark before running this program."
When I try to run spark-shell I get the same error.
What I have to do in this situation.
On Windows, I found that if it is installed in a directory that has a space in the path (C:\Program Files\Spark) the installation will fail. Move it to the root or another directory with no spaces.
Your Spark package doesn't include compiled Spark code. That's why you got the error message from these scripts spark-submit and spark-shell.
You have to download one of pre-built version in "Choose a package type" section from the Spark download page.
Try running mvn -DskipTests clean package first to build Spark.
If your spark binaries are in a folder where the name of the folder has spaces (for example, "Program Files (x86)"), it didn't work. I changed it to "Program_Files", then the spark_shell command works in cmd.
In my case, I install spark by pip3 install pyspark on macOS system, and the error caused by incorrect SPARK_HOME variable. It works when I run command like below:
PYSPARK_PYTHON=python3 SPARK_HOME=/usr/local/lib/python3.7/site-packages/pyspark python3 wordcount.py a.txt
Go to SPARK_HOME. Note that your SPARK_HOME variable should not include /bin at the end. Mention it when you're when you're adding it to path like this: export PATH=$SPARK_HOME/bin:$PATH
Run export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" to allot more memory to maven.
Run ./build/mvn -DskipTests clean package and be patient. It took my system 1 hour and 17 minutes to finish this.
Run ./dev/make-distribution.sh --name custom-spark --pip. This is just for python/pyspark. You can add more flags for Hive, Kubernetes, etc.
Running pyspark or spark-shell will now start pyspark and spark respectively.
If you have downloaded binary and getting this exception
Then please check your Spark_home path may contain spaces like "apache spark"/bin
Just remove spaces will works.
Just to add to #jurban1997 answer.
If you are running windows then make sure that SPARK_HOME and SCALA_HOME environment variables are setup right. SPARK_HOME should be pointing to {SPARK_HOME}\bin\spark-shell.cmd
For Windows machine with the pre-build version as of today (21.01.2022):
In order to verify all the edge cases you may have and avoid tedious guesswork about what exactly is not configred properly:
Find spark-class2.cmd and open it in with a text editor
Inspect the arguments of commands staring with call or if exists by typing the arguments in Command Prompt like this:
Open Command Prompt. (For PowerShell you need to print the var another way)
Copy-paste %SPARK_HOME%\bin\ as is and press enter.
If you see something like bin\bin in the path displayed now then you have appended /bin in your environment variable %SPARK_HOME%.
Now you have to add the path to the spark/bin to your PATH variable or it will not find spark-submit command
Try out and correct every path variable that the script in this file uses and and you should be good to go.
After that enter spark-submit ... you may now encounter the missing hadoop winutils.exe for which problem you can go get the tool and paste it where the spark-submit.cmd is located
Spark Installation:
For Window machine:
Download spark-2.1.1-bin-hadoop2.7.tgz from this site https://spark.apache.org/downloads.html
Unzip and Paste your spark folder in C:\ drive and set environment variable.
If you don’t have Hadoop,
you need to create Hadoop folder and also create Bin folder in it and then copy and paste winutils.exe file in it.
download winutils file from [https://codeload.github.com/gvreddy1210/64bit/zip/master][1]
and paste winutils.exe file in Hadoop\bin folder and set environment variable for c:\hadoop\bin;
create temp\hive folder in C:\ drive and give the full permission to this folder like:
C:\Windows\system32>C:\hadoop\bin\winutils.exe chmod 777 /tmp/hive
open command prompt first run C:\hadoop\bin> winutils.exe and then navigate to C:\spark\bin>
run spark-shell

Resources