Starting hadoop - command not found - linux

I have zero experience in hadoop and trying to set up hadoop in ec2 environment. After formatted the filesystem, I tried to start hadoop and it keeps saying command not found.
I think I have tried every advice I found on stackoverflow previous questions/answers.
Here is the line I am having trouble with:
[root#ip-172-31-22-92 ~]# start-hadoop.sh
-bash: start-hadoop.sh: command not found
I have tried all the following commands (which I found on previous answers)
[root#ip-172-31-22-92 ~]# hadoop-daemon.sh start namenode
-bash: hadoop-daemon.sh: command not found
[root#ip-172-31-22-92 ~]# ./start-all.sh
-bash: ./start-all.sh: No such file or directory
[root#ip-172-31-22-92 ~]# cd /usr/local/hadoop/
-bash: cd: /usr/local/hadoop/: No such file or directory
Honestly, I don't know what I am doing wrong. Plus, I am doing this as root...is this right? it seems like I should be in user...?! (discard this question if i just sounded dumber)

I am not sure whether you have downloaded/installed the hadoop package or not, so let me walk you through the process of it briefly:
Download the latest package using wget:
wget http://apache.cs.utah.edu/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
Extract the package relative to where you have downloaded it:
tar xzf hadoop-2.7.1.tar.gz
change the dir into the extracted directory
cd hadoop-2.7.1
Now you would be able to find or start the hadoop daemons using:
sbin/start-all.sh
You can find the script's you are trying to use in the extracted dir's (hadoop-2.7.1) sbin folder.
Make sure you follow the proper documentation to get it completed properly, because I haven't really covered installing Java or configuring hadoop which are extensively covered in the following documentation link:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html

The scripts in this repository could help you to understand the steps to install hadoop. https://github.com/lalosam/EasyHadoop (hadoop.sh). You could try to download it and execute it. The script should download the hadoop library and confgure it as pseudo cluster. start-hadoop and stop-hadoop scripts start and stop all the services required for hadoop.

First You may have to add your HADOOP_HOME variable in .bashrc file .
Ex:
export HADOOP_HOME=/usr/local/bigdata/hadoop/hadoop-1.2.1
export CLASSPATH=$JAVA_HOME:/usr/local/bigdata/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar
export PATH=$PATH:$HADOOP_HOME/bin
Then open a new session and execute ./start-all.sh

Related

start-all.sh, start-dfs.sh command not found

I am using Ubuntu 16.04 LTS and installed hadoop 2.7.2. The Output of
hadoop version
is
Hadoop 2.7.2
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41
Compiled by jenkins on 2016-01-26T00:08Z
Compiled with protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /usr/local/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar
and when i run
whereis hadoop
it gives output as
hadoop: /usr/local/hadoop /usr/local/hadoop-2.7.2/bin/hadoop.cmd /usr/local/hadoop-2.7.2/bin/hadoop
But when i run command
start-all.sh
it says command not found.
also when i run
start-dfs.sh
it gives output as command not found.
I am able to run these command when i navigate to hadoop directory but i want to run these command without navigating into hadoop directory.
Your problem is that bash doesn't know where to look for ./start-all.sh.
You can fix this by opening $HOME/.bashrc and adding a line that looks like this:
PATH=$PATH:/usr/local/hadoop/sbin
This tells bash that it should look in '/usr/local/hadoop/sbin' for start-all.sh.
Note:
Changes to $HOME/.bashrc will not take affect in any terminals that are currently open.
If you need the changes to take affect in a terminal that is currently open, run
source $HOME/.bashrc
I had to search for it with find.
find / -iname start-all.sh 2> /dev/null
It found:
/usr/local/sbin/start-all.sh
/usr/local/Cellar/hadoop/3.3.4/libexec/sbin/start-all.sh
/usr/local/Cellar/hadoop/3.3.4/sbin/start-all.sh
So in addition to the previous answer, the variable in $HOME/.bashrc looks like this:
PATH=$PATH:/usr/local/sbin/
Note: I'm not sure which one should be set as PATH

Ubuntu: hadoop command not found

I am trying to check my installation of hadoop. I did create the environment variables and when I call printenv, I do see my HADOOP_HOME and PATH variables printed and correct (home/hadoop and HADOOP_HOME/bin respectively).
If I go to home/hadoop in the terminal and call ls, I see the hadoop file there. If I try to run it by calling hadoop, it still tells me command not found.
First day on Linux, so there may be a stupid answer to this problem.
Your current working directory is probably not part of your path.
That is default on linux systems.
If you are in the same directory, where your hadoop file is, run that command with an relative path, like: ./hadoop
HOME DIRECTORY:
/home/hadoop is a home directory created by linux similar to Document and settings in windows.
Open your terminal and type:
ls -l /home/hadoop
Post your result for this command: ls -l /home/hadoop
SETTING GLOBAL PATH:
Go to /home/hadoop and open .bashrc in text editor.
Add these lines at the end:
export HADOOP_HOME=/path/to/your/hadoop/installation/folder
export PATH=$PATH:$HADOOP_HOME/bin
Save and exit. Now type, this in your teminal:
echo $PATH
echo $HADOOP_HOME
If these commands shows correct directories, try hadoop command. It should work.
Post your result for these command: echo $PATH and echo $HADOOP_HOME
Go to Hadoop-x.x.x/bin folder
check for hadoop folder there
run ./hadoop version
You must run “hadoop version” command.
If the hadoop setup is fine, then you should see the following result:
Hadoop 2.4.1
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
For installation related guide you can refer here:
Hadoop Environment Setup
Link to my quora answer https://qr.ae/TWngHN
Hope this helps.
Thanks
Enter which hadoop in your terminal. If you see a path as an output, hadoop is set in PATH of your system. If you get something similar to this,
usr/bin/which: no hadoop in (/usr/local/hadoop.... you might not have setup everything properly. Modify the /etc/bash.bashrc with
export HADOOP_HOME = /path/to/hadoop/folder and add it to PATH using export PATH=$PATH:HADOOP_HOME/bin
You may be editing the wrong ~/.bashrc file.
Open terminal and run sudo gedit ~/.bashrc and edit these command
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
Note: You must not use sudo gedit ~/.bashrc.sh these both work differently on newer OS

How to run Cassandra (cqlsh) from anywhere

In Cassandra the official documentation (https://wiki.apache.org/cassandra/GettingStarted) it states, to start the service use
'bin/cassandra -f'
Then use
'bin/cqlsh'
to access. But to use cqlsh in this way I always have to go to the bin folder. What is the procedure to make it work such that I can type 'cqlsh' from anywhere in the console to access (not have to be in the bin folder of Cassandra setup) ?
(just like we access python directly from anywhere by just typing python3 in console )
To get this work work, you have to add your Cassandra bin directory to your $PATH.
From a terminal prompt, check the contents of your $PATH.
$ echo $PATH
On my Ubuntu VM, this is what I see:
/usr/local/apache-maven/apache-maven-3.1.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/jvm/jdk1.7.0_45/bin
Since you mention Python3, I'll check the location of that on my system as well:
$ which python3
/usr/bin/python3
As you can see, Python3 is in my /usr/bin directory, and /usr/bin is in my $PATH, which is why simply typing python3 works for me (and you as well).
There are a few ways to get your Cassandra bin directory into your $PATH. There is some debate about which is the "correct" way to do accomplish this. So in lieu of telling you how I would do it, I will provide a link to a question on AskUbuntu that details something like 3 ways to add a directory into your $PATH: How to add a directory to my path?
Use cassandra -f in your root folder and then you should be able to use cqlsh anywhere you have cassandra installed

Why does spark-submit and spark-shell fail with "Failed to find Spark assembly JAR. You need to build Spark before running this program."?

I was trying to run spark-submit and I get
"Failed to find Spark assembly JAR.
You need to build Spark before running this program."
When I try to run spark-shell I get the same error.
What I have to do in this situation.
On Windows, I found that if it is installed in a directory that has a space in the path (C:\Program Files\Spark) the installation will fail. Move it to the root or another directory with no spaces.
Your Spark package doesn't include compiled Spark code. That's why you got the error message from these scripts spark-submit and spark-shell.
You have to download one of pre-built version in "Choose a package type" section from the Spark download page.
Try running mvn -DskipTests clean package first to build Spark.
If your spark binaries are in a folder where the name of the folder has spaces (for example, "Program Files (x86)"), it didn't work. I changed it to "Program_Files", then the spark_shell command works in cmd.
In my case, I install spark by pip3 install pyspark on macOS system, and the error caused by incorrect SPARK_HOME variable. It works when I run command like below:
PYSPARK_PYTHON=python3 SPARK_HOME=/usr/local/lib/python3.7/site-packages/pyspark python3 wordcount.py a.txt
Go to SPARK_HOME. Note that your SPARK_HOME variable should not include /bin at the end. Mention it when you're when you're adding it to path like this: export PATH=$SPARK_HOME/bin:$PATH
Run export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" to allot more memory to maven.
Run ./build/mvn -DskipTests clean package and be patient. It took my system 1 hour and 17 minutes to finish this.
Run ./dev/make-distribution.sh --name custom-spark --pip. This is just for python/pyspark. You can add more flags for Hive, Kubernetes, etc.
Running pyspark or spark-shell will now start pyspark and spark respectively.
If you have downloaded binary and getting this exception
Then please check your Spark_home path may contain spaces like "apache spark"/bin
Just remove spaces will works.
Just to add to #jurban1997 answer.
If you are running windows then make sure that SPARK_HOME and SCALA_HOME environment variables are setup right. SPARK_HOME should be pointing to {SPARK_HOME}\bin\spark-shell.cmd
For Windows machine with the pre-build version as of today (21.01.2022):
In order to verify all the edge cases you may have and avoid tedious guesswork about what exactly is not configred properly:
Find spark-class2.cmd and open it in with a text editor
Inspect the arguments of commands staring with call or if exists by typing the arguments in Command Prompt like this:
Open Command Prompt. (For PowerShell you need to print the var another way)
Copy-paste %SPARK_HOME%\bin\ as is and press enter.
If you see something like bin\bin in the path displayed now then you have appended /bin in your environment variable %SPARK_HOME%.
Now you have to add the path to the spark/bin to your PATH variable or it will not find spark-submit command
Try out and correct every path variable that the script in this file uses and and you should be good to go.
After that enter spark-submit ... you may now encounter the missing hadoop winutils.exe for which problem you can go get the tool and paste it where the spark-submit.cmd is located
Spark Installation:
For Window machine:
Download spark-2.1.1-bin-hadoop2.7.tgz from this site https://spark.apache.org/downloads.html
Unzip and Paste your spark folder in C:\ drive and set environment variable.
If you don’t have Hadoop,
you need to create Hadoop folder and also create Bin folder in it and then copy and paste winutils.exe file in it.
download winutils file from [https://codeload.github.com/gvreddy1210/64bit/zip/master][1]
and paste winutils.exe file in Hadoop\bin folder and set environment variable for c:\hadoop\bin;
create temp\hive folder in C:\ drive and give the full permission to this folder like:
C:\Windows\system32>C:\hadoop\bin\winutils.exe chmod 777 /tmp/hive
open command prompt first run C:\hadoop\bin> winutils.exe and then navigate to C:\spark\bin>
run spark-shell

Can not run hadoop localy using cygwin

I am new to hadoop so i am following this tuturial to install a single node on my computer. My OS is Windows 7 so I installed cygwin. In the tuturial I've been asked to:
Try the following command:
$ bin/hadoop
but when I try it I get this response:
-bash: binhadoop: command not found
The file "hadoop" is in the directory and have not changed it. As I mentioned I am new to Hadoop and Linux\Cygwin so maybe its something trivial that I do wrong.
Thanks
The command would be $cd /bin/hadoop to change directory. Use Linux for making things easier I would suggest.

Resources