HDFS start-all.sh by root or non-root user - linux

I am learning Hadoop, and would like to try the pseudo-distributed operation
When I try to use start-all.sh to start the Hadoop daemons, should I use a non-root user like foo-user or use root.
Using root has no problem, however, I am a little bit concerned about it.
Using a non-root user, foo-user, it complains that it doesn't have permission to files
/var/run/hadoop/hadoop-foo-user-namenode.pid: permission denied
/var/run/hadoop/hadoop-foo-user-tasktracker-foohost.pid: permission denied
It was trying to create these two files in the directory /var/run/hadoop
I tried vim /var/run/hadoop/testfile, and couldn't save. So turns out that foo-user doesn't have permission to write at /var/run/hadoop
I checked the permission of /var/run/hadoop
drwxrwxr-x root hadoop 4096 Feb 8 23:42 hadoop
foo-user is in group hadoop, so should have write permission to /var/run/hadoop. Indeed, several other id files are created there, like the ...jobtracker.pid
So should I use root for start-all.sh or there is something wrong with the permission ( I am really confused)?

It's not recommended to start Hadoop as the root, below is quoted from Yahoo's Hadoop tutorial:
The user who owns the Hadoop instances will need to have read and
write access to each of these directories. It is not necessary for all
users to have access to these directories. Set permissions with chmod
as appropriate. In a large-scale environment, it is recommended that
you create a user named "hadoop" on each node for the express purpose
of owning and running Hadoop tasks. For a single individual's machine,
it is perfectly acceptable to run Hadoop under your own username. It
is not recommended that you run Hadoop as root.
Even though foo-user is in the group hadoop in the Linux filesystem, you still need to make sure
that foo-user is also a group member in HDFS (by default the group is called supergroup), you'll see what the group is when you do hadoop fs -ls path_to_your_data.

group as well as user needs to be hadoop. Here you have:
drwxrwxr-x root hadoop 4096 Feb 8 23:42 hadoop
so change the root into hadoop (curently i don't have access to any linux machine so I can't say exact commands), then make yourself sure that hadoop user is able to create filies and directories within /var/run/hadoop. I strongly recommend to run it s non-root user.

Related

How to transfer data from local file system (linux) to a Hadoop Cluster made on Google Cloud Platform

I am a beginner in Hadoop, I made a Hadoop Cluster (one master and two slaves) on Google Cloud Platform.
I accessed the master of the cluster using from the local file system (Linux): ssh -i key key#public_ip_of_master
Then I did sudo su - inside the cluster because Hadoop functions only appears while being root.
Then I initiated the HDFS using start-dfs.sh and start-all.sh
Now the problem is that I want to tranfer files from the local Linux file system to the Hadoop Cluster and vice versa using the following command (inserting the command inside the cluster while being root):
root#master:~# hdfs dfs -put /home/abas1/Desktop/chromFa.tar.gz /Hadoop_File
The problem is that the local path which is: /home/abas1/Desktop/chromFa.tar.gz is never recognized and I can not seem to know what to do.
I am sure I am missing something trivial but I do not know what it is. I have to use either -copyFromLocal or -put.
local path is never recognized
That is not a Hadoop problem, then. You are on the master node (over SSH), as the root user. There is a /root folder with files, and probably no /home/abas1.
In other words, run ls -l /home, and you see what local files are available.
To get files to the master server to upload from that terminal session, you will want to SCP files first to there from a different machine.
Exit the SSH session
scp -i key root#master-ip home/abas1/Desktop/chromFa.tar.gz /tmp
ssh -i key root#master-ip
Then you can do this
hdfs mkdir /Hadoop_File
ls -l /tmp | grep chromFa # for example, to check file
hdfs -put /tmp/chromFa.tar.gz /Hadoop_file/
Hadoop functions only appears while being root.
Please do not use root for interacting with Hadoop services. Create unique user accounts for HDFS, YARN, Zookeeper, etc. with restricted permissions like you would for any other Unix process.
Using DataProc will do this... And you can still SSH to it, so you should really considering using it instead of manual GCE cluster.

Storefile from spark on windows to HDFS

I have installed Hadoop/YARN in a linux VM on my local windows machine. On the same windows machine (not in VM) I have installed Spark. When running spark on windows, I can read files stored in HDFS (in linux VM).
val lines = sc.textFile("hdfs://MyIP:9000/Data/sample.txt")
While saving a file using to HDFS saveAsTextFile("hdfs://MyIP:9000/Data/Output"), I am getting below error:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=LocalWindowsUser, access=WRITE,
inode="/Data":hadoop:supergroup:drwxr-xr-x.
I guess, it's because Windows and Linux users are different and windows user doesn't have permission to write files in linux.
What is the correct way to store files from windows to HDFS (linux VM) using spark?
Your problem is that the username that you are using to access HDFS with write mode does not have permissions.
The directory /Data has the permissions rwxr-xr-x. This translates to mode 755. Your username is LocalWindowsUser which has read and execute permissions.
Possible solutions:
Soution 1:
Since this is a local system under your full control, change the permissions to allow everyone access. Execute this command while inside the VM as the user hadoop:
hdfs dfs -chmod -R 777 /Data
Solution 2:
Create an environment variable in Windows and set the username:
set HADOOP_USER_NAME=hadoop
The username really should be the user hdfs. Try that also if necessary.

How to allow the root user to write files into HDFS

I have installed hadoop on Cent OS 7. The daemon service written in python trying to make a directory in HDFS , but getting the below permission error.
mkdir: Permission denied: user=root, access=WRITE, inode="/rep_data/store/data/":hadoop:supergroup:drwxr-xr-x
looks like my service is running under root account.
So I would like to know how do I give a permission to the root user to make directory and write files.
If you are trying to create directory under HDFS root i.e /, you may face this type of issue. You can create directories in your home, without any issues
To create directory in root, Execute command like follows
sudo hdfs hdfs dfs -mkdir /directory/name
To create directory in your HDFS home execute below command
hdfs dfs -mkdir /user/user_home/directory/name
This is probably an issue because you are not the super user.
A workaround is to enable Access Control Lists in hdfs and give permissions to your user.
To enable support for ACLs, set dfs.namenode.acls.enabled to true in the NameNode configuration.
For more info check: link

Perforce Client permissions

I had downloaded the perforce client in my ubuntu 11 system a while ago. The p4v in my system is found under:
-r-xr-xr-x 1 root root 1578 2011-08-29 12:46 /usr/bin/p4v
After I read the documentation of p4v I realize it should not be owned by root.
How do I change the ownership of this and also allow my "user" all the access rights without reinstalling the p4v again?
Thanks
If you "chown" P4V without changing the permissions, your user will still be able to use it.
For example:
chown user:group /usr/bin/p4v
The permissions granted to the file allow read and execute to the owner, anyone in the group and everyone else with access to the system. Changing the user and group will not change the permissions, so your user should still be able to execute the binary.
In general, the ownership of a binary that is executable by all shouldn't matter. The danger comes if the owner of the process is root (or any other privileged user). For example, "mount" will be owned by root, and can be run by any user to show the mounted file-systems. Only when it is run by root can it change your file-system structure.

Newly created folder permission rights issue

Hope you are good. I have Xammp on fedora and changed owner of opp/lampp/htdoc to root. Why I did so because whenever someone creates new folder through sharing, they don't have permission to dynamically create folder or files or to write images. Then I run command
chmod -R 777 /opt/lampp/htdocs
But when system goes to restart then I again need to run this command. So avoid again and again run this command I changed the owner on "opt/lampp/htdocs" and run
chmod -R 777 /opt/lampp/htdocs
Now, whenever server restarts, assigned permissions don't need to be set again and again. That is resolved.
I have an issue, that old directories can be used to write something. But if any network user creates new directory under htdocs, that new directory needs to be changed the permission for it.
previously created, and can use this one directory to run script to create files
drwxrwxrwx 2 root root 4096 2011-06-15 14:09 aaa
Newly created, cannot be used to run a script to create image or to write anything
drwxr-xr-x 2 root root 4096 2011-06-17 15:17 aaaa
drwxr-xr-x this one is really annoying to me for each newly created folder in htdocs :(
Just to let you know that my htdocs user and rights are:
drwxrwxrwx 101 root root 4096 2011-06-17 15:17 htdocs
Why is it so? Can anybody please help me to figure this problem out? I am waiting for quick response anxiously.
First off, you should investigate what permissions you really need - chmodding everything to 777 is a security risk as it will allow any user to write inside of your web root.
However, to address your actual question of the default permissions when a new folder is created by a user, you want to adjust the default "umask" which determines such things.
This question has some information for changing it for the Apache user (if a "network user" is a user creating new files and directories through the httpd process):
Setting the umask of the Apache user
If you need to adjust it for other users or processes, the solution will be similar.
Good luck!
Edit
Since you're on Fedora, try this: (from the question I linked above)
[root ~]$ echo "umask 002" >> /etc/sysconfig/httpd
[root ~]$ service httpd restart
The first command will add that line to the /etc/sysconfig/httpd which is a permanent configuration file, and the second command will make it active.
You are tackling the problem from the wrong side. Restore your apache configuration to use apache.apache as default user/group, and set your samba server to use those credentials when someone write to your document root.
If you are using nfs or another posix compatible filesystem, use chmod g+s to keep all files readable from your apache server.
Try it:
#umask 000
have a good time!!

Resources