How to allow the root user to write files into HDFS - linux

I have installed hadoop on Cent OS 7. The daemon service written in python trying to make a directory in HDFS , but getting the below permission error.
mkdir: Permission denied: user=root, access=WRITE, inode="/rep_data/store/data/":hadoop:supergroup:drwxr-xr-x
looks like my service is running under root account.
So I would like to know how do I give a permission to the root user to make directory and write files.

If you are trying to create directory under HDFS root i.e /, you may face this type of issue. You can create directories in your home, without any issues
To create directory in root, Execute command like follows
sudo hdfs hdfs dfs -mkdir /directory/name
To create directory in your HDFS home execute below command
hdfs dfs -mkdir /user/user_home/directory/name

This is probably an issue because you are not the super user.
A workaround is to enable Access Control Lists in hdfs and give permissions to your user.
To enable support for ACLs, set dfs.namenode.acls.enabled to true in the NameNode configuration.
For more info check: link

Related

Trying to move a csv file from local file system to hadoop file system

I am trying to copy a csv file from my local file system to hadoop. But I am not able to successfully do it. I am not sure which permissions i need to change. As I understand. hdfs super user does not have access to the /home/naya/dataFiles/BlackFriday.csv
hdfs dfs -put /home/naya/dataFiles/BlackFriday.csv /tmp
# Error: put: Permission denied: user=naya, access=WRITE, inode="/tmp":hdfs:supergroup:drwxr-xr-x
sudo -u hdfs hdfs dfs -put /home/naya/dataFiles/BlackFriday.csv /tmp
# Error: put: `/home/naya/dataFiles/BlackFriday.csv': No such file or directory
Any help is highly appreciated. I want to do it via the command line utility. I can do it via cloudera manager from the hadoop side. But I want to understand whats happening behind the commands

error when trying to save dataframe spark to a hdfs file

I'm using Ubuntu
When i try to save a dataframe to HDFS (Spark Scala):
processed.write.format("json").save("hdfs://localhost:54310/mydata/enedis/POC/processed.json")
I got this error
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/mydata/enedis/POC":hadoop_amine:supergroup:drwxr-xr-x
You are trying to write data as root user but hdfs directory(/mydata/enedis/POC) having permissions to hadoop_amine user to write to the directory.
Change the permissions on the HDFS directory to allow root user to write to /mydata/enedis/POC directory.
#login as hadoop_amine user then execute below command
hdfs dfs –chmod -R 777 /mydata/enedis/POC
(Or)
Intialize the spark shell with hadoop_amine user then no need to change the permissions of the directory.

SPARK Application + HDFS + User Airflow is not the owner of inode=alapati

We are running spark application on Hadoop cluster ( HDP version - 2.6.5 from Hortonworks ).
From the logs we can see the following Diagnostics
User: airflow
Application Type: SPARK
User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied. user=airflow is not the owner of inode=alapati
It is not provided clearly in log what we need to search in HDFS in order to find why we get Permission denied.
Looks line user=airflow don't have access to write data into HDFS.
By default the /user/ directory is owned by "hdfs" with 755 permissions. As a result only hdfs can write to that directory.
You can use two options:
change spark user name from airflow to hdfs or
If you still need to use user=airflow, create a home directory for airflow
sudo -u hdfs hadoop fs -mkdir /user/airflow
sudo -u hdfs hadoop fs -chown root /user/airflow

Storefile from spark on windows to HDFS

I have installed Hadoop/YARN in a linux VM on my local windows machine. On the same windows machine (not in VM) I have installed Spark. When running spark on windows, I can read files stored in HDFS (in linux VM).
val lines = sc.textFile("hdfs://MyIP:9000/Data/sample.txt")
While saving a file using to HDFS saveAsTextFile("hdfs://MyIP:9000/Data/Output"), I am getting below error:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=LocalWindowsUser, access=WRITE,
inode="/Data":hadoop:supergroup:drwxr-xr-x.
I guess, it's because Windows and Linux users are different and windows user doesn't have permission to write files in linux.
What is the correct way to store files from windows to HDFS (linux VM) using spark?
Your problem is that the username that you are using to access HDFS with write mode does not have permissions.
The directory /Data has the permissions rwxr-xr-x. This translates to mode 755. Your username is LocalWindowsUser which has read and execute permissions.
Possible solutions:
Soution 1:
Since this is a local system under your full control, change the permissions to allow everyone access. Execute this command while inside the VM as the user hadoop:
hdfs dfs -chmod -R 777 /Data
Solution 2:
Create an environment variable in Windows and set the username:
set HADOOP_USER_NAME=hadoop
The username really should be the user hdfs. Try that also if necessary.

HDFS start-all.sh by root or non-root user

I am learning Hadoop, and would like to try the pseudo-distributed operation
When I try to use start-all.sh to start the Hadoop daemons, should I use a non-root user like foo-user or use root.
Using root has no problem, however, I am a little bit concerned about it.
Using a non-root user, foo-user, it complains that it doesn't have permission to files
/var/run/hadoop/hadoop-foo-user-namenode.pid: permission denied
/var/run/hadoop/hadoop-foo-user-tasktracker-foohost.pid: permission denied
It was trying to create these two files in the directory /var/run/hadoop
I tried vim /var/run/hadoop/testfile, and couldn't save. So turns out that foo-user doesn't have permission to write at /var/run/hadoop
I checked the permission of /var/run/hadoop
drwxrwxr-x root hadoop 4096 Feb 8 23:42 hadoop
foo-user is in group hadoop, so should have write permission to /var/run/hadoop. Indeed, several other id files are created there, like the ...jobtracker.pid
So should I use root for start-all.sh or there is something wrong with the permission ( I am really confused)?
It's not recommended to start Hadoop as the root, below is quoted from Yahoo's Hadoop tutorial:
The user who owns the Hadoop instances will need to have read and
write access to each of these directories. It is not necessary for all
users to have access to these directories. Set permissions with chmod
as appropriate. In a large-scale environment, it is recommended that
you create a user named "hadoop" on each node for the express purpose
of owning and running Hadoop tasks. For a single individual's machine,
it is perfectly acceptable to run Hadoop under your own username. It
is not recommended that you run Hadoop as root.
Even though foo-user is in the group hadoop in the Linux filesystem, you still need to make sure
that foo-user is also a group member in HDFS (by default the group is called supergroup), you'll see what the group is when you do hadoop fs -ls path_to_your_data.
group as well as user needs to be hadoop. Here you have:
drwxrwxr-x root hadoop 4096 Feb 8 23:42 hadoop
so change the root into hadoop (curently i don't have access to any linux machine so I can't say exact commands), then make yourself sure that hadoop user is able to create filies and directories within /var/run/hadoop. I strongly recommend to run it s non-root user.

Resources