SPARK Application + HDFS + User Airflow is not the owner of inode=alapati

SPARK Application + HDFS + User Airflow is not the owner of inode=alapati - apache-spark

We are running spark application on Hadoop cluster ( HDP version - 2.6.5 from Hortonworks ).
From the logs we can see the following Diagnostics
User: airflow
Application Type: SPARK
User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied. user=airflow is not the owner of inode=alapati
It is not provided clearly in log what we need to search in HDFS in order to find why we get Permission denied.

Looks line user=airflow don't have access to write data into HDFS.
By default the /user/ directory is owned by "hdfs" with 755 permissions. As a result only hdfs can write to that directory.
You can use two options:
change spark user name from airflow to hdfs or
If you still need to use user=airflow, create a home directory for airflow
sudo -u hdfs hadoop fs -mkdir /user/airflow
sudo -u hdfs hadoop fs -chown root /user/airflow

Related

Trying to move a csv file from local file system to hadoop file system

I am trying to copy a csv file from my local file system to hadoop. But I am not able to successfully do it. I am not sure which permissions i need to change. As I understand. hdfs super user does not have access to the /home/naya/dataFiles/BlackFriday.csv
hdfs dfs -put /home/naya/dataFiles/BlackFriday.csv /tmp
# Error: put: Permission denied: user=naya, access=WRITE, inode="/tmp":hdfs:supergroup:drwxr-xr-x
sudo -u hdfs hdfs dfs -put /home/naya/dataFiles/BlackFriday.csv /tmp
# Error: put: `/home/naya/dataFiles/BlackFriday.csv': No such file or directory
Any help is highly appreciated. I want to do it via the command line utility. I can do it via cloudera manager from the hadoop side. But I want to understand whats happening behind the commands

how to Intialize the spark shell with a specific user to save data to hdfs by apache spark

im using ubuntu
im using spark dependency using intellij
Command 'spark' not found, but can be installed with: .. (when i enter spark in shell)
i have two user amine , and hadoop_amine (where hadoop hdfs is set)
when i try to save a dataframe to HDFS (spark scala):
procesed.write.format("json").save("hdfs://localhost:54310/mydata/enedis/POC/processed.json")
i got this error
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/mydata/enedis/POC":hadoop_amine:supergroup:drwxr-xr-x

Try to change the permissions of the HDFS directory or change your spark user simply!
For changing the directory permission you can use hdfs command line like this
hdfs dfs -chmod ...
In spark-submit you can use the proxy-user option
And at last, you can run the spark-submit or spark-shell with the proper user like this command:
sudo -u hadoop_amine spark-submit ...

error when trying to save dataframe spark to a hdfs file

I'm using Ubuntu
When i try to save a dataframe to HDFS (Spark Scala):
processed.write.format("json").save("hdfs://localhost:54310/mydata/enedis/POC/processed.json")
I got this error
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/mydata/enedis/POC":hadoop_amine:supergroup:drwxr-xr-x

You are trying to write data as root user but hdfs directory(/mydata/enedis/POC) having permissions to hadoop_amine user to write to the directory.
Change the permissions on the HDFS directory to allow root user to write to /mydata/enedis/POC directory.
#login as hadoop_amine user then execute below command
hdfs dfs –chmod -R 777 /mydata/enedis/POC
(Or)
Intialize the spark shell with hadoop_amine user then no need to change the permissions of the directory.

Storefile from spark on windows to HDFS

I have installed Hadoop/YARN in a linux VM on my local windows machine. On the same windows machine (not in VM) I have installed Spark. When running spark on windows, I can read files stored in HDFS (in linux VM).
val lines = sc.textFile("hdfs://MyIP:9000/Data/sample.txt")
While saving a file using to HDFS saveAsTextFile("hdfs://MyIP:9000/Data/Output"), I am getting below error:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=LocalWindowsUser, access=WRITE,
inode="/Data":hadoop:supergroup:drwxr-xr-x.
I guess, it's because Windows and Linux users are different and windows user doesn't have permission to write files in linux.
What is the correct way to store files from windows to HDFS (linux VM) using spark?

Your problem is that the username that you are using to access HDFS with write mode does not have permissions.
The directory /Data has the permissions rwxr-xr-x. This translates to mode 755. Your username is LocalWindowsUser which has read and execute permissions.
Possible solutions:
Soution 1:
Since this is a local system under your full control, change the permissions to allow everyone access. Execute this command while inside the VM as the user hadoop:
hdfs dfs -chmod -R 777 /Data
Solution 2:
Create an environment variable in Windows and set the username:
set HADOOP_USER_NAME=hadoop
The username really should be the user hdfs. Try that also if necessary.

How to allow the root user to write files into HDFS

I have installed hadoop on Cent OS 7. The daemon service written in python trying to make a directory in HDFS , but getting the below permission error.
mkdir: Permission denied: user=root, access=WRITE, inode="/rep_data/store/data/":hadoop:supergroup:drwxr-xr-x
looks like my service is running under root account.
So I would like to know how do I give a permission to the root user to make directory and write files.

If you are trying to create directory under HDFS root i.e /, you may face this type of issue. You can create directories in your home, without any issues
To create directory in root, Execute command like follows
sudo hdfs hdfs dfs -mkdir /directory/name
To create directory in your HDFS home execute below command
hdfs dfs -mkdir /user/user_home/directory/name

This is probably an issue because you are not the super user.
A workaround is to enable Access Control Lists in hdfs and give permissions to your user.
To enable support for ACLs, set dfs.namenode.acls.enabled to true in the NameNode configuration.
For more info check: link

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

SPARK Application + HDFS + User Airflow is not the owner of inode=alapati - apache-spark

Related

Trying to move a csv file from local file system to hadoop file system

how to Intialize the spark shell with a specific user to save data to hdfs by apache spark

error when trying to save dataframe spark to a hdfs file

Storefile from spark on windows to HDFS

How to allow the root user to write files into HDFS

Categories

Resources