error when trying to save dataframe spark to a hdfs file - apache-spark

I'm using Ubuntu
When i try to save a dataframe to HDFS (Spark Scala):
processed.write.format("json").save("hdfs://localhost:54310/mydata/enedis/POC/processed.json")
I got this error
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/mydata/enedis/POC":hadoop_amine:supergroup:drwxr-xr-x

You are trying to write data as root user but hdfs directory(/mydata/enedis/POC) having permissions to hadoop_amine user to write to the directory.
Change the permissions on the HDFS directory to allow root user to write to /mydata/enedis/POC directory.
#login as hadoop_amine user then execute below command
hdfs dfs –chmod -R 777 /mydata/enedis/POC
(Or)
Intialize the spark shell with hadoop_amine user then no need to change the permissions of the directory.

Related

Trying to move a csv file from local file system to hadoop file system

I am trying to copy a csv file from my local file system to hadoop. But I am not able to successfully do it. I am not sure which permissions i need to change. As I understand. hdfs super user does not have access to the /home/naya/dataFiles/BlackFriday.csv
hdfs dfs -put /home/naya/dataFiles/BlackFriday.csv /tmp
# Error: put: Permission denied: user=naya, access=WRITE, inode="/tmp":hdfs:supergroup:drwxr-xr-x
sudo -u hdfs hdfs dfs -put /home/naya/dataFiles/BlackFriday.csv /tmp
# Error: put: `/home/naya/dataFiles/BlackFriday.csv': No such file or directory
Any help is highly appreciated. I want to do it via the command line utility. I can do it via cloudera manager from the hadoop side. But I want to understand whats happening behind the commands

how to Intialize the spark shell with a specific user to save data to hdfs by apache spark

im using ubuntu
im using spark dependency using intellij
Command 'spark' not found, but can be installed with: .. (when i enter spark in shell)
i have two user amine , and hadoop_amine (where hadoop hdfs is set)
when i try to save a dataframe to HDFS (spark scala):
procesed.write.format("json").save("hdfs://localhost:54310/mydata/enedis/POC/processed.json")
i got this error
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/mydata/enedis/POC":hadoop_amine:supergroup:drwxr-xr-x
Try to change the permissions of the HDFS directory or change your spark user simply!
For changing the directory permission you can use hdfs command line like this
hdfs dfs -chmod ...
In spark-submit you can use the proxy-user option
And at last, you can run the spark-submit or spark-shell with the proper user like this command:
sudo -u hadoop_amine spark-submit ...

SPARK Application + HDFS + User Airflow is not the owner of inode=alapati

We are running spark application on Hadoop cluster ( HDP version - 2.6.5 from Hortonworks ).
From the logs we can see the following Diagnostics
User: airflow
Application Type: SPARK
User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied. user=airflow is not the owner of inode=alapati
It is not provided clearly in log what we need to search in HDFS in order to find why we get Permission denied.
Looks line user=airflow don't have access to write data into HDFS.
By default the /user/ directory is owned by "hdfs" with 755 permissions. As a result only hdfs can write to that directory.
You can use two options:
change spark user name from airflow to hdfs or
If you still need to use user=airflow, create a home directory for airflow
sudo -u hdfs hadoop fs -mkdir /user/airflow
sudo -u hdfs hadoop fs -chown root /user/airflow

spark sql permission denied user=anonymous

The error that returns for Spark SQL when the denied user is anonoymous:
src:hdfs://ournamenode:8020/tmp/hive/spark-root_hive_2017-02-24_00-40-48_944_8869995689545229744-1/-ext-10000/load_date=20170223/part-07262, dest: hdfs://ournamenode:8020/user/hive/warehouse/p13n.db/message_viewed_new/load_date=20170223/part-07262, Status:true
chmod: changing permissions of 'hdfs://ournamenode:8020/user/hive/warehouse/p13n.db/message_viewed_new/load_date=20170223/part-07262': Permission denied. user=anonymous is not the owner of inode=part-07262
Is there anywhere to fix it?
I'm assuming you are using beeline to run queries. One fix is to specify username using the -n option.
The problem that causes this permission denied issue is that the "spark-warehouse" directory is created under the "anonymous" user, while files in this directory are created under the user who is running the beeline command.

How to allow the root user to write files into HDFS

I have installed hadoop on Cent OS 7. The daemon service written in python trying to make a directory in HDFS , but getting the below permission error.
mkdir: Permission denied: user=root, access=WRITE, inode="/rep_data/store/data/":hadoop:supergroup:drwxr-xr-x
looks like my service is running under root account.
So I would like to know how do I give a permission to the root user to make directory and write files.
If you are trying to create directory under HDFS root i.e /, you may face this type of issue. You can create directories in your home, without any issues
To create directory in root, Execute command like follows
sudo hdfs hdfs dfs -mkdir /directory/name
To create directory in your HDFS home execute below command
hdfs dfs -mkdir /user/user_home/directory/name
This is probably an issue because you are not the super user.
A workaround is to enable Access Control Lists in hdfs and give permissions to your user.
To enable support for ACLs, set dfs.namenode.acls.enabled to true in the NameNode configuration.
For more info check: link

Resources