Permission error when using sparklyr with Hadoop - apache-spark

I am trying to get sparklyr to work on a cluster with Hadoop. When I run sc <- spark_connect(master = "yarn-client", version = "2.8.5")
I get this error message:
Error in force(code) :
Failed during initialize_connection: org.apache.hadoop.security.AccessControlException: Permission denied: user=rstudio, access=WRITE, inode="/user":hdfs:hadoop:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:189)
...
The user rstudio is what I created for RStudio server. How do I fix the permissions to get it to work?

Using hadoop superuser (looks like its hdfs in your case), you need to create an HDFS home directory (/user/rstudio) for your rstudio user, and change its ownership so that rstudio is the owner. See http://www.hadooplessons.info/2017/12/creating-home-directory-for-user-in-hdfs-hdpca.html?m=1 for details.

Related

Spark 2.4 Got an error when resolving hostNames Falling back to /default-rack

Running an application in in client mode, the driver logs are printed with the below info messages, any idea on how to resolve this? Any spark configs to be updated? or missing?
[INFO ][dispatcher-event-loop-29][SparkRackResolver:54] Got an error when resolving hostNames. Falling back to /default-rack for all
The jobs runs fine, this msg is not in the executor logs.
Check this bug:
https://issues.apache.org/jira/browse/SPARK-28005
If you want to suppress this in the logs you can try to add this into your log4j.properties
log4j.logger.org.apache.spark.deploy.yarn.SparkRackResolver=ERROR
This can happen while using spart-submit with master yarn in a deploy mode local (not using --deploy-mode cluster) and the path to topology.py script is not correct into your core-site.xml.
Path to core-site.xml can be set via environment variable HADOOP_CONF_DIR (or YARN_CONF_DIR).
Check the path in the param net.topology.script.file.name value of core-site.xml.
If the path is incorrect, deploying driver in local mode will lead to error of executing with the following warning:
23/01/15 18:39:43 WARN ScriptBasedMapping: Exception running /home/alexander/xxx/.conf/topology.py 10.15.21.199
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "/home/john"): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
...
23/01/15 18:39:43 INFO SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all

Running Derby as a server on Linux using JDK11

I am at my wits end!
I have a minimal install of Ubuntu Server 18.04 and OpenJDK 11 (headless).
Downloaded, to a local folder are the java 9+ binaries for Derby (db-derby-10.15.2.0-bin)
Path and Environment settings are all correct!
When I start the server startNetworkServer -h 0.0.0.0, I get an error when doing a simple connect using the ij command line tool
ij> connect 'jdbc:derby://localhost:1527/dbname;create=true';
ERROR XJ041: DERBY SQL error: ERRORCODE: 40000, SQLSTATE: XJ041, SQLERRMC: Failed to create database 'dbname', see the next exception for details.::SQLSTATE: XBM01::SQLSTATE: XJ001
The derby.log file makes reference to:
java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "getenv.SOURCE_DATE_EPOCH")
Looking further into this error, I learned that I somehow need a security.profile. I found this website that seemed to be the answers to my problems. https://www.javacodegeeks.com/2020/04/apache-derby-database-jvm-security-policy.html
Following these pretty straight-forward instructions, I get:
java.security.AccessControlException: access denied
org.apache.derby.shared.common.security.SystemPermission( "engine", "usederbyinternals" )
For the next person who has this strange problem (it seems to happen with some regularity, here's a simple workaround, copied from this FAQ page at Chalmers Institute of Technology:
Q: When we try to create a database in Derby and the database explorer in NetBeans, we get one or more of the following error(s):
An error occurred while creating the database:
java.sql.NonTransientConnectionException: DERBY SQL error: ERRORCODE:
40000, SQLSTATE: XJ041, SQLERRMC: ...
Caused by: java.security.AccessControlException: access denied
("java.lang.RuntimePermission" "getenv.SOURCE_DATE_EPOCH")
A: This is some kind of missconfiguration in the JVM with a very aggressive security policy that doesn't allow applications to fetch the time on the system (since epoch). The solution is to edit ~/.java.policy or [java.home]/lib/security/java.policy and add the following:
grant {
permission java.lang.RuntimePermission "getenv.SOURCE_DATE_EPOCH", "read";
};
If you are on Windows you can read about where this policy file is supposed to be located here;
https://docs.oracle.com/javase/7/docs/technotes/guides/security/PolicyFiles.html
Apache-Derby is a database management system prepared for a multi-user environment, therefore, when you execute the startNetworkServer -h 0.0.0.0 instruction, you are telling it by default to take certain security into account, and that is why it does not let you do an insecure connection such as ij> connect 'jdbc:derby://172.16.17.31:1527/BBDD_server;create=true';
because you are connecting without specifying username and password, so you should either connect by specifying username + password, or start the server without any security:
startNetworkServer -h 0.0.0.0 -noSecurityManager
More help:
https://db.apache.org/derby/docs/10.4/adminguide/tadminnetservopen.html
https://db.apache.org/derby/docs/10.4/adminguide/tadminnetservbasic.html

Hadoop Permission denied on Bash on Ubuntu on Windows

I'm trying to install Hadoop in order to use the HDFS service. I'm doing it in the Bash on Ubuntu on Windows (not VM) :
https://www.microsoft.com/fr-fr/store/p/ubuntu/9nblggh4msv6
The tutorial that I followed was this one (which is really similar to most tutorials for Hadoop installation):
https://www.youtube.com/watch?v=Nb1sinaTlmo
So everything goes well until I try to run the start-dfs.sh but I get this error messages:
17/12/12 22:19:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-DESKTOP-QG5RB4T.out
localhost: nice: cannot set niceness: Permission denied
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-DESKTOP-QG5RB4T.out
localhost: nice: cannot set niceness: Permission denied
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-DESKTOP-QG5RB4T.out
0.0.0.0: nice: cannot set niceness: Permission denied
17/12/12 22:19:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I made sure that the ssh localhost worked, also I gave permission to the hadoop folder chmod -R 755 hadoop/ and also give root privileges to the user I created for doing the hadoop implementation.
If you have any leads or solutions for this problem it would be amazing.
I have the exact same problem and for the love of god, I can't figure it out.
It has been like 2 Months and still nothing helps. I asked so many professionals and they can't figure it out themselves too.
The only tip I received is to check my Windows SSH Broker and disable it, but as far as I tried, it can't be done, cause it's an integral part to Windows. You can only partially disable it, which changes absolutely nothing.
I hope you can find the solution for the problem and I'm sorry that I can't help you with that any further, but I would be soo glad to know the solution to this aswell.
Edit your /etc/passwd file, giving {hadoop} user root permissions (replacing 1001 values with 0)
May be you can see this answer: https://github.com/Linuxbrew/brew/issues/695#issuecomment-386121530.
It says "You can safely ignore this warning. It's an upstream bug in Microsoft Windows."

SaveAsTextFile() results in Mkdirs failed - Apache Spark

Good, I currently have a cluster in spark with 3 working nodes. I also have a nfs server mounted on /var/nfs with 777 permission for testing. I'm trying to run the following code to count the words in a text:
root#master:/home/usuario# MASTER="spark://10.0.0.1:7077" spark-shell
val inputFile = sc.textFile("/var/nfs/texto.txt")
val counts = inputFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts.toDebugString
counts.cache()
counts.count()
counts.saveAsTextFile("/home/usuario/output");
But spark gives me the following error:
Caused by: java.io.IOException: Mkdirs failed to create
file:/var/nfs/output-4/_temporary/0/_temporary/attempt_20170614094558_0007_m_000000_20
(exists=false, cwd=file:/opt/spark/work/app-20170614093824-0005/2)
I have searched for many websites but I can not find the solution for my case. All help is grateful.
When you start a spark-shell with MASTER as valid application-master url - and not local[*], spark treats all paths as HDFS; and performs IO operations only in underlying HDFS; not in local.
YOu have mounted the locations in local file-system; and those paths are not existed in HDFS.
That's why, the error says: exists=false
Same issue with me. Check ownership of your directory again.
sudo chown -R owner-user:owner-group directory

Spark program on Windows cluster fails with error CreateProcess error=5, Access is denied

I am trying to execute a program on a Spark v2.0.0 Cluster on my Windows 10 laptop. There is a master node on port 31080 and slave node on 32080. The cluster is using the Standalone manager and am using JDK 1.8, with a custom work directory for the slave.
When the program is submitted via spark-submit or through Eclipse > Run program, I get the below error, and the executor goes in a loop (a new executor is created, and fails continuously). Please guide.
Executor updated: app-20160906203653-0001/0 is now RUNNING
Executor updated: app-20160906203653-0001/0 is now FAILED (java.io.IOException: Cannot run program ""D:\jdk1.8.0_101"\bin\java"
(in directory "D:\spark-work\app-20160906203653-0001\0"):
CreateProcess error=5, Access is denied)
Executor app-20160906203653-0001/0 removed: java.io.IOException: Cannot run program ""D:\jdk1.8.0_101"\bin\java" (in directory
"D:\spark-work\app-20160906203653-0001\0"): CreateProcess error=5,
Access is denied
Removal of executor 0 requested
Got the answer.. I was starting my master and slaves through windows batch scripts. These were invoking an env script which was setting JAVA_HOME, SCALA_HOME and SPARK_HOME. The paths were enclosed in double quotes. Hence the issue. Removing the double quotes fixed the issue... no Admin priviliges or changes needed.

Resources