spark job submitted via Livy throws GSSException: No valid credentials provided (Mechanism level: Failed to find any kerberos tgt) - apache-spark

I am trying to launch my spark batch job using livy. From the logs , i see that the start running but fails when it tries to access hive metastore with the following kerberos error:
GSSException: No valid credentials provided (Mechanism level: Failed
to find any kerberos tgt)
The same job runs fine when i launch it using a spark-submit command. However in the spark-submit command i pass the keytab and principal (--keytab, --principal).
I tried passing the keytab and principal in the livy rest call using the parameters spark.yarn.keytab and spark.yarn.principal. adding these options throw the following error:
Error: only one of --proxy-user or --principal can be provided
even though I do not provide proxyUser parameter in my curl request.
kindly let me know if you know how to resolve this issue

Related

Spark Structure Streaming job failing in cluster mode

I am using spark-sql-2.4.1 v in my application.
While writing data on to hdfs folder I am facing this issue in spark-streaming application
Error:
yarn.Client: Deleted staging directory hdfs://dev/user/xyz/.sparkStaging/application_1575699597805_47
20/02/24 14:02:15 ERROR yarn.Client: Application diagnostics message: User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied: user= xyz, access=WRITE, inode="/tmp/hadoop-admin":admin:supergroup:drwxr-xr-x
.
.
.
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=xyz, access=WRITE, inode="/tmp/hadoop-admin":admin:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:350)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:251)
While writing data on to HDFS folder I am facing this issue in spark-streaming application. When I run in yarn-cluster mode I face this issue i.e.
--master yarn \
--deploy-mode cluster \
But when I run in “yarn-client” mode it runs fine i.e.
--master yarn \
--deploy-mode client \
What is the root cause of this problem?
Fundamental question here, why it is trying to write in "/tmp/hadoop-admin/" instead of respective user directory i.e. hdfs://qa2/user/xyz/?
I have come across this fix:
https://issues.apache.org/jira/browse/SPARK-26825
How can I implement it in my spark-sql application?
The only difference between the working --deploy-mode client and the failing --deploy-mode cluster cases is the location of the driver. In client deploy mode, the driver runs on the machine you execute spark-submit (which is usually an edge node that is configured to use a YARN cluster, but it is not part of it) while in cluster deploy mode the driver runs as part of a YARN cluster (one of the nodes under control of YARN).
It looks like you've got a misconfigured edge node.
I'd not be surprised if a regular Spark SQL-only Spark application would be failing too. I'd not be surprised to hear that it has nothing to do with a streaming query (Spark Structured Streaming) and would fail for any Spark application.

How to distcp between a MAPR filesystem and a HDInsight Blob Storage

I'm trying to execute the distcp command below, however it is throwing the exception:
hadoop distcp date_load=201901* wasb://dev3-spark#clusterdev.blob.core.windows.net/luiz/producao/performance/performance_report
The thrown exception is as follow:
I'm trying to execute the distcp command below, however it is throwing the exception:
hadoop distcp date_load=201901* wasb://dev3-spark#clusterdev.blob.core.windows.net/luiz/producao/performance/performance_report
The thrown exception is as follow:
19/02/06 13:34:53 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
19/02/06 13:34:53 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
19/02/06 13:34:53 INFO impl.MetricsSystemImpl: azure-file-system metrics system started
19/02/06 13:34:53 ERROR tools.DistCp: Invalid arguments:
org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: Container dev3-spark in account clusterdev.blob.core.windows.net not found, and we can't create it using anoynomous credentials.
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:938)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:438)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1048)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2693)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:98)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2773)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2755)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:411)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:309)
at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:216)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:116)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: org.apache.hadoop.fs.azure.AzureException: Container dev3-spark in account clusterdev.blob.core.windows.net not found, and we can't create it using anoynomous credentials.
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:730)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:933)
... 12 more
Invalid arguments: org.apache.hadoop.fs.azure.AzureException: Container dev3-spark in account clusterdev.blob.core.windows.net not found, and we can't create it using anoynomous credentials.
You can distcp from your on-premise cluster to your Azure storage account
% hadoop distCP hdfs://<yourHostName>:9001/user/<yourUser>/<yourDirectory> wasbs://<yourStorageContainer>#<YourStorageAccount>.blob.core.windows.net/<yourDestinationDirectory>/
Hope this helps.

Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled

I am trying to configure hadoop pseudo node secure cluster (to ensure proper working) in Azure using Azure Domain Service.
OS - Windows Server 2012 R2 Datacenter
Hadoop Version - 2.7.2
I can able to run
hadoop fs -ls /
Example MapReduce job works fine
yarn jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-*.jar pi 16 10000
But when i run,
hdfs fsck /
it gives,
Connecting to namenode via https://node1:50470/fsck?ugi=Kumar&path=%2F
Exception in thread "main" java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, status: 403, message: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)
at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:335)
at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:73)
at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:152)
at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:377)
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, status: 403, message: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)
at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:274)
at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77)
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:214)
at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
at org.apache.hadoop.hdfs.web.URLConnectionFactory.openConnection(URLConnectionFactory.java:161)
at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:333)
... 10 more
When i access namenode web ui, it shows
GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)
But the same configuration works fine in Local Windows Active Directory
Someone help me to resolve this error and get it work successfully.

new Spark StreamingContext failes with hdfs errors

I'm using dcos installed via Azure ACS and installed hdfs and spark via dcos tool with default options.
Creating a SparkStreamingContext gives:
16/07/22 01:51:04 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly.
16/07/22 01:51:04 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly.
Exception in thread "main" java.lang.IllegalArgumentException:
java.net.UnknownHostException: namenode1.hdfs.mesos
I expect I have to redeploy the spark package with dcos package install with –options= but can't figure out what the hdfs.config-url should be. The https://docs.mesosphere.com/1.7/usage/service-guides/spark/install/#hdfs docs seem out of date.
Yes, it is out of date. We'll fix that.
DC/OS HDFS now serves its config on http://hdfs.marathon.mesos:[port]/v1/connect

Spark + Mesos cluster mode, who uploads the jar?

I'm trying to run Spark applications with Mesos cluster mode. (I've got client mode working but still would like to try cluster mode)
I have launched spark-mesos-dispatcher on the Mesos master node.
When I submit the assembly at local path /tmp/assembly.jar using the following command,
bin/spark-submit --master mesos://dispatcher:7077 --deploy-mode cluster --class com.example.Example /tmp/assembly.jar
It fails because the file /tmp/assembly.jar does not exist on the mesos slave nodes.
I1129 10:47:43.839771 5884 fetcher.cpp:414] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/9d725348-931a-48fb-96f7-d29a4b09f3e8-S9\/deploy","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"\/tmp\/assembly.jar"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/9d725348-931a-48fb-96f7-d29a4b09f3e8-S9\/frameworks\/9d725348-931a-48fb-96f7-d29a4b09f3e8-0291\/executors\/driver-20151129104742-0008\/runs\/31bf5840-226e-4b87-ae76-d14bd2f17950","user":"user"}
I1129 10:47:43.840710 5884 fetcher.cpp:369] Fetching URI '/tmp/assembly.jar'
I1129 10:47:43.840721 5884 fetcher.cpp:243] Fetching directly into the sandbox directory
I1129 10:47:43.840731 5884 fetcher.cpp:180] Fetching URI '/tmp/assembly.jar'
I1129 10:47:43.840737 5884 fetcher.cpp:160] Copying resource with command:cp '/tmp/assembly.jar' '/var/lib/mesos/slaves/9d725348-931a-48fb-96f7-d29a4b09f3e8-S9/frameworks/9d725348-931a-48fb-96f7-d29a4b09f3e8-0291/executors/driver-20151129104742-0008/runs/31bf5840-226e-4b87-ae76-d14bd2f17950/assembly.jar'
cp: cannot stat `/tmp/assembly.jar': No such file or directory
Failed to fetch '/tmp/assembly.jar': Failed to copy with command 'cp '/tmp/assembly.jar' '/var/lib/mesos/slaves/9d725348-931a-48fb-96f7-d29a4b09f3e8-S9/frameworks/9d725348-931a-48fb-96f7-d29a4b09f3e8-0291/executors/driver-20151129104742-0008/runs/31bf5840-226e-4b87-ae76-d14bd2f17950/assembly.jar'', exit status: 256
Failed to synchronize with slave (it's probably exited)
In case of YARN cluster mode, Spark's YARN client implementation will upload the application jar to HDFS so that the driver and all executors have access to the jar, but I could not find such code in RestSubmissionClient, which is used by Mesos or Standalond cluster mode.
Who does the uploading in this case? or do I need to manually put the application assembly somewhere accessible via an HTTP URI?
From my understanding you could use the SparkContext addJar() method to add a local (to the driver application) JAR file path, which will then be distributed to the executor nodes (in client mode).
As you state that you want to use cluster mode, I'd suggest that you have a look at the Spark Jobserver project, which should make the running of Spark applications on Mesos easier than with the built-in tools.

Resources