Spark doesn't acknowledge Kerberos authentication and application fail when delegation token is issued - apache-spark

I'm using spark to read data files from hdfs.
When I do a spark action, a spark exception is raised:
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication
In the logs before the execption is thrown I can see:
WARN [hadoop.security.UserGroupInformation] PriviledgedActionException as:principal (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
[21/02/22 17:27:17.439] WARN [hadoop.ipc.Client] Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
[21/02/22 17:27:17.440] WARN [hadoop.security.UserGroupInformation] PriviledgedActionException as:principal (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
Which is really weird because I set the spark submit config with kerberos using --keytab and --principal configs:
spark.master yarn-cluster
spark.app.name app
spark.submit.deployMode cluster
spark.yarn.principal pincipal/principal.com
spark.yarn.keytab all.keytab
spark.driver.memory 4G
spark.executor.memory 8G
spark.executor.instances 4
spark.executor.cores 8
spark.deploy.recoveryMode ZOOKEEPER
spark.deploy.zookeeper.url jdbc:phoenix:m1,m2,m3,m4:2181:/hbase
spark.driver.extraJavaOptions -XX:MaxPermSize=1024M -Dlog4j.configuration=log4j.xml
spark.executor.extraJavaOptions -Dlog4j.configuration=log4j.xml
I don't understand why the delegation token wouldn't be possible since the it is set up as kerberos auth.
I also don't understand why it displays those warnings as if the authentication mode of my spark was set as SIMPLE. Is spark ignoring my config ?
I have 2 environment, one on which the application works properly but I don't know what config I should look at.

Related

How to refresh kerberos ticket in running structured streaming spark application once in 7 days?

I've been running a structured streaming application to join 2 streams from kafka and push to the third stream. The application gets failed once in 7 days as HDFS_DELEGATION_TOKEN expires. I'm using jaas file to send the relevant configuration.
RegistryClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="./user.keytab"
storeKey=true
useTicketCache=false
principal="uder#Principal";
};
Pass the below parameters in spark submit command.
--conf spark.yarn.keytab=/path/to/file.keytab
--conf spark.yarn.principal=principleName#domain

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Token has expired

I have an issue in spark streaming job. This job will do the below process.
- Read the streaming data
- Join the streaming data with Hive table (underlying data is present in HBase.
Program is running for 2-3 weeks but then it is failing with the below message.
19/04/27 05:56:16 WARN security.UserGroupInformation: PriviledgedActionException as:ndc_common (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Token has expired
19/04/27 05:56:16 WARN ipc.RpcClientImpl: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Token has expired
Can someone please explain me how to resolve this issue. Though the cluster is kerborised.
Thanks

Kerberos ticket renewal on Spark streaming job that communicates to Kafka

I have a long running Spark streaming job that runs on a kerberized Hadoop cluster. It fails every few days with the following error:
Diagnostics: token (token for XXXXXXX: HDFS_DELEGATION_TOKEN owner=XXXXXXXXX#XX.COM, renewer=yarn, realUser=, issueDate=XXXXXXXXXXXXXXX, maxDate=XXXXXXXXXX, sequenceNumber=XXXXXXXX, masterKeyId=XXX) can't be found in cache
I tried adding in --keytab and --principal options to spark-submit. But we already have the following options that do the same thing:
For the second option, we already pass in the keytab and principal with the following:
'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=kafka_client_jaas.conf -Djava.security.krb5.conf=krb5.conf -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=12' \
Same for spark.executor.extraJavaOptions. If we add the options --principal and --keytab it results in attempt to add file (keytab) multiple times to distributed cache
There are 2 ways that you can do it.
Have a shell script that does the keytab/ticket generation on a regular interval.
[RECOMMENDED] Pass your keytab to Spark with strict access only to spark user and it can automatically regenerate the tickets for you. Visit this Cloudera community page for more details. It's just a simple bunch of steps and you can get going!
Hope that helps!

Spark Cluster mode issue to read Hive-Hbase table on Kerberized Environment

Error description
We are not able execute our Spark job in yarn-cluster or yarn-client mode, though it is working fine in the local mode.
This issue occurs when we try to read the Hive-HBase tables in a Kerberized cluster.
What we have tried so far
Passing all the HBASE jar in the –jar parameter in spark submi
--jars /usr/hdp/current/hive-client/lib/hive-hbase-handler-1.2.1000.2.5.3.16-1.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/current/hbase-client/lib/protobuf-java-2.5.0.jar,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar
Passing Hbase site and hive site in file parameter in Spark submit
--files /usr/hdp/2.5.3.16-1/hbase/conf/hbase-site.xml,/usr/hdp/current/spark-client/conf/hive-site.xml,/home/pasusr/pasusr.keytab
Doing Kerberos authentication inside the application. In the code we are explicitly passing the key tab
UserGroupInformation.setConfiguration(configuration)
val ugi: UserGroupInformation =
UserGroupInformation.loginUserFromKeytabAndReturnUGI(principle, keyTab)
UserGroupInformation.setLoginUser(ugi)
ConnectionFactory.createConnection(configuration)
return ugi.doAs(new PrivilegedExceptionActionConnection {
#throws[IOException]
def run: Connection = {
ConnectionFactory.createConnection(configuration) }
})
Passing key tab information in the Spark submit
Passing the HBASE jar in the spark.driver.extraClassPath and spark.executor.extraClassPath
Error Log
18/03/20 15:33:24 WARN TableInputFormatBase: You are using an HTable instance that relies on an HBase-managed Connection. This is usually due to directly creating an HTable, which is deprecated. Instead, you should create a Connection object and then request a Table instance from it. If you don't need the Table instance for your own use, you should instead use the TableInputFormatBase.initalizeTable method directly.
18/03/20 15:47:38 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 406, hadoopnode.server.name): java.lang.IllegalStateException: Error while configuring input job properties
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:444)
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:342)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=50, exceptions:
Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$1.run(RpcClientImpl.java:679)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
I was able to resolve this by adding following configuration in the spark-env.sh
export SPARK_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar
And removing the spark.driver.extraClassPath and spark.executor.extraClassPath in which I was passing the above Jar from the Spark submit command.

Enabling SSL between Apache spark and Kafka broker

I am trying to enable the SSL between my Apache Spark 1.4.1 and Kafka 0.9.0.0 and I am using spark-streaming-kafka_2.10 Jar to connect to Kafka and I am using KafkaUtils.createDirectStream method to read the data from Kafka topic.
Initially, I got OOM issue and I have resolved it by increasing the Driver memory, after that I am seeing below issue, I have done little bit of reading and found out that spark-streaming-kafka_2.10
uses Kafka 0.8.2.1 API, which doesn't support SSL (Kafka supports SSL only after 0.9.0.0 versions). are there any alternatives to enable SSL between Spark 1.4.1 and Kafka 0.9.0.0.
http://grokbase.com/p/kafka/users/158wy0wtxk/ssl-between-kafka-and-spark-streaming-api
SSL between Kafka and Spark
http://spark.apache.org/docs/latest/streaming-kafka-integration.html
https://issues.apache.org/jira/browse/SPARK-15089
Here is the log
iableProperties: Property security.protocol is not valid
16/10/24 18:25:09 WARN utils.VerifiableProperties: Property ssl.truststore.location is not valid
16/10/24 18:25:09 WARN utils.VerifiableProperties: Property ssl.truststore.password is not valid
16/10/24 18:25:09 INFO utils.VerifiableProperties: Property zookeeper.connect is overridden to
16/10/24 18:25:09 INFO consumer.SimpleConsumer: Reconnect due to error
Exception in thread "main" org.apache.spark.SparkException: java.io.EOFException
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:99)
Support for Kafka 0.10 (which has SSL support) has been added to Spark 2.0. we have to use maven artifact spark-streaming-kafka-0-10_2.10.

Resources