org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Token has expired - apache-spark

I have an issue in spark streaming job. This job will do the below process.
- Read the streaming data
- Join the streaming data with Hive table (underlying data is present in HBase.
Program is running for 2-3 weeks but then it is failing with the below message.
19/04/27 05:56:16 WARN security.UserGroupInformation: PriviledgedActionException as:ndc_common (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Token has expired
19/04/27 05:56:16 WARN ipc.RpcClientImpl: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Token has expired
Can someone please explain me how to resolve this issue. Though the cluster is kerborised.
Thanks

Related

Spark doesn't acknowledge Kerberos authentication and application fail when delegation token is issued

I'm using spark to read data files from hdfs.
When I do a spark action, a spark exception is raised:
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication
In the logs before the execption is thrown I can see:
WARN [hadoop.security.UserGroupInformation] PriviledgedActionException as:principal (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
[21/02/22 17:27:17.439] WARN [hadoop.ipc.Client] Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
[21/02/22 17:27:17.440] WARN [hadoop.security.UserGroupInformation] PriviledgedActionException as:principal (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
Which is really weird because I set the spark submit config with kerberos using --keytab and --principal configs:
spark.master yarn-cluster
spark.app.name app
spark.submit.deployMode cluster
spark.yarn.principal pincipal/principal.com
spark.yarn.keytab all.keytab
spark.driver.memory 4G
spark.executor.memory 8G
spark.executor.instances 4
spark.executor.cores 8
spark.deploy.recoveryMode ZOOKEEPER
spark.deploy.zookeeper.url jdbc:phoenix:m1,m2,m3,m4:2181:/hbase
spark.driver.extraJavaOptions -XX:MaxPermSize=1024M -Dlog4j.configuration=log4j.xml
spark.executor.extraJavaOptions -Dlog4j.configuration=log4j.xml
I don't understand why the delegation token wouldn't be possible since the it is set up as kerberos auth.
I also don't understand why it displays those warnings as if the authentication mode of my spark was set as SIMPLE. Is spark ignoring my config ?
I have 2 environment, one on which the application works properly but I don't know what config I should look at.

Client cannot authenticate via: [TOKEN, KERBEROS)

From my spark application I am trying to distcp from hdfs to s3. My app does some processing on data and writes data to hdfs and that data I am trying to push to s3 via distcp. I am facing below error. Any pointer will be helpful.
org.apache.hadoop.security.UserGroupInformation doAs -
PriviledgedActionException as: (auth:SIMPLE) cause:org.apache.hadoop.security.
Failed on local exception: java.io.IOException:
org.apache.hadoop.security.AccessControlException:
Client cannot authenticate via: [TOKEN, KERBEROS);
I was already doing knit . Doing ugi.doAs new privilege action fixed this issue

Elasticsearch spark connection for structured-streaming

I'm trying to make a connection to elasticsearch from my spark program.
My elasticsearch host is https and found no connection property for that.
We are using spark structred streaming Java API and the connection details are as follows,
SparkSession spark = SparkSession.builder()
.config(ConfigurationOptions.ES_NET_HTTP_AUTH_USER, "username")
.config(ConfigurationOptions.ES_NET_HTTP_AUTH_PASS, "password")
.config(ConfigurationOptions.ES_NODES, "my_host_url")
.config(ConfigurationOptions.ES_PORT, "9200")
.config(ConfigurationOptions.ES_NET_SSL_TRUST_STORE_LOCATION,"C:\\certs\\elastic\\truststore.jks")
.config(ConfigurationOptions.ES_NET_SSL_TRUST_STORE_PASS,"my_password") .config(ConfigurationOptions.ES_NET_SSL_KEYSTORE_TYPE,"jks")
.master("local[2]")
.appName("spark_elastic").getOrCreate();
spark.conf().set("spark.sql.shuffle.partitions",2);
spark.conf().set("spark.default.parallelism",2);
And I'm getting the following error
19/07/01 12:26:00 INFO HttpMethodDirector: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server 10.xx.xxx.xxx failed to respond
19/07/01 12:26:00 INFO HttpMethodDirector: Retrying request
19/07/01 12:26:00 ERROR NetworkClient: Node [10.xx.xxx.xxx:9200] failed (The server 10.xx.xxx.xxx failed to respond); no other nodes left - aborting...
19/07/01 12:26:00 ERROR StpMain: Error
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:344)
Probably it's because it tries to initiate connection by http protocol but in my case I need https connection and not sure how to configure that
The error happened as spark was not able to locate the truststore file. It seems we need to add "file:\\" for the path to be accepted.

Error opening block StreamChunkId : BlockNotFoundException

I am getting some transient excepting in using spark-streaming with Amazon Kinesis with storage level "MEMORY_AND_DISK_2". We are using Spark 2.2.0 with emr-5.9.0.
19/05/22 01:56:16 ERROR TransportRequestHandler: Error opening block StreamChunkId{streamId=438690479801, chunkIndex=0} for request from /10.1.100.56:38074
org.apache.spark.storage.BlockNotFoundException: Block broadcast_13287_piece0 not found
I have checked that are no lost nodes in EMR cluster. And HDFS utilization percentage is 35%

We are running a map reduce/spark job to bulk load hbase data in One of the environment

We are running a map reduce/spark job to bulk load hbase data in one of the environments.
While running it, connection to the hbase zookeeper cannot initialize throwing the following error.
16/05/10 06:36:10 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=c321shu.int.westgroup.com:2181,c149jub.int.westgroup.com:2181,c167rvm.int.westgroup.com:2181 sessionTimeout=90000 watcher=hconnection-0x74b47a30, quorum=c321shu.int.westgroup.com:2181,c149jub.int.westgroup.com:2181,c167rvm.int.westgroup.com:2181, baseZNode=/hbase
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Opening socket connection to server c321shu.int.westgroup.com/10.204.152.28:2181. Will not attempt to authenticate using SASL (unknown error)
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.204.24.16:35740, server: c321shu.int.westgroup.com/10.204.152.28:2181
16/05/10 06:36:10 INFO zookeeper.ClientCnxn: Session establishment complete on server c321shu.int.westgroup.com/10.204.152.28:2181, sessionid = 0x5534bebb441bd3f, negotiated timeout = 60000
16/05/10 06:36:11 INFO mapreduce.HFileOutputFormat2: Looking up current regions for table ecpdevv1patents:NormNovusDemo
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Tue May 10 06:36:11 CDT 2016, org.apache.hadoop.hbase.client.RpcRetryingCaller#3927df20, java.io.IOException: Call to c873gpv.int.westgroup.com/10.204.67.9:60020 failed on local exception: java.io.EOFException
We have executed the same job in Titan DEV too but facing the same problem. Please let us know if anyone has faced the same problem before.
Details are,
• Earlier job was failing to connect to localhost/127.0.0.1:2181. Hence only the property hbase.zookeeper.quorum has been set in map reduce code with c149jub.int.westgroup.com,c321shu.int.westgroup.com,c167rvm.int.westgroup.com which we got from hbase-site.xml.
• We are using jars of cdh version 5.3.3.

Resources