spark submit with phoenix - apache-spark

I'm Trying to connect to phoenix using spark submit on secured cluster I have this Exception
ConnectionQueryServicesImpl: Trying to connect to a secure cluster
with keytab:/hbase-secure 17/06/29 11:54:44 ERROR Executor: Exception
in task 0.0 in stage 0.0 (TID 0) java.sql.SQLException: ERROR 103
(08004): Unable to establish connection.
On my program I want to save to phoenix a dataframe (the program part that generate the exception) I already compiled and saved to phoenix from my intelliji but it's not working from spark submit even with master local
this is the part of code that generate the exception
df.write
.format("org.apache.phoenix.spark")
.mode("overwrite")
.option("table", output)
.option("zkUrl", "sv005689.info.ratp:2181")
.save()
this my spark submit command
spark-submit --class TEST.PHOENIX --master local --executor-memory 10G --num-executors 50 --conf spark.ui.port=14040 spark-assembly-0.0.2.jar
this is my exception
17/06/29 11:54:43 INFO ConnectionQueryServicesImpl: Trying to connect to a secure cluster with keytab:/hbase-secure
17/06/29 11:54:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.sql.SQLException: ERROR 103 (08004): Unable to establish connection.
at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:422)
at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:392)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:211)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2269)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2248)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2248)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:233)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:135)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:82)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:70)
at org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getUpsertColumnMetadataList(PhoenixConfigurationUtil.java:230)
at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$2.apply(DataFrameFunctions.scala:45)
at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$2.apply(DataFrameFunctions.scala:41)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Login failure for 2181 from keytab /hbase-secure: javax.security.auth.login.LoginException: Unable to obtain password from user
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:987)
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:280)
at org.apache.hadoop.hbase.security.User$SecureHadoopUser.login(User.java:386)
at org.apache.hadoop.hbase.security.User.login(User.java:253)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:380)
... 27 more
Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user
at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
Can someone help me to resolve this issue
thank you for your help

Related

Spark submit in cluster mode, exception in thread "main"

My spark submit syntax is:
spark-submit --queue regular --deploy-mode cluster --conf spark.locality.wait=5000000ms --num-executors 100 --executor-memory 40G job.py
And the exception happened after job run successfully for a while, which is:
Application diagnostics message: User application exited with status 1
Exception in thread "main" org.apache.spark.SparkException: Application application_1635856758535_5228470 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1150)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1530)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
More importantly, in the job.py script, there's no local file read or write. Only parquet read or write.
And I cannot even track the application UI as the application was closed because of the exception. If anyone else encountered similar issue and has any ideas, please advice any solutions. Thanks!

NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V

I am using <spark.version>3.0.2</spark.version> with detla version 0.8.0 in my project.
and running with
export SPARK_HOME=/pkg/spark-3.0.2-bin-hadoop2.7-hive1.2
$SPARK_HOME/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--packages io.delta:delta-core_2.12:0.8.0,org.apache.hadoop:hadoop-common:2.9.2,org.apache.hadoop:hadoop-aws:2.9.2,org.apache.hudi:hudi-spark-bundle_2.12:0.6.0
I am getting below error
User class threw exception: java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V
at org.apache.hadoop.fs.s3a.S3AFileSystem.addDeprecatedKeys(S3AFileSystem.java:183)
at org.apache.hadoop.fs.s3a.S3AFileSystem.<clinit>(S3AFileSystem.java:187)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134)
at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:171)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:232)
What is wrong here ?? any clue how to fix it ?
Tried with
export SPARK_HOME=/pkg/spark-3.0.2-bin-hadoop2.9.1-custom
Getting error
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 4 ($anonfun$call$1 at DatabricksLogging.scala:77) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: java.lang.IllegalArgumentException: Unknown message type: 9 at org.apache.spark.network.shuffle.protocol.BlockTransferMessage$Decoder.fromByteBuffer(BlockTransferMessage.java:71) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ... 1 more
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1602)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2236)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
at org.apache.spark.sql.delta.DeltaLog$.recordDeltaOperation(DeltaLog.scala:368)
at org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:470)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
Your Spark is compiled with Hadoop 2.7, but you're trying to execute your code with Hadoop 2.9. Remove Hadoop 2.9 coordinates from --packages command

Spark/Yarn - Connection error RetryingBlockFetcher trying to fetch blocks from random port

I am trying to set up spark on yarn on AWS machines. My spark.driver.port is 32975. I see the below error in the yarn container logs . It is trying to connect to master resource manager port 35653. I am not sure what block is it trying to fetch from port 35653. Could someone help
Spark command
spark-submit --deploy-mode client --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.11-2.4.4.jar 10
Hadoop version: 3.x
spark version: 2.4.4
2019-12-01 19:09:54,590 ERROR shuffle.RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks
java.io.IOException: Connecting to xyz.com/xx.xx.xx.xx:35653 timed out (120000 ms)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:243)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:114)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:121)
at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:124)
at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:98)
at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:757)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:162)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:151)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:231)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Please check if hadoop/yarn in working.
You should start hadoop first and then check if hadoop is running or not just by doing jps in the terminal.
hadoop start-all.sh
jps

spark application completes with SUCCESS status when an exception is thrown

I am running a spark application on yarn, which my goal is do some ETL from jdbc to elasticsearch.
However, when I check the log ,there is some errors like,this error is due to network problem :
17/12/01 00:35:19 WARN scheduler.TaskSetManager: Lost task 1317.0 in stage 0.0 (TID 1381, worker50.hadoop, executor 1): org.apache.spark.util.TaskCompletionListenerException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[192.168.200.154:8201, 192.168.200.156:9200, 192.168.200.155:8201]]
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:124)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
This means that connection failed and lost some data in this process.The job finalStatus should be failed, but spark returned me with {"state":"FINISHED","finalStatus":"SUCCEEDED"}
WHY? My spark version is 2.2.0

Executor shows up on the spark UI even on killing the worker and stages keep on failing with java.io.IOException

I am running a spark streaming application with spark version 1.4.0
If I kill the worker (using kill -9) when my job is running, then the worker and executor both on that node dies,but it still shows up in the executors tab of the UI. The number of active tasks sometimes shows as negative on those executors.
Because of this the jobs keep on failing with the following exception
16/04/01 23:54:20 WARN TaskSetManager: Lost task 141.0 in stage 19859.0 (TID 190333, 192.168.33.96): java.io.IOException: Failed to connect to /192.168.33.97:63276
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.33.97:63276
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more
On relaunching the worker a new executor is allocated but the old (dead) executor's entry is still there and the stages fail with "java.io.IOException: Failed to connect to " error.

Resources