Unable to kill spark application through spark-submit on spark standalone cluster with spark authentication and encryption enabled - apache-spark

I am not able to kill the spark application through spark submit command on spark standalone cluster with spark authentication and encryption enabled. Command-
bin/spark-class org.apache.spark.deploy.Client kill spark://host:7077 driver-20200728102235-0005.
Getting error: Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.ClientApp$$anonfun$7.apply(Client.scala:243)
at org.apache.spark.deploy.ClientApp$$anonfun$7.apply(Client.scala:243)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.deploy.ClientApp.start(Client.scala:243)
at org.apache.spark.deploy.Client$.main(Client.scala:225)
at org.apache.spark.deploy.Client.main(Client.scala)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown challenge message.
at org.apache.spark.network.crypto.AuthRpcHandler.receive(AuthRpcHandler.java:109)
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:180)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:103)
at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)```

On you spark client, try to set this value: -Dspark.authenticate=true -Dspark.network.crypto.enabled=true inside the environment variables called spark.executor.extraJavaOptions, spark.executorEnv.JAVA_TOOL_OPTIONS and so on.
Also please check that your secret is stored inside the spark.executorEnv._SPARK_AUTH_SECRET environment variable.
If it doesn't work, I suggest adding your spark-submit configuration to this topic.

Related

Apache Spark Cluster\Windows: Getting Connection refused: no further information from worker node

I need help please getting spark working in windows cluster of 3 nodes. I am able to download and configure and run the master node and worker nodes. the worker nodes are registered successfully with the master. Im able to see both worker nodes in the Master UI. When I try to submit a job using:
spark-submit --master spark://IP:7077 hello_world.py
Spark continuously try to start multiple executors but the all failed with code exit 1 and it doesnt stop until I kill it. when I check the log in the UI for each worker Im seeing the following error:
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:424)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:413)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$9(CoarseGrainedExecutorBackend.scala:444)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
at scala.collection.immutable.Range.foreach(Range.scala:158)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:442)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
... 4 more
**Caused by: java.io.IOException: Failed to connect to <Master DNS>/<Master IP>:56785
at **org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: <Master DNS>/<Master IP>:56785
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Im using Spark: spark-3.3.1-bin-hadoop3
Please help.
Thanks
The application to run

How to get application status in spark standalone cluster?

As per the official spark documentation we can check the status using
spark-submit --master spark://IP-ADDRESS:PORT --status SUBMISSION_ID
However when I tried using the same it was not able to establish a connection.
The master URL and the PORT is valid and my spark is running in standalone cluster with default config.
spark-submit --master spark://10.xx.xx8.xx5:7077 --status driver-20191119160551-0003
19/11/19 17:45:10 WARN RestSubmissionClient: Unable to connect to server spark://10.xx.xx8.xx5:7077.
Exception in thread "main" org.apache.spark.deploy.rest.SubmitRestConnectionException: Unable to connect to server
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:165)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:148)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.deploy.rest.RestSubmissionClient.requestSubmissionStatus(RestSubmissionClient.scala:148)
at org.apache.spark.deploy.SparkSubmit.requestStatus(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:88)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.deploy.rest.SubmitRestConnectionException: No response from server
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:285)
at org.apache.spark.deploy.rest.RestSubmissionClient.org$apache$spark$deploy$rest$RestSubmissionClient$$get(RestSubmissionClient.scala:195)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:152)
... 11 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:278)
... 13 more

spark submit failed when reading from Avro files

When trying to run my spark jar using spark-submit, I got an error.
The line that causes it
spark.read.format("com.databricks.spark.avro").load("/home/user/samples/events")
the stacktrace is:
Exception in thread "main" java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.execution.datasources.orc.OrcFileFormat not found
at java.util.ServiceLoader.fail(ServiceLoader.java:239)
at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:372)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:529)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:301)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
can't understand what is missing and why it calls ORC format?
I make sure that the class exists in spark/jar folder
"org/apache/spark/sql/execution/datasources/orc/OrcFileFormat"
Spark version: 2.3.2
Thanks :-)

Failed to submit the Spark Application to Spark REST URL

When submitting the Spark Application to Spark REST URL, always got the exception like the following:
18/04/13 11:54:29 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /10.11.9.2:6066 is closed
18/04/13 11:54:29 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 10.11.9.2:6066
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/04/13 11:55:09 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:524)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910)
at io.kf.etl.context.Context$$anonfun$getSparkSession$2.apply(Context.scala:76)
at io.kf.etl.context.Context$$anonfun$getSparkSession$2.apply(Context.scala:59)
at scala.Option.map(Option.scala:146)
at io.kf.etl.context.Context$.getSparkSession(Context.scala:59)
at io.kf.etl.context.Context$.sparkSession$lzycompute(Context.scala:20)
at io.kf.etl.context.Context$.sparkSession(Context.scala:20)
at io.kf.etl.processors.common.inject.ProcessorInjectModule.sparkSession$lzycompute(ProcessorInjectModule.scala:8)
at io.kf.etl.processors.common.inject.ProcessorInjectModule.sparkSession(ProcessorInjectModule.scala:8)
at io.kf.etl.processors.download.inject.DownloadInjectModule.getContext(DownloadInjectModule.scala:40)
at io.kf.etl.processors.download.inject.DownloadInjectModule.getProcessor(DownloadInjectModule.scala:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:104)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1013)
at io.kf.etl.ETLMain$.delayedEndpoint$io$kf$etl$ETLMain$1(ETLMain.scala:42)
at io.kf.etl.ETLMain$delayedInit$body.apply(ETLMain.scala:17)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at io.kf.etl.ETLMain$.main(ETLMain.scala:17)
at io.kf.etl.ETLMain.main(ETLMain.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
18/04/13 11:55:09 INFO SparkContext: SparkContext already stopped.
18/04/13 11:55:09 INFO SparkContext: Successfully stopped SparkContext
I am running Spark 2.2.1 on MacOS
The configurations look like this:
SPARK_LOCAL_IP=10.11.9.2
SPARK_MASTER_HOST=10.11.9.2
The submission command line is
${SPARK_HOME}/bin/spark-submit --master spark://10.11.9.2:6066 --deploy-mode cluster --class ....
If I submitted the application to port 7077, everything is fine.
Hidden REST API is not suppose to be used with spark-submit. Instead all arguments and job definition should be submitted as a http request to http://rest-ip:6066/v1/submissions/create.
Apache spark rest API
Triggering spark jobs with REST
I figured it out by myself.
The submission command line is fine, but when I initialized SparkSession, I also passed in spark://10.11.9.2:6066 as the master string.
if passing in spark://10.11.9.2:7077, everything just works well.

Warning when running spark example on mesos: Could not find CoarseGrainedScheduler

I am new to spark and recently I deployed my first spark cluster on mesos.
As I am using python to develop application, I tried to run example pi on my cluster. The resulst is shown successfully, but I got the following warnings.
16/10/18 17:28:54 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Executor finished with state FINISHED)] in 1 attempts
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
... 4 more
And also the state of one of the worker is killed.
This is how I submit the application
$SPARK_HOME/bin/spark-submit --master mesos://<MESOS_HOST>:<MESOS_PORT> $SPARK_HOME/examples/src/main/python/pi.py 1000
Could anyone give me some advice? Thanks in advance!

Resources