Warning when running spark example on mesos: Could not find CoarseGrainedScheduler - apache-spark

I am new to spark and recently I deployed my first spark cluster on mesos.
As I am using python to develop application, I tried to run example pi on my cluster. The resulst is shown successfully, but I got the following warnings.
16/10/18 17:28:54 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Executor finished with state FINISHED)] in 1 attempts
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
... 4 more
And also the state of one of the worker is killed.
This is how I submit the application
$SPARK_HOME/bin/spark-submit --master mesos://<MESOS_HOST>:<MESOS_PORT> $SPARK_HOME/examples/src/main/python/pi.py 1000
Could anyone give me some advice? Thanks in advance!

Related

Spark submit in cluster mode, exception in thread "main"

My spark submit syntax is:
spark-submit --queue regular --deploy-mode cluster --conf spark.locality.wait=5000000ms --num-executors 100 --executor-memory 40G job.py
And the exception happened after job run successfully for a while, which is:
Application diagnostics message: User application exited with status 1
Exception in thread "main" org.apache.spark.SparkException: Application application_1635856758535_5228470 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1150)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1530)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
More importantly, in the job.py script, there's no local file read or write. Only parquet read or write.
And I cannot even track the application UI as the application was closed because of the exception. If anyone else encountered similar issue and has any ideas, please advice any solutions. Thanks!

Unable to kill spark application through spark-submit on spark standalone cluster with spark authentication and encryption enabled

I am not able to kill the spark application through spark submit command on spark standalone cluster with spark authentication and encryption enabled. Command-
bin/spark-class org.apache.spark.deploy.Client kill spark://host:7077 driver-20200728102235-0005.
Getting error: Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.ClientApp$$anonfun$7.apply(Client.scala:243)
at org.apache.spark.deploy.ClientApp$$anonfun$7.apply(Client.scala:243)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.deploy.ClientApp.start(Client.scala:243)
at org.apache.spark.deploy.Client$.main(Client.scala:225)
at org.apache.spark.deploy.Client.main(Client.scala)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown challenge message.
at org.apache.spark.network.crypto.AuthRpcHandler.receive(AuthRpcHandler.java:109)
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:180)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:103)
at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)```
On you spark client, try to set this value: -Dspark.authenticate=true -Dspark.network.crypto.enabled=true inside the environment variables called spark.executor.extraJavaOptions, spark.executorEnv.JAVA_TOOL_OPTIONS and so on.
Also please check that your secret is stored inside the spark.executorEnv._SPARK_AUTH_SECRET environment variable.
If it doesn't work, I suggest adding your spark-submit configuration to this topic.

Spark Application Level logs in EMR step

I'm running spark application in EMR step but job failed due to some error, I want to see that error. I have checked stderr but it is not giving any detailed information about error. It's saying that
Exception in thread "main" org.apache.spark.SparkException: Application application_1593934145491_0002 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1149)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/07/05 07:50:37 INFO ShutdownHookManager: Shutdown hook called
Can anyone help me this ? I want to see application level logs.
After enabling Debugging mode and Running script on Client, I was able to see Spark Application level logs in Steps/Step_ID/stdout.gz
It should be always under /container but if you cannot find it try to ssh the master node and run the spark-submit

Exception thrown in awaitResult when an app is submitted to Standalone in cluster mode with port 6066

I am working with spark.2.3.1. A Spark app is submitted to a Standalone cluster spark://10.101.3.128:6066 in cluster mode. The app does not work. There is an ERROR in the driver's log file stdout under work directory.
2020-04-15 15:30:34 ERROR TransportResponseHandler:144 - Still have 1 requests outstanding when connection from /10.101.3.128:6066 is closed
2020-04-15 15:30:34 WARN StandaloneAppClient$ClientEndpoint:87 - Failed to connect to master 10.101.3.128:6066
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106)
....
There is no more information. The master does work on the port 6066, since the app has been submitted. Why here it cannot be connected.
I find a workaround. I use spark://10.101.3.128:6066 in submit script, but spark://10.101.3.128:7077 in Spark config property file. There will be no problem.
Is it a bug? I want to know if this issue has been fixed in Spark in any release or will be fixed in the future.
Thanks.

spark application completes with SUCCESS status when an exception is thrown

I am running a spark application on yarn, which my goal is do some ETL from jdbc to elasticsearch.
However, when I check the log ,there is some errors like,this error is due to network problem :
17/12/01 00:35:19 WARN scheduler.TaskSetManager: Lost task 1317.0 in stage 0.0 (TID 1381, worker50.hadoop, executor 1): org.apache.spark.util.TaskCompletionListenerException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[192.168.200.154:8201, 192.168.200.156:9200, 192.168.200.155:8201]]
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:124)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
This means that connection failed and lost some data in this process.The job finalStatus should be failed, but spark returned me with {"state":"FINISHED","finalStatus":"SUCCEEDED"}
WHY? My spark version is 2.2.0

Resources