All executors finish with state KILLED and exitStatus 1 - apache-spark

I am trying to setup a local Spark cluster. I am using Spark 2.4.4 on Windows 10 machine.
To start the master and one worker I do
spark-class org.apache.spark.deploy.master.Master
spark-class org.apache.spark.deploy.worker.Worker 172.17.1.230:7077
After submitting an application to the cluster, it finishes successfully but in the Spark web admin UI it says that the application is KILLED. It's also what I get from worker logs. I have tried running my own examples and examples included in the Spark installation. They all get killed with exitStatus 1.
To start spark JavaSparkPi example from spark installation folder
Spark> spark-submit --master spark://172.17.1.230:7077 --class org.apache.spark.examples.JavaSparkPi .\examples\jars\spark-examples_2.11-2.4.4.jar
Part of the log after finishing calculation outputs
20/01/19 18:55:11 INFO DAGScheduler: Job 0 finished: reduce at JavaSparkPi.java:54, took 4.183853 s
Pi is roughly 3.13814
20/01/19 18:55:11 INFO SparkUI: Stopped Spark web UI at http://Nikola-PC:4040
20/01/19 18:55:11 INFO StandaloneSchedulerBackend: Shutting down all executors
20/01/19 18:55:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
20/01/19 18:55:11 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/01/19 18:55:11 WARN TransportChannelHandler: Exception in connection from /172.17.1.230:58560
java.io.IOException: An existing connection was forcibly closed by the remote host
stderr log of the completed application outputs this at the end
20/01/19 18:55:11 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 910 bytes result sent to driver
20/01/19 18:55:11 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 910 bytes result sent to driver
20/01/19 18:55:11 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
The worker log outputs
20/01/19 18:55:06 INFO ExecutorRunner: Launch command: "C:\Program Files\Java\jdk1.8.0_231\bin\java" "-cp" "C:\Users\nikol\Spark\bin\..\conf\;C:\Users\nikol\Spark\jars\*" "-Xmx1024M" "-Dspark.driver.port=58484" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#Nikola-PC:58484" "--executor-id" "0" "--hostname" "172.17.1.230" "--cores" "12" "--app-id" "app-20200119185506-0001" "--worker-url" "spark://Worker#172.17.1.230:58069"
20/01/19 18:55:11 INFO Worker: Asked to kill executor app-20200119185506-0001/0
20/01/19 18:55:11 INFO ExecutorRunner: Runner thread for executor app-20200119185506-0001/0 interrupted
20/01/19 18:55:11 INFO ExecutorRunner: Killing process!
20/01/19 18:55:11 INFO Worker: Executor app-20200119185506-0001/0 finished with state KILLED exitStatus 1
I have tried with Spark 2.4.4 for Hadoop 2.6 and 2.7. The problem remains in both the cases.
This problem is the same as this one.

Related

Executor finished with state KILLED exitStatus 1

After starting the master and a worker on one single machine...
spark-class org.apache.spark.deploy.master.Master -i 127.0.0.1 -p 7070
spark-class org.apache.spark.deploy.worker.Worker 127.0.0.1:7070
and submitting the following Spark job...
spark-submit --class Main --master spark://127.0.0.1:7070 --deploy-mode client /path/to/app.jar
the application is successfully executed but the executor is for some reason forcefully killed:
19/05/10 09:28:31 INFO Worker: Asked to kill executor app-20190510092810-0000/0
19/05/10 09:28:31 INFO ExecutorRunner: Runner thread for executor app-20190510092810-0000/0 interrupted
19/05/10 09:28:31 INFO ExecutorRunner: Killing process!
19/05/10 09:28:31 INFO Worker: Executor app-20190510092810-0000/0 finished with state KILLED exitStatus 1
Is this normal behavior? If not, how can I prevent this from happening?
I am using Spark 2.4.0.

org.apache.spark.SparkException: Job aborted due to stage failure: Task in stage failed,Lost task in stage : ExecutorLostFailure (executor 4 lost)

I build MonoSpark(based on Spark 1.3.1) with JDK 1.7 and Hadoop 2.6.2 by this command (I edited my pom.xml so that the command can work)
./make-distribution.sh --tgz -Phadoop-2.6 -Dhadoop.version=2.6.2
Then, I get a tgz file named 'spark-1.3.1-SNAPSHOT-bin-2.6.2.tgz'.
I put the tgz file on my hadoop cluster which has a master and 4 slaves.
Then, I start the spark by using the command.
$SPARK_HOME/sbin/start-all.sh
The spark works well as there are 4 workers and 1 master. However, when I use spark-submit to run an example:
./bin/spark-submit --class org.apache.spark.examples.JavaWordCount --master spark://master:7077 lib/spark-examples-1.3.1-*-hadoop2.6.2.jar input/README.md
I get this error on my driver like below
......other useless logs.....
19/03/31 22:24:41 ERROR cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
19/03/31 22:24:46 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor#slave3:55311] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
19/03/31 22:24:50 ERROR scheduler.TaskSchedulerImpl: Lost executor 3 on slave1: remote Akka client disassociated
19/03/31 22:24:54 ERROR scheduler.TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
.......other useless logs......
Exception in thread "main" 19/03/31 22:24:54 ERROR cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 4
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, slave4): ExecutorLostFailure (executor 4 lost)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1325)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1314)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1313)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1313)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:714)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:714)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1526)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1487)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
The worker node error log is below:
19/03/31 22:25:11 INFO worker.Worker: Asked to launch executor app-20190331222434-0000/2 for JavaWordCount
19/03/31 22:25:19 INFO worker.Worker: Executor app-20190331222434-0000/2 finished with state EXITED message Command exited with code 50 exitStatus 50
19/03/31 22:25:19 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor#slave4:37919] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
19/03/31 22:25:19 INFO actor.LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%4010.0.2.27%3A35254-2#299045174] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
19/03/31 22:25:19 INFO worker.Worker: Asked to launch executor app-20190331222434-0000/4 for JavaWordCount
19/03/31 22:25:19 INFO worker.ExecutorRunner: Launch command: "/usr/local/java/jdk1.8.0_101/bin/java" "-cp" "/home/zxd/monotask_jdk1.7/spark-1.3.1-SNAPSHOT-bin-2.6.2/sbin/../conf:/home/zxd/monotask_jdk1.7/spark-1.3.1-SNAPSHOT-bin-2.6.2/lib/spark-assembly-1.3.1-SNAPSHOT-hadoop2.6.2.jar:/home/zxd/hadoop/hadoop-2.6.2/etc/hadoop:/home/zxd/monotask_jdk1.7/spark-1.3.1-SNAPSHOT-bin-2.6.2/sbin/../conf:/home/zxd/monotask_jdk1.7/spark-1.3.1-SNAPSHOT-bin-2.6.2/lib/spark-assembly-1.3.1-SNAPSHOT-hadoop2.6.2.jar:/home/zxd/hadoop/hadoop-2.6.2/etc/hadoop" "-Dspark.driver.port=42211" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver#master:42211/user/CoarseGrainedScheduler" "--executor-id" "4" "--hostname" "slave4" "--cores" "4" "--app-id" "app-20190331222434-0000" "--worker-url" "akka.tcp://sparkWorker#slave4:55970/user/Worker"
19/03/31 22:25:32 INFO worker.Worker: Executor app-20190331222434-0000/4 finished with state EXITED message Command exited with code 50 exitStatus 50
19/03/31 22:25:32 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor#slave4:60559] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
19/03/31 22:25:32 INFO actor.LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%4010.0.2.27%3A35260-3#479615849] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
19/03/31 22:25:32 INFO worker.Worker: Asked to launch executor app-20190331222434-0000/7 for JavaWordCount
19/03/31 22:25:32 INFO worker.ExecutorRunner: Launch command: "/usr/local/java/jdk1.8.0_101/bin/java" "-cp" "/home/zxd/monotask_jdk1.7/spark-1.3.1-SNAPSHOT-bin-2.6.2/sbin/../conf:/home/zxd/monotask_jdk1.7/spark-1.3.1-SNAPSHOT-bin-2.6.2/lib/spark-assembly-1.3.1-SNAPSHOT-hadoop2.6.2.jar:/home/zxd/hadoop/hadoop-2.6.2/etc/hadoop:/home/zxd/monotask_jdk1.7/spark-1.3.1-SNAPSHOT-bin-2.6.2/sbin/../conf:/home/zxd/monotask_jdk1.7/spark-1.3.1-SNAPSHOT-bin-2.6.2/lib/spark-assembly-1.3.1-SNAPSHOT-hadoop2.6.2.jar:/home/zxd/hadoop/hadoop-2.6.2/etc/hadoop" "-Dspark.driver.port=42211" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver#master:42211/user/CoarseGrainedScheduler" "--executor-id" "7" "--hostname" "slave4" "--cores" "4" "--app-id" "app-20190331222434-0000" "--worker-url" "akka.tcp://sparkWorker#slave4:55970/user/Worker"
19/03/31 22:25:32 INFO worker.Worker: Asked to kill executor app-20190331222434-0000/7
19/03/31 22:25:32 INFO worker.ExecutorRunner: Runner thread for executor app-20190331222434-0000/7 interrupted
19/03/31 22:25:32 INFO worker.ExecutorRunner: Killing process!
19/03/31 22:25:32 INFO worker.Worker: Executor app-20190331222434-0000/7 finished with state KILLED exitStatus 143
19/03/31 22:25:32 INFO worker.Worker: Cleaning up local directories for application app-20190331222434-0000
Are there any errors about hadoop version? Maybe I use the wrong hadoop version or jdk version to build Spark.
Hope someone can give me some suggestions, Thanks.
I find some errors in the executor:
java.lang.UnsupportedOperationException: Datanode-side support for getVolumeBlockLocations() must also be enabled in the client configuration.
I set dfs.datanode.hdfs-blocks-metadata.enabled as true in hadoop-site.xml and restart the hadoop cluster. Finally, it works for me.
The error log of executor is in directory: work
cd $SPARK_HOME/work/appxxxx/xx(xx is a number)

Spark Jobserver fail just by receiving a job request

Jobserver 0.7.0 it have 4Gb ram available and 10Gb for the context, the system have 3 more free Gb. The context was running for a while and at the time when receive a request fails without any error. The request is the same like other ones that have processed while it was up, is not a special one. The following log corresponds to the jobserver log and as you can see, the last successfully job was finished at 03:08:23,341 and when receive the next one then the driver command a shutdown.
[2017-05-16 03:08:23,340] INFO output.FileOutputCommitter [] [] - Saved output of task 'attempt_201705160308_0321_m_000199_0' to file:/value_iq/spark-warehouse/spark_cube_users_v/tenant_id=7/_temporary/0/task_201705160308_0321_m_000199
[2017-05-16 03:08:23,340] INFO pred.SparkHadoopMapRedUtil [] [] - attempt_201705160308_0321_m_000199_0: Committed
[2017-05-16 03:08:23,341] INFO he.spark.executor.Executor [] [] - Finished task 199.0 in stage 321.0 (TID 49474). 2738 bytes result sent to driver
[2017-05-16 03:39:02,195] INFO arseGrainedExecutorBackend [] [] - Driver commanded a shutdown
[2017-05-16 03:39:02,239] INFO storage.memory.MemoryStore [] [] - MemoryStore cleared
[2017-05-16 03:39:02,254] INFO spark.storage.BlockManager [] [] - BlockManager stopped
[2017-05-16 03:39:02,363] ERROR arseGrainedExecutorBackend [] [] - RECEIVED SIGNAL TERM
[2017-05-16 03:39:02,404] INFO k.util.ShutdownHookManager [] [] - Shutdown hook called
[2017-05-16 03:39:02,412] INFO k.util.ShutdownHookManager [] [] - Deleting directory /tmp/spark-556033e2-c456-49d6-a43c-ef2cd3494b71/executor-b3ceaf84-e66a-45ed-acfe-1052ab1de2f8/spark-87671e4f-54da-47d7-a077-eb5f75d07e39
The Spark Worker server just log the following:
17/05/15 19:25:54 INFO ExternalShuffleBlockResolver: Registered executor AppExecId{appId=app-20170515192550-0004, execId=0} with ExecutorShuffleInfo{localDirs=[/tmp/spark-556033e2-c456-49d6-a43c-ef2cd3494b71/executor-b3ceaf84-e66a-45ed-acfe-1052ab1de2f8/blockmgr-eca888c0-4e63-421c-9e61-d959ee45f8e9], subDirsPerLocalDir=64, shuffleManager=org.apache.spark.shuffle.sort.SortShuffleManager}
17/05/16 03:39:02 INFO Worker: Asked to kill executor app-20170515192550-0004/0
17/05/16 03:39:02 INFO ExecutorRunner: Runner thread for executor app-20170515192550-0004/0 interrupted
17/05/16 03:39:02 INFO ExecutorRunner: Killing process!
17/05/16 03:39:02 INFO Worker: Executor app-20170515192550-0004/0 finished with state KILLED exitStatus 0
17/05/16 03:39:02 INFO Worker: Cleaning up local directories for application app-20170515192550-0004
17/05/16 03:39:07 INFO ExternalShuffleBlockResolver: Application app-20170515192550-0004 removed, cleanupLocalDirs = true
17/05/16 03:39:07 INFO ExternalShuffleBlockResolver: Cleaning up executor AppExecId{appId=app-20170515192550-0004, execId=0}'s 1 local dirs
And the Master log:
17/05/16 03:39:02 INFO Master: Received unregister request from application app-20170515192550-0004
17/05/16 03:39:02 INFO Master: Removing app app-20170515192550-0004
17/05/16 03:39:02 INFO Master: 157.97.107.150:33928 got disassociated, removing it.
17/05/16 03:39:02 INFO Master: 157.97.107.150:55444 got disassociated, removing it.
17/05/16 03:39:02 WARN Master: Got status update for unknown executor app-20170515192550-0004/0
Before receiving this request spark wasn't executing any other job, the context was using 5,3G/10G and the driver 1,3G/4G.
What meas "Driver commanded a shutdown"?
There is any log property that can be changed to see more details on the logs?
How can a simple request just break the context?

Spark UI's kill is not killing Driver

I am trying to kill my spark-kafka streaming job from Spark UI. It is able to kill the application but the driver is still running.
Can anyone help me with this. I am good with my other streaming jobs. only one of the streaming jobs is giving this problem ever time.
I can't kill the driver through command or spark UI. Spark Master is alive.
Output i collected from logs is -
16/10/25 03:14:25 INFO BlockManagerMaster: Removed 0 successfully in removeExecutor
16/10/25 03:14:25 INFO SparkUI: Stopped Spark web UI at http://***:4040
16/10/25 03:14:25 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/10/25 03:14:25 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/10/25 03:14:35 INFO AppClient: Stop request to Master timed out; it may already be shut down.
16/10/25 03:14:35 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/10/25 03:14:35 INFO MemoryStore: MemoryStore cleared
16/10/25 03:14:35 INFO BlockManager: BlockManager stopped
16/10/25 03:14:35 INFO BlockManagerMaster: BlockManagerMaster stopped
16/10/25 03:14:35 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/10/25 03:14:35 INFO SparkContext: Successfully stopped SparkContext
16/10/25 03:14:35 ERROR Inbox: Ignoring error
org.apache.spark.SparkException: Exiting due to error from cluster scheduler: Master removed our application: KILLED
at org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:438)
at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:124)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:172)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/10/25 03:14:35 WARN NettyRpcEnv: Ignored message: true
16/10/25 03:14:35 WARN AppClient$ClientEndpoint: Connection to master:7077 failed; waiting for master to reconnect...
16/10/25 03:14:35 WARN AppClient$ClientEndpoint: Connection to master:7077 failed; waiting for master to reconnect...
Get the running driverId from spark UI, and hit the post rest call(spark master rest port like 6066) to kill the pipeline. I have tested it with spark 1.6.1
curl -X POST http://localhost:6066/v1/submissions/kill/driverId
Hope it helps...

Can't connect from application to the standalone cluster

I'm trying to connect from application to Spark's standalone cluster. I want to do this on one machine.
I run standalone master server by command:
bash start-master.sh
Then I run one worker by command:
bash spark-class org.apache.spark.deploy.worker.Worker spark://PC:7077 -m 512m
(I allocated 512 MBs for it).
At master’s web UI:
http://localhost:8080
I see, that master and worker are running.
Then I try to connect from application to cluster, with following command:
JavaSparkContext sc = new JavaSparkContext("spark://PC:7077", "myapplication");
When I run application it's crashing with following error message:
4/11/01 22:53:26 INFO client.AppClient$ClientActor: Connecting to master spark://PC:7077...
14/11/01 22:53:26 INFO spark.SparkContext: Starting job: collect at App.java:115
14/11/01 22:53:26 INFO scheduler.DAGScheduler: Got job 0 (collect at App.java:115) with 2 output partitions (allowLocal=false)
14/11/01 22:53:26 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at App.java:115)
14/11/01 22:53:26 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/11/01 22:53:26 INFO scheduler.DAGScheduler: Missing parents: List()
14/11/01 22:53:26 INFO scheduler.DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize at App.java:109), which has no missing parents
14/11/01 22:53:27 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at App.java:109)
14/11/01 22:53:27 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/11/01 22:53:42 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/11/01 22:53:46 INFO client.AppClient$ClientActor: Connecting to master spark://PC:7077...
14/11/01 22:53:57 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/11/01 22:54:06 INFO client.AppClient$ClientActor: Connecting to master spark://PC:7077...
14/11/01 22:54:12 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/11/01 22:54:26 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
14/11/01 22:54:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/11/01 22:54:26 INFO scheduler.DAGScheduler: Failed to run collect at App.java:115
Exception in thread "main" 14/11/01 22:54:26 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up.
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAnd IndependentStages(DAGScheduler.scala:1033)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017 )
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015 )
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.s cala:633)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.s cala:633)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAG Scheduler.scala:1207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/11/01 22:54:26 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
14/11/01 22:54:26 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
14/11/01 22:54:26 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
14/11/01 22:54:26 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
14/11/01 22:54:26 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
14/11/01 22:54:26 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
14/11/01 22:54:26 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
Any ideas what is going on?
P.S. I'm using pre-built version of Spark - spark-1.1.0-bin-hadoop2.4.
Thank You.
Make sure that both the standalone workers and the Spark driver are connected to the Spark master on the exact address listed in its web UI / printed in its startup log message. Spark uses Akka for some of its control-plane communication and Akka can be really picky about hostnames, so these need to match exactly.
There are several options to control which hostnames / network interfaces the driver and master will bind to. Probably the simplest option is to set the SPARK_LOCAL_IP environment variable to control the address that the Master / Driver will bind to. See http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html for an overview of the other settings that affect network address binding.

Resources