Error while running spark client mode on mesos using docker - apache-spark

we have a 3 node Mesos cluster. Master service was started on machine 1 using below command:
sudo ./bin/mesos-master.sh --ip=machine1-ip --work_dir=/home/mapr/mesos/mesos-1.7.0/build/workDir --zk=zk://machine1-ip:2181/mesos --quorum=1
and agent services on other 2 machines using below command:
sudo ./bin/mesos-agent.sh --containerizers=docker --master=zk://machine1-ip:2181/mesos --work_dir=/home/mapr/mesos/mesos-1.7.0/build/workDir --ip=machine2-ip --no-systemd_enable_support
sudo ./bin/mesos-agent.sh --containerizers=docker --master=zk://machine1-ip:2181/mesos --work_dir=/home/mapr/mesos/mesos-1.7.0/build/workDir --ip=machine3-ip --no-systemd_enable_support
Below property was set in machine1:
export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
we are trying to run spark job using docker image.
Note that we did not set "SPARK_EXECUTOR_URI" in machine1 because as per our understanding executor is going to run inside docker container and not on slave machine and hence this property is not required.
command used for spark submit is below(from machine 1):
/home/mapr/newSpark/spark-2.4.0-bin-hadoop2.7/bin/spark-submit \
--master mesos://machine1:5050 \
--deploy-mode client \
--class com.learning.spark.WordCount \
--conf spark.mesos.executor.docker.image=mesosphere/spark:2.4.0-2.2.1-3-hadoop-2.7 \
/home/mapr/mesos/wordcount.jar hdfs://machine2:8020/hdfslocation/input.txt hdfs://machine2:8020/hdfslocation/output
we are getting below error on spark submit:
Mesos task log:
I1211 20:27:55.040856 5996 exec.cpp:162] Version: 1.7.0
I1211 20:27:55.064775 6016 exec.cpp:236] Executor registered on agent 44c2e848-cd06-4546-b0e9-15537084df1b-S1
I1211 20:27:55.068828 6018 executor.cpp:130] Registered docker executor on company-i0058.company.co.in
I1211 20:27:55.069756 6016 executor.cpp:186] Starting task 3
/bin/sh: 1: /home/mapr/newSpark/spark-2.4.0-bin-hadoop2.7/./bin/spark-class: not found
I1211 20:27:57.669881 6017 executor.cpp:736] Container exited with status 127
I1211 20:27:58.672829 6019 process.cpp:926] Stopped the socket accept loop
messages on the terminal:
2018-12-11 20:27:49 INFO SparkContext:54 - Running Spark version 2.4.0
2018-12-11 20:27:49 INFO SparkContext:54 - Submitted application: WordCount
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing view acls to: mapr
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing modify acls to: mapr
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing view acls groups to:
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing modify acls groups to:
2018-12-11 20:27:49 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mapr); groups with view permissions: Set(); users with modify permissions: Set(mapr); groups with modify permissions: Set()
2018-12-11 20:27:49 INFO Utils:54 - Successfully started service 'sparkDriver' on port 48069.
2018-12-11 20:27:49 INFO SparkEnv:54 - Registering MapOutputTracker
2018-12-11 20:27:49 INFO SparkEnv:54 - Registering BlockManagerMaster
2018-12-11 20:27:49 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-12-11 20:27:49 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-12-11 20:27:49 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-3a4afff7-b050-45ba-bb50-c9f4ec5cc031
2018-12-11 20:27:49 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-12-11 20:27:49 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2018-12-11 20:27:49 INFO log:192 - Logging initialized #3157ms
2018-12-11 20:27:50 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2018-12-11 20:27:50 INFO Server:419 - Started #3273ms
2018-12-11 20:27:50 INFO AbstractConnector:278 - Started ServerConnector#1cfd1875{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-12-11 20:27:50 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6f0628de{/jobs,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#2b27cc70{/jobs/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6f6a7463{/jobs/job,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#79f227a9{/jobs/job/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6ca320ab{/stages,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#50d68830{/stages/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1e53135d{/stages/stage,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6754ef00{/stages/stage/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#619bd14c{/stages/pool,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#323e8306{/stages/pool/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#a23a01d{/storage,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4acf72b6{/storage/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#7561db12{/storage/rdd,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3301500b{/storage/rdd/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#24b52d3e{/environment,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#15deb1dc{/environment/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6e9c413e{/executors,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#57a4d5ee{/executors/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#5af5def9{/executors/threadDump,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3a45c42a{/executors/threadDump/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#36dce7ed{/static,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4b770e40{/,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#78e16155{/api,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#19868320{/jobs/job/kill,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#50b0bc4c{/stages/stage/kill,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://machine1:4040
2018-12-11 20:27:50 INFO SparkContext:54 - Added JAR file:/home/mapr/mesos/wordcount.jar at spark://machine1:48069/jars/wordcount.jar with timestamp 1544540270193
I1211 20:27:50.557170 7462 sched.cpp:232] Version: 1.7.0
I1211 20:27:50.560644 7454 sched.cpp:336] New master detected at master#machine1:5050
I1211 20:27:50.561132 7454 sched.cpp:356] No credentials provided. Attempting to register without authentication
I1211 20:27:50.571651 7456 sched.cpp:744] Framework registered with 5260e4c8-de1c-4772-b5a7-340480594ef4-0000
2018-12-11 20:27:50 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56351.
2018-12-11 20:27:50 INFO NettyBlockTransferService:54 - Server created on machine1:56351
2018-12-11 20:27:50 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-12-11 20:27:50 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, impetus-i0053.impetus.co.in, 56351, None)
2018-12-11 20:27:50 INFO BlockManagerMasterEndpoint:54 - Registering block manager machine1:56351 with 366.3 MB RAM, BlockManagerId(driver, impetus-i0053.impetus.co.in, 56351, None)
2018-12-11 20:27:50 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, machine1, 56351, None)
2018-12-11 20:27:50 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, machine1, 56351, None)
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#73ba6fe6{/metrics/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO MesosCoarseGrainedSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
2018-12-11 20:27:51 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 0 is now TASK_STARTING
2018-12-11 20:27:51 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 1 is now TASK_STARTING
2018-12-11 20:27:51 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 288.1 KB, free 366.0 MB)
2018-12-11 20:27:51 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 25.1 KB, free 366.0 MB)
2018-12-11 20:27:51 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on machine1:56351 (size: 25.1 KB, free: 366.3 MB)
2018-12-11 20:27:51 INFO SparkContext:54 - Created broadcast 0 from textFile at WordCount.scala:22
2018-12-11 20:27:52 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-12-11 20:27:52 INFO FileInputFormat:249 - Total input paths to process : 1
2018-12-11 20:27:53 INFO deprecation:1173 - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2018-12-11 20:27:53 INFO HadoopMapRedCommitProtocol:54 - Using output committer class org.apache.hadoop.mapred.FileOutputCommitter
2018-12-11 20:27:53 INFO FileOutputCommitter:108 - File Output Committer Algorithm version is 1
2018-12-11 20:27:53 INFO SparkContext:54 - Starting job: runJob at SparkHadoopWriter.scala:78
2018-12-11 20:27:53 INFO DAGScheduler:54 - Registering RDD 3 (map at WordCount.scala:24)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Got job 0 (runJob at SparkHadoopWriter.scala:78) with 2 output partitions
2018-12-11 20:27:53 INFO DAGScheduler:54 - Final stage: ResultStage 1 (runJob at SparkHadoopWriter.scala:78)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Parents of final stage: List(ShuffleMapStage 0)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Missing parents: List(ShuffleMapStage 0)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24), which has no missing parents
2018-12-11 20:27:53 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 1 is now TASK_RUNNING
2018-12-11 20:27:53 INFO MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 5.0 KB, free 366.0 MB)
2018-12-11 20:27:53 INFO MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.9 KB, free 366.0 MB)
2018-12-11 20:27:53 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on machine1:56351 (size: 2.9 KB, free: 366.3 MB)
2018-12-11 20:27:53 INFO SparkContext:54 - Created broadcast 1 from broadcast at DAGScheduler.scala:1161
2018-12-11 20:27:53 INFO DAGScheduler:54 - Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24) (first 15 tasks are for partitions Vector(0, 1))
2018-12-11 20:27:53 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks
2018-12-11 20:27:53 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 0 is now TASK_RUNNING
2018-12-11 20:27:54 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 0 is now TASK_FAILED
2018-12-11 20:27:54 INFO BlockManagerMaster:54 - Removal of executor 0 requested
2018-12-11 20:27:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 0
2018-12-11 20:27:54 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 0 from BlockManagerMaster.
2018-12-11 20:27:54 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 1 is now TASK_FAILED
2018-12-11 20:27:54 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 1 from BlockManagerMaster.
2018-12-11 20:27:54 INFO BlockManagerMaster:54 - Removal of executor 1 requested
2018-12-11 20:27:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 1
2018-12-11 20:27:54 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 2 is now TASK_STARTING
2018-12-11 20:27:55 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 3 is now TASK_STARTING
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 2 is now TASK_RUNNING
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 2 is now TASK_FAILED
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Blacklisting Mesos slave b92da3e9-a9c4-422a-babe-c5fb0f33e027-S0 due to too many failures; is Spark installed on it?
2018-12-11 20:27:57 INFO BlockManagerMaster:54 - Removal of executor 2 requested
2018-12-11 20:27:57 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 2
2018-12-11 20:27:57 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 2 from BlockManagerMaster.
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 3 is now TASK_RUNNING
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 3 is now TASK_FAILED
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Blacklisting Mesos slave 44c2e848-cd06-4546-b0e9-15537084df1b-S1 due to too many failures; is Spark installed on it?
2018-12-11 20:27:57 INFO BlockManagerMaster:54 - Removal of executor 3 requested
2018-12-11 20:27:57 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 3 from BlockManagerMaster.
2018-12-11 20:27:57 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 3
2018-12-11 20:28:08 WARN TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Related

not seeing output of spark-submit

I am running some examples which come with Spark using spark-submit. I am using Ubuntu virtual machine. The class I am trying to run is the following
object SparkPi {
def main(args: Array[String]) {
val spark = SparkSession
.builder
.appName("Spark Pi")
.getOrCreate()
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.sparkContext.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y <= 1) 1 else 0
}.reduce(_ + _)
println(s"Pi is roughly ${4.0 * count / (n - 1)}")
spark.stop()
}
}
To run the above code, I am using the spark-submit script as follows:
manu#manu-VirtualBox:~/spark-2.4.0-bin-hadoop2.7$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local ./examples/jars/spark-examples_2.11-2.4.0.jar 10
The out I see is the following (apologies for the big trace dump) but I don't see the print "Pi is roughly. I don't see any errors either. Why am I not seeing the output
2019-02-02 10:56:43 WARN Utils:66 - Your hostname, manu-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
2019-02-02 10:56:43 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-02-02 10:56:44 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-02-02 10:56:45 INFO SparkContext:54 - Running Spark version 2.4.0
2019-02-02 10:56:45 INFO SparkContext:54 - Submitted application: Spark Pi
2019-02-02 10:56:45 INFO SecurityManager:54 - Changing view acls to: manu
2019-02-02 10:56:45 INFO SecurityManager:54 - Changing modify acls to: manu
2019-02-02 10:56:45 INFO SecurityManager:54 - Changing view acls groups to:
2019-02-02 10:56:45 INFO SecurityManager:54 - Changing modify acls groups to:
2019-02-02 10:56:45 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(manu); groups with view permissions: Set(); users with modify permissions: Set(manu); groups with modify permissions: Set()
2019-02-02 10:56:46 INFO Utils:54 - Successfully started service 'sparkDriver' on port 32995.
2019-02-02 10:56:46 INFO SparkEnv:54 - Registering MapOutputTracker
2019-02-02 10:56:46 INFO SparkEnv:54 - Registering BlockManagerMaster
2019-02-02 10:56:46 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2019-02-02 10:56:46 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2019-02-02 10:56:46 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-13d95f47-51a8-4d27-8ebd-15cb0ee3d61a
2019-02-02 10:56:46 INFO MemoryStore:54 - MemoryStore started with capacity 413.9 MB
2019-02-02 10:56:46 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2019-02-02 10:56:46 INFO log:192 - Logging initialized #4685ms
2019-02-02 10:56:47 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2019-02-02 10:56:47 INFO Server:419 - Started #5030ms
2019-02-02 10:56:47 INFO AbstractConnector:278 - Started ServerConnector#46c6297b{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-02-02 10:56:47 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3f2049b6{/jobs,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6b85300e{/jobs/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3aaf4f07{/jobs/job,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#18e8473e{/jobs/job/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#5a2f016d{/stages,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1a38ba58{/stages/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3ad394e6{/stages/stage,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1deb2c43{/stages/stage/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3bb9efbc{/stages/pool,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1cefc4b3{/stages/pool/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#2b27cc70{/storage,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6f6a7463{/storage/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1bdaa23d{/storage/rdd,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#79f227a9{/storage/rdd/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6ca320ab{/environment,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#50d68830{/environment/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1e53135d{/executors,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#7674a051{/executors/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3a7704c{/executors/threadDump,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6754ef00{/executors/threadDump/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#619bd14c{/static,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#106faf11{/,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#70f43b45{/api,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#2c282004{/jobs/job/kill,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#22ee2d0{/stages/stage/kill,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040
2019-02-02 10:56:47 INFO SparkContext:54 - Added JAR file:/home/manu/spark-2.4.0-bin-hadoop2.7/./examples/jars/spark-examples_2.11-2.4.0.jar at spark://10.0.2.15:32995/jars/spark-examples_2.11-2.4.0.jar with timestamp 1549105007905
2019-02-02 10:56:48 INFO Executor:54 - Starting executor ID driver on host localhost
2019-02-02 10:56:48 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42123.
2019-02-02 10:56:48 INFO NettyBlockTransferService:54 - Server created on 10.0.2.15:42123
2019-02-02 10:56:48 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2019-02-02 10:56:48 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, 10.0.2.15, 42123, None)
2019-02-02 10:56:48 INFO BlockManagerMasterEndpoint:54 - Registering block manager 10.0.2.15:42123 with 413.9 MB RAM, BlockManagerId(driver, 10.0.2.15, 42123, None)
2019-02-02 10:56:48 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, 10.0.2.15, 42123, None)
2019-02-02 10:56:48 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 42123, None)
2019-02-02 10:56:49 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#7e46d648{/metrics/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:49 INFO SparkContext:54 - Starting job: reduce at SparkPi.scala:38
2019-02-02 10:56:50 INFO DAGScheduler:54 - Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
2019-02-02 10:56:50 INFO DAGScheduler:54 - Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
2019-02-02 10:56:50 INFO DAGScheduler:54 - Parents of final stage: List()
2019-02-02 10:56:50 INFO DAGScheduler:54 - Missing parents: List()
2019-02-02 10:56:50 INFO DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
2019-02-02 10:56:50 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB)
2019-02-02 10:56:50 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 MB)
2019-02-02 10:56:50 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 10.0.2.15:42123 (size: 1256.0 B, free: 413.9 MB)
2019-02-02 10:56:50 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1161
2019-02-02 10:56:50 INFO DAGScheduler:54 - Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
2019-02-02 10:56:50 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 10 tasks
2019-02-02 10:56:51 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:51 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2019-02-02 10:56:51 INFO Executor:54 - Fetching spark://10.0.2.15:32995/jars/spark-examples_2.11-2.4.0.jar with timestamp 1549105007905
2019-02-02 10:56:51 INFO TransportClientFactory:267 - Successfully created connection to /10.0.2.15:32995 after 110 ms (0 ms spent in bootstraps)
2019-02-02 10:56:51 INFO Utils:54 - Fetching spark://10.0.2.15:32995/jars/spark-examples_2.11-2.4.0.jar to /tmp/spark-3c47ed54-5a7a-4785-84e3-4b834b94b238/userFiles-f31cec9c-5bb9-41d0-b8c3-e18abe2be54a/fetchFileTemp4213110830681726950.tmp
2019-02-02 10:56:51 INFO Executor:54 - Adding file:/tmp/spark-3c47ed54-5a7a-4785-84e3-4b834b94b238/userFiles-f31cec9c-5bb9-41d0-b8c3-e18abe2be54a/spark-examples_2.11-2.4.0.jar to class loader
2019-02-02 10:56:52 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 910 bytes result sent to driver
2019-02-02 10:56:52 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:52 INFO Executor:54 - Running task 1.0 in stage 0.0 (TID 1)
2019-02-02 10:56:52 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 1113 ms on localhost (executor driver) (1/10)
2019-02-02 10:56:52 INFO Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 867 bytes result sent to driver
2019-02-02 10:56:52 INFO TaskSetManager:54 - Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:52 INFO Executor:54 - Running task 2.0 in stage 0.0 (TID 2)
2019-02-02 10:56:52 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 271 ms on localhost (executor driver) (2/10)
2019-02-02 10:56:52 INFO Executor:54 - Finished task 2.0 in stage 0.0 (TID 2). 824 bytes result sent to driver
2019-02-02 10:56:52 INFO TaskSetManager:54 - Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:52 INFO Executor:54 - Running task 3.0 in stage 0.0 (TID 3)
2019-02-02 10:56:52 INFO TaskSetManager:54 - Finished task 2.0 in stage 0.0 (TID 2) in 199 ms on localhost (executor driver) (3/10)
2019-02-02 10:56:52 INFO Executor:54 - Finished task 3.0 in stage 0.0 (TID 3). 867 bytes result sent to driver
2019-02-02 10:56:52 INFO TaskSetManager:54 - Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:52 INFO Executor:54 - Running task 4.0 in stage 0.0 (TID 4)
2019-02-02 10:56:52 INFO TaskSetManager:54 - Finished task 3.0 in stage 0.0 (TID 3) in 204 ms on localhost (executor driver) (4/10)
2019-02-02 10:56:52 INFO Executor:54 - Finished task 4.0 in stage 0.0 (TID 4). 824 bytes result sent to driver
2019-02-02 10:56:52 INFO TaskSetManager:54 - Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:52 INFO TaskSetManager:54 - Finished task 4.0 in stage 0.0 (TID 4) in 178 ms on localhost (executor driver) (5/10)
2019-02-02 10:56:52 INFO Executor:54 - Running task 5.0 in stage 0.0 (TID 5)
2019-02-02 10:56:53 INFO Executor:54 - Finished task 5.0 in stage 0.0 (TID 5). 824 bytes result sent to driver
2019-02-02 10:56:53 INFO TaskSetManager:54 - Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:53 INFO TaskSetManager:54 - Finished task 5.0 in stage 0.0 (TID 5) in 145 ms on localhost (executor driver) (6/10)
2019-02-02 10:56:53 INFO Executor:54 - Running task 6.0 in stage 0.0 (TID 6)
2019-02-02 10:56:53 INFO Executor:54 - Finished task 6.0 in stage 0.0 (TID 6). 867 bytes result sent to driver
2019-02-02 10:56:53 INFO TaskSetManager:54 - Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:53 INFO TaskSetManager:54 - Finished task 6.0 in stage 0.0 (TID 6) in 212 ms on localhost (executor driver) (7/10)
2019-02-02 10:56:53 INFO Executor:54 - Running task 7.0 in stage 0.0 (TID 7)
2019-02-02 10:56:53 INFO Executor:54 - Finished task 7.0 in stage 0.0 (TID 7). 867 bytes result sent to driver
2019-02-02 10:56:53 INFO TaskSetManager:54 - Starting task 8.0 in stage 0.0 (TID 8, localhost, executor driver, partition 8, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:53 INFO TaskSetManager:54 - Finished task 7.0 in stage 0.0 (TID 7) in 152 ms on localhost (executor driver) (8/10)
2019-02-02 10:56:53 INFO Executor:54 - Running task 8.0 in stage 0.0 (TID 8)
2019-02-02 10:56:53 INFO Executor:54 - Finished task 8.0 in stage 0.0 (TID 8). 867 bytes result sent to driver
2019-02-02 10:56:53 INFO TaskSetManager:54 - Starting task 9.0 in stage 0.0 (TID 9, localhost, executor driver, partition 9, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:53 INFO TaskSetManager:54 - Finished task 8.0 in stage 0.0 (TID 8) in 103 ms on localhost (executor driver) (9/10)
2019-02-02 10:56:53 INFO Executor:54 - Running task 9.0 in stage 0.0 (TID 9)
2019-02-02 10:56:53 INFO Executor:54 - Finished task 9.0 in stage 0.0 (TID 9). 867 bytes result sent to driver
2019-02-02 10:56:53 INFO TaskSetManager:54 - Finished task 9.0 in stage 0.0 (TID 9) in 79 ms on localhost (executor driver) (10/10)
2019-02-02 10:56:53 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2019-02-02 10:56:53 INFO DAGScheduler:54 - ResultStage 0 (reduce at SparkPi.scala:38) finished in 3.287 s
2019-02-02 10:56:53 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 3.700842 s
Pi is roughly 3.142931142931143
2019-02-02 10:56:53 INFO AbstractConnector:318 - Stopped Spark#46c6297b{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-02-02 10:56:53 INFO SparkUI:54 - Stopped Spark web UI at http://10.0.2.15:4040
2019-02-02 10:56:53 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2019-02-02 10:56:53 INFO MemoryStore:54 - MemoryStore cleared
2019-02-02 10:56:53 INFO BlockManager:54 - BlockManager stopped
2019-02-02 10:56:53 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2019-02-02 10:56:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2019-02-02 10:56:53 INFO SparkContext:54 - Successfully stopped SparkContext
2019-02-02 10:56:53 INFO ShutdownHookManager:54 - Shutdown hook called
2019-02-02 10:56:53 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-3c47ed54-5a7a-4785-84e3-4b834b94b238
2019-02-02 10:56:54 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-2dcaa58b-d605-40dc-8dd8-df55607f1a59
It seems to me, as stated by #ruslangm, that the expected output is actually there:
Maybe we didn't get the question.
Instead of printing on console, try to save result in file. As there is stdout during execution and it will be difficult to find out result in that, but I can see result in your output.

Pyspark hangs on simple command

Pyspark hangs with the below input.
Note it does not hang with Scala console.
Python 3.6.5 (default, Jun 17 2018, 12:13:06)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
2018-06-21 10:27:37 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.1
/_/
Using Python version 3.6.5 (default, Jun 17 2018 12:13:06)
SparkSession available as 'spark'.
>>> sc.parallelize((1,1)).count() <-----------HANGS!
Anyone have any idea why this is happening? I tried reinstalling everything, java, spark, homebrew, deleting entire /usr/local directory. All out of ideas.
Different test program
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
x = sc.parallelize((1,1)).count()
print("count: ", x)
Output from spark submit
Spark-Submit output, with a similar test python file output
2018-06-21 10:31:47 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-06-21 10:31:47 INFO SparkContext:54 - Running Spark version 2.3.1
2018-06-21 10:31:47 INFO SparkContext:54 - Submitted application: test_spark.py
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing view acls to: jonedoe
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing modify acls to: jonedoe
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing view acls groups to:
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing modify acls groups to:
2018-06-21 10:31:47 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jonedoe); groups with view permissions: Set(); users with modify permissions: Set(jonedoe); groups with modify permissions: Set()
2018-06-21 10:31:47 INFO Utils:54 - Successfully started service 'sparkDriver' on port 61556.
2018-06-21 10:31:47 INFO SparkEnv:54 - Registering MapOutputTracker
2018-06-21 10:31:47 INFO SparkEnv:54 - Registering BlockManagerMaster
2018-06-21 10:31:47 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-06-21 10:31:47 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-06-21 10:31:47 INFO DiskBlockManager:54 - Created local directory at /private/var/folders/gq/tm5q47gn6x363h5m_c86my_00000gp/T/blockmgr-5c0bfcf2-9009-46b5-bcd7-4fa5ec605a89
2018-06-21 10:31:47 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-06-21 10:31:47 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2018-06-21 10:31:48 INFO log:192 - Logging initialized #2297ms
2018-06-21 10:31:48 INFO Server:346 - jetty-9.3.z-SNAPSHOT
2018-06-21 10:31:48 INFO Server:414 - Started #2378ms
2018-06-21 10:31:48 INFO AbstractConnector:278 - Started ServerConnector#84802a{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-06-21 10:31:48 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#79c67e6f{/jobs,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6889c329{/jobs/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3a8c9a58{/jobs/job,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6e04f8ff{/jobs/job/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4832ee9d{/stages,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1632f399{/stages/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#398a3a30{/stages/stage,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#2eb62024{/stages/stage/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4685c478{/stages/pool,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#31053558{/stages/pool/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#537d3185{/storage,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4c559cce{/storage/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#249b3738{/storage/rdd,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3c2c6906{/storage/rdd/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6e7861f{/environment,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#66b4d9e1{/environment/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1b6b10f8{/executors,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#44502eca{/executors/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#7ebd8f21{/executors/threadDump,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3e862ac6{/executors/threadDump/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#7d29113e{/static,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#388c37ce{/,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#22374681{/api,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#dcbeb70{/jobs/job/kill,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#322ceede{/stages/stage/kill,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://ip-192-168-65-180.ec2.internal:4040
2018-06-21 10:31:48 INFO SparkContext:54 - Added file file:/Users/jonedoe/code/test_spark.py at file:/Users/jonedoe/code/test_spark.py with timestamp 1529602308500
2018-06-21 10:31:48 INFO Utils:54 - Copying /Users/jonedoe/code/test_spark.py to /private/var/folders/gq/tm5q47gn6x363h5m_c86my_00000gp/T/spark-99983724-420e-4bc0-ad1f-3bc41bba9114/userFiles-999bdcde-1e5d-4e9a-98ce-c6ecdaee0739/test_spark.py
2018-06-21 10:31:48 INFO Executor:54 - Starting executor ID driver on host localhost
2018-06-21 10:31:48 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61557.
2018-06-21 10:31:48 INFO NettyBlockTransferService:54 - Server created on ip-192-168-65-180.ec2.internal:61557
2018-06-21 10:31:48 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-06-21 10:31:48 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO BlockManagerMasterEndpoint:54 - Registering block manager ip-192-168-65-180.ec2.internal:61557 with 366.3 MB RAM, BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#2d1fafea{/metrics/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:49 INFO SparkContext:54 - Starting job: count at /Users/jonedoe/code/test_spark.py:4
2018-06-21 10:31:49 INFO DAGScheduler:54 - Got job 0 (count at /Users/jonedoe/code/test_spark.py:4) with 8 output partitions
2018-06-21 10:31:49 INFO DAGScheduler:54 - Final stage: ResultStage 0 (count at /Users/jonedoe/code/test_spark.py:4)
2018-06-21 10:31:49 INFO DAGScheduler:54 - Parents of final stage: List()
2018-06-21 10:31:49 INFO DAGScheduler:54 - Missing parents: List()
2018-06-21 10:31:49 INFO DAGScheduler:54 - Submitting ResultStage 0 (PythonRDD[1] at count at /Users/jonedoe/code/test_spark.py:4), which has no missing parents
2018-06-21 10:31:49 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 5.0 KB, free 366.3 MB)
2018-06-21 10:31:49 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.4 KB, free 366.3 MB)
2018-06-21 10:31:49 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on ip-192-168-65-180.ec2.internal:61557 (size: 3.4 KB, free: 366.3 MB)
2018-06-21 10:31:49 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1039
2018-06-21 10:31:49 INFO DAGScheduler:54 - Submitting 8 missing tasks from ResultStage 0 (PythonRDD[1] at count at /Users/jonedoe/code/test_spark.py:4) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7))
2018-06-21 10:31:49 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 8 tasks
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 7858 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 7858 bytes)
2018-06-21 10:31:49 INFO Executor:54 - Running task 3.0 in stage 0.0 (TID 3)
2018-06-21 10:31:49 INFO Executor:54 - Running task 2.0 in stage 0.0 (TID 2)
2018-06-21 10:31:49 INFO Executor:54 - Running task 4.0 in stage 0.0 (TID 4)
2018-06-21 10:31:49 INFO Executor:54 - Running task 1.0 in stage 0.0 (TID 1)
2018-06-21 10:31:49 INFO Executor:54 - Running task 6.0 in stage 0.0 (TID 6)
2018-06-21 10:31:49 INFO Executor:54 - Running task 7.0 in stage 0.0 (TID 7)
2018-06-21 10:31:49 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2018-06-21 10:31:49 INFO Executor:54 - Running task 5.0 in stage 0.0 (TID 5)
2018-06-21 10:31:49 INFO Executor:54 - Fetching file:/Users/jonedoe/code/test_spark.py with timestamp 1529602308500
2018-06-21 10:31:49 INFO Utils:54 - /Users/jonedoe/code/test_spark.py has been previously copied to /private/var/folders/gq/tm5q47gn6x363h5m_c86my_00000gp/T/spark-99983724-420e-4bc0-ad1f-3bc41bba9114/userFiles-999bdcde-1e5d-4e9a-98ce-c6ecdaee0739/test_spark.py
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 397, boot = 389, init = 8, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 399, boot = 396, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 406, boot = 403, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 413, boot = 410, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 420, boot = 417, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 426, boot = 423, init = 2, finish = 1
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 433, boot = 430, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 441, boot = 437, init = 3, finish = 1
2018-06-21 10:31:49 INFO Executor:54 - Finished task 5.0 in stage 0.0 (TID 5). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 2.0 in stage 0.0 (TID 2). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 3.0 in stage 0.0 (TID 3). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 6.0 in stage 0.0 (TID 6). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 7.0 in stage 0.0 (TID 7). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 4.0 in stage 0.0 (TID 4). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 1310 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 1310 bytes result sent to driver
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 5.0 in stage 0.0 (TID 5) in 580 ms on localhost (executor driver) (1/8)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 3.0 in stage 0.0 (TID 3) in 586 ms on localhost (executor driver) (2/8)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 2.0 in stage 0.0 (TID 2) in 587 ms on localhost (executor driver) (3/8)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 6.0 in stage 0.0 (TID 6) in 583 ms on localhost (executor driver) (4/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 4.0 in stage 0.0 (TID 4) in 586 ms on localhost (executor driver) (5/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 7.0 in stage 0.0 (TID 7) in 584 ms on localhost (executor driver) (6/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 608 ms on localhost (executor driver) (7/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 590 ms on localhost (executor driver) (8/8)
2018-06-21 10:31:50 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2018-06-21 10:31:50 INFO DAGScheduler:54 - ResultStage 0 (count at /Users/jonedoe/code/test_spark.py:4) finished in 0.774 s
2018-06-21 10:31:50 INFO DAGScheduler:54 - Job 0 finished: count at /Users/jonedoe/code/test_spark.py:4, took 0.825530 s
HANGS AFTER HERE.........
Looks like my anti-virus (Bitdefender) was the culprit.
For some reason it was blocking spark.

Apache Spark: worker can't connect to master but can ping and ssh from worker to master

I'm trying to setup an 8-node cluster on 8 RHEL 7.3 x86 machines using Spark 2.0.1. start-master.sh goes through fine:
Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.102-4.b14.el7.x86_64/jre/bin/java -cp /usr/local/bin/spark-2.0.1-bin-hadoop2.7/conf/:/usr/local/bin/spark-2.0.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host lambda.foo.net --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/12/08 04:26:46 INFO Master: Started daemon with process name: 22181#lambda.foo.net
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for TERM
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for HUP
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for INT
16/12/08 04:26:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/08 04:26:46 INFO SecurityManager: Changing view acls to: root
16/12/08 04:26:46 INFO SecurityManager: Changing modify acls to: root
16/12/08 04:26:46 INFO SecurityManager: Changing view acls groups to:
16/12/08 04:26:46 INFO SecurityManager: Changing modify acls groups to:
16/12/08 04:26:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
16/12/08 04:26:46 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
16/12/08 04:26:46 INFO Master: Starting Spark master at spark://lambda.foo.net:7077
16/12/08 04:26:46 INFO Master: Running Spark version 2.0.1
16/12/08 04:26:46 INFO Utils: Successfully started service 'MasterUI' on port 8080.
16/12/08 04:26:46 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://19.341.11.212:8080
16/12/08 04:26:46 INFO Utils: Successfully started service on port 6066.
16/12/08 04:26:46 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066
16/12/08 04:26:46 INFO Master: I have been elected leader! New state: ALIVE
But when I try to bring up the workers, using start-slaves.sh, what I see in the log of the workers is:
Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.102-4.b14.el7.x86_64/jre/bin/java -cp /usr/local/bin/spark-2.0.1-bin-hadoop2.7/conf/:/usr/local/bin/spark-2.0.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://lambda.foo.net:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/12/08 04:30:00 INFO Worker: Started daemon with process name: 14649#hawk040os4.foo.net
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for TERM
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for HUP
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for INT
16/12/08 04:30:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/08 04:30:00 INFO SecurityManager: Changing view acls to: root
16/12/08 04:30:00 INFO SecurityManager: Changing modify acls to: root
16/12/08 04:30:00 INFO SecurityManager: Changing view acls groups to:
16/12/08 04:30:00 INFO SecurityManager: Changing modify acls groups to:
16/12/08 04:30:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
16/12/08 04:30:00 INFO Utils: Successfully started service 'sparkWorker' on port 35858.
16/12/08 04:30:00 INFO Worker: Starting Spark worker 15.242.22.179:35858 with 24 cores, 1510.2 GB RAM
16/12/08 04:30:00 INFO Worker: Running Spark version 2.0.1
16/12/08 04:30:00 INFO Worker: Spark home: /usr/local/bin/spark-2.0.1-bin-hadoop2.7
16/12/08 04:30:00 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
16/12/08 04:30:00 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://15.242.22.179:8081
16/12/08 04:30:00 INFO Worker: Connecting to master lambda.foo.net:7077...
16/12/08 04:30:00 WARN Worker: Failed to connect to master lambda.foo.net:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to connect to lambda.foo.net/19.341.11.212:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
... 4 more
Caused by: java.net.NoRouteToHostException: No route to host: lambda.foo.net/19.341.11.212:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/12/08 04:30:12 INFO Worker: Retrying connection to master (attempt # 1)
16/12/08 04:30:12 INFO Worker: Connecting to master lambda.foo.net:7077...
16/12/08 04:30:12 WARN Worker: Failed to connect to master lambda.foo.net:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
So it says "No route to host". But I could successfully ping the master from the worker node, as well as ssh from the worker to the master node.
Why does spark say "No route to host"?
Problem solved: the firewall was blocking the packets.

Apache Spark job reading from Cassandra table stalls on launch (spark-1.3.1)

We've been having intermittent issues with Spark 1.3.1 and the datastax Cassandra connector causing jobs to stall indefinitely when they are launched.
EDIT: I also tried the same approach with Spark 1.2.1 and the packaged 1.2.1 spark-cassandra-connector_2.10 and it resulted in the same symptoms.
We are using the following dependency:
var sparkCas = "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.3.0-SNAPSHOT"
Our job code:
object ConnTransform {
private val AppName = "ConnTransformCassandra"
def main(args: Array[String]) {
val start = new DateTime(2015, 5, 27, 1, 0, 0)
val master = if (args.length >= 1) args(0) else "local[*]"
// Create the spark context.
val sc = {
val conf = new SparkConf()
.setAppName(AppName)
.setMaster(master)
.set("spark.cassandra.connection.host", "10.10.101.202,10.10.102.139,10.10.103.74")
new SparkContext(conf)
}
sc.cassandraTable("alpha_dev", "conn")
.select("data")
.where("timep = ?", start)
.where("sensorid IN ?", Utils.sensors)
.map(Utils.deserializeRow)
.saveAsTextFile("output/raw_data")
}
}
As you can see, the code is pretty simple (and it was more complex, but we've been attempting to narrow down the root cause of this issue).
Now, this job worked earlier today - data was successfully put into the directory specified. However, now when it is run we see the job start, get to the point just before it starts processing blocks, and sit there indefinitely.
Output from the job below shows the log messages seen so far, and at the time of writing the job has been stalled for almost an hour. If we set the logging level to DEBUG the only thing you see after that point in the job are heartbeat pings between akka workers.
ubuntu#ip-10-10-102-53:~/projects/icespark$ /home/ubuntu/spark/spark-1.3.1/bin/spark-submit --class com.splee.spark.ConnTransform splee-analytics-assembly-0.1.0.jar
15/05/27 21:15:21 INFO SparkContext: Running Spark version 1.3.1
15/05/27 21:15:21 INFO SecurityManager: Changing view acls to: ubuntu
15/05/27 21:15:21 INFO SecurityManager: Changing modify acls to: ubuntu
15/05/27 21:15:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
15/05/27 21:15:22 INFO Slf4jLogger: Slf4jLogger started
15/05/27 21:15:22 INFO Remoting: Starting remoting
15/05/27 21:15:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#ip-10-10-102-53.us-west-2.compute.internal:51977]
15/05/27 21:15:22 INFO Utils: Successfully started service 'sparkDriver' on port 51977.
15/05/27 21:15:22 INFO SparkEnv: Registering MapOutputTracker
15/05/27 21:15:22 INFO SparkEnv: Registering BlockManagerMaster
15/05/27 21:15:22 INFO DiskBlockManager: Created local directory at /tmp/spark-2466ff66-bb50-4d52-9d34-1801d69889b9/blockmgr-60e75214-1ba6-410c-a564-361263636e5c
15/05/27 21:15:22 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/05/27 21:15:22 INFO HttpFileServer: HTTP File server directory is /tmp/spark-72f1e849-c298-49ee-936c-e94c462f3df2/httpd-f81c2326-e5f1-4f33-9557-074f2789c4ee
15/05/27 21:15:22 INFO HttpServer: Starting HTTP Server
15/05/27 21:15:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/27 21:15:22 INFO AbstractConnector: Started SocketConnector#0.0.0.0:55357
15/05/27 21:15:22 INFO Utils: Successfully started service 'HTTP file server' on port 55357.
15/05/27 21:15:22 INFO SparkEnv: Registering OutputCommitCoordinator
15/05/27 21:15:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/27 21:15:22 INFO AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
15/05/27 21:15:22 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/05/27 21:15:22 INFO SparkUI: Started SparkUI at http://ip-10-10-102-53.us-west-2.compute.internal:4040
15/05/27 21:15:22 INFO SparkContext: Added JAR file:/home/ubuntu/projects/icespark/splee-analytics-assembly-0.1.0.jar at http://10.10.102.53:55357/jars/splee-analytics-assembly-0.1.0.jar with timestamp 1432761322942
15/05/27 21:15:23 INFO Executor: Starting executor ID <driver> on host localhost
15/05/27 21:15:23 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver#ip-10-10-102-53.us-west-2.compute.internal:51977/user/HeartbeatReceiver
15/05/27 21:15:23 INFO NettyBlockTransferService: Server created on 58479
15/05/27 21:15:23 INFO BlockManagerMaster: Trying to register BlockManager
15/05/27 21:15:23 INFO BlockManagerMasterActor: Registering block manager localhost:58479 with 265.1 MB RAM, BlockManagerId(<driver>, localhost, 58479)
15/05/27 21:15:23 INFO BlockManagerMaster: Registered BlockManager
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.28:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.28 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.60:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.60 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.154:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.154 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.145:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.145 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.78:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.78 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.200:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.200 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.73:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.73 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.205:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.205 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.205:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.205 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.74:9042 added
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.202:9042 added
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.139:9042 added
15/05/27 21:15:24 INFO CassandraConnector: Connected to Cassandra cluster: Splee Dev
15/05/27 21:15:25 INFO CassandraConnector: Disconnected from Cassandra cluster: Splee Dev
If anyone has any ideas what could be causing this job (which previously produced results) to stall in this way and can shed some light on the situation it would be much appreciated.

Spark fail when running pi.py example with yarn-client mode

I can successfully run the java version of pi example as follows.
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-client \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10
However, the python version failed with the following error information. I used yarn-client mode. The pyspark command line with yarn-client mode returned the same info. Can anyone help me to figure out this problem?
nlp#yyy2:~/spark$ ./bin/spark-submit --master yarn-client examples/src/main/python/pi.py
15/01/05 17:22:26 INFO spark.SecurityManager: Changing view acls to: nlp
15/01/05 17:22:26 INFO spark.SecurityManager: Changing modify acls to: nlp
15/01/05 17:22:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nlp); users with modify permissions: Set(nlp)
15/01/05 17:22:26 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/01/05 17:22:26 INFO Remoting: Starting remoting
15/01/05 17:22:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#yyy2:42747]
15/01/05 17:22:26 INFO util.Utils: Successfully started service 'sparkDriver' on port 42747.
15/01/05 17:22:26 INFO spark.SparkEnv: Registering MapOutputTracker
15/01/05 17:22:26 INFO spark.SparkEnv: Registering BlockManagerMaster
15/01/05 17:22:26 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150105172226-aeae
15/01/05 17:22:26 INFO storage.MemoryStore: MemoryStore started with capacity 265.1 MB
15/01/05 17:22:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/05 17:22:27 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-cbe0079b-79c5-426b-b67e-548805423b11
15/01/05 17:22:27 INFO spark.HttpServer: Starting HTTP Server
15/01/05 17:22:27 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/01/05 17:22:27 INFO server.AbstractConnector: Started SocketConnector#0.0.0.0:57169
15/01/05 17:22:27 INFO util.Utils: Successfully started service 'HTTP file server' on port 57169.
15/01/05 17:22:27 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/01/05 17:22:27 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
15/01/05 17:22:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/01/05 17:22:27 INFO ui.SparkUI: Started SparkUI at http://yyy2:4040
15/01/05 17:22:27 INFO client.RMProxy: Connecting to ResourceManager at yyy14/10.112.168.195:8032
15/01/05 17:22:27 INFO yarn.Client: Requesting a new application from cluster with 6 NodeManagers
15/01/05 17:22:27 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/01/05 17:22:27 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/01/05 17:22:27 INFO yarn.Client: Setting up container launch context for our AM
15/01/05 17:22:27 INFO yarn.Client: Preparing resources for our AM container
15/01/05 17:22:28 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 24 for xxx on ha-hdfs:hzdm-cluster1
15/01/05 17:22:28 INFO yarn.Client: Uploading resource file:/home/nlp/platform/spark-1.2.0-bin-2.5.2/lib/spark-assembly-1.2.0-hadoop2.5.2.jar -> hdfs://hzdm-cluster1/user/nlp/.sparkStaging/application_1420444011562_0023/spark-assembly-1.2.0-hadoop2.5.2.jar
15/01/05 17:22:29 INFO yarn.Client: Uploading resource file:/home/nlp/platform/spark-1.2.0-bin-2.5.2/examples/src/main/python/pi.py -> hdfs://hzdm-cluster1/user/nlp/.sparkStaging/application_1420444011562_0023/pi.py
15/01/05 17:22:29 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/05 17:22:29 INFO spark.SecurityManager: Changing view acls to: nlp
15/01/05 17:22:29 INFO spark.SecurityManager: Changing modify acls to: nlp
15/01/05 17:22:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nlp); users with modify permissions: Set(nlp)
15/01/05 17:22:29 INFO yarn.Client: Submitting application 23 to ResourceManager
15/01/05 17:22:30 INFO impl.YarnClientImpl: Submitted application application_1420444011562_0023
15/01/05 17:22:31 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:31 INFO yarn.Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.default
start time: 1420449749969
final status: UNDEFINED
tracking URL: http://yyy14:8070/proxy/application_1420444011562_0023/
user: nlp
15/01/05 17:22:32 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:33 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:34 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:35 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:36 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:36 INFO cluster.YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM#yyy16:52855/user/YarnAM#435880073]
15/01/05 17:22:36 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> yyy14, PROXY_URI_BASES -> http://yyy14:8070/proxy/application_1420444011562_0023), /proxy/application_1420444011562_0023
15/01/05 17:22:36 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/01/05 17:22:37 INFO yarn.Client: Application report for application_1420444011562_0023 (state: RUNNING)
15/01/05 17:22:37 INFO yarn.Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: yyy16
ApplicationMaster RPC port: 0
queue: root.default
start time: 1420449749969
final status: UNDEFINED
tracking URL: http://yyy14:8070/proxy/application_1420444011562_0023/
user: nlp
15/01/05 17:22:37 INFO cluster.YarnClientSchedulerBackend: Application application_1420444011562_0023 has started running.
15/01/05 17:22:37 INFO netty.NettyBlockTransferService: Server created on 35648
15/01/05 17:22:37 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/01/05 17:22:37 INFO storage.BlockManagerMasterActor: Registering block manager yyy2:35648 with 265.1 MB RAM, BlockManagerId(<driver>, yyy2, 35648)
15/01/05 17:22:37 INFO storage.BlockManagerMaster: Registered BlockManager
15/01/05 17:22:37 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM#yyy16:52855] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/01/05 17:22:38 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs,null}
15/01/05 17:22:38 INFO ui.SparkUI: Stopped Spark web UI at http://yyy2:4040
15/01/05 17:22:38 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Stopped
15/01/05 17:22:39 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/01/05 17:22:39 INFO storage.MemoryStore: MemoryStore cleared
15/01/05 17:22:39 INFO storage.BlockManager: BlockManager stopped
15/01/05 17:22:39 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/01/05 17:22:39 INFO spark.SparkContext: Successfully stopped SparkContext
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/01/05 17:22:57 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
Traceback (most recent call last):
File "/home/nlp/platform/spark-1.2.0-bin-2.5.2/examples/src/main/python/pi.py", line 29, in <module>
sc = SparkContext(appName="PythonPi")
File "/home/nlp/spark/python/pyspark/context.py", line 105, in __init__
conf, jsc)
File "/home/nlp/spark/python/pyspark/context.py", line 153, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "/home/nlp/spark/python/pyspark/context.py", line 201, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "/home/nlp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__
File "/home/nlp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
If you're running this example on Java 8, this may be due to Java 8's excessive memory allocation strategy: https://issues.apache.org/jira/browse/YARN-4714
You can force YARN to ignore this by setting up the following properties in yarn-site.xml
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
Try with deploy mode parameter, like this:
--deploy-mode cluster
I had problem like your, with this parameter it worked.
I experienced a similar problem using spark-submit and yarn-client (I got the same NPE/stacktrace). Tuning down my memory settings did the trick. It seems to fail like this when you try to allot too much memory. I would start by removing the --executor-memory and --driver-memory switches.
I reduced the number of cores in the Advanced spark-env to make it work.
I ran into this issue running (hdp 2.3 spark 1.3.1)
spark-shell
--master yarn-client
--driver-memory 4g
--executor-memory 4g
--executor-cores 1
--num-executors 4
Solution for me was to set the spark config value:
spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.0.0-2557

Resources