I am running some examples which come with Spark using spark-submit. I am using Ubuntu virtual machine. The class I am trying to run is the following
object SparkPi {
def main(args: Array[String]) {
val spark = SparkSession
.builder
.appName("Spark Pi")
.getOrCreate()
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.sparkContext.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y <= 1) 1 else 0
}.reduce(_ + _)
println(s"Pi is roughly ${4.0 * count / (n - 1)}")
spark.stop()
}
}
To run the above code, I am using the spark-submit script as follows:
manu#manu-VirtualBox:~/spark-2.4.0-bin-hadoop2.7$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local ./examples/jars/spark-examples_2.11-2.4.0.jar 10
The out I see is the following (apologies for the big trace dump) but I don't see the print "Pi is roughly. I don't see any errors either. Why am I not seeing the output
2019-02-02 10:56:43 WARN Utils:66 - Your hostname, manu-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
2019-02-02 10:56:43 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-02-02 10:56:44 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-02-02 10:56:45 INFO SparkContext:54 - Running Spark version 2.4.0
2019-02-02 10:56:45 INFO SparkContext:54 - Submitted application: Spark Pi
2019-02-02 10:56:45 INFO SecurityManager:54 - Changing view acls to: manu
2019-02-02 10:56:45 INFO SecurityManager:54 - Changing modify acls to: manu
2019-02-02 10:56:45 INFO SecurityManager:54 - Changing view acls groups to:
2019-02-02 10:56:45 INFO SecurityManager:54 - Changing modify acls groups to:
2019-02-02 10:56:45 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(manu); groups with view permissions: Set(); users with modify permissions: Set(manu); groups with modify permissions: Set()
2019-02-02 10:56:46 INFO Utils:54 - Successfully started service 'sparkDriver' on port 32995.
2019-02-02 10:56:46 INFO SparkEnv:54 - Registering MapOutputTracker
2019-02-02 10:56:46 INFO SparkEnv:54 - Registering BlockManagerMaster
2019-02-02 10:56:46 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2019-02-02 10:56:46 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2019-02-02 10:56:46 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-13d95f47-51a8-4d27-8ebd-15cb0ee3d61a
2019-02-02 10:56:46 INFO MemoryStore:54 - MemoryStore started with capacity 413.9 MB
2019-02-02 10:56:46 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2019-02-02 10:56:46 INFO log:192 - Logging initialized #4685ms
2019-02-02 10:56:47 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2019-02-02 10:56:47 INFO Server:419 - Started #5030ms
2019-02-02 10:56:47 INFO AbstractConnector:278 - Started ServerConnector#46c6297b{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-02-02 10:56:47 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3f2049b6{/jobs,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6b85300e{/jobs/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3aaf4f07{/jobs/job,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#18e8473e{/jobs/job/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#5a2f016d{/stages,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1a38ba58{/stages/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3ad394e6{/stages/stage,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1deb2c43{/stages/stage/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3bb9efbc{/stages/pool,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1cefc4b3{/stages/pool/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#2b27cc70{/storage,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6f6a7463{/storage/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1bdaa23d{/storage/rdd,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#79f227a9{/storage/rdd/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6ca320ab{/environment,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#50d68830{/environment/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1e53135d{/executors,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#7674a051{/executors/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3a7704c{/executors/threadDump,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6754ef00{/executors/threadDump/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#619bd14c{/static,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#106faf11{/,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#70f43b45{/api,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#2c282004{/jobs/job/kill,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#22ee2d0{/stages/stage/kill,null,AVAILABLE,#Spark}
2019-02-02 10:56:47 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040
2019-02-02 10:56:47 INFO SparkContext:54 - Added JAR file:/home/manu/spark-2.4.0-bin-hadoop2.7/./examples/jars/spark-examples_2.11-2.4.0.jar at spark://10.0.2.15:32995/jars/spark-examples_2.11-2.4.0.jar with timestamp 1549105007905
2019-02-02 10:56:48 INFO Executor:54 - Starting executor ID driver on host localhost
2019-02-02 10:56:48 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42123.
2019-02-02 10:56:48 INFO NettyBlockTransferService:54 - Server created on 10.0.2.15:42123
2019-02-02 10:56:48 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2019-02-02 10:56:48 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, 10.0.2.15, 42123, None)
2019-02-02 10:56:48 INFO BlockManagerMasterEndpoint:54 - Registering block manager 10.0.2.15:42123 with 413.9 MB RAM, BlockManagerId(driver, 10.0.2.15, 42123, None)
2019-02-02 10:56:48 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, 10.0.2.15, 42123, None)
2019-02-02 10:56:48 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 42123, None)
2019-02-02 10:56:49 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#7e46d648{/metrics/json,null,AVAILABLE,#Spark}
2019-02-02 10:56:49 INFO SparkContext:54 - Starting job: reduce at SparkPi.scala:38
2019-02-02 10:56:50 INFO DAGScheduler:54 - Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
2019-02-02 10:56:50 INFO DAGScheduler:54 - Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
2019-02-02 10:56:50 INFO DAGScheduler:54 - Parents of final stage: List()
2019-02-02 10:56:50 INFO DAGScheduler:54 - Missing parents: List()
2019-02-02 10:56:50 INFO DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
2019-02-02 10:56:50 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB)
2019-02-02 10:56:50 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 MB)
2019-02-02 10:56:50 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 10.0.2.15:42123 (size: 1256.0 B, free: 413.9 MB)
2019-02-02 10:56:50 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1161
2019-02-02 10:56:50 INFO DAGScheduler:54 - Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
2019-02-02 10:56:50 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 10 tasks
2019-02-02 10:56:51 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:51 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2019-02-02 10:56:51 INFO Executor:54 - Fetching spark://10.0.2.15:32995/jars/spark-examples_2.11-2.4.0.jar with timestamp 1549105007905
2019-02-02 10:56:51 INFO TransportClientFactory:267 - Successfully created connection to /10.0.2.15:32995 after 110 ms (0 ms spent in bootstraps)
2019-02-02 10:56:51 INFO Utils:54 - Fetching spark://10.0.2.15:32995/jars/spark-examples_2.11-2.4.0.jar to /tmp/spark-3c47ed54-5a7a-4785-84e3-4b834b94b238/userFiles-f31cec9c-5bb9-41d0-b8c3-e18abe2be54a/fetchFileTemp4213110830681726950.tmp
2019-02-02 10:56:51 INFO Executor:54 - Adding file:/tmp/spark-3c47ed54-5a7a-4785-84e3-4b834b94b238/userFiles-f31cec9c-5bb9-41d0-b8c3-e18abe2be54a/spark-examples_2.11-2.4.0.jar to class loader
2019-02-02 10:56:52 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 910 bytes result sent to driver
2019-02-02 10:56:52 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:52 INFO Executor:54 - Running task 1.0 in stage 0.0 (TID 1)
2019-02-02 10:56:52 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 1113 ms on localhost (executor driver) (1/10)
2019-02-02 10:56:52 INFO Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 867 bytes result sent to driver
2019-02-02 10:56:52 INFO TaskSetManager:54 - Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:52 INFO Executor:54 - Running task 2.0 in stage 0.0 (TID 2)
2019-02-02 10:56:52 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 271 ms on localhost (executor driver) (2/10)
2019-02-02 10:56:52 INFO Executor:54 - Finished task 2.0 in stage 0.0 (TID 2). 824 bytes result sent to driver
2019-02-02 10:56:52 INFO TaskSetManager:54 - Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:52 INFO Executor:54 - Running task 3.0 in stage 0.0 (TID 3)
2019-02-02 10:56:52 INFO TaskSetManager:54 - Finished task 2.0 in stage 0.0 (TID 2) in 199 ms on localhost (executor driver) (3/10)
2019-02-02 10:56:52 INFO Executor:54 - Finished task 3.0 in stage 0.0 (TID 3). 867 bytes result sent to driver
2019-02-02 10:56:52 INFO TaskSetManager:54 - Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:52 INFO Executor:54 - Running task 4.0 in stage 0.0 (TID 4)
2019-02-02 10:56:52 INFO TaskSetManager:54 - Finished task 3.0 in stage 0.0 (TID 3) in 204 ms on localhost (executor driver) (4/10)
2019-02-02 10:56:52 INFO Executor:54 - Finished task 4.0 in stage 0.0 (TID 4). 824 bytes result sent to driver
2019-02-02 10:56:52 INFO TaskSetManager:54 - Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:52 INFO TaskSetManager:54 - Finished task 4.0 in stage 0.0 (TID 4) in 178 ms on localhost (executor driver) (5/10)
2019-02-02 10:56:52 INFO Executor:54 - Running task 5.0 in stage 0.0 (TID 5)
2019-02-02 10:56:53 INFO Executor:54 - Finished task 5.0 in stage 0.0 (TID 5). 824 bytes result sent to driver
2019-02-02 10:56:53 INFO TaskSetManager:54 - Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:53 INFO TaskSetManager:54 - Finished task 5.0 in stage 0.0 (TID 5) in 145 ms on localhost (executor driver) (6/10)
2019-02-02 10:56:53 INFO Executor:54 - Running task 6.0 in stage 0.0 (TID 6)
2019-02-02 10:56:53 INFO Executor:54 - Finished task 6.0 in stage 0.0 (TID 6). 867 bytes result sent to driver
2019-02-02 10:56:53 INFO TaskSetManager:54 - Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:53 INFO TaskSetManager:54 - Finished task 6.0 in stage 0.0 (TID 6) in 212 ms on localhost (executor driver) (7/10)
2019-02-02 10:56:53 INFO Executor:54 - Running task 7.0 in stage 0.0 (TID 7)
2019-02-02 10:56:53 INFO Executor:54 - Finished task 7.0 in stage 0.0 (TID 7). 867 bytes result sent to driver
2019-02-02 10:56:53 INFO TaskSetManager:54 - Starting task 8.0 in stage 0.0 (TID 8, localhost, executor driver, partition 8, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:53 INFO TaskSetManager:54 - Finished task 7.0 in stage 0.0 (TID 7) in 152 ms on localhost (executor driver) (8/10)
2019-02-02 10:56:53 INFO Executor:54 - Running task 8.0 in stage 0.0 (TID 8)
2019-02-02 10:56:53 INFO Executor:54 - Finished task 8.0 in stage 0.0 (TID 8). 867 bytes result sent to driver
2019-02-02 10:56:53 INFO TaskSetManager:54 - Starting task 9.0 in stage 0.0 (TID 9, localhost, executor driver, partition 9, PROCESS_LOCAL, 7866 bytes)
2019-02-02 10:56:53 INFO TaskSetManager:54 - Finished task 8.0 in stage 0.0 (TID 8) in 103 ms on localhost (executor driver) (9/10)
2019-02-02 10:56:53 INFO Executor:54 - Running task 9.0 in stage 0.0 (TID 9)
2019-02-02 10:56:53 INFO Executor:54 - Finished task 9.0 in stage 0.0 (TID 9). 867 bytes result sent to driver
2019-02-02 10:56:53 INFO TaskSetManager:54 - Finished task 9.0 in stage 0.0 (TID 9) in 79 ms on localhost (executor driver) (10/10)
2019-02-02 10:56:53 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2019-02-02 10:56:53 INFO DAGScheduler:54 - ResultStage 0 (reduce at SparkPi.scala:38) finished in 3.287 s
2019-02-02 10:56:53 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 3.700842 s
Pi is roughly 3.142931142931143
2019-02-02 10:56:53 INFO AbstractConnector:318 - Stopped Spark#46c6297b{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-02-02 10:56:53 INFO SparkUI:54 - Stopped Spark web UI at http://10.0.2.15:4040
2019-02-02 10:56:53 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2019-02-02 10:56:53 INFO MemoryStore:54 - MemoryStore cleared
2019-02-02 10:56:53 INFO BlockManager:54 - BlockManager stopped
2019-02-02 10:56:53 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2019-02-02 10:56:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2019-02-02 10:56:53 INFO SparkContext:54 - Successfully stopped SparkContext
2019-02-02 10:56:53 INFO ShutdownHookManager:54 - Shutdown hook called
2019-02-02 10:56:53 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-3c47ed54-5a7a-4785-84e3-4b834b94b238
2019-02-02 10:56:54 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-2dcaa58b-d605-40dc-8dd8-df55607f1a59
It seems to me, as stated by #ruslangm, that the expected output is actually there:
Maybe we didn't get the question.
Instead of printing on console, try to save result in file. As there is stdout during execution and it will be difficult to find out result in that, but I can see result in your output.
Related
we have a 3 node Mesos cluster. Master service was started on machine 1 using below command:
sudo ./bin/mesos-master.sh --ip=machine1-ip --work_dir=/home/mapr/mesos/mesos-1.7.0/build/workDir --zk=zk://machine1-ip:2181/mesos --quorum=1
and agent services on other 2 machines using below command:
sudo ./bin/mesos-agent.sh --containerizers=docker --master=zk://machine1-ip:2181/mesos --work_dir=/home/mapr/mesos/mesos-1.7.0/build/workDir --ip=machine2-ip --no-systemd_enable_support
sudo ./bin/mesos-agent.sh --containerizers=docker --master=zk://machine1-ip:2181/mesos --work_dir=/home/mapr/mesos/mesos-1.7.0/build/workDir --ip=machine3-ip --no-systemd_enable_support
Below property was set in machine1:
export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
we are trying to run spark job using docker image.
Note that we did not set "SPARK_EXECUTOR_URI" in machine1 because as per our understanding executor is going to run inside docker container and not on slave machine and hence this property is not required.
command used for spark submit is below(from machine 1):
/home/mapr/newSpark/spark-2.4.0-bin-hadoop2.7/bin/spark-submit \
--master mesos://machine1:5050 \
--deploy-mode client \
--class com.learning.spark.WordCount \
--conf spark.mesos.executor.docker.image=mesosphere/spark:2.4.0-2.2.1-3-hadoop-2.7 \
/home/mapr/mesos/wordcount.jar hdfs://machine2:8020/hdfslocation/input.txt hdfs://machine2:8020/hdfslocation/output
we are getting below error on spark submit:
Mesos task log:
I1211 20:27:55.040856 5996 exec.cpp:162] Version: 1.7.0
I1211 20:27:55.064775 6016 exec.cpp:236] Executor registered on agent 44c2e848-cd06-4546-b0e9-15537084df1b-S1
I1211 20:27:55.068828 6018 executor.cpp:130] Registered docker executor on company-i0058.company.co.in
I1211 20:27:55.069756 6016 executor.cpp:186] Starting task 3
/bin/sh: 1: /home/mapr/newSpark/spark-2.4.0-bin-hadoop2.7/./bin/spark-class: not found
I1211 20:27:57.669881 6017 executor.cpp:736] Container exited with status 127
I1211 20:27:58.672829 6019 process.cpp:926] Stopped the socket accept loop
messages on the terminal:
2018-12-11 20:27:49 INFO SparkContext:54 - Running Spark version 2.4.0
2018-12-11 20:27:49 INFO SparkContext:54 - Submitted application: WordCount
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing view acls to: mapr
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing modify acls to: mapr
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing view acls groups to:
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing modify acls groups to:
2018-12-11 20:27:49 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mapr); groups with view permissions: Set(); users with modify permissions: Set(mapr); groups with modify permissions: Set()
2018-12-11 20:27:49 INFO Utils:54 - Successfully started service 'sparkDriver' on port 48069.
2018-12-11 20:27:49 INFO SparkEnv:54 - Registering MapOutputTracker
2018-12-11 20:27:49 INFO SparkEnv:54 - Registering BlockManagerMaster
2018-12-11 20:27:49 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-12-11 20:27:49 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-12-11 20:27:49 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-3a4afff7-b050-45ba-bb50-c9f4ec5cc031
2018-12-11 20:27:49 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-12-11 20:27:49 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2018-12-11 20:27:49 INFO log:192 - Logging initialized #3157ms
2018-12-11 20:27:50 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2018-12-11 20:27:50 INFO Server:419 - Started #3273ms
2018-12-11 20:27:50 INFO AbstractConnector:278 - Started ServerConnector#1cfd1875{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-12-11 20:27:50 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6f0628de{/jobs,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#2b27cc70{/jobs/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6f6a7463{/jobs/job,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#79f227a9{/jobs/job/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6ca320ab{/stages,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#50d68830{/stages/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1e53135d{/stages/stage,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6754ef00{/stages/stage/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#619bd14c{/stages/pool,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#323e8306{/stages/pool/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#a23a01d{/storage,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4acf72b6{/storage/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#7561db12{/storage/rdd,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3301500b{/storage/rdd/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#24b52d3e{/environment,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#15deb1dc{/environment/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6e9c413e{/executors,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#57a4d5ee{/executors/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#5af5def9{/executors/threadDump,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3a45c42a{/executors/threadDump/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#36dce7ed{/static,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4b770e40{/,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#78e16155{/api,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#19868320{/jobs/job/kill,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#50b0bc4c{/stages/stage/kill,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://machine1:4040
2018-12-11 20:27:50 INFO SparkContext:54 - Added JAR file:/home/mapr/mesos/wordcount.jar at spark://machine1:48069/jars/wordcount.jar with timestamp 1544540270193
I1211 20:27:50.557170 7462 sched.cpp:232] Version: 1.7.0
I1211 20:27:50.560644 7454 sched.cpp:336] New master detected at master#machine1:5050
I1211 20:27:50.561132 7454 sched.cpp:356] No credentials provided. Attempting to register without authentication
I1211 20:27:50.571651 7456 sched.cpp:744] Framework registered with 5260e4c8-de1c-4772-b5a7-340480594ef4-0000
2018-12-11 20:27:50 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56351.
2018-12-11 20:27:50 INFO NettyBlockTransferService:54 - Server created on machine1:56351
2018-12-11 20:27:50 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-12-11 20:27:50 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, impetus-i0053.impetus.co.in, 56351, None)
2018-12-11 20:27:50 INFO BlockManagerMasterEndpoint:54 - Registering block manager machine1:56351 with 366.3 MB RAM, BlockManagerId(driver, impetus-i0053.impetus.co.in, 56351, None)
2018-12-11 20:27:50 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, machine1, 56351, None)
2018-12-11 20:27:50 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, machine1, 56351, None)
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#73ba6fe6{/metrics/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO MesosCoarseGrainedSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
2018-12-11 20:27:51 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 0 is now TASK_STARTING
2018-12-11 20:27:51 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 1 is now TASK_STARTING
2018-12-11 20:27:51 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 288.1 KB, free 366.0 MB)
2018-12-11 20:27:51 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 25.1 KB, free 366.0 MB)
2018-12-11 20:27:51 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on machine1:56351 (size: 25.1 KB, free: 366.3 MB)
2018-12-11 20:27:51 INFO SparkContext:54 - Created broadcast 0 from textFile at WordCount.scala:22
2018-12-11 20:27:52 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-12-11 20:27:52 INFO FileInputFormat:249 - Total input paths to process : 1
2018-12-11 20:27:53 INFO deprecation:1173 - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2018-12-11 20:27:53 INFO HadoopMapRedCommitProtocol:54 - Using output committer class org.apache.hadoop.mapred.FileOutputCommitter
2018-12-11 20:27:53 INFO FileOutputCommitter:108 - File Output Committer Algorithm version is 1
2018-12-11 20:27:53 INFO SparkContext:54 - Starting job: runJob at SparkHadoopWriter.scala:78
2018-12-11 20:27:53 INFO DAGScheduler:54 - Registering RDD 3 (map at WordCount.scala:24)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Got job 0 (runJob at SparkHadoopWriter.scala:78) with 2 output partitions
2018-12-11 20:27:53 INFO DAGScheduler:54 - Final stage: ResultStage 1 (runJob at SparkHadoopWriter.scala:78)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Parents of final stage: List(ShuffleMapStage 0)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Missing parents: List(ShuffleMapStage 0)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24), which has no missing parents
2018-12-11 20:27:53 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 1 is now TASK_RUNNING
2018-12-11 20:27:53 INFO MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 5.0 KB, free 366.0 MB)
2018-12-11 20:27:53 INFO MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.9 KB, free 366.0 MB)
2018-12-11 20:27:53 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on machine1:56351 (size: 2.9 KB, free: 366.3 MB)
2018-12-11 20:27:53 INFO SparkContext:54 - Created broadcast 1 from broadcast at DAGScheduler.scala:1161
2018-12-11 20:27:53 INFO DAGScheduler:54 - Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24) (first 15 tasks are for partitions Vector(0, 1))
2018-12-11 20:27:53 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks
2018-12-11 20:27:53 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 0 is now TASK_RUNNING
2018-12-11 20:27:54 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 0 is now TASK_FAILED
2018-12-11 20:27:54 INFO BlockManagerMaster:54 - Removal of executor 0 requested
2018-12-11 20:27:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 0
2018-12-11 20:27:54 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 0 from BlockManagerMaster.
2018-12-11 20:27:54 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 1 is now TASK_FAILED
2018-12-11 20:27:54 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 1 from BlockManagerMaster.
2018-12-11 20:27:54 INFO BlockManagerMaster:54 - Removal of executor 1 requested
2018-12-11 20:27:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 1
2018-12-11 20:27:54 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 2 is now TASK_STARTING
2018-12-11 20:27:55 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 3 is now TASK_STARTING
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 2 is now TASK_RUNNING
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 2 is now TASK_FAILED
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Blacklisting Mesos slave b92da3e9-a9c4-422a-babe-c5fb0f33e027-S0 due to too many failures; is Spark installed on it?
2018-12-11 20:27:57 INFO BlockManagerMaster:54 - Removal of executor 2 requested
2018-12-11 20:27:57 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 2
2018-12-11 20:27:57 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 2 from BlockManagerMaster.
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 3 is now TASK_RUNNING
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 3 is now TASK_FAILED
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Blacklisting Mesos slave 44c2e848-cd06-4546-b0e9-15537084df1b-S1 due to too many failures; is Spark installed on it?
2018-12-11 20:27:57 INFO BlockManagerMaster:54 - Removal of executor 3 requested
2018-12-11 20:27:57 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 3 from BlockManagerMaster.
2018-12-11 20:27:57 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 3
2018-12-11 20:28:08 WARN TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Pyspark hangs with the below input.
Note it does not hang with Scala console.
Python 3.6.5 (default, Jun 17 2018, 12:13:06)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
2018-06-21 10:27:37 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.1
/_/
Using Python version 3.6.5 (default, Jun 17 2018 12:13:06)
SparkSession available as 'spark'.
>>> sc.parallelize((1,1)).count() <-----------HANGS!
Anyone have any idea why this is happening? I tried reinstalling everything, java, spark, homebrew, deleting entire /usr/local directory. All out of ideas.
Different test program
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
x = sc.parallelize((1,1)).count()
print("count: ", x)
Output from spark submit
Spark-Submit output, with a similar test python file output
2018-06-21 10:31:47 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-06-21 10:31:47 INFO SparkContext:54 - Running Spark version 2.3.1
2018-06-21 10:31:47 INFO SparkContext:54 - Submitted application: test_spark.py
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing view acls to: jonedoe
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing modify acls to: jonedoe
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing view acls groups to:
2018-06-21 10:31:47 INFO SecurityManager:54 - Changing modify acls groups to:
2018-06-21 10:31:47 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jonedoe); groups with view permissions: Set(); users with modify permissions: Set(jonedoe); groups with modify permissions: Set()
2018-06-21 10:31:47 INFO Utils:54 - Successfully started service 'sparkDriver' on port 61556.
2018-06-21 10:31:47 INFO SparkEnv:54 - Registering MapOutputTracker
2018-06-21 10:31:47 INFO SparkEnv:54 - Registering BlockManagerMaster
2018-06-21 10:31:47 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-06-21 10:31:47 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-06-21 10:31:47 INFO DiskBlockManager:54 - Created local directory at /private/var/folders/gq/tm5q47gn6x363h5m_c86my_00000gp/T/blockmgr-5c0bfcf2-9009-46b5-bcd7-4fa5ec605a89
2018-06-21 10:31:47 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-06-21 10:31:47 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2018-06-21 10:31:48 INFO log:192 - Logging initialized #2297ms
2018-06-21 10:31:48 INFO Server:346 - jetty-9.3.z-SNAPSHOT
2018-06-21 10:31:48 INFO Server:414 - Started #2378ms
2018-06-21 10:31:48 INFO AbstractConnector:278 - Started ServerConnector#84802a{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-06-21 10:31:48 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#79c67e6f{/jobs,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6889c329{/jobs/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3a8c9a58{/jobs/job,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6e04f8ff{/jobs/job/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4832ee9d{/stages,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1632f399{/stages/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#398a3a30{/stages/stage,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#2eb62024{/stages/stage/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4685c478{/stages/pool,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#31053558{/stages/pool/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#537d3185{/storage,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4c559cce{/storage/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#249b3738{/storage/rdd,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3c2c6906{/storage/rdd/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6e7861f{/environment,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#66b4d9e1{/environment/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1b6b10f8{/executors,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#44502eca{/executors/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#7ebd8f21{/executors/threadDump,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3e862ac6{/executors/threadDump/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#7d29113e{/static,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#388c37ce{/,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#22374681{/api,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#dcbeb70{/jobs/job/kill,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#322ceede{/stages/stage/kill,null,AVAILABLE,#Spark}
2018-06-21 10:31:48 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://ip-192-168-65-180.ec2.internal:4040
2018-06-21 10:31:48 INFO SparkContext:54 - Added file file:/Users/jonedoe/code/test_spark.py at file:/Users/jonedoe/code/test_spark.py with timestamp 1529602308500
2018-06-21 10:31:48 INFO Utils:54 - Copying /Users/jonedoe/code/test_spark.py to /private/var/folders/gq/tm5q47gn6x363h5m_c86my_00000gp/T/spark-99983724-420e-4bc0-ad1f-3bc41bba9114/userFiles-999bdcde-1e5d-4e9a-98ce-c6ecdaee0739/test_spark.py
2018-06-21 10:31:48 INFO Executor:54 - Starting executor ID driver on host localhost
2018-06-21 10:31:48 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61557.
2018-06-21 10:31:48 INFO NettyBlockTransferService:54 - Server created on ip-192-168-65-180.ec2.internal:61557
2018-06-21 10:31:48 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-06-21 10:31:48 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO BlockManagerMasterEndpoint:54 - Registering block manager ip-192-168-65-180.ec2.internal:61557 with 366.3 MB RAM, BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, ip-192-168-65-180.ec2.internal, 61557, None)
2018-06-21 10:31:48 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#2d1fafea{/metrics/json,null,AVAILABLE,#Spark}
2018-06-21 10:31:49 INFO SparkContext:54 - Starting job: count at /Users/jonedoe/code/test_spark.py:4
2018-06-21 10:31:49 INFO DAGScheduler:54 - Got job 0 (count at /Users/jonedoe/code/test_spark.py:4) with 8 output partitions
2018-06-21 10:31:49 INFO DAGScheduler:54 - Final stage: ResultStage 0 (count at /Users/jonedoe/code/test_spark.py:4)
2018-06-21 10:31:49 INFO DAGScheduler:54 - Parents of final stage: List()
2018-06-21 10:31:49 INFO DAGScheduler:54 - Missing parents: List()
2018-06-21 10:31:49 INFO DAGScheduler:54 - Submitting ResultStage 0 (PythonRDD[1] at count at /Users/jonedoe/code/test_spark.py:4), which has no missing parents
2018-06-21 10:31:49 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 5.0 KB, free 366.3 MB)
2018-06-21 10:31:49 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.4 KB, free 366.3 MB)
2018-06-21 10:31:49 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on ip-192-168-65-180.ec2.internal:61557 (size: 3.4 KB, free: 366.3 MB)
2018-06-21 10:31:49 INFO SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1039
2018-06-21 10:31:49 INFO DAGScheduler:54 - Submitting 8 missing tasks from ResultStage 0 (PythonRDD[1] at count at /Users/jonedoe/code/test_spark.py:4) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7))
2018-06-21 10:31:49 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 8 tasks
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 7858 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 7839 bytes)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 7858 bytes)
2018-06-21 10:31:49 INFO Executor:54 - Running task 3.0 in stage 0.0 (TID 3)
2018-06-21 10:31:49 INFO Executor:54 - Running task 2.0 in stage 0.0 (TID 2)
2018-06-21 10:31:49 INFO Executor:54 - Running task 4.0 in stage 0.0 (TID 4)
2018-06-21 10:31:49 INFO Executor:54 - Running task 1.0 in stage 0.0 (TID 1)
2018-06-21 10:31:49 INFO Executor:54 - Running task 6.0 in stage 0.0 (TID 6)
2018-06-21 10:31:49 INFO Executor:54 - Running task 7.0 in stage 0.0 (TID 7)
2018-06-21 10:31:49 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2018-06-21 10:31:49 INFO Executor:54 - Running task 5.0 in stage 0.0 (TID 5)
2018-06-21 10:31:49 INFO Executor:54 - Fetching file:/Users/jonedoe/code/test_spark.py with timestamp 1529602308500
2018-06-21 10:31:49 INFO Utils:54 - /Users/jonedoe/code/test_spark.py has been previously copied to /private/var/folders/gq/tm5q47gn6x363h5m_c86my_00000gp/T/spark-99983724-420e-4bc0-ad1f-3bc41bba9114/userFiles-999bdcde-1e5d-4e9a-98ce-c6ecdaee0739/test_spark.py
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 397, boot = 389, init = 8, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 399, boot = 396, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 406, boot = 403, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 413, boot = 410, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 420, boot = 417, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 426, boot = 423, init = 2, finish = 1
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 433, boot = 430, init = 3, finish = 0
2018-06-21 10:31:49 INFO PythonRunner:54 - Times: total = 441, boot = 437, init = 3, finish = 1
2018-06-21 10:31:49 INFO Executor:54 - Finished task 5.0 in stage 0.0 (TID 5). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 2.0 in stage 0.0 (TID 2). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 3.0 in stage 0.0 (TID 3). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 6.0 in stage 0.0 (TID 6). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 7.0 in stage 0.0 (TID 7). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 4.0 in stage 0.0 (TID 4). 1267 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 1310 bytes result sent to driver
2018-06-21 10:31:49 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 1310 bytes result sent to driver
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 5.0 in stage 0.0 (TID 5) in 580 ms on localhost (executor driver) (1/8)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 3.0 in stage 0.0 (TID 3) in 586 ms on localhost (executor driver) (2/8)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 2.0 in stage 0.0 (TID 2) in 587 ms on localhost (executor driver) (3/8)
2018-06-21 10:31:49 INFO TaskSetManager:54 - Finished task 6.0 in stage 0.0 (TID 6) in 583 ms on localhost (executor driver) (4/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 4.0 in stage 0.0 (TID 4) in 586 ms on localhost (executor driver) (5/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 7.0 in stage 0.0 (TID 7) in 584 ms on localhost (executor driver) (6/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 608 ms on localhost (executor driver) (7/8)
2018-06-21 10:31:50 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 590 ms on localhost (executor driver) (8/8)
2018-06-21 10:31:50 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2018-06-21 10:31:50 INFO DAGScheduler:54 - ResultStage 0 (count at /Users/jonedoe/code/test_spark.py:4) finished in 0.774 s
2018-06-21 10:31:50 INFO DAGScheduler:54 - Job 0 finished: count at /Users/jonedoe/code/test_spark.py:4, took 0.825530 s
HANGS AFTER HERE.........
Looks like my anti-virus (Bitdefender) was the culprit.
For some reason it was blocking spark.
I'm trying to setup an 8-node cluster on 8 RHEL 7.3 x86 machines using Spark 2.0.1. start-master.sh goes through fine:
Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.102-4.b14.el7.x86_64/jre/bin/java -cp /usr/local/bin/spark-2.0.1-bin-hadoop2.7/conf/:/usr/local/bin/spark-2.0.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host lambda.foo.net --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/12/08 04:26:46 INFO Master: Started daemon with process name: 22181#lambda.foo.net
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for TERM
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for HUP
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for INT
16/12/08 04:26:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/08 04:26:46 INFO SecurityManager: Changing view acls to: root
16/12/08 04:26:46 INFO SecurityManager: Changing modify acls to: root
16/12/08 04:26:46 INFO SecurityManager: Changing view acls groups to:
16/12/08 04:26:46 INFO SecurityManager: Changing modify acls groups to:
16/12/08 04:26:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
16/12/08 04:26:46 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
16/12/08 04:26:46 INFO Master: Starting Spark master at spark://lambda.foo.net:7077
16/12/08 04:26:46 INFO Master: Running Spark version 2.0.1
16/12/08 04:26:46 INFO Utils: Successfully started service 'MasterUI' on port 8080.
16/12/08 04:26:46 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://19.341.11.212:8080
16/12/08 04:26:46 INFO Utils: Successfully started service on port 6066.
16/12/08 04:26:46 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066
16/12/08 04:26:46 INFO Master: I have been elected leader! New state: ALIVE
But when I try to bring up the workers, using start-slaves.sh, what I see in the log of the workers is:
Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.102-4.b14.el7.x86_64/jre/bin/java -cp /usr/local/bin/spark-2.0.1-bin-hadoop2.7/conf/:/usr/local/bin/spark-2.0.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://lambda.foo.net:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/12/08 04:30:00 INFO Worker: Started daemon with process name: 14649#hawk040os4.foo.net
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for TERM
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for HUP
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for INT
16/12/08 04:30:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/08 04:30:00 INFO SecurityManager: Changing view acls to: root
16/12/08 04:30:00 INFO SecurityManager: Changing modify acls to: root
16/12/08 04:30:00 INFO SecurityManager: Changing view acls groups to:
16/12/08 04:30:00 INFO SecurityManager: Changing modify acls groups to:
16/12/08 04:30:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
16/12/08 04:30:00 INFO Utils: Successfully started service 'sparkWorker' on port 35858.
16/12/08 04:30:00 INFO Worker: Starting Spark worker 15.242.22.179:35858 with 24 cores, 1510.2 GB RAM
16/12/08 04:30:00 INFO Worker: Running Spark version 2.0.1
16/12/08 04:30:00 INFO Worker: Spark home: /usr/local/bin/spark-2.0.1-bin-hadoop2.7
16/12/08 04:30:00 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
16/12/08 04:30:00 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://15.242.22.179:8081
16/12/08 04:30:00 INFO Worker: Connecting to master lambda.foo.net:7077...
16/12/08 04:30:00 WARN Worker: Failed to connect to master lambda.foo.net:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to connect to lambda.foo.net/19.341.11.212:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
... 4 more
Caused by: java.net.NoRouteToHostException: No route to host: lambda.foo.net/19.341.11.212:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/12/08 04:30:12 INFO Worker: Retrying connection to master (attempt # 1)
16/12/08 04:30:12 INFO Worker: Connecting to master lambda.foo.net:7077...
16/12/08 04:30:12 WARN Worker: Failed to connect to master lambda.foo.net:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
So it says "No route to host". But I could successfully ping the master from the worker node, as well as ssh from the worker to the master node.
Why does spark say "No route to host"?
Problem solved: the firewall was blocking the packets.
I can successfully run the java version of pi example as follows.
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-client \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10
However, the python version failed with the following error information. I used yarn-client mode. The pyspark command line with yarn-client mode returned the same info. Can anyone help me to figure out this problem?
nlp#yyy2:~/spark$ ./bin/spark-submit --master yarn-client examples/src/main/python/pi.py
15/01/05 17:22:26 INFO spark.SecurityManager: Changing view acls to: nlp
15/01/05 17:22:26 INFO spark.SecurityManager: Changing modify acls to: nlp
15/01/05 17:22:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nlp); users with modify permissions: Set(nlp)
15/01/05 17:22:26 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/01/05 17:22:26 INFO Remoting: Starting remoting
15/01/05 17:22:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#yyy2:42747]
15/01/05 17:22:26 INFO util.Utils: Successfully started service 'sparkDriver' on port 42747.
15/01/05 17:22:26 INFO spark.SparkEnv: Registering MapOutputTracker
15/01/05 17:22:26 INFO spark.SparkEnv: Registering BlockManagerMaster
15/01/05 17:22:26 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150105172226-aeae
15/01/05 17:22:26 INFO storage.MemoryStore: MemoryStore started with capacity 265.1 MB
15/01/05 17:22:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/05 17:22:27 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-cbe0079b-79c5-426b-b67e-548805423b11
15/01/05 17:22:27 INFO spark.HttpServer: Starting HTTP Server
15/01/05 17:22:27 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/01/05 17:22:27 INFO server.AbstractConnector: Started SocketConnector#0.0.0.0:57169
15/01/05 17:22:27 INFO util.Utils: Successfully started service 'HTTP file server' on port 57169.
15/01/05 17:22:27 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/01/05 17:22:27 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
15/01/05 17:22:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/01/05 17:22:27 INFO ui.SparkUI: Started SparkUI at http://yyy2:4040
15/01/05 17:22:27 INFO client.RMProxy: Connecting to ResourceManager at yyy14/10.112.168.195:8032
15/01/05 17:22:27 INFO yarn.Client: Requesting a new application from cluster with 6 NodeManagers
15/01/05 17:22:27 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/01/05 17:22:27 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/01/05 17:22:27 INFO yarn.Client: Setting up container launch context for our AM
15/01/05 17:22:27 INFO yarn.Client: Preparing resources for our AM container
15/01/05 17:22:28 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 24 for xxx on ha-hdfs:hzdm-cluster1
15/01/05 17:22:28 INFO yarn.Client: Uploading resource file:/home/nlp/platform/spark-1.2.0-bin-2.5.2/lib/spark-assembly-1.2.0-hadoop2.5.2.jar -> hdfs://hzdm-cluster1/user/nlp/.sparkStaging/application_1420444011562_0023/spark-assembly-1.2.0-hadoop2.5.2.jar
15/01/05 17:22:29 INFO yarn.Client: Uploading resource file:/home/nlp/platform/spark-1.2.0-bin-2.5.2/examples/src/main/python/pi.py -> hdfs://hzdm-cluster1/user/nlp/.sparkStaging/application_1420444011562_0023/pi.py
15/01/05 17:22:29 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/05 17:22:29 INFO spark.SecurityManager: Changing view acls to: nlp
15/01/05 17:22:29 INFO spark.SecurityManager: Changing modify acls to: nlp
15/01/05 17:22:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nlp); users with modify permissions: Set(nlp)
15/01/05 17:22:29 INFO yarn.Client: Submitting application 23 to ResourceManager
15/01/05 17:22:30 INFO impl.YarnClientImpl: Submitted application application_1420444011562_0023
15/01/05 17:22:31 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:31 INFO yarn.Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.default
start time: 1420449749969
final status: UNDEFINED
tracking URL: http://yyy14:8070/proxy/application_1420444011562_0023/
user: nlp
15/01/05 17:22:32 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:33 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:34 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:35 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:36 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:36 INFO cluster.YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM#yyy16:52855/user/YarnAM#435880073]
15/01/05 17:22:36 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> yyy14, PROXY_URI_BASES -> http://yyy14:8070/proxy/application_1420444011562_0023), /proxy/application_1420444011562_0023
15/01/05 17:22:36 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/01/05 17:22:37 INFO yarn.Client: Application report for application_1420444011562_0023 (state: RUNNING)
15/01/05 17:22:37 INFO yarn.Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: yyy16
ApplicationMaster RPC port: 0
queue: root.default
start time: 1420449749969
final status: UNDEFINED
tracking URL: http://yyy14:8070/proxy/application_1420444011562_0023/
user: nlp
15/01/05 17:22:37 INFO cluster.YarnClientSchedulerBackend: Application application_1420444011562_0023 has started running.
15/01/05 17:22:37 INFO netty.NettyBlockTransferService: Server created on 35648
15/01/05 17:22:37 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/01/05 17:22:37 INFO storage.BlockManagerMasterActor: Registering block manager yyy2:35648 with 265.1 MB RAM, BlockManagerId(<driver>, yyy2, 35648)
15/01/05 17:22:37 INFO storage.BlockManagerMaster: Registered BlockManager
15/01/05 17:22:37 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM#yyy16:52855] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/01/05 17:22:38 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs,null}
15/01/05 17:22:38 INFO ui.SparkUI: Stopped Spark web UI at http://yyy2:4040
15/01/05 17:22:38 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Stopped
15/01/05 17:22:39 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/01/05 17:22:39 INFO storage.MemoryStore: MemoryStore cleared
15/01/05 17:22:39 INFO storage.BlockManager: BlockManager stopped
15/01/05 17:22:39 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/01/05 17:22:39 INFO spark.SparkContext: Successfully stopped SparkContext
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/01/05 17:22:57 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
Traceback (most recent call last):
File "/home/nlp/platform/spark-1.2.0-bin-2.5.2/examples/src/main/python/pi.py", line 29, in <module>
sc = SparkContext(appName="PythonPi")
File "/home/nlp/spark/python/pyspark/context.py", line 105, in __init__
conf, jsc)
File "/home/nlp/spark/python/pyspark/context.py", line 153, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "/home/nlp/spark/python/pyspark/context.py", line 201, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "/home/nlp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__
File "/home/nlp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
If you're running this example on Java 8, this may be due to Java 8's excessive memory allocation strategy: https://issues.apache.org/jira/browse/YARN-4714
You can force YARN to ignore this by setting up the following properties in yarn-site.xml
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
Try with deploy mode parameter, like this:
--deploy-mode cluster
I had problem like your, with this parameter it worked.
I experienced a similar problem using spark-submit and yarn-client (I got the same NPE/stacktrace). Tuning down my memory settings did the trick. It seems to fail like this when you try to allot too much memory. I would start by removing the --executor-memory and --driver-memory switches.
I reduced the number of cores in the Advanced spark-env to make it work.
I ran into this issue running (hdp 2.3 spark 1.3.1)
spark-shell
--master yarn-client
--driver-memory 4g
--executor-memory 4g
--executor-cores 1
--num-executors 4
Solution for me was to set the spark config value:
spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.0.0-2557
I am trying graphx with live journal data, https://snap.stanford.edu/data/soc-LiveJournal1.html.
I have a cluster of 10 computing nodes. Each computing node has 64G RAM and 32 cores.
When I run pagerank algorithm using 9 worker nodes, it's slower than running it using just 1 woker node. I suspect I am not utilizing all the memory and/or cores due to some configuration issues.
I went through configuration, tuning and programming guide for spark.
I am using spark-shell to run the script which invoke by
./spark-shell --executor-memory 50g
I had the workers and master running. when I start the spark-shell I get the following logs
14/07/09 17:26:10 INFO Slf4jLogger: Slf4jLogger started
14/07/09 17:26:10 INFO Remoting: Starting remoting
14/07/09 17:26:10 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark#node0472.local:60035]
14/07/09 17:26:10 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark#node0472.local:60035]
14/07/09 17:26:10 INFO SparkEnv: Registering MapOutputTracker
14/07/09 17:26:10 INFO SparkEnv: Registering BlockManagerMaster
14/07/09 17:26:10 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140709172610-7f5e
14/07/09 17:26:10 INFO MemoryStore: MemoryStore started with capacity 294.4 MB.
14/07/09 17:26:10 INFO ConnectionManager: Bound socket to port 45700 with id = ConnectionManagerId(node0472.local,45700)
14/07/09 17:26:10 INFO BlockManagerMaster: Trying to register BlockManager
14/07/09 17:26:10 INFO BlockManagerInfo: Registering block manager node0472.local:45700 with 294.4 MB RAM
14/07/09 17:26:10 INFO BlockManagerMaster: Registered BlockManager
14/07/09 17:26:10 INFO HttpServer: Starting HTTP Server
14/07/09 17:26:10 INFO HttpBroadcast: Broadcast server started at http://172.16.104.72:48116
14/07/09 17:26:10 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7b4a7c3c-9fc9-4a64-b2ac-5f328abe9265
14/07/09 17:26:10 INFO HttpServer: Starting HTTP Server
14/07/09 17:26:11 INFO SparkUI: Started SparkUI at http://node0472.local:4040
14/07/09 17:26:12 INFO AppClient$ClientActor: Connecting to master spark://node0472.local:7077...
14/07/09 17:26:12 INFO SparkILoop: Created spark context..
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140709172612-0007
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/0 on worker-20140709162149-node0476.local-53728 (node0476.local:53728) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/0 on hostPort node0476.local:53728 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/1 on worker-20140709162145-node0475.local-56009 (node0475.local:56009) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/1 on hostPort node0475.local:56009 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/2 on worker-20140709162141-node0474.local-58108 (node0474.local:58108) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/2 on hostPort node0474.local:58108 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/3 on worker-20140709170011-node0480.local-49021 (node0480.local:49021) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/3 on hostPort node0480.local:49021 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/4 on worker-20140709165929-node0479.local-53886 (node0479.local:53886) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/4 on hostPort node0479.local:53886 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/5 on worker-20140709170036-node0481.local-60958 (node0481.local:60958) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/5 on hostPort node0481.local:60958 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/6 on worker-20140709162151-node0477.local-44550 (node0477.local:44550) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/6 on hostPort node0477.local:44550 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/7 on worker-20140709162138-node0473.local-42025 (node0473.local:42025) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/7 on hostPort node0473.local:42025 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor added: app-20140709172612-0007/8 on worker-20140709162156-node0478.local-52943 (node0478.local:52943) with 32 cores
14/07/09 17:26:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709172612-0007/8 on hostPort node0478.local:52943 with 32 cores, 50.0 GB RAM
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/1 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/0 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/2 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/3 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/6 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/4 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/5 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/8 is now RUNNING
14/07/09 17:26:12 INFO AppClient$ClientActor: Executor updated: app-20140709172612-0007/7 is now RUNNING
Spark context available as sc.
scala> 14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor#node0479.local:47343/user/Executor#1253632521] with ID 4
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor#node0474.local:39431/user/Executor#1607018658] with ID 2
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor#node0481.local:53722/user/Executor#-1846270627] with ID 5
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor#node0477.local:40185/user/Executor#-111495591] with ID 6
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor#node0473.local:36426/user/Executor#652192289] with ID 7
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor#node0480.local:37230/user/Executor#-1581927012] with ID 3
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor#node0475.local:46363/user/Executor#-182973444] with ID 1
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor#node0476.local:58053/user/Executor#609775393] with ID 0
14/07/09 17:26:18 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor#node0478.local:55152/user/Executor#-2126598605] with ID 8
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0474.local:60025 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0473.local:33992 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0481.local:46513 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0477.local:37455 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0475.local:33829 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0479.local:56433 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0480.local:38134 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0476.local:46284 with 28.8 GB RAM
14/07/09 17:26:19 INFO BlockManagerInfo: Registering block manager node0478.local:43187 with 28.8 GB RAM
According to logs, I believe my application was registered on workers and each executor had 50g of RAM. Now, I run the following scala code on my terminal to load data and compute pagerank
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
val startgraphloading = System.currentTimeMillis;
val graph = GraphLoader.edgeListFile(sc, "filepath").cache();
graph.cache();
val endgraphloading = System.currentTimeMillis;
val startpr1 = System.currentTimeMillis;
val prGraph = graph.staticPageRank(1)
val endpr1 = System.currentTimeMillis;
val startpr2 = System.currentTimeMillis;
val prGraph = graph.staticPageRank(5)
val endpr2 = System.currentTimeMillis;
val loadingt = endgraphloading - startgraphloading;
val firstt = endpr1 - startpr1
val secondt = endpr2 - startpr2
print(loadingt)
print(firstt)
print(secondt)
When I try to see memory usage on every node, Only 2-3 computing node RAM is actually being used. Is it correct? It runs faster with only 1 worker than with 9 workers.
I am using spark stand-alone cluster mode. Is there any issue with configuration?
Thanks in advance :)
I figured a problem with this after looking at spark code. It was an issue in my script where I am using graphx.
val graph = GraphLoader.edgeListFile(sc, "filepath").cache();
When I looked at the constructor of edgeListFile it says minPartition=1. I thought it's a minimum partition but it's the partition size you want. I set it to number of nodes i.e. partitions I want, and done. Another thing to take care is, as mentioned in graphx programming guide, if you haven't built spark 1.0 from main branch. You should use your own partitionBy function. If graph is not partitioned properly, it'll cause some issues.
It took me a while to know this, Hope this info saves someone's time :)