Spark on yarn runs indefinity - apache-spark
I had spark (2.2 on hadoop 2.7) jobs running and had to restart the sparkmaster machine. Now the spark jobs on yarn is getting submitted, Accepted and running but does not end.
Cluster ( 1 + 3 nodes). Resourcemanager & Namenode running on sparkmaster node. And Nodemanager and Datanode running on 3 worker nodes.
Executor Log:
/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/12/15 08:58:02 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: 130256#cassandralake1node3.localdomain
17/12/15 08:58:02 INFO util.SignalUtils: Registered signal handler for TERM
17/12/15 08:58:02 INFO util.SignalUtils: Registered signal handler for HUP
17/12/15 08:58:02 INFO util.SignalUtils: Registered signal handler for INT
17/12/15 08:58:03 WARN util.Utils: Your hostname, cassandralake1node3.localdomain resolves to a loopback address: 127.0.0.1; using 10.204.211.105 instead (on interface em1)
17/12/15 08:58:03 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/12/15 08:58:03 INFO spark.SecurityManager: Changing view acls to: root
17/12/15 08:58:03 INFO spark.SecurityManager: Changing modify acls to: root
17/12/15 08:58:03 INFO spark.SecurityManager: Changing view acls groups to:
17/12/15 08:58:03 INFO spark.SecurityManager: Changing modify acls groups to:
17/12/15 08:58:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
17/12/15 08:58:03 INFO client.TransportClientFactory: Successfully created connection to /10.204.211.105:40866 after 85 ms (0 ms spent in bootstraps)
17/12/15 08:58:04 INFO spark.SecurityManager: Changing view acls to: root
17/12/15 08:58:04 INFO spark.SecurityManager: Changing modify acls to: root
17/12/15 08:58:04 INFO spark.SecurityManager: Changing view acls groups to:
17/12/15 08:58:04 INFO spark.SecurityManager: Changing modify acls groups to:
17/12/15 08:58:04 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
17/12/15 08:58:04 INFO client.TransportClientFactory: Successfully created connection to /10.204.211.105:40866 after 1 ms (0 ms spent in bootstraps)
17/12/15 08:58:04 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1513329182871_0010/blockmgr-15ae52df-c267-427e-b8f1-ef1c84059740
17/12/15 08:58:04 INFO memory.MemoryStore: MemoryStore started with capacity 1311.0 MB
17/12/15 08:58:04 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler#10.204.211.105:40866
17/12/15 08:58:04 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
17/12/15 08:58:04 INFO executor.Executor: Starting executor ID 1 on host cassandranode3
17/12/15 08:58:04 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35983.
17/12/15 08:58:04 INFO netty.NettyBlockTransferService: Server created on cassandranode3:35983
17/12/15 08:58:04 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/12/15 08:58:04 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(1, cassandranode3, 35983, None)
17/12/15 08:58:04 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(1, cassandranode3, 35983, None)
17/12/15 08:58:04 INFO storage.BlockManager: external shuffle service port = 7337
17/12/15 08:58:04 INFO storage.BlockManager: Registering executor with local external shuffle service.
17/12/15 08:58:04 INFO client.TransportClientFactory: Successfully created connection to cassandranode3/10.204.211.105:7337 after 1 ms (0 ms spent in bootstraps)
17/12/15 08:58:04 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(1, cassandranode3, 35983, None)
Driver Log:
O util.Utils: Using initial executors = 2, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
17/12/15 09:50:06 INFO yarn.YarnAllocator: Will request 2 executor container(s), each with 1 core(s) and 3072 MB memory (including 1024 MB of overhead)
17/12/15 09:50:06 INFO yarn.YarnAllocator: Submitted 2 unlocalized container requests.
17/12/15 09:50:06 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
17/12/15 09:50:07 INFO impl.AMRMClientImpl: Received new token for : cassandranode2:38628
17/12/15 09:50:07 INFO impl.AMRMClientImpl: Received new token for : cassandranode3:39212
17/12/15 09:50:07 INFO yarn.YarnAllocator: Launching container container_1513329182871_0011_01_000002 on host cassandranode2 for executor with ID 1
17/12/15 09:50:07 INFO yarn.YarnAllocator: Launching container container_1513329182871_0011_01_000003 on host cassandranode3 for executor with ID 2
17/12/15 09:50:07 INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.
17/12/15 09:50:07 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
17/12/15 09:50:07 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
17/12/15 09:50:07 INFO impl.ContainerManagementProtocolProxy: Opening proxy : cassandranode3:39212
17/12/15 09:50:07 INFO impl.ContainerManagementProtocolProxy: Opening proxy : cassandranode2:38628
17/12/15 09:50:09 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.204.211.105:47622) with ID 2
17/12/15 09:50:09 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 1)
17/12/15 09:50:09 INFO storage.BlockManagerMasterEndpoint: Registering block manager cassandranode3:33779 with 1311.0 MB RAM, BlockManagerId(2, cassandranode3, 33779, None)
17/12/15 09:50:11 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.204.211.103:43578) with ID 1
17/12/15 09:50:11 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 2)
17/12/15 09:50:11 INFO storage.BlockManagerMasterEndpoint: Registering block manager cassandranode2:37931 with 1311.0 MB RAM, BlockManagerId(1, cassandranode2, 37931, None)
17/12/15 09:50:11 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
17/12/15 09:50:11 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
17/12/15 09:50:11 INFO internal.SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1513329182871_0011/container_1513329182871_0011_01_000001/spark-warehouse').
17/12/15 09:50:11 INFO internal.SharedState: Warehouse path is 'file:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1513329182871_0011/container_1513329182871_0011_01_000001/spark-warehouse'.
17/12/15 09:50:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#e087bd4{/SQL,null,AVAILABLE,#Spark}
17/12/15 09:50:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#c93af1f{/SQL/json,null,AVAILABLE,#Spark}
17/12/15 09:50:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#53fd3a5d{/SQL/execution,null,AVAILABLE,#Spark}
17/12/15 09:50:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#7dcd6778{/SQL/execution/json,null,AVAILABLE,#Spark}
17/12/15 09:50:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#3a25ecc9{/static/sql,null,AVAILABLE,#Spark}
17/12/15 09:50:12 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
17/12/15 09:51:09 INFO spark.ExecutorAllocationManager: Request to remove executorIds: 2
17/12/15 09:51:11 INFO spark.ExecutorAllocationManager: Request to remove executorIds: 1
spark-default.conf
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir file:///home/sparkeventlogs
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 5g
spark.driver.cores 1
spark.yarn.am.memory 2048m
spark.yarn.am.cores 1
spark.submit.deployMode cluster
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.driver.maxResultSize 20g
spark.jars.packages datastax:spark-cassandra-connector:2.0.5-s_2.11
spark.cassandra.connection.host 10.204.211.101,10.204.211.103,10.204.211.105
spark.executor.extraJavaOptions -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
spark.driver.extraJavaOptions -Dhdp.version=2.7.4
spark.cassandra.read.timeout_ms 180000
spark.yarn.stagingDir hdfs:///tmp
spark.network.timeout 2400
spark.yarn.driver.memoryOverhead 2048
spark.yarn.executor.memoryOverhead 1024
spark.network.timeout 2400
yarn.resourcemanager.app.timeout.minutes=-1
spark.yarn.submit.waitAppCompletion true
spark.sql.inMemoryColumnarStorage.compressed true
spark.sql.inMemoryColumnarStorage.batchSize 10000
Spark Submit command:
spark-submit --class com.swcassandrautil.popstatsclone.popihits --master yarn --deploy-mode cluster --executor-cores 1 --executor-memory 2g --conf spark.dynamicAllocation.initialExecutors=2 --conf spark.dynamicAllocation.maxExecutors=8 --conf spark.dynamicAllocation.minExecutors=2 --conf spark.memory.fraction=0.75 --conf spark.memory.storageFraction=0.75 /scala/statscloneihits/target/scala-2.11/popstatscloneihits_2.11-1.0.jar "/mnt/data/tmp/xyz*" "\t";
Request your input and Appreciate.
Thanks
Related
Spark Standalone Mode, application runs, but executor is killed with exitStatus 1
I am new to Apache Spark and was trying to run the example Pi Calculation application on my local spark setup (using Standalone Cluster). Both the Master, Slave and Driver are running on my local machine. What I am noticing is that, the PI is calculated successfully, however in the slave logs I see that the Worker/Executor is being killed with exitStatus 1. I do not see any errors/exceptions logged to the console otherwise. I tried finding help on similar issue, but most of the search hits were referring to exitStatus 137 etc. (e.g: Spark application kills executor) I have failed miserably to understand why the Worker is being killed instead of completing the execution with 'EXITED' state. I think it's related to how I am executing the app, but am not quite clear what am I doing wrong. Can someone guide me on identifying the root cause? Given below is the code I am using for PI calculation and the logs of the master, slave, driver respsectively. PI Calculation Application package sparky import org.apache.spark.scheduler._ import org.apache.spark.sql.SparkSession import scala.math.random object Application { def runSpark(args: Array[String] ): Unit = { val spark = SparkSession .builder .appName("Spark Pi") .getOrCreate() spark.sparkContext.addSparkListener(new MyListener()) val slices = if (args.length > 0) args(0).toInt else 2 val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow val count = spark.sparkContext.parallelize(1 until n, slices).map { i => val x = random * 2 - 1 val y = random * 2 - 1 if (x * x + y * y <= 1) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / (n - 1)) spark.stop() } def main(args: Array[String]) = { Application.runSpark(args) } } Master Console Output C:\Servers\apache-spark\2.2.0\bin λ start-master.cmd -h 0.0.0.0 C:\Platforms\Java\jdk1.8.0_65\bin\java -cp "C:\Servers\apache-spark\2.2.0\bin\..\conf\;C:\Servers\apache-spark\2.2.0\bin\..\jars\*" -Xmx1g org.apache.spark.deploy.master.Master 18/01/25 09:01:30,099 INFO Master: Started daemon with process name: 14900#somemachine 18/01/25 09:01:30,580 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/01/25 09:01:30,680 INFO SecurityManager: Changing view acls to: someuser 18/01/25 09:01:30,681 INFO SecurityManager: Changing modify acls to: someuser 18/01/25 09:01:30,682 INFO SecurityManager: Changing view acls groups to: 18/01/25 09:01:30,683 INFO SecurityManager: Changing modify acls groups to: 18/01/25 09:01:30,684 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(someuser); groups with view permissions: Set(); users with modify permissions: Set(someuser); groups with modify permissions: Set() 18/01/25 09:01:31,711 INFO Utils: Successfully started service 'sparkMaster' on port 7077. 18/01/25 09:01:31,829 INFO Master: Starting Spark master at spark://0.0.0.0:7077 18/01/25 09:01:31,833 INFO Master: Running Spark version 2.2.0 18/01/25 09:01:31,903 INFO log: Logging initialized #2692ms 18/01/25 09:01:31,960 INFO Server: jetty-9.3.z-SNAPSHOT 18/01/25 09:01:32,025 INFO Server: Started #2816ms 18/01/25 09:01:32,057 INFO AbstractConnector: Started ServerConnector#106ca013{HTTP/1.1,[http/1.1]}{0.0.0.0:8080} 18/01/25 09:01:32,058 INFO Utils: Successfully started service 'MasterUI' on port 8080. 18/01/25 09:01:32,087 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#41cc88b{/app,null,AVAILABLE,#Spark} 18/01/25 09:01:32,088 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#1c63bda6{/app/json,null,AVAILABLE,#Spark} 18/01/25 09:01:32,089 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#45ae273f{/,null,AVAILABLE,#Spark} 18/01/25 09:01:32,090 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#7a319c60{/json,null,AVAILABLE,#Spark} 18/01/25 09:01:32,098 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#23510beb{/static,null,AVAILABLE,#Spark} 18/01/25 09:01:32,099 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#462c632c{/app/kill,null,AVAILABLE,#Spark} 18/01/25 09:01:32,101 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#436ef27b{/driver/kill,null,AVAILABLE,#Spark} 18/01/25 09:01:32,104 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://192.168.56.1:8080 18/01/25 09:01:32,119 INFO Server: jetty-9.3.z-SNAPSHOT 18/01/25 09:01:32,130 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#6f7d1cba{/,null,AVAILABLE} 18/01/25 09:01:32,134 INFO AbstractConnector: Started ServerConnector#3f9e9637{HTTP/1.1,[http/1.1]}{0.0.0.0:6066} 18/01/25 09:01:32,134 INFO Server: Started #2925ms 18/01/25 09:01:32,134 INFO Utils: Successfully started service on port 6066. 18/01/25 09:01:32,135 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066 18/01/25 09:01:32,358 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#7b3e5adb{/metrics/master/json,null,AVAILABLE,#Spark} 18/01/25 09:01:32,362 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#139cbe00{/metrics/applications/json,null,AVAILABLE,#Spark} 18/01/25 09:01:32,399 INFO Master: I have been elected leader! New state: ALIVE 18/01/25 09:01:41,225 INFO Master: Registering worker 192.168.56.1:48591 with 4 cores, 14.4 GB RAM 18/01/25 09:01:53,510 INFO Master: Registering app Spark Pi 18/01/25 09:01:53,515 INFO Master: Registered app Spark Pi with ID app-20180125090153-0000 18/01/25 09:01:53,569 INFO Master: Launching executor app-20180125090153-0000/0 on worker worker-20180125090140-192.168.56.1-48591 18/01/25 09:02:00,262 INFO Master: Received unregister request from application app-20180125090153-0000 18/01/25 09:02:00,269 INFO Master: Removing app app-20180125090153-0000 18/01/25 09:02:00,323 WARN Master: Got status update for unknown executor app-20180125090153-0000/0 18/01/25 09:02:00,338 INFO Master: 127.0.0.1:48625 got disassociated, removing it. 18/01/25 09:02:00,345 INFO Master: 192.168.56.1:48620 got disassociated, removing it. Slave Console Output C:\Servers\apache-spark\2.2.0\bin λ start-slave.cmd -h 0.0.0.0 C:\Platforms\Java\jdk1.8.0_65\bin\java -cp "C:\Servers\apache-spark\2.2.0\bin\..\conf\;C:\Servers\apache-spark\2.2.0\bin\..\jars\*" -Xmx1g org.apache.spark.deploy.worker.Worker spark://0.0.0.0:7077 18/01/25 09:01:38,054 INFO Worker: Started daemon with process name: 14532#somemachine 18/01/25 09:01:38,546 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/01/25 09:01:38,644 INFO SecurityManager: Changing view acls to: someuser 18/01/25 09:01:38,645 INFO SecurityManager: Changing modify acls to: someuser 18/01/25 09:01:38,646 INFO SecurityManager: Changing view acls groups to: 18/01/25 09:01:38,647 INFO SecurityManager: Changing modify acls groups to: 18/01/25 09:01:38,648 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(someuser); groups with view permissions: Set(); users with modify permissions: Set(someuser); groups with modify permissions: Set() 18/01/25 09:01:39,655 INFO Utils: Successfully started service 'sparkWorker' on port 48591. 18/01/25 09:01:40,521 INFO Worker: Starting Spark worker 192.168.56.1:48591 with 4 cores, 14.4 GB RAM 18/01/25 09:01:40,526 INFO Worker: Running Spark version 2.2.0 18/01/25 09:01:40,527 INFO Worker: Spark home: C:\Servers\apache-spark\2.2.0\bin\.. 18/01/25 09:01:40,586 INFO log: Logging initialized #3430ms 18/01/25 09:01:40,636 INFO Server: jetty-9.3.z-SNAPSHOT 18/01/25 09:01:40,657 INFO Server: Started #3503ms 18/01/25 09:01:40,787 WARN Utils: Service 'WorkerUI' could not bind on port 8081. Attempting port 8082. 18/01/25 09:01:40,797 INFO AbstractConnector: Started ServerConnector#24c54ec4{HTTP/1.1,[http/1.1]}{0.0.0.0:8082} 18/01/25 09:01:40,797 INFO Utils: Successfully started service 'WorkerUI' on port 8082. 18/01/25 09:01:40,832 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#6e86345{/logPage,null,AVAILABLE,#Spark} 18/01/25 09:01:40,833 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#43dbfd42{/logPage/json,null,AVAILABLE,#Spark} 18/01/25 09:01:40,834 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#768b7729{/,null,AVAILABLE,#Spark} 18/01/25 09:01:40,836 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#382e7183{/json,null,AVAILABLE,#Spark} 18/01/25 09:01:40,844 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#459d7b70{/static,null,AVAILABLE,#Spark} 18/01/25 09:01:40,845 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#5bf4fc9c{/log,null,AVAILABLE,#Spark} 18/01/25 09:01:40,849 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://192.168.56.1:8082 18/01/25 09:01:40,853 INFO Worker: Connecting to master 0.0.0.0:7077... 18/01/25 09:01:40,885 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#4e93ba9d{/metrics/json,null,AVAILABLE,#Spark} 18/01/25 09:01:40,971 INFO TransportClientFactory: Successfully created connection to /0.0.0.0:7077 after 82 ms (0 ms spent in bootstraps) 18/01/25 09:01:41,246 INFO Worker: Successfully registered with master spark://0.0.0.0:7077 18/01/25 09:01:53,621 INFO Worker: Asked to launch executor app-20180125090153-0000/0 for Spark Pi 18/01/25 09:01:53,661 INFO SecurityManager: Changing view acls to: someuser 18/01/25 09:01:53,663 INFO SecurityManager: Changing modify acls to: someuser 18/01/25 09:01:53,664 INFO SecurityManager: Changing view acls groups to: 18/01/25 09:01:53,668 INFO SecurityManager: Changing modify acls groups to: 18/01/25 09:01:53,669 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(someuser); groups with view permissions: Set(); users with modify permissions: Set(someuser); groups with modify permissions: Set() 18/01/25 09:01:53,695 INFO ExecutorRunner: Launch command: "C:\Platforms\Java\jdk1.8.0_65\bin\java" "-cp" "C:\Servers\apache-spark\2.2.0\bin\..\conf\;C:\Servers\apache-spark\2.2.0\bin\..\jars\*" "-Xmx1024M" "-Dspark.driver.port=48620" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#192.168.56.1:48620" "--executor-id" "0" "--hostname" "192.168.56.1" "--cores" "4" "--app-id" "app-20180125090153-0000" "--worker-url" "spark://Worker#192.168.56.1:48591" 18/01/25 09:02:00,297 INFO Worker: Asked to kill executor app-20180125090153-0000/0 18/01/25 09:02:00,303 INFO ExecutorRunner: Runner thread for executor app-20180125090153-0000/0 interrupted 18/01/25 09:02:00,305 INFO ExecutorRunner: Killing process! 18/01/25 09:02:00,323 INFO Worker: Executor app-20180125090153-0000/0 finished with state KILLED exitStatus 1 18/01/25 09:02:00,336 INFO ExternalShuffleBlockResolver: Application app-20180125090153-0000 removed, cleanupLocalDirs = true 18/01/25 09:02:00,340 INFO Worker: Cleaning up local directories for application app-20180125090153-0000 Driver Console Output 9:01:47 AM: Executing task 'submitToSpark'... C:\Applications\scala\sparky\app\build\libs\sparky-app-0.0.1.jar :app:compileJava NO-SOURCE :app:compileScala UP-TO-DATE :app:processResources NO-SOURCE :app:classes UP-TO-DATE :app:jar UP-TO-DATE :runner:submitToSpark C:\Platforms\Java\jdk1.8.0_65\bin\java -cp "C:\Servers\apache-spark\2.2.0\bin\..\conf\;C:\Servers\apache-spark\2.2.0\bin\..\jars\*" -Xmx1g org.apache.spark.deploy.SparkSubmit --master spark://localhost:7077 C:\Applications\scala\sparky\app\build\libs\sparky-app-0.0.1.jar 18/01/25 09:01:51,111 INFO SparkContext: Running Spark version 2.2.0 18/01/25 09:01:51,465 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/01/25 09:01:51,677 INFO SparkContext: Submitted application: Spark Pi 18/01/25 09:01:51,711 INFO SecurityManager: Changing view acls to: someuser 18/01/25 09:01:51,712 INFO SecurityManager: Changing modify acls to: someuser 18/01/25 09:01:51,712 INFO SecurityManager: Changing view acls groups to: 18/01/25 09:01:51,713 INFO SecurityManager: Changing modify acls groups to: 18/01/25 09:01:51,714 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(someuser); groups with view permissions: Set(); users with modify permissions: Set(someuser); groups with modify permissions: Set() 18/01/25 09:01:52,639 INFO Utils: Successfully started service 'sparkDriver' on port 48620. 18/01/25 09:01:52,669 INFO SparkEnv: Registering MapOutputTracker 18/01/25 09:01:52,695 INFO SparkEnv: Registering BlockManagerMaster 18/01/25 09:01:52,699 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 18/01/25 09:01:52,700 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 18/01/25 09:01:52,712 INFO DiskBlockManager: Created local directory at C:\Users\someuser\AppData\Local\Temp\blockmgr-f9908c61-a91a-43d5-8d24-e0fd86d55d1c 18/01/25 09:01:52,740 INFO MemoryStore: MemoryStore started with capacity 366.3 MB 18/01/25 09:01:52,808 INFO SparkEnv: Registering OutputCommitCoordinator 18/01/25 09:01:52,924 INFO log: Logging initialized #3539ms 18/01/25 09:01:53,009 INFO Server: jetty-9.3.z-SNAPSHOT 18/01/25 09:01:53,038 INFO Server: Started #3654ms 18/01/25 09:01:53,067 INFO AbstractConnector: Started ServerConnector#21a5fd96{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 18/01/25 09:01:53,067 INFO Utils: Successfully started service 'SparkUI' on port 4040. 18/01/25 09:01:53,099 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#40bffbca{/jobs,null,AVAILABLE,#Spark} 18/01/25 09:01:53,100 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#6c4f9535{/jobs/json,null,AVAILABLE,#Spark} 18/01/25 09:01:53,100 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#30c31dd7{/jobs/job,null,AVAILABLE,#Spark} 18/01/25 09:01:53,101 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#c1fca1e{/jobs/job/json,null,AVAILABLE,#Spark} 18/01/25 09:01:53,102 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#344344fa{/stages,null,AVAILABLE,#Spark} 18/01/25 09:01:53,103 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#70e659aa{/stages/json,null,AVAILABLE,#Spark} 18/01/25 09:01:53,103 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#285f09de{/stages/stage,null,AVAILABLE,#Spark} 18/01/25 09:01:53,105 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#48e64352{/stages/stage/json,null,AVAILABLE,#Spark} 18/01/25 09:01:53,106 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#4362d7df{/stages/pool,null,AVAILABLE,#Spark} 18/01/25 09:01:53,106 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#1c25b8a7{/stages/pool/json,null,AVAILABLE,#Spark} 18/01/25 09:01:53,107 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#750fe12e{/storage,null,AVAILABLE,#Spark} 18/01/25 09:01:53,108 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#3e587920{/storage/json,null,AVAILABLE,#Spark} 18/01/25 09:01:53,108 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#24f43aa3{/storage/rdd,null,AVAILABLE,#Spark} 18/01/25 09:01:53,109 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#1e11bc55{/storage/rdd/json,null,AVAILABLE,#Spark} 18/01/25 09:01:53,110 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#70e0accd{/environment,null,AVAILABLE,#Spark} 18/01/25 09:01:53,112 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#6ab72419{/environment/json,null,AVAILABLE,#Spark} 18/01/25 09:01:53,112 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#4fdfa676{/executors,null,AVAILABLE,#Spark} 18/01/25 09:01:53,113 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#5be82d43{/executors/json,null,AVAILABLE,#Spark} 18/01/25 09:01:53,114 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#345e5a17{/executors/threadDump,null,AVAILABLE,#Spark} 18/01/25 09:01:53,115 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#443dbe42{/executors/threadDump/json,null,AVAILABLE,#Spark} 18/01/25 09:01:53,125 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#1734f68{/static,null,AVAILABLE,#Spark} 18/01/25 09:01:53,125 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#31c269fd{/,null,AVAILABLE,#Spark} 18/01/25 09:01:53,127 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#47747fb9{/api,null,AVAILABLE,#Spark} 18/01/25 09:01:53,128 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#70eecdc2{/jobs/job/kill,null,AVAILABLE,#Spark} 18/01/25 09:01:53,129 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#7db0565c{/stages/stage/kill,null,AVAILABLE,#Spark} 18/01/25 09:01:53,133 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.56.1:4040 18/01/25 09:01:53,174 INFO SparkContext: Added JAR file:/C:/Applications/scala/sparky/app/build/libs/sparky-app-0.0.1.jar at spark://192.168.56.1:48620/jars/sparky-app-0.0.1.jar with timestamp 1516888913174 18/01/25 09:01:53,318 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://localhost:7077... 18/01/25 09:01:53,389 INFO TransportClientFactory: Successfully created connection to localhost/127.0.0.1:7077 after 42 ms (0 ms spent in bootstraps) 18/01/25 09:01:53,554 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20180125090153-0000 18/01/25 09:01:53,577 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 48642. 18/01/25 09:01:53,578 INFO NettyBlockTransferService: Server created on 192.168.56.1:48642 18/01/25 09:01:53,582 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 18/01/25 09:01:53,590 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.56.1, 48642, None) 18/01/25 09:01:53,595 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.56.1:48642 with 366.3 MB RAM, BlockManagerId(driver, 192.168.56.1, 48642, None) 18/01/25 09:01:53,600 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.56.1, 48642, None) 18/01/25 09:01:53,601 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.56.1, 48642, None) 18/01/25 09:01:53,667 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20180125090153-0000/0 on worker-20180125090140-192.168.56.1-48591 (192.168.56.1:48591) with 4 cores 18/01/25 09:01:53,668 INFO StandaloneSchedulerBackend: Granted executor ID app-20180125090153-0000/0 on hostPort 192.168.56.1:48591 with 4 cores, 1024.0 MB RAM 18/01/25 09:01:53,901 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#74fef3f7{/metrics/json,null,AVAILABLE,#Spark} 18/01/25 09:01:55,026 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180125090153-0000/0 is now RUNNING 18/01/25 09:01:55,096 INFO EventLoggingListener: Logging events to file:///C:/Dustbin/spark-events/app-20180125090153-0000 18/01/25 09:01:55,127 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 18/01/25 09:01:55,218 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/C:/Applications/scala/sparky/runner/spark-warehouse/'). 18/01/25 09:01:55,219 INFO SharedState: Warehouse path is 'file:/C:/Applications/scala/sparky/runner/spark-warehouse/'. 18/01/25 09:01:55,228 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#50a691d3{/SQL,null,AVAILABLE,#Spark} 18/01/25 09:01:55,228 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#3b95d13c{/SQL/json,null,AVAILABLE,#Spark} 18/01/25 09:01:55,229 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#54d901aa{/SQL/execution,null,AVAILABLE,#Spark} 18/01/25 09:01:55,230 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#573284a5{/SQL/execution/json,null,AVAILABLE,#Spark} 18/01/25 09:01:55,233 INFO ContextHandler: Started o.s.j.s.ServletContextHandler#507b79f7{/static/sql,null,AVAILABLE,#Spark} 18/01/25 09:01:56,232 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 18/01/25 09:01:56,609 INFO SparkContext: Starting job: reduce at Application.scala:29 18/01/25 09:01:56,636 INFO DAGScheduler: Got job 0 (reduce at Application.scala:29) with 2 output partitions 18/01/25 09:01:56,637 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at Application.scala:29) 18/01/25 09:01:56,638 INFO DAGScheduler: Parents of final stage: List() 18/01/25 09:01:56,640 INFO DAGScheduler: Missing parents: List() 18/01/25 09:01:56,654 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at Application.scala:25), which has no missing parents 18/01/25 09:01:56,815 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1800.0 B, free 366.3 MB) 18/01/25 09:01:56,980 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1168.0 B, free 366.3 MB) 18/01/25 09:01:56,984 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.56.1:48642 (size: 1168.0 B, free: 366.3 MB) 18/01/25 09:01:56,988 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 18/01/25 09:01:57,016 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at Application.scala:25) (first 15 tasks are for partitions Vector(0, 1)) 18/01/25 09:01:57,018 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 18/01/25 09:01:58,617 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.56.1:48660) with ID 0 18/01/25 09:01:58,661 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.56.1, executor 0, partition 0, PROCESS_LOCAL, 4829 bytes) 18/01/25 09:01:58,665 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.56.1, executor 0, partition 1, PROCESS_LOCAL, 4829 bytes) 18/01/25 09:01:59,242 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.56.1:48678 with 366.3 MB RAM, BlockManagerId(0, 192.168.56.1, 48678, None) 18/01/25 09:01:59,819 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.56.1:48678 (size: 1168.0 B, free: 366.3 MB) 18/01/25 09:02:00,139 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1500 ms on 192.168.56.1 (executor 0) (1/2) 18/01/25 09:02:00,142 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 1478 ms on 192.168.56.1 (executor 0) (2/2) 18/01/25 09:02:00,143 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 18/01/25 09:02:00,150 INFO DAGScheduler: ResultStage 0 (reduce at Application.scala:29) finished in 3.109 s 18/01/25 09:02:00,156 INFO DAGScheduler: Job 0 finished: reduce at Application.scala:29, took 3.546255 s Pi is roughly 3.1363756818784094 18/01/25 09:02:00,168 INFO AbstractConnector: Stopped Spark#21a5fd96{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 18/01/25 09:02:00,170 INFO SparkUI: Stopped Spark web UI at http://192.168.56.1:4040 18/01/25 09:02:00,247 INFO StandaloneSchedulerBackend: Shutting down all executors 18/01/25 09:02:00,249 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down 18/01/25 09:02:00,269 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 18/01/25 09:02:00,300 INFO MemoryStore: MemoryStore cleared 18/01/25 09:02:00,301 INFO BlockManager: BlockManager stopped 18/01/25 09:02:00,321 INFO BlockManagerMaster: BlockManagerMaster stopped 18/01/25 09:02:00,328 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 18/01/25 09:02:00,353 INFO SparkContext: Successfully stopped SparkContext 2018-01-25 09:02:00.353 18/01/25 09:02:00,358 INFO ShutdownHookManager: Shutdown hook called 18/01/25 09:02:00,360 INFO ShutdownHookManager: Deleting directory C:\Users\someuser\AppData\Local\Temp\spark-ac6369a0-abb8-476e-a527-91e0a8011302 BUILD SUCCESSFUL in 13s 3 actionable tasks: 1 executed, 2 up-to-date 9:02:01 AM: Task execution finished 'submitToSpark'.
Spark/Python - Slave in Cluster is not used
I'm new to Spark. I have the master(192.168.33.10), and slave(192.168.33.12) cluster setup locally, and I'm wrote to following script to demo that both master and slave are running the get_ip_wrap() on its own machine. However, when I run with the command ./bin/spark-submit ip.py, I only see the 192.168.33.10 in the output, I was expecting 192.168.33.12 in the output as well. I have also included the trace for my master and work output file as well. import socket import fcntl import struct from pyspark import SparkContext, SparkConf from pyspark.sql import SparkSession def get_ip_address(ifname): s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) return socket.inet_ntoa(fcntl.ioctl( s.fileno(), 0x8915, # SIOCGIFADDR struct.pack('256s', ifname[:15]) )[20:24]) def get_ip_wrap(num): return get_ip_address('eth1') #spark = SparkSession\ # .builder\ # .appName("PythonALS")\ # .getOrCreate() #sc = spark.sparkContext conf = SparkConf().setAppName('appName').setMaster('spark://vagrant-ubuntu-trusty-64:7077') sc = SparkContext(conf=conf) data = [x for x in range(0, 50)] distData = sc.parallelize(data) result = distData.map(get_ip_wrap) print result.collect() vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ ./sbin/start-master.sh starting org.apache.spark.deploy.master.Master, logging to /home/vagrant/spark-2.1.1-bin-hadoop2.7/logs/spark-vagrant-org.apache.spark.deploy.master.Master-1-vagrant-ubuntu-trusty-64.out vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ ./sbin/start-slave.sh spark://vagrant-ubuntu-trusty-64:7077 starting org.apache.spark.deploy.worker.Worker, logging to /home/vagrant/spark-2.1.1-bin-hadoop2.7/logs/spark-vagrant-org.apache.spark.deploy.worker.Worker-1-vagrant-ubuntu-trusty-64.out vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ ./bin/spark-submit ip.py Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/05/27 17:08:09 INFO SparkContext: Running Spark version 2.1.1 17/05/27 17:08:09 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0 17/05/27 17:08:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/05/27 17:08:10 INFO SecurityManager: Changing view acls to: vagrant 17/05/27 17:08:10 INFO SecurityManager: Changing modify acls to: vagrant 17/05/27 17:08:10 INFO SecurityManager: Changing view acls groups to: 17/05/27 17:08:10 INFO SecurityManager: Changing modify acls groups to: 17/05/27 17:08:10 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vagrant); groups with view permissions: Set(); users with modify permissions: Set(vagrant); groups with modify permissions: Set() 17/05/27 17:08:10 INFO Utils: Successfully started service 'sparkDriver' on port 59290. 17/05/27 17:08:10 INFO SparkEnv: Registering MapOutputTracker 17/05/27 17:08:10 INFO SparkEnv: Registering BlockManagerMaster 17/05/27 17:08:10 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 17/05/27 17:08:10 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 17/05/27 17:08:10 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ad008702-6e92-4e60-ab27-a582b1ba9fb9 17/05/27 17:08:10 INFO MemoryStore: MemoryStore started with capacity 413.9 MB 17/05/27 17:08:11 INFO SparkEnv: Registering OutputCommitCoordinator 17/05/27 17:08:11 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 17/05/27 17:08:11 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 17/05/27 17:08:11 INFO Utils: Successfully started service 'SparkUI' on port 4042. 17/05/27 17:08:11 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4042 17/05/27 17:08:11 INFO SparkContext: Added file file:/home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py at spark://10.0.2.15:59290/files/ip.py with timestamp 1495904891756 17/05/27 17:08:11 INFO Utils: Copying /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py to /tmp/spark-5400808c-1304-404d-ae53-dc6cdb14694f/userFiles-dc94d72e-15d3-4d84-87b9-27e87dcb0f6a/ip.py 17/05/27 17:08:11 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://vagrant-ubuntu-trusty-64:7077... 17/05/27 17:08:11 INFO TransportClientFactory: Successfully created connection to vagrant-ubuntu-trusty-64/10.0.2.15:7077 after 20 ms (0 ms spent in bootstraps) 17/05/27 17:08:12 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20170527170812-0000 17/05/27 17:08:12 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 53124. 17/05/27 17:08:12 INFO NettyBlockTransferService: Server created on 10.0.2.15:53124 17/05/27 17:08:12 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 17/05/27 17:08:12 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 53124, None) 17/05/27 17:08:12 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20170527170812-0000/0 on worker-20170527170800-10.0.2.15-54829 (10.0.2.15:54829) with 1 cores 17/05/27 17:08:12 INFO StandaloneSchedulerBackend: Granted executor ID app-20170527170812-0000/0 on hostPort 10.0.2.15:54829 with 1 cores, 1024.0 MB RAM 17/05/27 17:08:12 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:53124 with 413.9 MB RAM, BlockManagerId(driver, 10.0.2.15, 53124, None) 17/05/27 17:08:12 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 53124, None) 17/05/27 17:08:12 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 53124, None) 17/05/27 17:08:12 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20170527170812-0000/0 is now RUNNING 17/05/27 17:08:12 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 17/05/27 17:08:13 INFO SparkContext: Starting job: collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31 17/05/27 17:08:13 INFO DAGScheduler: Got job 0 (collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31) with 2 output partitions 17/05/27 17:08:13 INFO DAGScheduler: Final stage: ResultStage 0 (collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31) 17/05/27 17:08:13 INFO DAGScheduler: Parents of final stage: List() 17/05/27 17:08:13 INFO DAGScheduler: Missing parents: List() 17/05/27 17:08:13 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31), which has no missing parents 17/05/27 17:08:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.1 KB, free 413.9 MB) 17/05/27 17:08:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.8 KB, free 413.9 MB) 17/05/27 17:08:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:53124 (size: 2.8 KB, free: 413.9 MB) 17/05/27 17:08:13 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996 17/05/27 17:08:13 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[1] at collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31) 17/05/27 17:08:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 17/05/27 17:08:15 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.0.2.15:40762) with ID 0 17/05/27 17:08:15 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.0.2.15, executor 0, partition 0, PROCESS_LOCAL, 6136 bytes) 17/05/27 17:08:15 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:33949 with 413.9 MB RAM, BlockManagerId(0, 10.0.2.15, 33949, None) 17/05/27 17:08:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:33949 (size: 2.8 KB, free: 413.9 MB) 17/05/27 17:08:16 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.0.2.15, executor 0, partition 1, PROCESS_LOCAL, 6136 bytes) 17/05/27 17:08:16 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1050 ms on 10.0.2.15 (executor 0) (1/2) 17/05/27 17:08:16 INFO DAGScheduler: ResultStage 0 (collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31) finished in 2.504 s 17/05/27 17:08:16 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 119 ms on 10.0.2.15 (executor 0) (2/2) 17/05/27 17:08:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 17/05/27 17:08:16 INFO DAGScheduler: Job 0 finished: collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31, took 2.981746 s ['192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10', '192.168.33.10'] 17/05/27 17:08:16 INFO SparkContext: Invoking stop() from shutdown hook 17/05/27 17:08:16 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4042 17/05/27 17:08:16 INFO StandaloneSchedulerBackend: Shutting down all executors 17/05/27 17:08:16 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down 17/05/27 17:08:16 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/05/27 17:08:16 INFO MemoryStore: MemoryStore cleared 17/05/27 17:08:16 INFO BlockManager: BlockManager stopped 17/05/27 17:08:16 INFO BlockManagerMaster: BlockManagerMaster stopped 17/05/27 17:08:16 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/05/27 17:08:16 INFO SparkContext: Successfully stopped SparkContext 17/05/27 17:08:16 INFO ShutdownHookManager: Shutdown hook called 17/05/27 17:08:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-5400808c-1304-404d-ae53-dc6cdb14694f/pyspark-021d6ed2-91d0-481b-b528-108581abe66c 17/05/27 17:08:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-5400808c-1304-404d-ae53-dc6cdb14694f vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$ cat /home/vagrant/spark-2.1.1-bin-hadoop2.7/logs/spark-vagrant-org.apache.spark.deploy.master.Master-1-vagrant-ubuntu-trusty-64.out Spark Command: /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java -cp /home/vagrant/spark-2.1.1-bin-hadoop2.7/conf/:/home/vagrant/spark-2.1.1-bin-hadoop2.7/jars/* -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --host vagrant-ubuntu-trusty-64 --port 7077 --webui-port 8080 ======================================== Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/05/27 17:07:44 INFO Master: Started daemon with process name: 9384#vagrant-ubuntu-trusty-64 17/05/27 17:07:44 INFO SignalUtils: Registered signal handler for TERM 17/05/27 17:07:44 INFO SignalUtils: Registered signal handler for HUP 17/05/27 17:07:44 INFO SignalUtils: Registered signal handler for INT 17/05/27 17:07:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/05/27 17:07:45 INFO SecurityManager: Changing view acls to: vagrant 17/05/27 17:07:45 INFO SecurityManager: Changing modify acls to: vagrant 17/05/27 17:07:45 INFO SecurityManager: Changing view acls groups to: 17/05/27 17:07:45 INFO SecurityManager: Changing modify acls groups to: 17/05/27 17:07:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vagrant); groups with view permissions: Set(); users with modify permissions: Set(vagrant); groups with modify permissions: Set() 17/05/27 17:07:45 INFO Utils: Successfully started service 'sparkMaster' on port 7077. 17/05/27 17:07:45 INFO Master: Starting Spark master at spark://vagrant-ubuntu-trusty-64:7077 17/05/27 17:07:45 INFO Master: Running Spark version 2.1.1 17/05/27 17:07:45 INFO Utils: Successfully started service 'MasterUI' on port 8080. 17/05/27 17:07:45 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://10.0.2.15:8080 17/05/27 17:07:45 INFO Utils: Successfully started service on port 6066. 17/05/27 17:07:45 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066 17/05/27 17:07:46 INFO Master: I have been elected leader! New state: ALIVE 17/05/27 17:08:00 INFO Master: Registering worker 10.0.2.15:54829 with 1 cores, 2.8 GB RAM 17/05/27 17:08:12 INFO Master: Registering app appName 17/05/27 17:08:12 INFO Master: Registered app appName with ID app-20170527170812-0000 17/05/27 17:08:12 INFO Master: Launching executor app-20170527170812-0000/0 on worker worker-20170527170800-10.0.2.15-54829 17/05/27 17:08:16 INFO Master: Received unregister request from application app-20170527170812-0000 17/05/27 17:08:16 INFO Master: Removing app app-20170527170812-0000 17/05/27 17:08:16 INFO Master: 10.0.2.15:51703 got disassociated, removing it. 17/05/27 17:08:16 INFO Master: 10.0.2.15:59290 got disassociated, removing it. 17/05/27 17:08:16 WARN Master: Got status update for unknown executor app-20170527170812-0000/0 vagrant#vagrant-ubuntu-trusty-64:~/spark-2.1.1-bin-hadoop2.7$
Spark not doing any work on slave: Initial job has not accepted any resources
I am trying to do a very simple setup with Spark using SSH tunneling and I can't make it work. I have master running on my PC, with this setup ./sbin/start-master.sh -h localhost -p 7077 (if not stated otherwise, everything else is default). On my slave PC (IP is 192.168.0.222), which is in other domain and I don't have a root access to it, I made ssh -N -L localhost:7078:localhost:7077 myMasterPCSSHalias and run slave with ./sbin/start-slave.sh spark://localhost:7078. I can now see this slave on the dashboard at http://localhost:8080/ in my browser. I see that it has 14GB of free memory. When I then try e.g. this example: ./bin/spark-submit --master spark://localhost:7077 examples/src/main/python/pi.py 10 it hangs on this message until I kill it (you can see the full log message below): WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources I am sure I am not using more resources than I have available, the problem still persists even though I use --executor-memory 512m and running executor is just signalling RUNNING state. The only thing in error log is this: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/05/09 22:45:44 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 16/05/09 22:45:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/05/09 22:45:45 INFO SecurityManager: Changing view acls to: hnykdan1,dan 16/05/09 22:45:45 INFO SecurityManager: Changing modify acls to: hnykdan1,dan 16/05/09 22:45:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hnykdan1, dan); users with modify permissions: Set(hnykdan1, dan) and in slave log is this: 16/05/09 22:48:56 INFO Worker: Asked to launch executor app-20160509224034-0013/0 for PythonPi 16/05/09 22:48:56 INFO SecurityManager: Changing view acls to: hnykdan1 16/05/09 22:48:56 INFO SecurityManager: Changing modify acls to: hnykdan1 16/05/09 22:48:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hnykdan1); users with modify permissions: Set(hnykdan1) 16/05/09 22:48:56 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java" "-cp" "/home/hnykdan1/spark/conf/:/home/hnykdan1/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar:/home/hnykdan1/spark/lib/datanucleus-core-3.2.10.jar:/home/hnykdan1/spark/lib/datanucleus-api-jdo-3.2.6.jar:/home/hnykdan1/spark/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=37450" "-XX:MaxPermSize=256m" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#192.168.0.222:37450" "--executor-id" "0" "--hostname" "147.32.8.103" "--cores" "8" "--app-id" "app-20160509224034-0013" "--worker-url" "spark://Worker#147.32.8.103:54894" Everything looks quite normal and I don't know where might be a problem. Do I need to tunnel even the other way around? It runs fine when I run slave locally in the exactly same fashion. Thanks Full Log from console Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/05/09 22:28:21 INFO SparkContext: Running Spark version 1.6.1 16/05/09 22:28:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/05/09 22:28:22 INFO SecurityManager: Changing view acls to: dan 16/05/09 22:28:22 INFO SecurityManager: Changing modify acls to: dan 16/05/09 22:28:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(dan); users with modify permissions: Set(dan) 16/05/09 22:28:22 INFO Utils: Successfully started service 'sparkDriver' on port 34508. 16/05/09 22:28:23 INFO Slf4jLogger: Slf4jLogger started 16/05/09 22:28:23 INFO Remoting: Starting remoting 16/05/09 22:28:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#192.168.0.222:44359] 16/05/09 22:28:23 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 44359. 16/05/09 22:28:23 INFO SparkEnv: Registering MapOutputTracker 16/05/09 22:28:23 INFO SparkEnv: Registering BlockManagerMaster 16/05/09 22:28:23 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-db4c3293-423f-4966-a479-b69a90439da9 16/05/09 22:28:23 INFO MemoryStore: MemoryStore started with capacity 511.1 MB 16/05/09 22:28:23 INFO SparkEnv: Registering OutputCommitCoordinator 16/05/09 22:28:24 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/05/09 22:28:24 INFO SparkUI: Started SparkUI at http://192.168.0.222:4040 16/05/09 22:28:24 INFO HttpFileServer: HTTP File server directory is /tmp/spark-d532a9c1-0455-4937-ad27-b47abb2a65e8/httpd-aa031b8c-f605-41c3-aabe-fc4fe01bdcf8 16/05/09 22:28:24 INFO HttpServer: Starting HTTP Server 16/05/09 22:28:24 INFO Utils: Successfully started service 'HTTP file server' on port 41770. 16/05/09 22:28:24 INFO Utils: Copying /home/hnykdan1/spark/examples/src/main/python/pi.py to /tmp/spark-d532a9c1-0455-4937-ad27-b47abb2a65e8/userFiles-14720bed-cd41-4b15-9bd3-38dbf4f268ff/pi.py 16/05/09 22:28:24 INFO SparkContext: Added file file:/home/hnykdan1/spark/examples/src/main/python/pi.py at http://192.168.0.222:41770/files/pi.py with timestamp 1462825704629 16/05/09 22:28:24 INFO AppClient$ClientEndpoint: Connecting to master spark://localhost:7077... 16/05/09 22:28:24 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160509222824-0011 16/05/09 22:28:24 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44617. 16/05/09 22:28:24 INFO NettyBlockTransferService: Server created on 44617 16/05/09 22:28:24 INFO AppClient$ClientEndpoint: Executor added: app-20160509222824-0011/0 on worker-20160509214654-147.32.8.103-54894 (147.32.8.103:54894) with 8 cores 16/05/09 22:28:24 INFO BlockManagerMaster: Trying to register BlockManager 16/05/09 22:28:24 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160509222824-0011/0 on hostPort 147.32.8.103:54894 with 8 cores, 1024.0 MB RAM 16/05/09 22:28:24 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.222:44617 with 511.1 MB RAM, BlockManagerId(driver, 192.168.0.222, 44617) 16/05/09 22:28:24 INFO BlockManagerMaster: Registered BlockManager 16/05/09 22:28:25 INFO AppClient$ClientEndpoint: Executor updated: app-20160509222824-0011/0 is now RUNNING 16/05/09 22:28:25 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 16/05/09 22:28:25 INFO SparkContext: Starting job: reduce at /home/hnykdan1/spark/examples/src/main/python/pi.py:39 16/05/09 22:28:25 INFO DAGScheduler: Got job 0 (reduce at /home/hnykdan1/spark/examples/src/main/python/pi.py:39) with 10 output partitions 16/05/09 22:28:25 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at /home/hnykdan1/spark/examples/src/main/python/pi.py:39) 16/05/09 22:28:25 INFO DAGScheduler: Parents of final stage: List() 16/05/09 22:28:25 INFO DAGScheduler: Missing parents: List() 16/05/09 22:28:25 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at reduce at /home/hnykdan1/spark/examples/src/main/python/pi.py:39), which has no missing parents 16/05/09 22:28:26 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.0 KB, free 4.0 KB) 16/05/09 22:28:26 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.7 KB, free 6.7 KB) 16/05/09 22:28:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.222:44617 (size: 2.7 KB, free: 511.1 MB) 16/05/09 22:28:26 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 16/05/09 22:28:26 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (PythonRDD[1] at reduce at /home/hnykdan1/spark/examples/src/main/python/pi.py:39) 16/05/09 22:28:26 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks 16/05/09 22:28:41 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/05/09 22:28:56 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/05/09 22:29:11 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/05/09 22:29:26 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/05/09 22:29:41 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/05/09 22:29:56 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/05/09 22:30:11 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/05/09 22:30:26 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Since you checked that you have the resources, the next most likely problem is that the executor cannot connect back to the driver. When submitting a job, the driver starts a server that the executor will connect to in order to download the jar(s). Yes, the error message (Initial job has not accepted any resources...) does not look related to network problem. This is a known issue discussed for example here: https://github.com/databricks/spark-knowledgebase/issues/9
It's probably related to the network (security groups rules). It's a silly test, but I just made it work by opening master and workers to all TCP traffic (inbound/outbound).
Kafka message consumption with spark
I am using HDP-2.3 sandbox for Consuming kafka messages by running SPARK submit job. i am putting some messages in kafka as below: kafka-console-producer.sh --broker-list sandbox.hortonworks.com:6667 --topic webevent OR kafka-console-producer.sh --broker-list sandbox.hortonworks.com:6667 --topic test --new-producer < myfile.txt Now i need to consume above messages from spark job as shown below: ./bin/spark-submit --master spark://192.168.255.150:7077 --executor-memory 512m --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar 192.168.255.150:2181 webevent 10 Where 2181 is a zookeeper port I am getting Error as shown(Guide me how to consume that message from Kafka): 16/05/02 15:21:30 INFO SparkContext: Running Spark version 1.3.1 16/05/02 15:21:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/05/02 15:21:31 INFO SecurityManager: Changing view acls to: root 16/05/02 15:21:31 INFO SecurityManager: Changing modify acls to: root 16/05/02 15:21:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 16/05/02 15:21:31 INFO Slf4jLogger: Slf4jLogger started 16/05/02 15:21:31 INFO Remoting: Starting remoting 16/05/02 15:21:32 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#sandbox.hortonworks.com:53950] 16/05/02 15:21:32 INFO Utils: Successfully started service 'sparkDriver' on port 53950. 16/05/02 15:21:32 INFO SparkEnv: Registering MapOutputTracker 16/05/02 15:21:32 INFO SparkEnv: Registering BlockManagerMaster 16/05/02 15:21:32 INFO DiskBlockManager: Created local directory at /tmp/spark-c70b08b9-41a3-42c8-9d83-bc4258e299c6/blockmgr-c2d86de6-34a7-497c-8018-d3437a100e87 16/05/02 15:21:32 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 16/05/02 15:21:32 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a8f7ade9-292c-42c4-9e54-43b3b3495b0c/httpd-65d36d04-1e2a-4e69-8d20-295465100070 16/05/02 15:21:32 INFO HttpServer: Starting HTTP Server 16/05/02 15:21:32 INFO Server: jetty-8.y.z-SNAPSHOT 16/05/02 15:21:32 INFO AbstractConnector: Started SocketConnector#0.0.0.0:37014 16/05/02 15:21:32 INFO Utils: Successfully started service 'HTTP file server' on port 37014. 16/05/02 15:21:32 INFO SparkEnv: Registering OutputCommitCoordinator 16/05/02 15:21:32 INFO Server: jetty-8.y.z-SNAPSHOT 16/05/02 15:21:32 INFO AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040 16/05/02 15:21:32 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/05/02 15:21:32 INFO SparkUI: Started SparkUI at http://sandbox.hortonworks.com:4040 16/05/02 15:21:33 INFO SparkContext: Added JAR file:/usr/hdp/2.3.0.0-2130/spark/lib/spark-examples-1.4.1-hadoop2.4.0.jar at http://192.168.255.150:37014/jars/spark-examples-1.4.1-hadoop2.4.0.jar with timestamp 1462202493866 16/05/02 15:21:34 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#192.168.255.150:7077/user/Master... 16/05/02 15:21:34 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160502152134-0000 16/05/02 15:21:34 INFO AppClient$ClientActor: Executor added: app-20160502152134-0000/0 on worker-20160502150437-sandbox.hortonworks.com-36920 (sandbox.hortonworks.com:36920) with 1 cores 16/05/02 15:21:34 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160502152134-0000/0 on hostPort sandbox.hortonworks.com:36920 with 1 cores, 512.0 MB RAM 16/05/02 15:21:34 INFO AppClient$ClientActor: Executor updated: app-20160502152134-0000/0 is now RUNNING 16/05/02 15:21:34 INFO AppClient$ClientActor: Executor updated: app-20160502152134-0000/0 is now LOADING 16/05/02 15:21:34 INFO NettyBlockTransferService: Server created on 43440 16/05/02 15:21:34 INFO BlockManagerMaster: Trying to register BlockManager 16/05/02 15:21:34 INFO BlockManagerMasterActor: Registering block manager sandbox.hortonworks.com:43440 with 265.4 MB RAM, BlockManagerId(<driver>, sandbox.hortonworks.com, 43440) 16/05/02 15:21:34 INFO BlockManagerMaster: Registered BlockManager 16/05/02 15:21:35 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 16/05/02 15:21:35 INFO VerifiableProperties: Verifying properties 16/05/02 15:21:35 INFO VerifiableProperties: Property group.id is overridden to 16/05/02 15:21:35 INFO VerifiableProperties: Property zookeeper.connect is overridden to 16/05/02 15:21:35 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. Error: application failed with exception org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416) at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416) at scala.util.Either.fold(Either.scala:97) at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:415) at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:532) at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala) at org.apache.spark.examples.streaming.JavaDirectKafkaWordCount.main(JavaDirectKafkaWordCount.java:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) OR wen i use this: ./bin/spark-submit --master spark://192.168.255.150:7077 --executor-memory 512m --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar 192.168.255.150:6667 webevent 10 where 6667 is a Kafka’s message producing port, i am getting this error: 16/05/02 15:27:26 INFO SimpleConsumer: Reconnect due to socket error: java.nio.channels.ClosedChannelException Error: application failed with exception org.apache.spark.SparkException: java.nio.channels.ClosedChannelException at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416) at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416) i dont know if this can help: ./bin/spark-submit --class consumer.kafka.client.Consumer --master spark://192.168.255.150:7077 --executor-memory 1G lib/kafka-spark-consumer-1.0.6.jar 10
Scala Spark App submitted to yarn-cluster and unregistered with SUCCEEDED without doing anything
Goal Run our scala spark app jar on yarn-cluster mode. It works with standalone cluster mode and with yarn-client, but for some reason it does not run to completion for yarn-cluster mode. Details The last portion of the code it seems to execute is on assigning the initial value to the Dataframe when reading the input file. It looks like it does not do anything after that. None of the logs look abnormal and there are no Warns or errors either. It suddenly gets unregistered with status succeeded and everything gets killed. On any other deployment mode (eg. yarn-client, standalone cluster mode) everything runs smoothly to completion. 15/07/22 15:57:00 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED I have also ran this job on spark 1.3.x and 1.4.x on a vanilla spark/YARN cluster and a cdh 5.4.3 cluster as well. All with the same results. What could possibly be the issue? Job was run with the command below and the input file is accessible through hdfs. bin/spark-submit --master yarn-cluster --class AssocApp ../associationRulesScala/target/scala-2.10/AssociationRule_2.10.4-1.0.0.SNAPSHOT.jar hdfs://sparkMaster-hk:9000/user/root/BreastCancer.csv Code snippets this is the code in the area were the dataframe is loaded. It spits out the log message "Uploading Dataframe..." but there is nothing else after that. Refer to the driver's logs below //... logger.info("Uploading Dataframe from %s".format(filename)) sparkParams.sqlContext.csvFile(filename) MDC.put("jobID",jobID.takeRight(3)) logger.info("Extracting Unique Vals from each of %d columns...".format(frame.columns.length)) private val uniqueVals = frame.columns.zipWithIndex.map(colname => (colname._2, colname._1, frame.select(colname._1).distinct.cache)). //... Driver logs SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/tmp/hadoop-root/nm-local-dir/usercache/root/filecache/60/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/root/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/07/22 15:56:52 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 15/07/22 15:56:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1434116948302_0097_000001 15/07/22 15:56:55 INFO spark.SecurityManager: Changing view acls to: root 15/07/22 15:56:55 INFO spark.SecurityManager: Changing modify acls to: root 15/07/22 15:56:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/07/22 15:56:55 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread 15/07/22 15:56:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization 15/07/22 15:56:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 15/07/22 15:56:56 INFO AssocApp$: Starting new Association Rules calculation. From File: hdfs://sparkMaster-hk:9000/user/root/BreastCancer.csv 15/07/22 15:56:56 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0 15/07/22 15:56:57 INFO associationRules.primaryPackageSpark: Uploading Dataframe from hdfs://sparkMaster-hk:9000/user/root/BreastCancer.csv 15/07/22 15:56:57 INFO spark.SparkContext: Running Spark version 1.4.0 15/07/22 15:56:57 INFO spark.SecurityManager: Changing view acls to: root 15/07/22 15:56:57 INFO spark.SecurityManager: Changing modify acls to: root 15/07/22 15:56:57 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/07/22 15:56:57 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/07/22 15:56:57 INFO Remoting: Starting remoting 15/07/22 15:56:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#119.81.232.13:41459] 15/07/22 15:56:57 INFO util.Utils: Successfully started service 'sparkDriver' on port 41459. 15/07/22 15:56:57 INFO spark.SparkEnv: Registering MapOutputTracker 15/07/22 15:56:57 INFO spark.SparkEnv: Registering BlockManagerMaster 15/07/22 15:56:57 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/blockmgr-f0e66040-1fdb-4a05-87e1-160194829f84 15/07/22 15:56:57 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB 15/07/22 15:56:58 INFO spark.HttpFileServer: HTTP File server directory is /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/httpd-79b304a1-3cf4-4951-9e22-bbdfac435824 15/07/22 15:56:58 INFO spark.HttpServer: Starting HTTP Server 15/07/22 15:56:58 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/22 15:56:58 INFO server.AbstractConnector: Started SocketConnector#0.0.0.0:36021 15/07/22 15:56:58 INFO util.Utils: Successfully started service 'HTTP file server' on port 36021. 15/07/22 15:56:58 INFO spark.SparkEnv: Registering OutputCommitCoordinator 15/07/22 15:56:58 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 15/07/22 15:56:58 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/22 15:56:58 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:53274 15/07/22 15:56:58 INFO util.Utils: Successfully started service 'SparkUI' on port 53274. 15/07/22 15:56:58 INFO ui.SparkUI: Started SparkUI at http://119.XX.XXX.XX:53274 15/07/22 15:56:58 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/22 15:56:59 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34498. 15/07/22 15:56:59 INFO netty.NettyBlockTransferService: Server created on 34498 15/07/22 15:56:59 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/22 15:56:59 INFO storage.BlockManagerMasterEndpoint: Registering block manager 119.81.232.13:34498 with 267.3 MB RAM, BlockManagerId(driver, 119.81.232.13, 34498) 15/07/22 15:56:59 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/22 15:56:59 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#-819146876]) 15/07/22 15:56:59 INFO client.RMProxy: Connecting to ResourceManager at sparkMaster-hk/119.81.232.24:8030 15/07/22 15:56:59 INFO yarn.YarnRMClient: Registering the ApplicationMaster 15/07/22 15:57:00 INFO yarn.YarnAllocator: Will request 2 executor containers, each with 1 cores and 1408 MB memory including 384 MB overhead 15/07/22 15:57:00 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>) 15/07/22 15:57:00 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>) 15/07/22 15:57:00 INFO yarn.ApplicationMaster: Started progress reporter thread - sleep time : 5000 15/07/22 15:57:00 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED 15/07/22 15:57:00 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. 15/07/22 15:57:00 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1434116948302_0097 15/07/22 15:57:00 INFO storage.DiskBlockManager: Shutdown hook called 15/07/22 15:57:00 INFO util.Utils: Shutdown hook called 15/07/22 15:57:00 INFO util.Utils: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/httpd-79b304a1-3cf4-4951-9e22-bbdfac435824 15/07/22 15:57:00 INFO util.Utils: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/userFiles-e01b4dd2-681c-4108-aec6-879774652c7a