Can't run spark-submit with an application jar on a Mesos cluster - apache-spark

Mesosphere did a great job on simplifying the process of running Spark on Mesos. I am using this guide to setup a development Mesos cluster on Google Cloud Compute.
https://mesosphere.com/docs/tutorials/run-spark-on-mesos/
I can run the example that's in the guide by using spark-shell (finding numbers less than 10). However, when I attempt to submit an application that otherwise works fine with Spark locally it blows up with TASK_FAILED messages (i.e. CoarseMesosSchedulerBackend: Mesos task 4 is now TASK_FAILED).
Here's the command I'm using with the provided Spark Pi example.
./spark-submit --class org.apache.spark.examples.SparkPi --master mesos://10.173.40.36:5050 ~/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar 100
And the output:
jclouds#development-5159-d9:~/learning-spark$ ~/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class org.apache.spark.examples.SparkPi --master mesos://10.173.40.36:5050 ~/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar 100
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/03/22 16:44:02 INFO SparkContext: Running Spark version 1.3.0
15/03/22 16:44:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/22 16:44:03 INFO SecurityManager: Changing view acls to: jclouds
15/03/22 16:44:03 INFO SecurityManager: Changing modify acls to: jclouds
15/03/22 16:44:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jclouds); users with modify permissions: Set(jclouds)
15/03/22 16:44:03 INFO Slf4jLogger: Slf4jLogger started
15/03/22 16:44:03 INFO Remoting: Starting remoting
15/03/22 16:44:03 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#development-5159-d9.c.learning-spark.internal:60301]
15/03/22 16:44:03 INFO Utils: Successfully started service 'sparkDriver' on port 60301.
15/03/22 16:44:03 INFO SparkEnv: Registering MapOutputTracker
15/03/22 16:44:03 INFO SparkEnv: Registering BlockManagerMaster
15/03/22 16:44:03 INFO DiskBlockManager: Created local directory at /tmp/spark-27fad7e3-4ad7-44d6-845f-4a09ac9cce90/blockmgr-a558b7be-0d72-49b9-93fd-5ef8731b314b
15/03/22 16:44:03 INFO MemoryStore: MemoryStore started with capacity 265.0 MB
15/03/22 16:44:04 INFO HttpFileServer: HTTP File server directory is /tmp/spark-de9ac795-381b-4acd-a723-a9a6778773c9/httpd-7115216c-0223-492b-ae6f-4134ba7228ba
15/03/22 16:44:04 INFO HttpServer: Starting HTTP Server
15/03/22 16:44:04 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 16:44:04 INFO AbstractConnector: Started SocketConnector#0.0.0.0:36663
15/03/22 16:44:04 INFO Utils: Successfully started service 'HTTP file server' on port 36663.
15/03/22 16:44:04 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/22 16:44:04 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 16:44:04 INFO AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
15/03/22 16:44:04 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/22 16:44:04 INFO SparkUI: Started SparkUI at http://development-5159-d9.c.learning-spark.internal:4040
15/03/22 16:44:04 INFO SparkContext: Added JAR file:/home/jclouds/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar at http://10.173.40.36:36663/jars/spark-examples-1.3.0-hadoop2.4.0.jar with timestamp 1427042644934
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY instead. Future releases will not support JNI bindings via MESOS_NATIVE_LIBRARY.
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY instead. Future releases will not support JNI bindings via MESOS_NATIVE_LIBRARY.
I0322 16:44:05.035423 308 sched.cpp:137] Version: 0.21.1
I0322 16:44:05.038136 309 sched.cpp:234] New master detected at master#10.173.40.36:5050
I0322 16:44:05.039261 309 sched.cpp:242] No credentials provided. Attempting to register without authentication
I0322 16:44:05.040351 310 sched.cpp:408] Framework registered with 20150322-040336-606645514-5050-2744-0019
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Registered as framework ID 20150322-040336-606645514-5050-2744-0019
15/03/22 16:44:05 INFO NettyBlockTransferService: Server created on 44177
15/03/22 16:44:05 INFO BlockManagerMaster: Trying to register BlockManager
15/03/22 16:44:05 INFO BlockManagerMasterActor: Registering block manager development-5159-d9.c.learning-spark.internal:44177 with 265.0 MB RAM, BlockManagerId(<driver>, development-5159-d9.c.learning-spark.internal, 44177)
15/03/22 16:44:05 INFO BlockManagerMaster: Registered BlockManager
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/03/22 16:44:05 INFO SparkContext: Starting job: reduce at SparkPi.scala:35
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now TASK_RUNNING
15/03/22 16:44:05 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 100 output partitions (allowLocal=false)
15/03/22 16:44:05 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35)
15/03/22 16:44:05 INFO DAGScheduler: Parents of final stage: List()
15/03/22 16:44:05 INFO DAGScheduler: Missing parents: List()
15/03/22 16:44:05 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31), which has no missing parents
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos slave value: "20150322-040336-606645514-5050-2744-S1"
due to too many failures; is Spark installed on it?
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos slave value: "20150322-040336-606645514-5050-2744-S0"
due to too many failures; is Spark installed on it?
15/03/22 16:44:05 INFO MemoryStore: ensureFreeSpace(1848) called with curMem=0, maxMem=277842493
15/03/22 16:44:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1848.0 B, free 265.0 MB)
15/03/22 16:44:05 INFO MemoryStore: ensureFreeSpace(1296) called with curMem=1848, maxMem=277842493
15/03/22 16:44:05 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1296.0 B, free 265.0 MB)
15/03/22 16:44:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on development-5159-d9.c.learning-spark.internal:44177 (size: 1296.0 B, free: 265.0 MB)
15/03/22 16:44:05 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/22 16:44:05 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:839
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 5 is now TASK_RUNNING
15/03/22 16:44:05 INFO DAGScheduler: Submitting 100 missing tasks from Stage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31)
15/03/22 16:44:05 INFO TaskSchedulerImpl: Adding task set 0.0 with 100 tasks
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 5 is now TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos slave value: "20150322-040336-606645514-5050-2744-S2"
due to too many failures; is Spark installed on it?
15/03/22 16:44:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I suspect it may have something to do with the mesos slave nodes not finding the application jar, but when I put it in HDFS and provide the URL to it, spark-submit tells me it will Skip remote jar.
jclouds#development-5159-d9:~/learning-spark$ ~/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class org.apache.spark.examples.SparkPi --master mesos://10.173.40.36:5050 hdfs://10.173.40.36/tmp/spark-examples-1.3.0-hadoop2.4.0.jar 100Spark assembly has been built with Hive, including Datanucleus jars on classpath
Warning: Skip remote jar hdfs://10.173.40.36/tmp/spark-examples-1.3.0-hadoop2.4.0.jar.
java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:266)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
--
EDIT: Just to bring this to conclusion, hbogert on the spark user list pointed me in the direction of debugging the spark logs on one of my slave nodes and the problem was as clear as day.
jclouds#development-5159-d3d:/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/latest$ cat stderr
I0329 20:34:26.107267 10026 exec.cpp:132] Version: 0.21.1
I0329 20:34:26.109591 10031 exec.cpp:206] Executor registered on slave 20150322-040336-606645514-5050-2744-S1
sh: 1: /home/jclouds/spark-1.3.0-bin-hadoop2.4/bin/spark-class: not found
jclouds#development-5159-d3d:/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/latest$ cat stdout
Registered executor on 10.217.7.180
Starting task 1
Forked command at 10036
sh -c ' "/home/jclouds/spark-1.3.0-bin-hadoop2.4/bin/spark-class" org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver#development-5159-d9.c.learning-spark.internal:54746/user/CoarseGrainedScheduler --executor-id 20150322-040336-606645514-5050-2744-S1 --hostname 10.217.7.180 --cores 10 --app-id 20150322-040336-606645514-5050-2744-0037'
Command exited with status 127 (pid: 10036)
Related:
How to run Hadoop on a Mesos cluster?
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-failing-on-cluster-td8369.html

It's hard to tell without knowing what's the stderr output in the Mesos sandbox logs, but usually you need to make sure you set the MESOS_NATIVE_LIBRARY (in spark-env.sh) and also the spark.executor.uri (in spark-defaults.conf) URL pointing to a Spark tar correctly. If not you need to have spark installed at the same location in each slave.

Related

Permission denied error when setting up local Spark instance and running pyspark

I am setting up a local Spark instance on Windows to use with PySpark as described in this guide (but with spark-3.0.0 / hadoop 2.7 instead): https://phoenixnap.com/kb/install-spark-on-windows-10.
I can startup Spark with:
C:\Spark\spark-3.0.0-bin-hadoop2.7\bin>spark-shell.cmd
and connect to it with http://localhost:4040/ in my browser (I see the Spark GUI).
But when am running the Python pyspark example with
C:\Spark\spark-3.0.0-bin-hadoop2.7\examples>run-example SparkPi
it throws an Permission Denied error like in this trace:
21/03/08 10:51:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/08 10:51:04 INFO SparkContext: Running Spark version 3.0.0
21/03/08 10:51:04 INFO ResourceUtils: ==============================================================
21/03/08 10:51:04 INFO ResourceUtils: Resources for spark.driver:
21/03/08 10:51:04 INFO ResourceUtils: ==============================================================
21/03/08 10:51:04 INFO SparkContext: Submitted application: Spark Pi
21/03/08 10:51:04 INFO SecurityManager: Changing view acls to: #####
21/03/08 10:51:04 INFO SecurityManager: Changing modify acls to: #####
21/03/08 10:51:04 INFO SecurityManager: Changing view acls groups to:
21/03/08 10:51:04 INFO SecurityManager: Changing modify acls groups to:
21/03/08 10:51:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(#####); groups with view permissions: Set(); users with modify permissions: Set(#####); groups with modify permissions: Set()
21/03/08 10:51:05 INFO Utils: Successfully started service 'sparkDriver' on port 63213.
21/03/08 10:51:05 INFO SparkEnv: Registering MapOutputTracker
21/03/08 10:51:05 INFO SparkEnv: Registering BlockManagerMaster
21/03/08 10:51:05 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/03/08 10:51:05 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/03/08 10:51:05 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
21/03/08 10:51:05 INFO DiskBlockManager: Created local directory at C:\Users\#####\AppData\Local\Temp\blockmgr-dce03954-27a7-484d-8e54-f552b21433f7
21/03/08 10:51:05 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
21/03/08 10:51:05 INFO SparkEnv: Registering OutputCommitCoordinator
21/03/08 10:51:05 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/03/08 10:51:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://WORKSTATION.DOMAIN.EXT:4040
21/03/08 10:51:05 INFO SparkContext: Added JAR file:///C:/Spark/spark-3.0.0-bin-hadoop2.7/examples/jars/scopt_2.12-3.7.1.jar at spark://WORKSTATION.DOMAIN.EXT:63213/jars/scopt_2.12-3.7.1.jar with timestamp 1615197065578
21/03/08 10:51:05 INFO SparkContext: Added JAR file:///C:/Spark/spark-3.0.0-bin-hadoop2.7/examples/jars/spark-examples_2.12-3.0.0.jar at spark://WORKSTATION.DOMAIN.EXT:63213/jars/spark-examples_2.12-3.0.0.jar with timestamp 1615197065579
21/03/08 10:51:05 INFO Executor: Starting executor ID driver on host WORKSTATION.DOMAIN.EXT
21/03/08 10:51:05 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 63260.
21/03/08 10:51:05 INFO NettyBlockTransferService: Server created on WORKSTATION.DOMAIN.EXT:63260
21/03/08 10:51:05 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/03/08 10:51:05 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, NLLR4000250910.solon.prd, 63260, None)
21/03/08 10:51:05 INFO BlockManagerMasterEndpoint: Registering block manager NLLR4000250910.solon.prd:63260 with 366.3 MiB RAM, BlockManagerId(driver, WORKSTATION.DOMAIN.EXT, 63260, None)
21/03/08 10:51:05 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, NLLR4000250910.solon.prd, 63260, None)
21/03/08 10:51:05 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, NLLR4000250910.solon.prd, 63260, None)
21/03/08 10:51:06 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
21/03/08 10:51:06 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
21/03/08 10:51:06 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
21/03/08 10:51:06 INFO DAGScheduler: Parents of final stage: List()
21/03/08 10:51:06 INFO DAGScheduler: Missing parents: List()
21/03/08 10:51:06 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
21/03/08 10:51:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.1 KiB, free 366.3 MiB)
21/03/08 10:51:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1816.0 B, free 366.3 MiB)
21/03/08 10:51:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on WORKSTATION.DOMAIN.EXT:63260 (size: 1816.0 B, free: 366.3 MiB)
21/03/08 10:51:06 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1200
21/03/08 10:51:06 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
21/03/08 10:51:06 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
21/03/08 10:51:06 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, WORKSTATION.DOMAIN.EXT, executor driver, partition 0, PROCESS_LOCAL, 7393 bytes)
21/03/08 10:51:06 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, WORKSTATION.DOMAIN.EXT, executor driver, partition 1, PROCESS_LOCAL, 7393 bytes)
21/03/08 10:51:06 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
21/03/08 10:51:06 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/03/08 10:51:06 INFO Executor: Fetching spark://WORKSTATION.DOMAIN.EXT:63213/jars/spark-examples_2.12-3.0.0.jar with timestamp 1615197065579
21/03/08 10:51:06 ERROR Utils: Aborting task
java.io.IOException: Failed to connect to WORKSTATION.DOMAIN.EXT/192.168.#.#:63213
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:392)
at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:360)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:359)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:719)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:535)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:869)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:860)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:860)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:404)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: io.netty.channel.AbstractChannel$AnnotatedSocketException: Permission denied: no further information: WORKSTATION.DOMAIN.EXT/192.168.#.#:63213
Caused by: java.net.SocketException: Permission denied: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Unknown Source)
[snip]
When running it on a different machine with seemingly the same config where it works fine, I get this trace on the part where the Exception is thrown on the other trace:
[snip]
21/03/08 08:00:22 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/03/08 08:00:22 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
21/03/08 08:00:22 INFO Executor: Fetching spark://WORKSTATION.DOMAIN.EXT:63646/jars/spark-examples_2.12-3.0.0.jar with timestamp 1615186820489
21/03/08 08:00:22 INFO TransportClientFactory: Successfully created connection to WORKSTATION.DOMAIN.EXT/10.121.#.#:63646 after 86 ms (0 ms spent in bootstraps)
21/03/08 08:00:22 INFO Utils: Fetching spark://WORKSTATION.DOMAIN.EXT:63646/jars/spark-examples_2.12-3.0.0.jar to C:\Users\#####\AppData\Local\Temp\spark-54a13d9f-9064-4f34-ba81-af49b18d9a0c\userFiles-24c3eabc-02a4-4aca-8abb-424431c6442f\fetchFileTemp5258763437798623210.tmp
21/03/08 08:00:24 INFO Executor: Adding file:/C:/Users/#####/AppData/Local/Temp/spark-54a13d9f-9064-4f34-ba81-af49b18d9a0c/userFiles-24c3eabc-02a4-4aca-8abb-424431c6442f/spark-examples_2.12-3.0.0.jar to class loader
[snip]
At first it seemed to me as a Firewall issue, but adding the executing java.exe as exeption to the firewall didn't solve the issue.
Does anyone know what I should try next to get this issue resolved?
Finally I could solve it by setting my SPARK_LOCAL_IP to localhost in my environment variables: Go to your Windows environment variables and set SPARK_LOCAL_IP=localhost

how to properly submit spark jobs on a stand-alone cluster

I just built a spark 2.0 stand-alone single node cluster on Ubuntu 14.
Trying to submit a pyspark job:
~/spark/spark-2.0.0$ bin/spark-submit --driver-memory 1024m --executor-memory 1024m --executor-cores 1 --master spark://ip-10-180-191-14:7077 examples/src/main/python/pi.py
spark gives me this message:
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Here is the complete output:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/27 17:45:18 INFO SparkContext: Running Spark version 2.0.0
16/07/27 17:45:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/27 17:45:18 INFO SecurityManager: Changing view acls to: ubuntu
16/07/27 17:45:18 INFO SecurityManager: Changing modify acls to: ubuntu
16/07/27 17:45:18 INFO SecurityManager: Changing view acls groups to:
16/07/27 17:45:18 INFO SecurityManager: Changing modify acls groups to:
16/07/27 17:45:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); groups with view permissions: Set(); users with modify permissions: Set(ubuntu); groups with modify permissions: Set()
16/07/27 17:45:19 INFO Utils: Successfully started service 'sparkDriver' on port 36842.
16/07/27 17:45:19 INFO SparkEnv: Registering MapOutputTracker
16/07/27 17:45:19 INFO SparkEnv: Registering BlockManagerMaster
16/07/27 17:45:19 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e25f3ae9-be1f-4ea3-8f8b-b3ff3ec7e978
16/07/27 17:45:19 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/07/27 17:45:19 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/27 17:45:19 INFO log: Logging initialized #1986ms
16/07/27 17:45:19 INFO Server: jetty-9.2.16.v20160414
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#4674e929{/jobs,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#1adab7c7{/jobs/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#26296937{/jobs/job,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#7ef4a753{/jobs/job/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#1f282405{/stages,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#5083cca8{/stages/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#3d8e675e{/stages/stage,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#661b8183{/stages/stage/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#384d9949{/stages/pool,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#7665e464{/stages/pool/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#381fc961{/storage,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#2325078{/storage/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#566116a6{/storage/rdd,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#f7e9eca{/storage/rdd/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#496c0a85{/environment,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#59cd2240{/environment/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#747dbf9{/executors,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#7c349d15{/executors/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#55259834{/executors/threadDump,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#65ca7ff2{/executors/threadDump/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#5c6be8a1{/static,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#4ef1a0c{/,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#7df2d69d{/api,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#4b71033e{/stages/stage/kill,null,AVAILABLE}
16/07/27 17:45:19 INFO ServerConnector: Started ServerConnector#646986bc{HTTP/1.1}{0.0.0.0:4040}
16/07/27 17:45:19 INFO Server: Started #2150ms
16/07/27 17:45:19 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/27 17:45:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.180.191.14:4040
16/07/27 17:45:19 INFO Utils: Copying /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py to /tmp/spark-ee1ceb06-a7c4-4b18-8577-adb02f97f31e/userFiles-565d5e0b-5879-40d3-8077-d9d782156818/pi.py
16/07/27 17:45:19 INFO SparkContext: Added file file:/home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py at spark://10.180.191.14:36842/files/pi.py with timestamp 1469641519759
16/07/27 17:45:19 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://ip-10-180-191-14:7077...
16/07/27 17:45:19 INFO TransportClientFactory: Successfully created connection to ip-10-180-191-14/10.180.191.14:7077 after 25 ms (0 ms spent in bootstraps)
16/07/27 17:45:20 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20160727174520-0006
16/07/27 17:45:20 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39047.
16/07/27 17:45:20 INFO NettyBlockTransferService: Server created on 10.180.191.14:39047
16/07/27 17:45:20 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.180.191.14, 39047)
16/07/27 17:45:20 INFO BlockManagerMasterEndpoint: Registering block manager 10.180.191.14:39047 with 366.3 MB RAM, BlockManagerId(driver, 10.180.191.14, 39047)
16/07/27 17:45:20 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.180.191.14, 39047)
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#2bc4029c{/metrics/json,null,AVAILABLE}
16/07/27 17:45:20 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#60378632{/SQL,null,AVAILABLE}
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#6491578b{/SQL/json,null,AVAILABLE}
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#9ae3f78{/SQL/execution,null,AVAILABLE}
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#3c80379{/SQL/execution/json,null,AVAILABLE}
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler#245146b3{/static/sql,null,AVAILABLE}
16/07/27 17:45:20 INFO SharedState: Warehouse path is 'file:/home/ubuntu/spark/spark-2.0.0/spark-warehouse'.
16/07/27 17:45:20 INFO SparkContext: Starting job: reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43
16/07/27 17:45:20 INFO DAGScheduler: Got job 0 (reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43) with 2 output partitions
16/07/27 17:45:20 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43)
16/07/27 17:45:20 INFO DAGScheduler: Parents of final stage: List()
16/07/27 17:45:20 INFO DAGScheduler: Missing parents: List()
16/07/27 17:45:20 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43), which has no missing parents
16/07/27 17:45:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.6 KB, free 366.3 MB)
16/07/27 17:45:21 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.0 KB, free 366.3 MB)
16/07/27 17:45:21 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.180.191.14:39047 (size: 3.0 KB, free: 366.3 MB)
16/07/27 17:45:21 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/07/27 17:45:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[1] at reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43)
16/07/27 17:45:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/07/27 17:45:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/07/27 17:45:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I'm not running spark on top of hadoop or yarn, just by itself, stand-alone.
What can I do to make spark process these jobs?
Try setting master to local like this, in order to use local mode:
~/spark/spark-2.0.0$ bin/spark-submit --driver-memory 1024m --executor-memory 1024m --executor-cores 1 --master local[2] examples/src/main/python/pi.py
You may also need to use the
--py-files
option as well. Spark submit options
Setting master to local as suggested above will only make your program run in local mode - which is good for beginners/small loads for a single computer - but it will not be configured to run on a cluster.
What you need to do in order to run your program in a real cluster (and possibly on multiple machines) is setting a master and slaves using the scripts located in:
<spark-install-dir>/start-master.sh
Your slaves (you must have at least one) should be started using:
<spark-install-dir> start-slave.sh spark://<master-address>:7077
This way you will be able to run in a real cluster mode - the UI will show you your workers and jobs etc.
you will see the main UI in port 8080 on the master machine.
Port 4040 on the machine that run the driver will show you the application UI.
Port 8081 will show you the workers UI (if u use many slaves on the same machine the ports will be 8081 for the first, 8082 for the second etc.)
You can run as many slaves as you want from as many machines - and supply the amount of cores for each slave (it is possible to provide few slaves from the same machine - just give them the appropriate number of cores/ram - so that you will not confuse the scheduler).

Scala application using spark cassandra connector hangs

I am developing a test application in Intellij using scala and spark cassandra connector. Here is my build.sbt code:
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"
libraryDependencies += "org.apache.spark".%%("spark-sql") % "1.6.1"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.6.0-M2"
I have created cassandra cluster using ccm with 4 nodes. The keyspace I created has replication factor 3. Here is my code in scala app
val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("SparkCassandra")
//set Cassandra host address as your local address
.set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext(conf)
val rdd = sc.cassandraTable("excelsior", "emp")
val total = rdd.count()
println(total)
println("exiting now:")
sc.stop()
But the spark job hangs at following line
CassandraConnector: Disconnected from Cassandra cluster: cluster4nodes
Only 3 tasks out of 4 are completed.Here is the full log:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/08 14:23:26 INFO SparkContext: Running Spark version 1.6.0
16/07/08 14:23:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/08 14:23:27 WARN Utils: Your hostname, renuka-Inspiron-3542 resolves to a loopback address: 127.0.1.1; using 192.168.1.189 instead (on interface wlan0)
16/07/08 14:23:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/07/08 14:23:27 INFO SecurityManager: Changing view acls to: renuka
16/07/08 14:23:27 INFO SecurityManager: Changing modify acls to: renuka
16/07/08 14:23:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(renuka); users with modify permissions: Set(renuka)
16/07/08 14:23:28 INFO Utils: Successfully started service 'sparkDriver' on port 41027.
16/07/08 14:23:28 INFO Slf4jLogger: Slf4jLogger started
16/07/08 14:23:28 INFO Remoting: Starting remoting
16/07/08 14:23:28 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#192.168.1.189:46329]
16/07/08 14:23:28 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 46329.
16/07/08 14:23:29 INFO SparkEnv: Registering MapOutputTracker
16/07/08 14:23:29 INFO SparkEnv: Registering BlockManagerMaster
16/07/08 14:23:29 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-d1db2ea9-3fba-4b64-ad2b-80deddd3f05a
16/07/08 14:23:29 INFO MemoryStore: MemoryStore started with capacity 1091.3 MB
16/07/08 14:23:29 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/08 14:23:29 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/08 14:23:29 INFO SparkUI: Started SparkUI at http://192.168.1.189:4040
16/07/08 14:23:30 INFO Executor: Starting executor ID driver on host localhost
16/07/08 14:23:30 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40956.
16/07/08 14:23:30 INFO NettyBlockTransferService: Server created on 40956
16/07/08 14:23:30 INFO BlockManagerMaster: Trying to register BlockManager
16/07/08 14:23:30 INFO BlockManagerMasterEndpoint: Registering block manager localhost:40956 with 1091.3 MB RAM, BlockManagerId(driver, localhost, 40956)
16/07/08 14:23:30 INFO BlockManagerMaster: Registered BlockManager
16/07/08 14:23:30 INFO NettyUtil: Found Netty's native epoll transport in the classpath, using it
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.1:9042 added
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.2:9042 added
16/07/08 14:23:31 INFO LocalNodeFirstLoadBalancingPolicy: Added host 127.0.0.2 (datacenter1)
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.3:9042 added
16/07/08 14:23:31 INFO LocalNodeFirstLoadBalancingPolicy: Added host 127.0.0.3 (datacenter1)
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.4:9042 added
16/07/08 14:23:31 INFO LocalNodeFirstLoadBalancingPolicy: Added host 127.0.0.4 (datacenter1)
16/07/08 14:23:31 INFO CassandraConnector: Connected to Cassandra cluster: cluster4nodes
16/07/08 14:23:31 INFO SparkContext: Starting job: count at hello.scala:36
16/07/08 14:23:32 INFO DAGScheduler: Got job 0 (count at hello.scala:36) with 4 output partitions
16/07/08 14:23:32 INFO DAGScheduler: Final stage: ResultStage 0 (count at hello.scala:36)
16/07/08 14:23:32 INFO DAGScheduler: Parents of final stage: List()
16/07/08 14:23:32 INFO DAGScheduler: Missing parents: List()
16/07/08 14:23:32 INFO DAGScheduler: Submitting ResultStage 0 (CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:18), which has no missing parents
16/07/08 14:23:32 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 7.2 KB, free 7.2 KB)
16/07/08 14:23:32 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.7 KB, free 10.9 KB)
16/07/08 14:23:32 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40956 (size: 3.7 KB, free: 1091.2 MB)
16/07/08 14:23:32 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
16/07/08 14:23:32 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:18)
16/07/08 14:23:32 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
16/07/08 14:23:32 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,NODE_LOCAL, 3530 bytes)
16/07/08 14:23:32 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,NODE_LOCAL, 3530 bytes)
16/07/08 14:23:32 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, partition 2,NODE_LOCAL, 3530 bytes)
16/07/08 14:23:32 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
16/07/08 14:23:32 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/07/08 14:23:32 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/07/08 14:23:34 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2082 bytes result sent to driver
16/07/08 14:23:34 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2082 bytes result sent to driver
16/07/08 14:23:34 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 2082 bytes result sent to driver
16/07/08 14:23:34 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 2089 ms on localhost (1/4)
16/07/08 14:23:34 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2179 ms on localhost (2/4)
16/07/08 14:23:34 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2117 ms on localhost (3/4)
16/07/08 14:23:41 INFO CassandraConnector: Disconnected from Cassandra cluster: cluster4nodes
If I created a keyspace with replication factor 4 on 4 nodes cluster, app works fine and never hangs. Am I missing anything in configuration. Thanks in advance.

Spark not doing any work on slave: Initial job has not accepted any resources

I am trying to do a very simple setup with Spark using SSH tunneling and I can't make it work.
I have master running on my PC, with this setup ./sbin/start-master.sh -h localhost -p 7077 (if not stated otherwise, everything else is default).
On my slave PC (IP is 192.168.0.222), which is in other domain and I don't have a root access to it, I made ssh -N -L localhost:7078:localhost:7077 myMasterPCSSHalias and run slave with ./sbin/start-slave.sh spark://localhost:7078. I can now see this slave on the dashboard at http://localhost:8080/ in my browser. I see that it has 14GB of free memory.
When I then try e.g. this example:
./bin/spark-submit --master spark://localhost:7077 examples/src/main/python/pi.py 10
it hangs on this message until I kill it (you can see the full log message below):
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I am sure I am not using more resources than I have available, the problem still persists even though I use --executor-memory 512m and running executor is just signalling RUNNING state. The only thing in error log is this:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/09 22:45:44 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
16/05/09 22:45:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/09 22:45:45 INFO SecurityManager: Changing view acls to: hnykdan1,dan
16/05/09 22:45:45 INFO SecurityManager: Changing modify acls to: hnykdan1,dan
16/05/09 22:45:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hnykdan1, dan); users with modify permissions: Set(hnykdan1, dan)
and in slave log is this:
16/05/09 22:48:56 INFO Worker: Asked to launch executor app-20160509224034-0013/0 for PythonPi
16/05/09 22:48:56 INFO SecurityManager: Changing view acls to: hnykdan1
16/05/09 22:48:56 INFO SecurityManager: Changing modify acls to: hnykdan1
16/05/09 22:48:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hnykdan1); users with modify permissions: Set(hnykdan1)
16/05/09 22:48:56 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java" "-cp" "/home/hnykdan1/spark/conf/:/home/hnykdan1/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar:/home/hnykdan1/spark/lib/datanucleus-core-3.2.10.jar:/home/hnykdan1/spark/lib/datanucleus-api-jdo-3.2.6.jar:/home/hnykdan1/spark/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=37450" "-XX:MaxPermSize=256m" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#192.168.0.222:37450" "--executor-id" "0" "--hostname" "147.32.8.103" "--cores" "8" "--app-id" "app-20160509224034-0013" "--worker-url" "spark://Worker#147.32.8.103:54894"
Everything looks quite normal and I don't know where might be a problem. Do I need to tunnel even the other way around? It runs fine when I run slave locally in the exactly same fashion. Thanks
Full Log from console
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/09 22:28:21 INFO SparkContext: Running Spark version 1.6.1
16/05/09 22:28:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/09 22:28:22 INFO SecurityManager: Changing view acls to: dan
16/05/09 22:28:22 INFO SecurityManager: Changing modify acls to: dan
16/05/09 22:28:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(dan); users with modify permissions: Set(dan)
16/05/09 22:28:22 INFO Utils: Successfully started service 'sparkDriver' on port 34508.
16/05/09 22:28:23 INFO Slf4jLogger: Slf4jLogger started
16/05/09 22:28:23 INFO Remoting: Starting remoting
16/05/09 22:28:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#192.168.0.222:44359]
16/05/09 22:28:23 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 44359.
16/05/09 22:28:23 INFO SparkEnv: Registering MapOutputTracker
16/05/09 22:28:23 INFO SparkEnv: Registering BlockManagerMaster
16/05/09 22:28:23 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-db4c3293-423f-4966-a479-b69a90439da9
16/05/09 22:28:23 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/05/09 22:28:23 INFO SparkEnv: Registering OutputCommitCoordinator
16/05/09 22:28:24 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/05/09 22:28:24 INFO SparkUI: Started SparkUI at http://192.168.0.222:4040
16/05/09 22:28:24 INFO HttpFileServer: HTTP File server directory is /tmp/spark-d532a9c1-0455-4937-ad27-b47abb2a65e8/httpd-aa031b8c-f605-41c3-aabe-fc4fe01bdcf8
16/05/09 22:28:24 INFO HttpServer: Starting HTTP Server
16/05/09 22:28:24 INFO Utils: Successfully started service 'HTTP file server' on port 41770.
16/05/09 22:28:24 INFO Utils: Copying /home/hnykdan1/spark/examples/src/main/python/pi.py to /tmp/spark-d532a9c1-0455-4937-ad27-b47abb2a65e8/userFiles-14720bed-cd41-4b15-9bd3-38dbf4f268ff/pi.py
16/05/09 22:28:24 INFO SparkContext: Added file file:/home/hnykdan1/spark/examples/src/main/python/pi.py at http://192.168.0.222:41770/files/pi.py with timestamp 1462825704629
16/05/09 22:28:24 INFO AppClient$ClientEndpoint: Connecting to master spark://localhost:7077...
16/05/09 22:28:24 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160509222824-0011
16/05/09 22:28:24 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44617.
16/05/09 22:28:24 INFO NettyBlockTransferService: Server created on 44617
16/05/09 22:28:24 INFO AppClient$ClientEndpoint: Executor added: app-20160509222824-0011/0 on worker-20160509214654-147.32.8.103-54894 (147.32.8.103:54894) with 8 cores
16/05/09 22:28:24 INFO BlockManagerMaster: Trying to register BlockManager
16/05/09 22:28:24 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160509222824-0011/0 on hostPort 147.32.8.103:54894 with 8 cores, 1024.0 MB RAM
16/05/09 22:28:24 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.222:44617 with 511.1 MB RAM, BlockManagerId(driver, 192.168.0.222, 44617)
16/05/09 22:28:24 INFO BlockManagerMaster: Registered BlockManager
16/05/09 22:28:25 INFO AppClient$ClientEndpoint: Executor updated: app-20160509222824-0011/0 is now RUNNING
16/05/09 22:28:25 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/05/09 22:28:25 INFO SparkContext: Starting job: reduce at /home/hnykdan1/spark/examples/src/main/python/pi.py:39
16/05/09 22:28:25 INFO DAGScheduler: Got job 0 (reduce at /home/hnykdan1/spark/examples/src/main/python/pi.py:39) with 10 output partitions
16/05/09 22:28:25 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at /home/hnykdan1/spark/examples/src/main/python/pi.py:39)
16/05/09 22:28:25 INFO DAGScheduler: Parents of final stage: List()
16/05/09 22:28:25 INFO DAGScheduler: Missing parents: List()
16/05/09 22:28:25 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at reduce at /home/hnykdan1/spark/examples/src/main/python/pi.py:39), which has no missing parents
16/05/09 22:28:26 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.0 KB, free 4.0 KB)
16/05/09 22:28:26 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.7 KB, free 6.7 KB)
16/05/09 22:28:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.222:44617 (size: 2.7 KB, free: 511.1 MB)
16/05/09 22:28:26 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
16/05/09 22:28:26 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (PythonRDD[1] at reduce at /home/hnykdan1/spark/examples/src/main/python/pi.py:39)
16/05/09 22:28:26 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
16/05/09 22:28:41 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/05/09 22:28:56 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/05/09 22:29:11 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/05/09 22:29:26 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/05/09 22:29:41 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/05/09 22:29:56 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/05/09 22:30:11 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/05/09 22:30:26 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Since you checked that you have the resources, the next most likely problem is that the executor cannot connect back to the driver. When submitting a job, the driver starts a server that the executor will connect to in order to download the jar(s).
Yes, the error message (Initial job has not accepted any resources...) does not look related to network problem. This is a known issue discussed for example here:
https://github.com/databricks/spark-knowledgebase/issues/9
It's probably related to the network (security groups rules). It's a silly test, but I just made it work by opening master and workers to all TCP traffic (inbound/outbound).

Kafka message consumption with spark

I am using HDP-2.3 sandbox for Consuming kafka messages by running SPARK submit job.
i am putting some messages in kafka as below:
kafka-console-producer.sh --broker-list sandbox.hortonworks.com:6667 --topic webevent
OR
kafka-console-producer.sh --broker-list sandbox.hortonworks.com:6667 --topic test --new-producer < myfile.txt
Now i need to consume above messages from spark job as shown below:
./bin/spark-submit --master spark://192.168.255.150:7077 --executor-memory 512m --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar 192.168.255.150:2181 webevent 10
Where 2181 is a zookeeper port
I am getting Error as shown(Guide me how to consume that message from Kafka):
16/05/02 15:21:30 INFO SparkContext: Running Spark version 1.3.1
16/05/02 15:21:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/02 15:21:31 INFO SecurityManager: Changing view acls to: root
16/05/02 15:21:31 INFO SecurityManager: Changing modify acls to: root
16/05/02 15:21:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/05/02 15:21:31 INFO Slf4jLogger: Slf4jLogger started
16/05/02 15:21:31 INFO Remoting: Starting remoting
16/05/02 15:21:32 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#sandbox.hortonworks.com:53950]
16/05/02 15:21:32 INFO Utils: Successfully started service 'sparkDriver' on port 53950.
16/05/02 15:21:32 INFO SparkEnv: Registering MapOutputTracker
16/05/02 15:21:32 INFO SparkEnv: Registering BlockManagerMaster
16/05/02 15:21:32 INFO DiskBlockManager: Created local directory at /tmp/spark-c70b08b9-41a3-42c8-9d83-bc4258e299c6/blockmgr-c2d86de6-34a7-497c-8018-d3437a100e87
16/05/02 15:21:32 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
16/05/02 15:21:32 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a8f7ade9-292c-42c4-9e54-43b3b3495b0c/httpd-65d36d04-1e2a-4e69-8d20-295465100070
16/05/02 15:21:32 INFO HttpServer: Starting HTTP Server
16/05/02 15:21:32 INFO Server: jetty-8.y.z-SNAPSHOT
16/05/02 15:21:32 INFO AbstractConnector: Started SocketConnector#0.0.0.0:37014
16/05/02 15:21:32 INFO Utils: Successfully started service 'HTTP file server' on port 37014.
16/05/02 15:21:32 INFO SparkEnv: Registering OutputCommitCoordinator
16/05/02 15:21:32 INFO Server: jetty-8.y.z-SNAPSHOT
16/05/02 15:21:32 INFO AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
16/05/02 15:21:32 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/05/02 15:21:32 INFO SparkUI: Started SparkUI at http://sandbox.hortonworks.com:4040
16/05/02 15:21:33 INFO SparkContext: Added JAR file:/usr/hdp/2.3.0.0-2130/spark/lib/spark-examples-1.4.1-hadoop2.4.0.jar at http://192.168.255.150:37014/jars/spark-examples-1.4.1-hadoop2.4.0.jar with timestamp 1462202493866
16/05/02 15:21:34 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#192.168.255.150:7077/user/Master...
16/05/02 15:21:34 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160502152134-0000
16/05/02 15:21:34 INFO AppClient$ClientActor: Executor added: app-20160502152134-0000/0 on worker-20160502150437-sandbox.hortonworks.com-36920 (sandbox.hortonworks.com:36920) with 1 cores
16/05/02 15:21:34 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160502152134-0000/0 on hostPort sandbox.hortonworks.com:36920 with 1 cores, 512.0 MB RAM
16/05/02 15:21:34 INFO AppClient$ClientActor: Executor updated: app-20160502152134-0000/0 is now RUNNING
16/05/02 15:21:34 INFO AppClient$ClientActor: Executor updated: app-20160502152134-0000/0 is now LOADING
16/05/02 15:21:34 INFO NettyBlockTransferService: Server created on 43440
16/05/02 15:21:34 INFO BlockManagerMaster: Trying to register BlockManager
16/05/02 15:21:34 INFO BlockManagerMasterActor: Registering block manager sandbox.hortonworks.com:43440 with 265.4 MB RAM, BlockManagerId(<driver>, sandbox.hortonworks.com, 43440)
16/05/02 15:21:34 INFO BlockManagerMaster: Registered BlockManager
16/05/02 15:21:35 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/05/02 15:21:35 INFO VerifiableProperties: Verifying properties
16/05/02 15:21:35 INFO VerifiableProperties: Property group.id is overridden to
16/05/02 15:21:35 INFO VerifiableProperties: Property zookeeper.connect is overridden to
16/05/02 15:21:35 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
Error: application failed with exception
org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
at scala.util.Either.fold(Either.scala:97)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:415)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:532)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at org.apache.spark.examples.streaming.JavaDirectKafkaWordCount.main(JavaDirectKafkaWordCount.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
OR
wen i use this:
./bin/spark-submit --master spark://192.168.255.150:7077 --executor-memory 512m --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar 192.168.255.150:6667 webevent 10
where 6667 is a Kafka’s message producing port, i am getting this error:
16/05/02 15:27:26 INFO SimpleConsumer: Reconnect due to socket error: java.nio.channels.ClosedChannelException
Error: application failed with exception
org.apache.spark.SparkException: java.nio.channels.ClosedChannelException
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
i dont know if this can help:
./bin/spark-submit --class consumer.kafka.client.Consumer --master spark://192.168.255.150:7077 --executor-memory 1G lib/kafka-spark-consumer-1.0.6.jar 10

Resources