Spark fail when running pi.py example with yarn-client mode - apache-spark

I can successfully run the java version of pi example as follows.
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-client \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10
However, the python version failed with the following error information. I used yarn-client mode. The pyspark command line with yarn-client mode returned the same info. Can anyone help me to figure out this problem?
nlp#yyy2:~/spark$ ./bin/spark-submit --master yarn-client examples/src/main/python/pi.py
15/01/05 17:22:26 INFO spark.SecurityManager: Changing view acls to: nlp
15/01/05 17:22:26 INFO spark.SecurityManager: Changing modify acls to: nlp
15/01/05 17:22:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nlp); users with modify permissions: Set(nlp)
15/01/05 17:22:26 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/01/05 17:22:26 INFO Remoting: Starting remoting
15/01/05 17:22:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#yyy2:42747]
15/01/05 17:22:26 INFO util.Utils: Successfully started service 'sparkDriver' on port 42747.
15/01/05 17:22:26 INFO spark.SparkEnv: Registering MapOutputTracker
15/01/05 17:22:26 INFO spark.SparkEnv: Registering BlockManagerMaster
15/01/05 17:22:26 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150105172226-aeae
15/01/05 17:22:26 INFO storage.MemoryStore: MemoryStore started with capacity 265.1 MB
15/01/05 17:22:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/05 17:22:27 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-cbe0079b-79c5-426b-b67e-548805423b11
15/01/05 17:22:27 INFO spark.HttpServer: Starting HTTP Server
15/01/05 17:22:27 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/01/05 17:22:27 INFO server.AbstractConnector: Started SocketConnector#0.0.0.0:57169
15/01/05 17:22:27 INFO util.Utils: Successfully started service 'HTTP file server' on port 57169.
15/01/05 17:22:27 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/01/05 17:22:27 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
15/01/05 17:22:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/01/05 17:22:27 INFO ui.SparkUI: Started SparkUI at http://yyy2:4040
15/01/05 17:22:27 INFO client.RMProxy: Connecting to ResourceManager at yyy14/10.112.168.195:8032
15/01/05 17:22:27 INFO yarn.Client: Requesting a new application from cluster with 6 NodeManagers
15/01/05 17:22:27 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/01/05 17:22:27 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/01/05 17:22:27 INFO yarn.Client: Setting up container launch context for our AM
15/01/05 17:22:27 INFO yarn.Client: Preparing resources for our AM container
15/01/05 17:22:28 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 24 for xxx on ha-hdfs:hzdm-cluster1
15/01/05 17:22:28 INFO yarn.Client: Uploading resource file:/home/nlp/platform/spark-1.2.0-bin-2.5.2/lib/spark-assembly-1.2.0-hadoop2.5.2.jar -> hdfs://hzdm-cluster1/user/nlp/.sparkStaging/application_1420444011562_0023/spark-assembly-1.2.0-hadoop2.5.2.jar
15/01/05 17:22:29 INFO yarn.Client: Uploading resource file:/home/nlp/platform/spark-1.2.0-bin-2.5.2/examples/src/main/python/pi.py -> hdfs://hzdm-cluster1/user/nlp/.sparkStaging/application_1420444011562_0023/pi.py
15/01/05 17:22:29 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/05 17:22:29 INFO spark.SecurityManager: Changing view acls to: nlp
15/01/05 17:22:29 INFO spark.SecurityManager: Changing modify acls to: nlp
15/01/05 17:22:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nlp); users with modify permissions: Set(nlp)
15/01/05 17:22:29 INFO yarn.Client: Submitting application 23 to ResourceManager
15/01/05 17:22:30 INFO impl.YarnClientImpl: Submitted application application_1420444011562_0023
15/01/05 17:22:31 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:31 INFO yarn.Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.default
start time: 1420449749969
final status: UNDEFINED
tracking URL: http://yyy14:8070/proxy/application_1420444011562_0023/
user: nlp
15/01/05 17:22:32 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:33 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:34 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:35 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:36 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED)
15/01/05 17:22:36 INFO cluster.YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM#yyy16:52855/user/YarnAM#435880073]
15/01/05 17:22:36 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> yyy14, PROXY_URI_BASES -> http://yyy14:8070/proxy/application_1420444011562_0023), /proxy/application_1420444011562_0023
15/01/05 17:22:36 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/01/05 17:22:37 INFO yarn.Client: Application report for application_1420444011562_0023 (state: RUNNING)
15/01/05 17:22:37 INFO yarn.Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: yyy16
ApplicationMaster RPC port: 0
queue: root.default
start time: 1420449749969
final status: UNDEFINED
tracking URL: http://yyy14:8070/proxy/application_1420444011562_0023/
user: nlp
15/01/05 17:22:37 INFO cluster.YarnClientSchedulerBackend: Application application_1420444011562_0023 has started running.
15/01/05 17:22:37 INFO netty.NettyBlockTransferService: Server created on 35648
15/01/05 17:22:37 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/01/05 17:22:37 INFO storage.BlockManagerMasterActor: Registering block manager yyy2:35648 with 265.1 MB RAM, BlockManagerId(<driver>, yyy2, 35648)
15/01/05 17:22:37 INFO storage.BlockManagerMaster: Registered BlockManager
15/01/05 17:22:37 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM#yyy16:52855] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/01/05 17:22:38 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/json,null}
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs,null}
15/01/05 17:22:38 INFO ui.SparkUI: Stopped Spark web UI at http://yyy2:4040
15/01/05 17:22:38 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Stopped
15/01/05 17:22:39 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/01/05 17:22:39 INFO storage.MemoryStore: MemoryStore cleared
15/01/05 17:22:39 INFO storage.BlockManager: BlockManager stopped
15/01/05 17:22:39 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/01/05 17:22:39 INFO spark.SparkContext: Successfully stopped SparkContext
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/01/05 17:22:57 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
Traceback (most recent call last):
File "/home/nlp/platform/spark-1.2.0-bin-2.5.2/examples/src/main/python/pi.py", line 29, in <module>
sc = SparkContext(appName="PythonPi")
File "/home/nlp/spark/python/pyspark/context.py", line 105, in __init__
conf, jsc)
File "/home/nlp/spark/python/pyspark/context.py", line 153, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "/home/nlp/spark/python/pyspark/context.py", line 201, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "/home/nlp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__
File "/home/nlp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)

If you're running this example on Java 8, this may be due to Java 8's excessive memory allocation strategy: https://issues.apache.org/jira/browse/YARN-4714
You can force YARN to ignore this by setting up the following properties in yarn-site.xml
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

Try with deploy mode parameter, like this:
--deploy-mode cluster
I had problem like your, with this parameter it worked.

I experienced a similar problem using spark-submit and yarn-client (I got the same NPE/stacktrace). Tuning down my memory settings did the trick. It seems to fail like this when you try to allot too much memory. I would start by removing the --executor-memory and --driver-memory switches.

I reduced the number of cores in the Advanced spark-env to make it work.

I ran into this issue running (hdp 2.3 spark 1.3.1)
spark-shell
--master yarn-client
--driver-memory 4g
--executor-memory 4g
--executor-cores 1
--num-executors 4
Solution for me was to set the spark config value:
spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.0.0-2557

Related

How to solve "no org.apache.spark.deploy.worker.Worker to stop" issue?

I a using a spark standalone in Google Cloud, composed of 1 master and 4 worker nodes. When I start the cluster. I can see the master and worker running. But when I try to stop-all, I get the following issue. Maybe this the reason I cannot run spark-submit. How to solve this issue. The following are the terminal screen.
sparkuser#master:/opt/spark/logs$ jps
1867 Jps
sparkuser#master:/opt/spark/logs$ start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.master.Master-1-master.out
worker4: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.worker.Worker-1-worker4.out
worker1: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.worker.Worker-1-worker1.out
worker2: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.worker.Worker-1-worker2.out
worker3: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.worker.Worker-1-worker3.out
sparkuser#master:/opt/spark/logs$ jps -lm
1946 sun.tools.jps.Jps -lm
1886 org.apache.spark.deploy.master.Master --host master --port 7077 --webui-port 8080
sparkuser#master:/opt/spark/logs$ cat spark-sparkuser-org.apache.spark.deploy.master.Master-1-master.out
Spark Command: /usr/lib/jvm/jdk1.8.0_202/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host master --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/10/13 04:28:23 INFO Master: Started daemon with process name: 1886#master
22/10/13 04:28:23 INFO SignalUtils: Registering signal handler for TERM
22/10/13 04:28:23 INFO SignalUtils: Registering signal handler for HUP
22/10/13 04:28:23 INFO SignalUtils: Registering signal handler for INT
22/10/13 04:28:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/10/13 04:28:24 INFO SecurityManager: Changing view acls to: sparkuser
22/10/13 04:28:24 INFO SecurityManager: Changing modify acls to: sparkuser
22/10/13 04:28:24 INFO SecurityManager: Changing view acls groups to:
22/10/13 04:28:24 INFO SecurityManager: Changing modify acls groups to:
22/10/13 04:28:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sparkuser); groups with view permissions: Set(); users with modify permissions: Set(sparkuser); groups with modify permissions: Set()
22/10/13 04:28:24 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
22/10/13 04:28:24 INFO Master: Starting Spark master at spark://master:7077
22/10/13 04:28:24 INFO Master: Running Spark version 3.2.2
22/10/13 04:28:25 INFO Utils: Successfully started service 'MasterUI' on port 8080.
22/10/13 04:28:25 INFO MasterWebUI: Bound MasterWebUI to 127.0.0.1, and started at http://localhost:8080
22/10/13 04:28:25 INFO Master: I have been elected leader! New state: ALIVE
sparkuser#master:/opt/spark/logs$ stop-all.sh
worker2: no org.apache.spark.deploy.worker.Worker to stop
worker4: no org.apache.spark.deploy.worker.Worker to stop
worker1: no org.apache.spark.deploy.worker.Worker to stop
worker3: no org.apache.spark.deploy.worker.Worker to stop
stopping org.apache.spark.deploy.master.Master
sparkuser#master:/opt/spark/logs$

Error while running spark client mode on mesos using docker

we have a 3 node Mesos cluster. Master service was started on machine 1 using below command:
sudo ./bin/mesos-master.sh --ip=machine1-ip --work_dir=/home/mapr/mesos/mesos-1.7.0/build/workDir --zk=zk://machine1-ip:2181/mesos --quorum=1
and agent services on other 2 machines using below command:
sudo ./bin/mesos-agent.sh --containerizers=docker --master=zk://machine1-ip:2181/mesos --work_dir=/home/mapr/mesos/mesos-1.7.0/build/workDir --ip=machine2-ip --no-systemd_enable_support
sudo ./bin/mesos-agent.sh --containerizers=docker --master=zk://machine1-ip:2181/mesos --work_dir=/home/mapr/mesos/mesos-1.7.0/build/workDir --ip=machine3-ip --no-systemd_enable_support
Below property was set in machine1:
export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
we are trying to run spark job using docker image.
Note that we did not set "SPARK_EXECUTOR_URI" in machine1 because as per our understanding executor is going to run inside docker container and not on slave machine and hence this property is not required.
command used for spark submit is below(from machine 1):
/home/mapr/newSpark/spark-2.4.0-bin-hadoop2.7/bin/spark-submit \
--master mesos://machine1:5050 \
--deploy-mode client \
--class com.learning.spark.WordCount \
--conf spark.mesos.executor.docker.image=mesosphere/spark:2.4.0-2.2.1-3-hadoop-2.7 \
/home/mapr/mesos/wordcount.jar hdfs://machine2:8020/hdfslocation/input.txt hdfs://machine2:8020/hdfslocation/output
we are getting below error on spark submit:
Mesos task log:
I1211 20:27:55.040856 5996 exec.cpp:162] Version: 1.7.0
I1211 20:27:55.064775 6016 exec.cpp:236] Executor registered on agent 44c2e848-cd06-4546-b0e9-15537084df1b-S1
I1211 20:27:55.068828 6018 executor.cpp:130] Registered docker executor on company-i0058.company.co.in
I1211 20:27:55.069756 6016 executor.cpp:186] Starting task 3
/bin/sh: 1: /home/mapr/newSpark/spark-2.4.0-bin-hadoop2.7/./bin/spark-class: not found
I1211 20:27:57.669881 6017 executor.cpp:736] Container exited with status 127
I1211 20:27:58.672829 6019 process.cpp:926] Stopped the socket accept loop
messages on the terminal:
2018-12-11 20:27:49 INFO SparkContext:54 - Running Spark version 2.4.0
2018-12-11 20:27:49 INFO SparkContext:54 - Submitted application: WordCount
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing view acls to: mapr
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing modify acls to: mapr
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing view acls groups to:
2018-12-11 20:27:49 INFO SecurityManager:54 - Changing modify acls groups to:
2018-12-11 20:27:49 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mapr); groups with view permissions: Set(); users with modify permissions: Set(mapr); groups with modify permissions: Set()
2018-12-11 20:27:49 INFO Utils:54 - Successfully started service 'sparkDriver' on port 48069.
2018-12-11 20:27:49 INFO SparkEnv:54 - Registering MapOutputTracker
2018-12-11 20:27:49 INFO SparkEnv:54 - Registering BlockManagerMaster
2018-12-11 20:27:49 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-12-11 20:27:49 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-12-11 20:27:49 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-3a4afff7-b050-45ba-bb50-c9f4ec5cc031
2018-12-11 20:27:49 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-12-11 20:27:49 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2018-12-11 20:27:49 INFO log:192 - Logging initialized #3157ms
2018-12-11 20:27:50 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2018-12-11 20:27:50 INFO Server:419 - Started #3273ms
2018-12-11 20:27:50 INFO AbstractConnector:278 - Started ServerConnector#1cfd1875{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-12-11 20:27:50 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6f0628de{/jobs,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#2b27cc70{/jobs/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6f6a7463{/jobs/job,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#79f227a9{/jobs/job/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6ca320ab{/stages,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#50d68830{/stages/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#1e53135d{/stages/stage,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6754ef00{/stages/stage/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#619bd14c{/stages/pool,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#323e8306{/stages/pool/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#a23a01d{/storage,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4acf72b6{/storage/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#7561db12{/storage/rdd,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3301500b{/storage/rdd/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#24b52d3e{/environment,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#15deb1dc{/environment/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#6e9c413e{/executors,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#57a4d5ee{/executors/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#5af5def9{/executors/threadDump,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#3a45c42a{/executors/threadDump/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#36dce7ed{/static,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#4b770e40{/,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#78e16155{/api,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#19868320{/jobs/job/kill,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#50b0bc4c{/stages/stage/kill,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://machine1:4040
2018-12-11 20:27:50 INFO SparkContext:54 - Added JAR file:/home/mapr/mesos/wordcount.jar at spark://machine1:48069/jars/wordcount.jar with timestamp 1544540270193
I1211 20:27:50.557170 7462 sched.cpp:232] Version: 1.7.0
I1211 20:27:50.560644 7454 sched.cpp:336] New master detected at master#machine1:5050
I1211 20:27:50.561132 7454 sched.cpp:356] No credentials provided. Attempting to register without authentication
I1211 20:27:50.571651 7456 sched.cpp:744] Framework registered with 5260e4c8-de1c-4772-b5a7-340480594ef4-0000
2018-12-11 20:27:50 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56351.
2018-12-11 20:27:50 INFO NettyBlockTransferService:54 - Server created on machine1:56351
2018-12-11 20:27:50 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-12-11 20:27:50 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, impetus-i0053.impetus.co.in, 56351, None)
2018-12-11 20:27:50 INFO BlockManagerMasterEndpoint:54 - Registering block manager machine1:56351 with 366.3 MB RAM, BlockManagerId(driver, impetus-i0053.impetus.co.in, 56351, None)
2018-12-11 20:27:50 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, machine1, 56351, None)
2018-12-11 20:27:50 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, machine1, 56351, None)
2018-12-11 20:27:50 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler#73ba6fe6{/metrics/json,null,AVAILABLE,#Spark}
2018-12-11 20:27:50 INFO MesosCoarseGrainedSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
2018-12-11 20:27:51 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 0 is now TASK_STARTING
2018-12-11 20:27:51 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 1 is now TASK_STARTING
2018-12-11 20:27:51 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 288.1 KB, free 366.0 MB)
2018-12-11 20:27:51 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 25.1 KB, free 366.0 MB)
2018-12-11 20:27:51 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on machine1:56351 (size: 25.1 KB, free: 366.3 MB)
2018-12-11 20:27:51 INFO SparkContext:54 - Created broadcast 0 from textFile at WordCount.scala:22
2018-12-11 20:27:52 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-12-11 20:27:52 INFO FileInputFormat:249 - Total input paths to process : 1
2018-12-11 20:27:53 INFO deprecation:1173 - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2018-12-11 20:27:53 INFO HadoopMapRedCommitProtocol:54 - Using output committer class org.apache.hadoop.mapred.FileOutputCommitter
2018-12-11 20:27:53 INFO FileOutputCommitter:108 - File Output Committer Algorithm version is 1
2018-12-11 20:27:53 INFO SparkContext:54 - Starting job: runJob at SparkHadoopWriter.scala:78
2018-12-11 20:27:53 INFO DAGScheduler:54 - Registering RDD 3 (map at WordCount.scala:24)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Got job 0 (runJob at SparkHadoopWriter.scala:78) with 2 output partitions
2018-12-11 20:27:53 INFO DAGScheduler:54 - Final stage: ResultStage 1 (runJob at SparkHadoopWriter.scala:78)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Parents of final stage: List(ShuffleMapStage 0)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Missing parents: List(ShuffleMapStage 0)
2018-12-11 20:27:53 INFO DAGScheduler:54 - Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24), which has no missing parents
2018-12-11 20:27:53 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 1 is now TASK_RUNNING
2018-12-11 20:27:53 INFO MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 5.0 KB, free 366.0 MB)
2018-12-11 20:27:53 INFO MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.9 KB, free 366.0 MB)
2018-12-11 20:27:53 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on machine1:56351 (size: 2.9 KB, free: 366.3 MB)
2018-12-11 20:27:53 INFO SparkContext:54 - Created broadcast 1 from broadcast at DAGScheduler.scala:1161
2018-12-11 20:27:53 INFO DAGScheduler:54 - Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24) (first 15 tasks are for partitions Vector(0, 1))
2018-12-11 20:27:53 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks
2018-12-11 20:27:53 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 0 is now TASK_RUNNING
2018-12-11 20:27:54 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 0 is now TASK_FAILED
2018-12-11 20:27:54 INFO BlockManagerMaster:54 - Removal of executor 0 requested
2018-12-11 20:27:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 0
2018-12-11 20:27:54 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 0 from BlockManagerMaster.
2018-12-11 20:27:54 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 1 is now TASK_FAILED
2018-12-11 20:27:54 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 1 from BlockManagerMaster.
2018-12-11 20:27:54 INFO BlockManagerMaster:54 - Removal of executor 1 requested
2018-12-11 20:27:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 1
2018-12-11 20:27:54 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 2 is now TASK_STARTING
2018-12-11 20:27:55 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 3 is now TASK_STARTING
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 2 is now TASK_RUNNING
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 2 is now TASK_FAILED
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Blacklisting Mesos slave b92da3e9-a9c4-422a-babe-c5fb0f33e027-S0 due to too many failures; is Spark installed on it?
2018-12-11 20:27:57 INFO BlockManagerMaster:54 - Removal of executor 2 requested
2018-12-11 20:27:57 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 2
2018-12-11 20:27:57 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 2 from BlockManagerMaster.
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 3 is now TASK_RUNNING
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Mesos task 3 is now TASK_FAILED
2018-12-11 20:27:57 INFO MesosCoarseGrainedSchedulerBackend:54 - Blacklisting Mesos slave 44c2e848-cd06-4546-b0e9-15537084df1b-S1 due to too many failures; is Spark installed on it?
2018-12-11 20:27:57 INFO BlockManagerMaster:54 - Removal of executor 3 requested
2018-12-11 20:27:57 INFO BlockManagerMasterEndpoint:54 - Trying to remove executor 3 from BlockManagerMaster.
2018-12-11 20:27:57 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asked to remove non-existent executor 3
2018-12-11 20:28:08 WARN TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Apache Spark: worker can't connect to master but can ping and ssh from worker to master

I'm trying to setup an 8-node cluster on 8 RHEL 7.3 x86 machines using Spark 2.0.1. start-master.sh goes through fine:
Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.102-4.b14.el7.x86_64/jre/bin/java -cp /usr/local/bin/spark-2.0.1-bin-hadoop2.7/conf/:/usr/local/bin/spark-2.0.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host lambda.foo.net --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/12/08 04:26:46 INFO Master: Started daemon with process name: 22181#lambda.foo.net
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for TERM
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for HUP
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for INT
16/12/08 04:26:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/08 04:26:46 INFO SecurityManager: Changing view acls to: root
16/12/08 04:26:46 INFO SecurityManager: Changing modify acls to: root
16/12/08 04:26:46 INFO SecurityManager: Changing view acls groups to:
16/12/08 04:26:46 INFO SecurityManager: Changing modify acls groups to:
16/12/08 04:26:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
16/12/08 04:26:46 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
16/12/08 04:26:46 INFO Master: Starting Spark master at spark://lambda.foo.net:7077
16/12/08 04:26:46 INFO Master: Running Spark version 2.0.1
16/12/08 04:26:46 INFO Utils: Successfully started service 'MasterUI' on port 8080.
16/12/08 04:26:46 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://19.341.11.212:8080
16/12/08 04:26:46 INFO Utils: Successfully started service on port 6066.
16/12/08 04:26:46 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066
16/12/08 04:26:46 INFO Master: I have been elected leader! New state: ALIVE
But when I try to bring up the workers, using start-slaves.sh, what I see in the log of the workers is:
Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.102-4.b14.el7.x86_64/jre/bin/java -cp /usr/local/bin/spark-2.0.1-bin-hadoop2.7/conf/:/usr/local/bin/spark-2.0.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://lambda.foo.net:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/12/08 04:30:00 INFO Worker: Started daemon with process name: 14649#hawk040os4.foo.net
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for TERM
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for HUP
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for INT
16/12/08 04:30:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/08 04:30:00 INFO SecurityManager: Changing view acls to: root
16/12/08 04:30:00 INFO SecurityManager: Changing modify acls to: root
16/12/08 04:30:00 INFO SecurityManager: Changing view acls groups to:
16/12/08 04:30:00 INFO SecurityManager: Changing modify acls groups to:
16/12/08 04:30:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
16/12/08 04:30:00 INFO Utils: Successfully started service 'sparkWorker' on port 35858.
16/12/08 04:30:00 INFO Worker: Starting Spark worker 15.242.22.179:35858 with 24 cores, 1510.2 GB RAM
16/12/08 04:30:00 INFO Worker: Running Spark version 2.0.1
16/12/08 04:30:00 INFO Worker: Spark home: /usr/local/bin/spark-2.0.1-bin-hadoop2.7
16/12/08 04:30:00 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
16/12/08 04:30:00 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://15.242.22.179:8081
16/12/08 04:30:00 INFO Worker: Connecting to master lambda.foo.net:7077...
16/12/08 04:30:00 WARN Worker: Failed to connect to master lambda.foo.net:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to connect to lambda.foo.net/19.341.11.212:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
... 4 more
Caused by: java.net.NoRouteToHostException: No route to host: lambda.foo.net/19.341.11.212:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/12/08 04:30:12 INFO Worker: Retrying connection to master (attempt # 1)
16/12/08 04:30:12 INFO Worker: Connecting to master lambda.foo.net:7077...
16/12/08 04:30:12 WARN Worker: Failed to connect to master lambda.foo.net:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
So it says "No route to host". But I could successfully ping the master from the worker node, as well as ssh from the worker to the master node.
Why does spark say "No route to host"?
Problem solved: the firewall was blocking the packets.

Spark UI is inaccessible from host for sequenceiq/spark:1.6.0

I have used given command. I am able to run a spark PI and stand-alone programs, but when I tried to spark UI its not accessible from host. Any help ?
docker pull sequenceiq/spark:1.6.0
docker build --rm -t sequenceiq/spark:1.6.0 .
docker run -it -p 8088:8088 -p 8042:8042 -p 4040:4040 -h sandbox sequenceiq/spark:1.6.0 bash
docker run -i -t -v /scratch:/mnt -e 8088 -e 8042 -e 4040 sequenceiq/spark:1.6.0 /bin/bash
docker run -d -h sandbox sequenceiq/spark:1.6.0 -d
#run the spark shell
spark-shell \
--master yarn-client \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1
#execute the the following command which should return 1000
scala> sc.parallelize(1 to 1000).count()
Spark shell snippet :
16/12/08 22:42:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#172.17.0.5:43371]
16/12/08 22:42:27 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 43371.
16/12/08 22:42:27 INFO spark.SparkEnv: Registering MapOutputTracker
16/12/08 22:42:27 INFO spark.SparkEnv: Registering BlockManagerMaster
16/12/08 22:42:27 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-61bf4e4d-e0f9-48a9-924f-8971c3d916e7
16/12/08 22:42:27 INFO storage.MemoryStore: MemoryStore started with capacity 511.5 MB
16/12/08 22:42:28 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/12/08 22:42:28 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/12/08 22:42:28 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
16/12/08 22:42:28 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/12/08 22:42:28 INFO ui.SparkUI: Started SparkUI at http://172.17.0.5:4040
16/12/08 22:42:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Try setting SPARK_LOCAL_IP to host's ip and then try to access
Got the Answer, I was running a spark shell with master as Yarn . so this job was visible under resource manager UI.
If I run a spark shell with local as master Spark UI works fine.
Please make sure what is your master and choose your UI

Apache Spark job reading from Cassandra table stalls on launch (spark-1.3.1)

We've been having intermittent issues with Spark 1.3.1 and the datastax Cassandra connector causing jobs to stall indefinitely when they are launched.
EDIT: I also tried the same approach with Spark 1.2.1 and the packaged 1.2.1 spark-cassandra-connector_2.10 and it resulted in the same symptoms.
We are using the following dependency:
var sparkCas = "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.3.0-SNAPSHOT"
Our job code:
object ConnTransform {
private val AppName = "ConnTransformCassandra"
def main(args: Array[String]) {
val start = new DateTime(2015, 5, 27, 1, 0, 0)
val master = if (args.length >= 1) args(0) else "local[*]"
// Create the spark context.
val sc = {
val conf = new SparkConf()
.setAppName(AppName)
.setMaster(master)
.set("spark.cassandra.connection.host", "10.10.101.202,10.10.102.139,10.10.103.74")
new SparkContext(conf)
}
sc.cassandraTable("alpha_dev", "conn")
.select("data")
.where("timep = ?", start)
.where("sensorid IN ?", Utils.sensors)
.map(Utils.deserializeRow)
.saveAsTextFile("output/raw_data")
}
}
As you can see, the code is pretty simple (and it was more complex, but we've been attempting to narrow down the root cause of this issue).
Now, this job worked earlier today - data was successfully put into the directory specified. However, now when it is run we see the job start, get to the point just before it starts processing blocks, and sit there indefinitely.
Output from the job below shows the log messages seen so far, and at the time of writing the job has been stalled for almost an hour. If we set the logging level to DEBUG the only thing you see after that point in the job are heartbeat pings between akka workers.
ubuntu#ip-10-10-102-53:~/projects/icespark$ /home/ubuntu/spark/spark-1.3.1/bin/spark-submit --class com.splee.spark.ConnTransform splee-analytics-assembly-0.1.0.jar
15/05/27 21:15:21 INFO SparkContext: Running Spark version 1.3.1
15/05/27 21:15:21 INFO SecurityManager: Changing view acls to: ubuntu
15/05/27 21:15:21 INFO SecurityManager: Changing modify acls to: ubuntu
15/05/27 21:15:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
15/05/27 21:15:22 INFO Slf4jLogger: Slf4jLogger started
15/05/27 21:15:22 INFO Remoting: Starting remoting
15/05/27 21:15:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#ip-10-10-102-53.us-west-2.compute.internal:51977]
15/05/27 21:15:22 INFO Utils: Successfully started service 'sparkDriver' on port 51977.
15/05/27 21:15:22 INFO SparkEnv: Registering MapOutputTracker
15/05/27 21:15:22 INFO SparkEnv: Registering BlockManagerMaster
15/05/27 21:15:22 INFO DiskBlockManager: Created local directory at /tmp/spark-2466ff66-bb50-4d52-9d34-1801d69889b9/blockmgr-60e75214-1ba6-410c-a564-361263636e5c
15/05/27 21:15:22 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/05/27 21:15:22 INFO HttpFileServer: HTTP File server directory is /tmp/spark-72f1e849-c298-49ee-936c-e94c462f3df2/httpd-f81c2326-e5f1-4f33-9557-074f2789c4ee
15/05/27 21:15:22 INFO HttpServer: Starting HTTP Server
15/05/27 21:15:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/27 21:15:22 INFO AbstractConnector: Started SocketConnector#0.0.0.0:55357
15/05/27 21:15:22 INFO Utils: Successfully started service 'HTTP file server' on port 55357.
15/05/27 21:15:22 INFO SparkEnv: Registering OutputCommitCoordinator
15/05/27 21:15:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/27 21:15:22 INFO AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
15/05/27 21:15:22 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/05/27 21:15:22 INFO SparkUI: Started SparkUI at http://ip-10-10-102-53.us-west-2.compute.internal:4040
15/05/27 21:15:22 INFO SparkContext: Added JAR file:/home/ubuntu/projects/icespark/splee-analytics-assembly-0.1.0.jar at http://10.10.102.53:55357/jars/splee-analytics-assembly-0.1.0.jar with timestamp 1432761322942
15/05/27 21:15:23 INFO Executor: Starting executor ID <driver> on host localhost
15/05/27 21:15:23 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver#ip-10-10-102-53.us-west-2.compute.internal:51977/user/HeartbeatReceiver
15/05/27 21:15:23 INFO NettyBlockTransferService: Server created on 58479
15/05/27 21:15:23 INFO BlockManagerMaster: Trying to register BlockManager
15/05/27 21:15:23 INFO BlockManagerMasterActor: Registering block manager localhost:58479 with 265.1 MB RAM, BlockManagerId(<driver>, localhost, 58479)
15/05/27 21:15:23 INFO BlockManagerMaster: Registered BlockManager
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.28:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.28 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.60:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.60 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.154:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.154 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.145:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.145 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.78:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.78 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.200:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.200 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.73:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.73 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.205:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.205 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.205:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.205 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.74:9042 added
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.202:9042 added
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.139:9042 added
15/05/27 21:15:24 INFO CassandraConnector: Connected to Cassandra cluster: Splee Dev
15/05/27 21:15:25 INFO CassandraConnector: Disconnected from Cassandra cluster: Splee Dev
If anyone has any ideas what could be causing this job (which previously produced results) to stall in this way and can shed some light on the situation it would be much appreciated.

Resources