Spark submitted application not shown in YARN web ui - apache-spark

I have node where I have installed spark in yarn mode. When I run an application with
sudo ./usr/bin/spark-submit --master yarn --deploy-mode client MySparkCode.py
it runs fine.
When I connect in spark history server at http://localhost:18089/ I can see my submitted application.
But When I go at Yarn recourse manager web ui at http://localhost:8088/cluster/apps my application is not showing at all. Did I do something wrong? Shouldn't my application be shown there?

sudo ./usr/bin/spark-submit --master yarn --deploy-mode cluster MySparkCode.py
Try mode Cluster

Related

What is the command to call Spark2 from Shell

I have two services for Spark in my cluster, one is with name of Spark(1.6 version) and another one is Spark2(2.0 Version). I am able to call Spark with below command.
spark-shell --master yarn
But not able to connect Spark2 service even after set "export SPARK_MAJOR_VERSION=2"
Can some one help me on.
I'm using CDH cluster and following command works for me.
spark2-shell --queue <queue-name-if-any> --deploy-mode client
If I remember, SPARK_MAJOR_VERSION only works with spark-submit
You would need to find the spark2 installation directory to use the other spark-shell
Sounds like you are in an HDP cluster, so look under /usr/hdp

How does a MasterNode fit into a Spark cluster?

I'm getting a little confused with how to setup my Spark configuration for workloads using YARN as the resource manager. I've got a small cluster spun up right now with 1 master node and 2 core nodes.
Do I include the master node when calculating the number of executors or no?
Do I leave out 1 core for every node to account for Yarn management?
Am I supposed to designate the master node for anything in particular in Spark configurations?
Master node shouldn't be taken into account to calculate number of executors
Each node is actually EC2 instance with operating system so you have to leave 1 or more cores for system tasks and yarn agents
Master node can be used to run spark driver. For this start EMR cluster in client mode from master node by adding arguments --master yarn --deploy-mode client to spark-submit command. Keep in mind following:
Cluster mode allows you to submit work using S3 URIs. Client mode requires that you put the application in the local file system on the cluster master node
To do all preparation work (copy libs, scripts etc to a master node) you can setup a separate step and then run spark-submit --master yarn --deploy-mode client command as next step.

Possible to get output from Spark App submitted in cluster mode?

Is it possible to get output from my Spark App submitted in cluster mode? If so, how?
I'm running a simple Spark application using Python. The program just sets up a Spark Context and prints This app ran successfully to the screen. When I submit this app with the following:
spark-submit --deploy-mode client --master local[*] foo.py
it runs successfully and prints out the message.
However, when I run the same app with:
spark-submit --deploy-mode cluster --master yarn-cluster foo.py
it runs successfully, but I get no output.
While I've been using Spark for a few months now, I'm relatively new to submitting apps in cluster mode, so any help/documentation would be great!
You can save This app ran successfully to external storage system such as:
sc.parallelize(['This app ran successfully'], 1).saveAsTextFile(path='hdfs:///somewhere/you/want')

Spark Mesos Cluster Mode using Dispatcher

I have only a single machine and want to run spark jobs with mesos cluster mode. It might make more sense to run with a cluster of nodes, but I mainly want to test out mesos first to check if it's able to utilize resources more efficiently (run multiple spark jobs at the same time without static partitioning). I have tried a number of ways but without success. Here is what I did:
Build mesos and run both mesos master and slaves (2 slaves in same machines).
sudo ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos
sudo ./bin/mesos-slave.sh --master=127.0.0.1:5050 --port=5051 --work_dir=/tmp/mesos1
sudo ./bin/mesos-slave.sh --master=127.0.0.1:5050 --port=5052 --work_dir=/tmp/mesos2
Run the spark-mesos-dispatcher
sudo ./sbin/start-mesos-dispatcher.sh --master mesos://localhost:5050
The submit the app with dispatcher as master url.
spark-submit --master mesos://localhost:7077 <other-config> <jar file>
But it doesnt work:
E0925 17:30:30.158846 807608320 socket.hpp:174] Shutdown failed on fd=61: Socket is not connected [57]
E0925 17:30:30.159545 807608320 socket.hpp:174] Shutdown failed on fd=62: Socket is not connected [57]
If I use spark-submit --deploy-mode cluster, then I got another error message:
Exception in thread "main" org.apache.spark.deploy.rest.SubmitRestConnectionException: Unable to connect to server
It work perfectly if I don't use dispatcher but using mesos master url directly: --master mesos://localhost:5050 (client mode). According to the documentation , cluster mode is not supported for Mesos clusters, but they give another instruction for cluster mode here. So it's kind of confusing? My question is:
How I can get it works?
Should I use client mode instead of cluster mode if I submit the app/jar directly from the master node?
If I have a single computer, should I spawn 1 or more mesos slave processes. Basically, I have a number of spark job and dont want to do static partitioning of resources. But when using mesos without static partitioning, it seems to be much slower?
Thanks.
There seem to be two things you're confusing: launching a Spark application in a cluster (as opposed to locally) and launching the driver into the cluster.
From the top of Submitting Applications:
The spark-submit script in Spark’s bin directory is used to launch applications on a cluster. It can use all of Spark’s supported cluster managers through a uniform interface so you don’t have to configure your application specially for each one.
So, Mesos is one of the supported cluster managers and hence you can run Spark apps on a Mesos cluster.
What Mesos as time of writing does not support is launching the driver into the cluster, this is what the command line argument --deploy-mode of ./bin/spark-submitspecifies. Since the default value of --deploy-mode is client you can just omit it, or if you want to explicitly specify it, then use:
./bin/spark-submit --deploy-mode client ...
I use your scenario to try, it could be work.
One thing different , I use ip address to instead of "localhost" and "127.0.0.1"
So just try again and to check http://your_dispatcher:8081 (on browser) if exist.
This is my spark-submit command:
$spark-submit --deploy-mode cluster --master mesos://192.168.11.79:7077 --class "SimpleApp" SimpleAppV2.jar
If success, you can see as below
{
"action" : "CreateSubmissionResponse",
"serverSparkVersion" : "1.5.0",
"submissionId" : "driver-20151006164749-0001",
"success" : true
}
When I got your error log as yours, I reboot the machine and retry your step. It also work.
Try using the 6066 port instead of 7077. The newer versions of Spark prefer the REST api for submitting jobs.
See https://issues.apache.org/jira/browse/SPARK-5388

Spark - How to run a standalone cluster locally

Is there the possibility to run the Spark standalone cluster locally on just one machine (which is basically different from just developing jobs locally (i.e., local[*]))?.
So far I am running 2 different VMs to build a cluster, what if I could run a standalone cluster on the very same machine, having for instance three different JVMs running?
Could something like having multiple loopback addresses do the trick?
yes you can do it, launch one master and one worker node and you are good to go
launch master
./sbin/start-master.sh
launch worker
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077 -c 1 -m 512M
run SparkPi example
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://localhost:7077 lib/spark-examples-1.2.1-hadoop2.4.0.jar
Apache Spark Standalone Mode Documentation
A small update as for the latest version (the 2.1.0), the default is to bind the master to the hostname, so when starting a worker locally use the output of hostname:
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://`hostname`:7077 -c 1 -m 512M
And to run an example, simply run the following command:
bin/run-example SparkPi
If you can't find the ./sbin/start-master.sh file on your machine, you can start the master also with
./bin/spark-class org.apache.spark.deploy.master.Master
More simply,
./sbin/start-all.sh
On your local machine there will be master and one worker launched.
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://localhost:7077 \
examples/jars/spark-examples_2.12-3.0.1.jar 10000
A sample application is submitted. For monitoring via Web UI:
Master UI: http://localhost:8080
Worker UI: http://localhost:8081
Application UI: http://localhost:4040

Resources