Spark History Server .... list of running jobs - apache-spark

I am using Cloudera 5.4.1 with Spark 1.3.0. When I go to spark history server, I can see list of completed jobs and list of incomplete jobs.
However many jobs listed as incomplete are the ones which were killed.
So how does one see list of "running" jobs. Not the ones which were killed.
also how does one kill a running spark job by taking the application id from the history server?

Following is from Cloudera documentation:
To access the web application UI of a running Spark application, open http://spark_driver_host:4040 in a web browser. If multiple applications are running on the same host, the web application binds to successive ports beginning with 4040 (4041, 4042, and so on). The web application is available only for the duration of the application.
For 5.4x
For 5.9x
Answer for your second question:
You can use yarn CLI to kill the Spark application.
Ex: yarn application -kill <application ID>

Related

How do you kill a Spark job from the CLI?

Killing Spark job using command Prompt
This is the thread that I hoped would answer my question. But all four answers explain how to kill the entire application.
How can I stop a job? Like a count for example?
I can do it from the Spark Web UI by clicking "kill" on the respective job. I suppose it must be possible to list running jobs and interact with them also directly via CLI.
Practically speaking I am working in a Notebook with PySpark on a Glue endpoint. If I kill the application the entire endpoint dies and I have to spin up a new cluster. I just want to stop a job. Cancelling it within the Notebook will just detach synchronization and the job keeps running, blocking any further commands from being executed.
Spark History Server provides REST API interface. Unfortunately, it only exposes monitoring capabilities for applications, jobs, stages, etc.
There is also a REST Submission interface that provides capabilities to submit, kill and check up on status of the applications. It is undocumented AFAIK, and is only supported on Spark standalone and Mesos clusters, no YARN. (Thats why there is no "kill" link in Jobs UI screen for Spark on YARN, I guess.)
So you can try using that "hidden" API, but if you know your application's Spark UI URL and job id of a job you want to kill, the easier way is something like:
$ curl -G http://<Spark-Application-UI-host:port>/jobs/job/kill/?id=<job_id>
Since I don't work with Glue I'd be interested to find out myself how its going to react, because the kill normally results in org.apache.spark.SparkException: Job <job_id> cancelled.
building on the answer by mazaneicha it appears that, for Spark 2.4.6 in standalone mode for jobs submitted in client mode, the curl request to kill an app with a known applicationID is
curl -d "id=<your_appID>&terminate=true" -X POST <your_spark_master_url>/app/kill/
We had a similar problem with people not disconnecting their notebooks from the cluster and hence hogging resources.
We get the list of running applications by parsing the webUI. I'm pretty sure there's less painful ways to manage a Spark cluster..
list the job in linux and kill it.
I would do
ps -ef |grep spark-submit
if it was started using spark-submit. Get the PID from the output and then
kill -9
Kill running job by:
open Spark application UI.
Go to jobs tab.
Find job among running jobs.
Click kill link and confirm.

How can I see the aggregated logs for a Spark standalone cluster

With Spark running over Yarn, I could simply use yarn -logs -applicationId appId to see the aggregated log, after a Spark job is finished. What is the equivalent method for a Spark standalone cluster?
Via the Web Interface:
Spark’s standalone mode offers a web-based user interface to monitor
the cluster. The master and each worker has its own web UI that shows
cluster and job statistics. By default you can access the web UI for
the master at port 8080. The port can be changed either in the
configuration file or via command-line options.
In addition, detailed log output for each job is also written to the
work directory of each slave node (SPARK_HOME/work by default). You
will see two files for each job, stdout and stderr, with all output it
wrote to its console.
Please find more information in Monitoring and Instrumentation.

From where can see how many spark job running in server?

I submitted spark job in linux server and can see in console and determine it is running or not.
But in case of production multiple spark job submiting and running on server,
So at that time from where I can see how many spark job running?
You can get list of running applications from command line (assuming that you are using yarn)
yarn application --list
more about yarn command line operations
Every SparkContext launches a web UI, by default on port 4040 on the host you submit your application. For more application monitoring details check this link

What is the difference between web UIs on 4040 and 8080?

There are two different web UIs (one is for standalone mode only). Can I use web UI on port 4040 when I am launching Spark in standalone mode? (example:spark-class.cmd org.apache.spark.deploy.master.Master- web ui 8080 is working, 4040 - not.) What is the main difference between these UIs?
Is it possible for me to launch Spark (without hadoop, hdfs, yarn etc), to keep it up and to submit my jars(classes) into it? I want to watch job statistics after it finishes. I am trying something like this:
Server: Spark\bin>spark-class.cmd org.apache.spark.deploy.master.Master
Worker: Spark\bin>spark-class.cmd org.apache.spark.deploy.worker.Worker spark://169.254.8.45:7077 --cores 4 --memory 512M
Submit: Spark\bin>spark-submit.cmd --class demo.TreesSample --master spark://169.254.8.45:7077 file:///E:/spark-demo/target/demo.jar
It runs. It gets new WebUI on port 4040 up for this task. I dont see anything in Master's ui on 8080.
Currently I'm using win7 x64, spark-1.5.2-bin-hadoop2.6. I can switch into linux if it matters.
You should be able to change the web UI port for standalone Master using spark.master.ui.port or SPARK_MASTER_WEBUI_PORT as described in Configuring Ports for Network Security / Standalone mode only.
Standalone Master's web UI is a management console of a cluster manager (that happens to be part of Apache Spark, but could've been a separate product as Hadoop YARN and Apache Mesos). Having said that, it can often be confusing what the two web UIs have in common, and the answer is nothing.
The Spark driver's web UI is to show the progress of your computations (jobs, stages, storage for RDD persistence, broadcasts, accumulators) while standalone Master's web UI is to let you know the current state of your "operating environment" (aka the Spark Standalone cluster).
I leave the other part of your question about History server to #Sumit's answer.
Yes, you can launch the Spark as a standalone server, without any Hadoop or HDFS. Also as soon as you submit your job to master, it will show your job either in in-"Running jobs" or "Jobs Completed" section.
You can also enable History Server for preserving the job Statistics and analyzing the same at a later time -
./sbin/start-history-server.sh
Refer Here for more details on enabling History server

How can I verify that DSE Spark Shell is distributing across the cluster

Is it possible to verify from within the Spark shell what nodes if the shell is connected to the cluster or is running just in local mode? I'm hoping to use that to investigate the following problem:
I've used DSE to setup a small 3 node Cassandra Analytics cluster. I can log onto any of the 3 servers and run dse spark and bring up the Spark shell. I have also verified that all 3 servers have the Spark master configured by running dsetool sparkmaster.
However, when I run any task using the Spark shell, it appears that the it is only running locally. I ran a small test command:
val rdd = sc.cassandraTable("test", "test_table")
rdd.count
When I check the Spark Master webpage, I see that only one server is running the job.
I suspect that when I run dse spark it's running the shell in local mode. I looked up how to specific a master for the Spark 0.9.1 shell and even when I use MASTER=<sparkmaster> dse spark (from the Programming Guide) it still runs only in local mode.
Here's a walkthrough once you've started a DSE 4.5.1 cluster with 3 nodes, all set for Analytics Spark mode.
Once the cluster is up and running, you can determine which node is the Spark Master with command dsetool sparkmaster. This command just prints the current master; it does not affect which node is the master and does not start/stop it.
Point a web browser to the Spark Master web UI at the given IP address and port 7080. You should see 3 workers in the ALIVE state, and no Running Applications. (You may have some DEAD workers or Completed Applications if previous Spark jobs had happened on this cluster.)
Now on one node bring up the Spark shell with dse spark. If you check the Spark Master web UI, you should see one Running Application named "Spark shell". It will probably show 1 core allocated (the default).
If you click on the application ID link ("app-2014...") you'll see the details for that app, including one executor (worker). Any commands you give the Spark shell will run on this worker.
The default configuration is limiting the Spark master to only allowing each application to use 1 core, therefore the work will only be given to a single node.
To change this, login to the Spark master node and sudo edit the file /etc/dse/spark/spark-env.sh. Find the line that sets SPARK_MASTER_OPTS and remove the portion -Dspark.deploy.defaultCores=1. Then restart DSE on this node (sudo service dse restart).
Once it comes up, check the Spark master web UI and repeat the test with the Spark shell. You should see that it's been allocated more cores, and any jobs it performs will happen on multiple nodes.
In a production environment you'd want to set the number of cores more carefully so that a single job doesn't take all the resources.

Resources