How do you kill a Spark job from the CLI? - apache-spark

Killing Spark job using command Prompt
This is the thread that I hoped would answer my question. But all four answers explain how to kill the entire application.
How can I stop a job? Like a count for example?
I can do it from the Spark Web UI by clicking "kill" on the respective job. I suppose it must be possible to list running jobs and interact with them also directly via CLI.
Practically speaking I am working in a Notebook with PySpark on a Glue endpoint. If I kill the application the entire endpoint dies and I have to spin up a new cluster. I just want to stop a job. Cancelling it within the Notebook will just detach synchronization and the job keeps running, blocking any further commands from being executed.

Spark History Server provides REST API interface. Unfortunately, it only exposes monitoring capabilities for applications, jobs, stages, etc.
There is also a REST Submission interface that provides capabilities to submit, kill and check up on status of the applications. It is undocumented AFAIK, and is only supported on Spark standalone and Mesos clusters, no YARN. (Thats why there is no "kill" link in Jobs UI screen for Spark on YARN, I guess.)
So you can try using that "hidden" API, but if you know your application's Spark UI URL and job id of a job you want to kill, the easier way is something like:
$ curl -G http://<Spark-Application-UI-host:port>/jobs/job/kill/?id=<job_id>
Since I don't work with Glue I'd be interested to find out myself how its going to react, because the kill normally results in org.apache.spark.SparkException: Job <job_id> cancelled.

building on the answer by mazaneicha it appears that, for Spark 2.4.6 in standalone mode for jobs submitted in client mode, the curl request to kill an app with a known applicationID is
curl -d "id=<your_appID>&terminate=true" -X POST <your_spark_master_url>/app/kill/
We had a similar problem with people not disconnecting their notebooks from the cluster and hence hogging resources.
We get the list of running applications by parsing the webUI. I'm pretty sure there's less painful ways to manage a Spark cluster..

list the job in linux and kill it.
I would do
ps -ef |grep spark-submit
if it was started using spark-submit. Get the PID from the output and then
kill -9

Kill running job by:
open Spark application UI.
Go to jobs tab.
Find job among running jobs.
Click kill link and confirm.

Related

Unable to kill Running Queries in Spark UI and Write to sql server never happens

I have a simple code which will read entire Hive table and load it to SQL Server in Azure Databricks.
df = spark.sql("select * from Mechanics")
df.write.format("com.microsoft.sqlserver.jdbc.spark")\
.mode("overwrite").option("url",sql_connection_properties)\
.option("dbtable",glb.mechanics_invoices)\
.option("user",username).option("password",pwd)\
.option("bulkCopyBatchSize",10000).save()
On executing this cell, the command keeps on running for minutes. I am unable to kill. Because in SparkUI neither Job or Stage gets created. I can only see Running Queries(1). No option to kill.
If I try the same for another table or glb.mechanics_invoices_temp it
runs successfully.
Tried to check locks in SQL Server but I dont have privileges to check.
What is the mistake I am doing here?
Hive table contains 2700490 rows.
How to kill such processes either by pyspark code or from UI
Any suggestions to above code Performant,Fail safe especially by using the options like bulkCopyTableLock, bulkCopyTimeout, partitions etc.
Any resources to understand above situation better.
Appreciate your feedback and suggestions.
Thank You
How to kill such processes either by pyspark code or from UI
you can kill the Spark application from the Web UI, by accessing the application master page of spark job.
Opening Spark cluster UI- master.
Expand Running Application.
Find a job you wanted to kill.
Select kill to stop the Job.

How to shutdown a spark application without knowing driver Id via spark-submit

We are using DSE Analytics.
I am trying to schedule a spark job using crontab ,
via spark-submit. Basically this job should run every night ,
When the job is about to be submitted for subsequent times , the existing application should be killed , I am having trouble finding a way to do it.
Because I am unable to find the application Id of the submitted job or the driver Id so I can shutdown gracefully.
I understand that the Spark Master Web UI can be used to find the Submission Id , but if I am going to setup a cron for this , I can't get the Id from the UI .
Is there a proper way to do this.
We are running DSE 6.7 with Analytics running in a dedicated DC.
Any help would be appreciated
Because you're running it this way, then the driver is deployed in the client mode, meaning that it's executing on your local machine, so you can kill it with just kill command. You can find the process ID with something like this
ps -aef|grep com.spark.Test|grep -v grep|awk '{print $2}'

Kill Spark Job or terminate EMR Cluster if job takes longer than expected

I have a spark job that periodically hangs, leaving my AWS EMR cluster in a state where an application is RUNNING but really the cluster is stuck. I know that if my job doesn't get stuck, it'll finish in 5 hours or less. If it's still running after that, it's a sign that the job is stuck. Yarn and the Spark UI is still responsive, the it's just that an executor gets stuck on a task.
Background: I'm using an ephemeral EMR cluster that performs only one step before terminating, so it's not a problem to kill it off if I notice this job is hanging.
What's the easiest way to kill the task, job, or cluster in this case? Ideally this would not involve setting up some extra service to monitor the job -- ideally there would be some kind of spark / yarn / emr setting I could use.
Note: I've tried using spark speculation to unblock the stuck spark job, but that doesn't help.
EMR has a Bootstrap Actions feature where you can run scripts that start up when initializing the cluster. I've used this feature along with a startup script that monitors how long the cluster has been online and terminates itself after a certain time.
I use a script based off this one for the bootstrap action. https://github.com/thomhopmans/themarketingtechnologist/blob/master/6_deploy_spark_cluster_on_aws/files/terminate_idle_cluster.sh
Basically make a script that checks /proc/uptime to see how long the EC2 machine has been online and after uptime surpasses your time limit you can send a shutdown command to the cluster.

From where can see how many spark job running in server?

I submitted spark job in linux server and can see in console and determine it is running or not.
But in case of production multiple spark job submiting and running on server,
So at that time from where I can see how many spark job running?
You can get list of running applications from command line (assuming that you are using yarn)
yarn application --list
more about yarn command line operations
Every SparkContext launches a web UI, by default on port 4040 on the host you submit your application. For more application monitoring details check this link

Spark History Server .... list of running jobs

I am using Cloudera 5.4.1 with Spark 1.3.0. When I go to spark history server, I can see list of completed jobs and list of incomplete jobs.
However many jobs listed as incomplete are the ones which were killed.
So how does one see list of "running" jobs. Not the ones which were killed.
also how does one kill a running spark job by taking the application id from the history server?
Following is from Cloudera documentation:
To access the web application UI of a running Spark application, open http://spark_driver_host:4040 in a web browser. If multiple applications are running on the same host, the web application binds to successive ports beginning with 4040 (4041, 4042, and so on). The web application is available only for the duration of the application.
For 5.4x
For 5.9x
Answer for your second question:
You can use yarn CLI to kill the Spark application.
Ex: yarn application -kill <application ID>

Resources