I have been kill yarn application using command yarn application -kill <app_id>.
I submitted a job which is currently under NEW_SAVING state and I want to kill it.
When i try yarn application -kill i get below message continuously
INFO impl.YarnClientImpl: Waiting for application application_XXXX_XXXX to be killed.
Any idea how can i kill it forcefully?
The output of the 'yarn application -list' contains the following information of yarn applications:
Application-Id
Application-Name
Application-Type
User
Queue
State
Final-State
Progress
Tracking-URL
You can list the applications and awk by the required parameter. For ex: to list the applications by 'Application-Name'
yarn application -list | awk '$6 == "NEW_SAVING" { print $1 }' > applications_list.txt
Then you can iterate through the file and kill the applications as below:
while read p; do
echo $p
yarn application -kill $p
done <applications_list.txt
Do you have access to YARN Cluster UI? You could kill the application from UI. Usually, that works better for me than yarn application -kill.
Related
I am adding airflow to a web application that manually adds a directory containing business logic to the PYTHON_PATH env var, as well as does additional system-level setup that I want to be consistent across all servers in my cluster. I've been successfully running celery for this application with RMQ as the broker and redis as the task results backend for awhile, and have prior experience running Airflow with LocalExecutor.
Instead of using Pukel's image, I have a an entry point for a base backend image that runs a different service based on the SERVICE env var. That looks like this:
if [ $SERVICE == "api" ]; then
# upgrade to the data model
flask db upgrade
# start the web application
python wsgi.py
fi
if [ $SERVICE == "worker" ]; then
celery -A tasks.celery.celery worker --loglevel=info --uid=nobody
fi
if [ $SERVICE == "scheduler" ]; then
celery -A tasks.celery.celery beat --loglevel=info
fi
if [ $SERVICE == "airflow" ]; then
airflow initdb
airflow scheduler
airflow webserver
I have an .env file that I build the containers with the defines my airflow parameters:
AIRFLOW_HOME=/home/backend/airflow
AIRFLOW__CORE__LOAD_EXAMPLES=False
AIRFLOW__CORE__EXECUTOR=CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN=mysql+pymysql://${MYSQL_USER}:${MYSQL_ROOT_PASSWORD}#${MYSQL_HOST}:${MYSQL_PORT}/airflow?charset=utf8mb4
AIRFLOW__CELERY__BROKER_URL=amqp://${RABBITMQ_DEFAULT_USER}:${RABBITMQ_DEFAULT_PASS}#${RABBITMQ_HOST}:5672
AIRFLOW__CELERY__RESULT_BACKEND=redis://${REDIS_HOST}
With how my entrypoint is setup currently, it doesn't make it to the webserver. Instead, it runs that scheduler in the foreground with invoking the web server. I can change this to
airflow initdb
airflow scheduler -D
airflow webserver
Now the webserver runs, but it isn't aware of the scheduler, which is now running as a daemon:
Airflow does, however, know that I'm using a CeleryExecutor and looks for the dags in the right place:
airflow | [2020-07-29 21:48:35,006] {default_celery.py:88} WARNING - You have configured a result_backend of redis://redis, it is highly recommended to use an alternative result_backend (i.e. a database).
airflow | [2020-07-29 21:48:35,010] {__init__.py:50} INFO - Using executor CeleryExecutor
airflow | [2020-07-29 21:48:35,010] {dagbag.py:396} INFO - Filling up the DagBag from /home/backend/airflow/dags
airflow | [2020-07-29 21:48:35,113] {default_celery.py:88} WARNING - You have configured a result_backend of redis://redis, it is highly recommended to use an alternative result_backend (i.e. a database).
I can solve this by going inside the container and manually firing up the scheduler:
The trick seems to be running both processes in the foreground within the container, but I'm stuck on how to do that inside the entrypoint. I've checked out Pukel's entrypoint code, but it's not obvious to me what he's doing. I'm sure that with just a slight tweak this will be off to the races... Thanks in advance for the help. Also, if there's any major anti-pattern that I'm at risk of running into here I'd love to get the feedback so that I can implement airflow properly. This is my first time implementing CeleryExecutor, and there's a decent amount involved.
try using nohup. https://en.wikipedia.org/wiki/Nohup
nohup airflow scheduler >scheduler.log &
in your case, you would update your entrypoint as follows:
if [ $SERVICE == "airflow" ]; then
airflow initdb
nohup airflow scheduler > scheduler.log &
nohup airflow webserver
fi
I am trying to running a bash script to run spark-submit and run a pyspark script but it was not successful. I want to check yarn logs using "yarn logs -applicationId ". My question is how can I find the appropriate application id?
Below is some parts of the error I got
1. Using Yarn Logs:
In logs you can see tracking URL: http://<nn>:8088/proxy/application_*****/
If you copy and open the link you can see all the logs for the application in Resourcemanager.
2.Using Spark application:
From sparkContext we can get the applicationID.
print(spark.sparkContext.aplicationId)
3. Using yarn application command:
Use yarn application --list command to get all the running yarn applications on the cluster then use
yarn application --help
-appStates <States> Works with -list to filter applications
based on input comma-separated list of
application states. The valid application
state can be one of the following:
ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUN
NING,FINISHED,FAILED,KILLED
-appTypes <Types> Works with -list to filter applications
based on input comma-separated list of
application types.
-help Displays help for all commands.
-kill <Application ID> Kills the application.
-list List applications. Supports optional use
of -appTypes to filter applications based
on application type, and -appStates to
filter applications based on application
state.
-movetoqueue <Application ID> Moves the application to a different
queue.
-queue <Queue Name> Works with the movetoqueue command to
specify which queue to move an
application to.
-status <Application ID> Prints the status of the application.
List all the finished applications:
yarn application -appStates FINISHED -list
You can also use curl to get required details of your application using YARN Rest API.
state="RUNNING" // RUNNING, FAILED, COMPLETED.
user="" // userid from where you started job.
applicationTypes="spark" // Type of application
applicationName="<application_name>" // Your application name
url="http://<host_name>:8088/ws/v1/cluster/apps?state=${state}&user=${user}&applicationTypes=${applicationTypes}" // Build Rest API
applicationId=$(curl "${url}" | python -m json.tool | jq -r '.apps."app" | .[] | select(.name | contains('\"${applicationName}\"')) | .id')
Output
> echo $applicationId
application_1593019621736_42096
A spark application can run many jobs. My spark is running on yarn. Version 2.2.0.
How to get job running status and other info for a given application id, possibly using REST API?
job like follows:
enter image description here
This might be late but putting it for convenience. Hope it helps. You can use below Rest API command to get the status of any jobs running on YARN.
curl --negotiate -s -u : -X GET 'http://resourcemanagerhost:8088/ws/v1/cluster/apps/application_121766109986_12343/state'
O/P - {"state":"RUNNING"}
Throughout the job cycle the state will vary from NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED
You can use jq for a formatted output.
curl --negotiate -s -u : -X GET 'http://resourcemanagerhost:8088/ws/v1/cluster/apps/application_121766109986_12343'| jq .app.state
O/P - "RUNNING"
YARN has a Cluster Applications API. This shows the state along with other information. To use it:
$ curl 'RMURL/ws/v1/cluster/apps/APP_ID'
with your application id as APP_ID.
It provides:
I am in the process of creating some scripts to deploy my node.js based application, via continuous integration and I am having trouble seeing the right way to stop the node process.
I start the application via a start-dev.sh script:
#!/bin/sh
scripts_dir=`dirname $0`
cd "${scripts_dir}/"..
npm start &
echo $! > app.pid
And then I was hoping to stop it via:
#!/bin/sh
scripts_dir=`dirname $0`
cd "${scripts_dir}/"..
echo killing pid `cat app.pid`
kill -9 `cat app.pid`
The issue I am finding is that npm is no longer running at this point, so the pid isn't useful to stop the process tree. The only workaround I can think of at this point is to skip npm completely for launch and simply call node directly?
Can anyone suggest an appropriate way to deal with this? Is foregoing npm for launching a good approach, in this context?
Forever can do the process management stuff for you.
forever start app.js
forever stop app.js
Try to avoid relying on npm start outside of development, it just adds an additional layer between you and node.
just use supervisor example conf is like
[program:long_script]
command=/usr/bin/node SOURCE_FOLDER/EXECUTABLE_JAVASCRIPT_FILE.js
autostart=true
autorestart=true
stderr_logfile=/var/log/long.err.log
stdout_logfile=/var/log/long.out.log
where
SOURCE_FOLDER is the folder for your project
EXECUTABLE_JAVASCRIPT_FILE the file to be run
you can check the post here
I am wondering if it is possible to submit, monitor & kill spark applications from another service.
My requirements are as follows:
I wrote a service that
parse user commands
translate them into understandable arguments to an already prepared Spark-SQL application
submit the application along with arguments to Spark Cluster using spark-submit from ProcessBuilder
And plans to run generated applications' driver in cluster mode.
Other requirements needs:
Query about the applications status, for example, the percentage remains
Kill queries accrodingly
What I find in spark standalone documentation suggest kill application using:
./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>
And should find the driver ID through the standalone Master web UI at http://<master url>:8080.
So, what am I supposed to do?
Related SO questions:
Spark application finished callback
Deploy Apache Spark application from another application in Java, best practice
You could use shell script to do this.
The deploy script:
#!/bin/bash
spark-submit --class "xx.xx.xx" \
--deploy-mode cluster \
--supervise \
--executor-memory 6G hdfs:///spark-stat.jar > output 2>&1
cat output
and you will get output like this:
16/06/23 08:37:21 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://node-1:6066.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submission successfully created as driver-20160623083722-0026. Polling submission state...
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submitting a request for the status of submission driver-20160623083722-0026 in spark://node-1:6066.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: State of driver driver-20160623083722-0026 is now RUNNING.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Driver is running on worker worker-20160621162532-192.168.1.200-7078 at 192.168.1.200:7078.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20160623083722-0026",
"serverSparkVersion" : "1.6.0",
"submissionId" : "driver-20160623083722-0026",
"success" : true
}
And based on this, create your kill driver script
#!/bin/bash
driverid=`cat output | grep submissionId | grep -Po 'driver-\d+-\d+'`
spark-submit --master spark://node-1:6066 --kill $driverid
Make sure given the script execute permission by using chmod +x
A "dirty" trick to kill spark apps is by kill the jps named SparkSubmit. The main problem is that the app will be "killed" but at spark master log it will appear as "finished"...
user#user:~$ jps
20894 Jps
20704 SparkSubmit
user#user:~$ kill 20704
To be honest I don't like this solution but by now is the only way I know to kill an app.
Here's what I do:
To submit apps, use the (hidden) Spark REST Submission API: http://arturmkrtchyan.com/apache-spark-hidden-rest-api
This way you get a DriverID (under submissionId) which you can use to kill your Job later (you shouldn't Kill the Application, specially if you're using "supervise" on Standalone mode)
This API also lets you query the Driver Status
Query status for apps using the (also hidden) UI Json API: http://[master-node]:[master-ui-port]/json/
This service exposes all information available on the master UI in JSON format.
You can also use the "public" REST API to query Applications on Master or Executors on each worker, but this won't expose Drivers (at least not as of Spark 1.6)
you can fire yarn commnds from processbuilder to list the applications and then filter based on your application name that is available with you, extract the appId and then use Yarn commands poll the status/kill etc.
You can find driver id in [spark]/work/. The id is the directory name. Kill the job by spark-submit.
I also have same kind of problem where I need to map my application-id and driver-id and add them a csv for other application availability in standalone mode
I was able to get application id easily by using command sparkContext.applicationId
In order to get driver-id I thought of using shell command pwd When your program runs the driver logs are written in directory named with driver-id So I extracted the folder name to get driver-id
import scala.sys.process._
val pwdCmd= "pwd"
val getDriverId=pwdCmd.!!
val driverId = get_driver_id.split("/").last.mkString
kill -9 $(jps | grep SparkSubmit | grep -Eo '[0-9]{1,7}')