Manually stopping Spark Workers - apache-spark

Is there a way to stop a Spark worker through the terminal? I'm aware of the scripts: start-all.sh, stop-all.sh, stop-workers.sh, etc. However, everytime I run start-all.sh there seems to be residual workers from a previous Spark cluster instance that are also spawned. I know this because the Worker Id contains the date and timestamp of when the worker was created.
So when I run start-all.sh today, I see the same 7 or so workers that were created at the beginning of April.
Is there a way to kill these earlier workers? Or perhaps a way to grep for their process names?

This has happened to me in the past and what I usually do is:
1) Find the process id:
ps aux | grep spark
2) And kill it:
sudo kill pid1

You can do either:
for pid in $(ps aux | grep spark | awk '{print $2}'); do kill -9 $pid; done
or
for pid in $(jps | grep Worker | awk '{print $1}'); do kill -9 $pid; done
I would suggest that you use the second one so that you don't kill something accidentally since the first one will also show pid for grep and maybe something else that you might have running.
Explanation (might be helpful for new users):
In the pipeline, we have ps (gives us the currently running processes) or jps (jvm process status tool, which gives us the process ids for the processes in the jvm).Then we grep and according to output we grab the pid with the help of awk.
In the loop, kill command with signal -9 (SIGKILL i.e. forced termination, not recommended should be used as last resort). Bye bye process.

To kill a spark worker using spark daemons:
ssh <host-name-where-worker-is-running>
spark-3.3.0-bin-hadoop3/sbin/spark-daemon.sh stop org.apache.spark.deploy.worker.Worker 1
sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
To start it back again:
spark-3.3.0-bin-hadoop3/sbin/spark-daemon.sh start org.apache.spark.deploy.worker.Worker 1
Ideally, spark-3.3.0-bin-hadoop3 should be your ${SPARK-HOME}. Also, ensure to clear your memory caches when you kill the spark worker/executor

Related

Linux "kill -9 <PID>" for all processes? [duplicate]

This question already has answers here:
Kill all processes for a given user
(5 answers)
Closed 3 years ago.
I have a bunch of processes on my school's server that have been running for about a week without it being terminated. I found out that I could use "kill -9 [PID]" for each of the PIDs, but it took me awhile to individually kill each of them.
If, for instance, I have hundreds of processes I want to forcefully kill, is there a way to kill them all instantly?
You don't linux has number of commands, use the following with caution, killall or you could try pkill -U UID or pkill -U username
Note when using pkill, it will kill all processes including your tty terminal session if you are using SSH, you will be kicked out!
You can kill process by grep your applicationName. For example
ps aux |grep kpark06 | awk '{print $2}' | xargs sudo kill -9
man kill:
kill [options] [...]
<pid> can be a list. You can put a giant space-separated list of processes after kill, like kill 123 543.
A PID of -1 is special; it indicates all processes except the kill process itself
and init
So, kill -9 -1 will get everything, but that could easily be more than you expect. Having no idea what else is running there, I would only kill all the processes if prepared to restart the server.
If these processes have something in common, you may want killall, which can filter the processes to kill by age, user, and name/context regular expression, as well as asking for confirmation.

How to stop a given process with the linux task spooler

I am not sure, whether this question relates to Linux in particular:
I use the tsp command to run small batchs of processes
tsp ./myScript.sh
which is then running
ID State Output E-Level Times(r/u/s) Command [run=1/5]
1 running /tmp/ts-out.woHIKK ./myScript.sh
but how can I kill this process? The only -K killing option with tsp seems to wipe the whole tsp server, it could be done using
ps -aux | grep myScript.sh | kill {}
But isn't there any tsp way to do it? It sounds to me like an obvious option.
You can use kill:
kill $(tsp -p 1)
where 1 is the job ID.

How to identify a job given from your user account and kill it

I had given a job in a remote server yesterday from my home. The command was
sh run.sh >& out &
The run.sh will excute a program (average.f) more than 1000 times recurssively.
Today, in my office, I found some mistake in my run.sh. So I would like to kill it.
I used top command, but it is not showing the run.sh. It is only showing average.f. So, once, I killed it with kill PID, it is again starting average.f with another PID and producing outputs.
ps -u is not showing either run.sh or average.f.
Can anybody please help me how to kill this job.
find your job id with the process or application name . example is given below - I am killing java process here
ps -aef|grep java
// the above command will give you pid, now fire below command to kill that job
kill -9 pid
// here pid is a number which you get from the first command
ps -ef | grep run.sh | grep -v grep | awk '{print $2}' | xargs kill -9
Use pstree(1) (probably as pstree -p) to list the process tree hierarchy, then kill(1) (first with -TERM, then with -QUIT, and if that does not work, at last with -KILL) your topmost shell process running run.sh (or else the few "higher" processes). Perhaps use killall(1) or pidof(1) (or pgrep(1) or pkill)
You might want to kill the process group, with a negative pid.
You should never at first kill -KILL a process (but only at last resort); some programs (e.g. database servers, sophisticated numerical computations with application checkpointing, ...) have installed a SIGTERM or SIGQUIT signal handler to clean up their mess and e.g. save some state (on the disk) in a sane way. If you kill -KILL them, they could leave the mess uncleaned (since SIGKILL cannot be caught, see signal(7)....)
BTW, you should use ps auxw to list all processes, read ps(1)

Kill a java process (in linux) by process name instead of PID

While configuring/installing Hadoop cluster we often need to kill a Java Process/Daemon.
We see Java Processes/Daemons running with jps command.
Usually we kill a Java process with its PID. E.g.
kill -9 112224
It is little bit difficult to type the PID. Is there a way to kill the process by its name? In a single command?
Here is the command to kill the Java process by is Process Name instead of its ProcessID.
kill -9 `jps | grep "DataNode" | cut -d " " -f 1`
Let me explain more, about the benefit of this command. Lets say you are working with Hadoop cluster. Its often required that you check java daemons running with jps command. Lets say when you give this command on worker nodes, you see following output.
1915 NodeManager
18119 DataNode
17680 Jps
Usually, if we want to kill DataNode process, we would use following command
kill -9 18119
But, it is little bit difficult to type the PID, to use kill command. By using the command, given in this answer, it is easy to write the name of the process. We can also prepare shell scripts to kill commonly used deamons in hadoop cluster,
or we can prepare one shell script and can use parameter as process name.
How about using
killall firefox
jps -l has helped me killing the process
kill `jps -l | grep "myjarname.jar" | cut -d " " -f 1`
To get the process id of that java process run
netstat -tuplen
Process ID (PID) of that process whom you want to kill and run
kill -9 PID

How to get the process ID to kill a nohup process?

I'm running a nohup process on the server. When I try to kill it my putty console closes instead.
this is how I try to find the process ID:
ps -ef |grep nohup
this is the command to kill
kill -9 1787 787
When using nohup and you put the task in the background, the background operator (&) will give you the PID at the command prompt. If your plan is to manually manage the process, you can save that PID and use it later to kill the process if needed, via kill PID or kill -9 PID (if you need to force kill). Alternatively, you can find the PID later on by ps -ef | grep "command name" and locate the PID from there. Note that nohup keyword/command itself does not appear in the ps output for the command in question.
If you use a script, you could do something like this in the script:
nohup my_command > my.log 2>&1 &
echo $! > save_pid.txt
This will run my_command saving all output into my.log (in a script, $! represents the PID of the last process executed). The 2 is the file descriptor for standard error (stderr) and 2>&1 tells the shell to route standard error output to the standard output (file descriptor 1). It requires &1 so that the shell knows it's a file descriptor in that context instead of just a file named 1. The 2>&1 is needed to capture any error messages that normally are written to standard error into our my.log file (which is coming from standard output). See I/O Redirection for more details on handling I/O redirection with the shell.
If the command sends output on a regular basis, you can check the output occasionally with tail my.log, or if you want to follow it "live" you can use tail -f my.log. Finally, if you need to kill the process, you can do it via:
kill -9 `cat save_pid.txt`
rm save_pid.txt
I am using red hat linux on a VPS server (and via SSH - putty), for me the following worked:
First, you list all the running processes:
ps -ef
Then in the first column you find your user name; I found it the following three times:
One was the SSH connection
The second was an FTP connection
The last one was the nohup process
Then in the second column you can find the PID of the nohup process and you only type:
kill PID
(replacing the PID with the nohup process's PID of course)
And that is it!
I hope this answer will be useful for someone I'm also very new to bash and SSH, but found 95% of the knowledge I need here :)
suppose i am running ruby script in the background with below command
nohup ruby script.rb &
then i can get the pid of above background process by specifying command name. In my case command is ruby.
ps -ef | grep ruby
output
ubuntu 25938 25742 0 05:16 pts/0 00:00:00 ruby test.rb
Now you can easily kill the process by using kill command
kill 25938
jobs -l should give you the pid for the list of nohup processes.
kill (-9) them gently.
;)
You could try
kill -9 `pgrep [command name]`
Suppose you are executing a java program with nohup you can get java process id by
`ps aux | grep java`
output
xxxxx 9643 0.0 0.0 14232 968 pts/2
then you can kill the process by typing
sudo kill 9643
or lets say that you need to kill all the java processes then just use
sudo killall java
this command kills all the java processes. you can use this with process. just give the process name at the end of the command
sudo killall {processName}
If your application always uses the same port, you can kill all the processes in that port like this.
kill -9 $(lsof -t -i:8080)
This works in Ubuntu
Type this to find out the PID
ps aux | grep java
All the running process regarding to java will be shown
In my case is
johnjoe 3315 9.1 4.0 1465240 335728 ? Sl 09:42 3:19 java -jar batch.jar
Now kill it kill -9 3315
The zombie process finally stopped.
when you create a job in nohup it will tell you the process ID !
nohup sh test.sh &
the output will show you the process ID like
25013
you can kill it then :
kill 25013
I started django server with the following command.
nohup manage.py runserver <localhost:port>
This works on CentOS:
:~ ns$netstat -ntlp
:~ ns$kill -9 PID
This works for mi fine on mac
kill -9 `ps -ef | awk '/nohup/{ print \$2 }'`
I often do this way. Try this way :
ps aux | grep script_Name
Here, script_Name could be any script/file run by nohup.
This command gets you a process ID. Then use this command below to kill the script running on nohup.
kill -9 1787 787
Here, 1787 and 787 are Process ID as mentioned in the question as an example.
This should do what was intended in the question.
If you are unaware of the PID, then first find it using TOP command
top -U userid
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
You will get the PID using top, then perform the kill operation.
$ kill -9 <PID>
Today I met the same problem. And since it was a long time ago, I totally forgot which command I used and when. I tried three methods:
Using the STIME shown in ps -ef command. This shows the time you start your process, and it's very likely that you nohup you command just before you close ssh(depends on you) . Unfortunately I don't think the latest command is the command I run using nohup, so this doesn't work for me.
Second is the PPID, also shown in ps -ef command. It means Parent Process ID, the ID of process that creates the process. The ppid is 1 in ubuntu for process that using nohup to run. Then you can use ps --ppid "1" to get the list, and check TIME(the total CPU time your process use) or CMD to find the process's PID.
Use lsof -i:port if the process occupy some ports, and you will get the command. Then just like the answer above, use ps -ef | grep command and you will get the PID.
Once you find the PID of the process, then can use kill pid to terminal the process.
About losing your putty: often the ps ... | awk/grep/perl/... process gets matched, too! So the old school trick is like this
ps -ef | grep -i [n]ohup
That way the regex search doesn't match the regex search process!
if you are on a remote server, check memory usage with top , and find your process and its ID. After that, just execute kill [your process ID] .

Resources