Finding pid of the mpirun command - linux

After I issue a mpirun command, I want to get the pid of this process, so that I can kill this process later. How to do this without having to add '&' at the end of the mpirun command to send it to background?
Other condition is that there could be more than one mpirun processes running on the machine.

With Open MPI one could instruct mpirun to output its own PID by giving it the --report-pid option:
--report-pid - outputs the PID to the standard output;
--report-pid + outputs the PID to the standard error;
--report-pid /path/to/filename writes the PID into filename.
To get the PIDs of all your running mpiruns, use:
$ pgrep -u `whoami` mpirun

You can use 'ps' to get pid
ps -A | grep process-name

Besides putting the & and using $! and then % to put the mpirun process bash to foreground:
$ mpirun -np 4 ./a.out &
[1] 12345
$ PID=$!
$ %
...
You can use the pkill commands to terminate it e.g. pkill mpirun but then you need to make sure only to have one mpirun process running, or you can write your MPI program so that it writes it PID to a file and then read this file, like many UNIX daemon do.

ps aux | grep application_name
This syntax works great on Ubuntu, but I don't know if it works well for the other distributions. It returns (at least) two different rows: one containing the pid of your current search process and the other containing the pid of the process you are searching for. The program PID is just the second column of the list.
You can run man ps or ps --help to see the manual of the ps command.
To kill a process check the options of the commands pkill kill and killall

Related

Monitoring all running process using strace in shell script

I want to monitor all the running processes using strace and when a process ends the output of the strace should be sent to a file.
And how to find every running proc PID. I also want to include process name in the output file.
$ sudo strace -p 1725 -o firefox_trace.txt
$ tail -f firefox_trace.txt
1725 would be the PID of the proccess you want to monitor (you can find the PID with "ps -C firefox-bin", for firefox in the example)
And firefox_trace.txt would be the output file !
The way to got would be to find every running proc PID, and use the command to write them in the output file !
Considering the doc,
-p pid
Attach to the process with the process ID pid and begin tracing. The
trace may be terminated at any time by a keyboard interrupt signal (
CTRL -C). strace will respond by detaching itself from the traced
process(es) leaving it (them) to continue running. Multiple -p options
can be used to attach to up to 32 processes in addition to command
(which is optional if at least one -p option is given).
Use -o to store the output to the file, or 2>&1 to redirect standard error to output, so you can filter it (grep) or redirect it into file (> file).
To monitor process without knowing its PID, but name, you can use pgrep command, e.g.
strace -p $(pgrep command) -o file.out
where command is your name of process (e.g. php, Chrome, etc.).
To learn more about parameters, check man strace.

How to identify a job given from your user account and kill it

I had given a job in a remote server yesterday from my home. The command was
sh run.sh >& out &
The run.sh will excute a program (average.f) more than 1000 times recurssively.
Today, in my office, I found some mistake in my run.sh. So I would like to kill it.
I used top command, but it is not showing the run.sh. It is only showing average.f. So, once, I killed it with kill PID, it is again starting average.f with another PID and producing outputs.
ps -u is not showing either run.sh or average.f.
Can anybody please help me how to kill this job.
find your job id with the process or application name . example is given below - I am killing java process here
ps -aef|grep java
// the above command will give you pid, now fire below command to kill that job
kill -9 pid
// here pid is a number which you get from the first command
ps -ef | grep run.sh | grep -v grep | awk '{print $2}' | xargs kill -9
Use pstree(1) (probably as pstree -p) to list the process tree hierarchy, then kill(1) (first with -TERM, then with -QUIT, and if that does not work, at last with -KILL) your topmost shell process running run.sh (or else the few "higher" processes). Perhaps use killall(1) or pidof(1) (or pgrep(1) or pkill)
You might want to kill the process group, with a negative pid.
You should never at first kill -KILL a process (but only at last resort); some programs (e.g. database servers, sophisticated numerical computations with application checkpointing, ...) have installed a SIGTERM or SIGQUIT signal handler to clean up their mess and e.g. save some state (on the disk) in a sane way. If you kill -KILL them, they could leave the mess uncleaned (since SIGKILL cannot be caught, see signal(7)....)
BTW, you should use ps auxw to list all processes, read ps(1)

How to know if the process is set NOHUP?

Using jobs I know the process is running.
bash-4.2$ jobs
[1]+ Running test.sh &
I wanted it to be set NOHUP so that it won't be killed when I exit. I used
disown
and
bash-4.2$ jobs
shows nothing. I'm not sure if the process is set NOHUP or not. I'm curious about this because after I read the manual it says
disown -h
should be used to set NOHUP.
Edit
I don't think the link Find the Process run by nohup command helps. The question is different than that one.
I'm gonna restate my problem. I run a program without nohup, and later I wanted it to be set NOHUP so that it won't be killed when I exit the system. So I used disown, but later I found the manual says I should have used disown -h to set NOHUP. I want to check if my process is set NOHUP or not successfully. If not, what can I do to set it to be NOHUP?
UPDATE
I know two ways my be helpful:
1) Whenever a process is running over nohup It writes output on ~/nohup.out . So you can check this file by running command find -cmin 2. It shows you if nohup.out is changing each 2 seconds or not.
If it is changing you would understand that sth is running by nohup command, after that you can check it with lsof and continue your checking...
2) If you logout from specific user andgo to tty then do ps aux | grep <user> or ps aux | grep ? then you can understand that is running with nohup command... because there is no pts then it shows you ? instead...
useful command:
ps aux | grep <program> | awk -F" " '{print $7}'
Hope to be helpful

Run a script in the same shell(bash)

My problem is specific to the running of SPECCPU2006(a benchmark suite).
After I installed the benchmark, I can invoke a command called "specinvoke" in terminal to run a specific benchmark. I have another script, where part of the codes are like following:
cd (specific benchmark directory)
specinvoke &
pid=$!
My goal is to get the PID of the running task. However, by doing what is shown above, what I got is the PID for the "specinvoke" shell command and the real running task will have another PID.
However, by running specinvoke -n ,the real code running in the specinvoke shell will be output to the stdout. For example, for one benchmark,it's like this:
# specinvoke r6392
# Invoked as: specinvoke -n
# timer ticks over every 1000 ns
# Use another -n on the command line to see chdir commands and env dump
# Starting run for copy #0
../run_base_ref_gcc43-64bit.0000/milc_base.gcc43-64bit < su3imp.in > su3imp.out 2>> su3imp.err
Inside it it's running a binary.The code will be different from benchmark to benchmark(by invoking under different benchmark directory). And because "specinvoke" is installed and not just a script, I can not use "source specinvoke".
So is there any clue? Is there any way to directly invoke the shell command in the same shell(have same PID) or maybe I should dump the specinvoke -n and run the dumped materials?
You can still do something like:
cd (specific benchmark directory)
specinvoke &
pid=$(pgrep milc_base.gcc43-64bit)
If there are several invocation of the milc_base.gcc43-64bit binary, you can still use
pid=$(pgrep -n milc_base.gcc43-64bit)
Which according to the man page:
-n
Select only the newest (most recently started) of the matching
processes
when the process is a direct child of the subshell:
ps -o pid= -C=milc_base.gcc43-64bit --ppid $!
when not a direct child, you could get the info from pstree:
pstree -p $! | grep -o 'milc_base.gcc43-64bit(.*)'
output from above (PID is in brackets): milc_base.gcc43-64bit(9837)

How to get the process ID to kill a nohup process?

I'm running a nohup process on the server. When I try to kill it my putty console closes instead.
this is how I try to find the process ID:
ps -ef |grep nohup
this is the command to kill
kill -9 1787 787
When using nohup and you put the task in the background, the background operator (&) will give you the PID at the command prompt. If your plan is to manually manage the process, you can save that PID and use it later to kill the process if needed, via kill PID or kill -9 PID (if you need to force kill). Alternatively, you can find the PID later on by ps -ef | grep "command name" and locate the PID from there. Note that nohup keyword/command itself does not appear in the ps output for the command in question.
If you use a script, you could do something like this in the script:
nohup my_command > my.log 2>&1 &
echo $! > save_pid.txt
This will run my_command saving all output into my.log (in a script, $! represents the PID of the last process executed). The 2 is the file descriptor for standard error (stderr) and 2>&1 tells the shell to route standard error output to the standard output (file descriptor 1). It requires &1 so that the shell knows it's a file descriptor in that context instead of just a file named 1. The 2>&1 is needed to capture any error messages that normally are written to standard error into our my.log file (which is coming from standard output). See I/O Redirection for more details on handling I/O redirection with the shell.
If the command sends output on a regular basis, you can check the output occasionally with tail my.log, or if you want to follow it "live" you can use tail -f my.log. Finally, if you need to kill the process, you can do it via:
kill -9 `cat save_pid.txt`
rm save_pid.txt
I am using red hat linux on a VPS server (and via SSH - putty), for me the following worked:
First, you list all the running processes:
ps -ef
Then in the first column you find your user name; I found it the following three times:
One was the SSH connection
The second was an FTP connection
The last one was the nohup process
Then in the second column you can find the PID of the nohup process and you only type:
kill PID
(replacing the PID with the nohup process's PID of course)
And that is it!
I hope this answer will be useful for someone I'm also very new to bash and SSH, but found 95% of the knowledge I need here :)
suppose i am running ruby script in the background with below command
nohup ruby script.rb &
then i can get the pid of above background process by specifying command name. In my case command is ruby.
ps -ef | grep ruby
output
ubuntu 25938 25742 0 05:16 pts/0 00:00:00 ruby test.rb
Now you can easily kill the process by using kill command
kill 25938
jobs -l should give you the pid for the list of nohup processes.
kill (-9) them gently.
;)
You could try
kill -9 `pgrep [command name]`
Suppose you are executing a java program with nohup you can get java process id by
`ps aux | grep java`
output
xxxxx 9643 0.0 0.0 14232 968 pts/2
then you can kill the process by typing
sudo kill 9643
or lets say that you need to kill all the java processes then just use
sudo killall java
this command kills all the java processes. you can use this with process. just give the process name at the end of the command
sudo killall {processName}
If your application always uses the same port, you can kill all the processes in that port like this.
kill -9 $(lsof -t -i:8080)
This works in Ubuntu
Type this to find out the PID
ps aux | grep java
All the running process regarding to java will be shown
In my case is
johnjoe 3315 9.1 4.0 1465240 335728 ? Sl 09:42 3:19 java -jar batch.jar
Now kill it kill -9 3315
The zombie process finally stopped.
when you create a job in nohup it will tell you the process ID !
nohup sh test.sh &
the output will show you the process ID like
25013
you can kill it then :
kill 25013
I started django server with the following command.
nohup manage.py runserver <localhost:port>
This works on CentOS:
:~ ns$netstat -ntlp
:~ ns$kill -9 PID
This works for mi fine on mac
kill -9 `ps -ef | awk '/nohup/{ print \$2 }'`
I often do this way. Try this way :
ps aux | grep script_Name
Here, script_Name could be any script/file run by nohup.
This command gets you a process ID. Then use this command below to kill the script running on nohup.
kill -9 1787 787
Here, 1787 and 787 are Process ID as mentioned in the question as an example.
This should do what was intended in the question.
If you are unaware of the PID, then first find it using TOP command
top -U userid
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
You will get the PID using top, then perform the kill operation.
$ kill -9 <PID>
Today I met the same problem. And since it was a long time ago, I totally forgot which command I used and when. I tried three methods:
Using the STIME shown in ps -ef command. This shows the time you start your process, and it's very likely that you nohup you command just before you close ssh(depends on you) . Unfortunately I don't think the latest command is the command I run using nohup, so this doesn't work for me.
Second is the PPID, also shown in ps -ef command. It means Parent Process ID, the ID of process that creates the process. The ppid is 1 in ubuntu for process that using nohup to run. Then you can use ps --ppid "1" to get the list, and check TIME(the total CPU time your process use) or CMD to find the process's PID.
Use lsof -i:port if the process occupy some ports, and you will get the command. Then just like the answer above, use ps -ef | grep command and you will get the PID.
Once you find the PID of the process, then can use kill pid to terminal the process.
About losing your putty: often the ps ... | awk/grep/perl/... process gets matched, too! So the old school trick is like this
ps -ef | grep -i [n]ohup
That way the regex search doesn't match the regex search process!
if you are on a remote server, check memory usage with top , and find your process and its ID. After that, just execute kill [your process ID] .

Resources