Bash script to list all processes in the foreground process group of a terminal - linux

How can I write a bash script to print out the PIDs of all processes in the foreground process group of a given terminal (which is different from the one in which I run the script)? I know that the C function tcgetpgrp can do the job, but I am wondering if there exist any command line utilities that can do this more easily.

To find the pids of all processes in the foreground process group of pts/29, you can do (on linux):
ps ao stat=,pid=,tty= | awk '$1 ~ /\+/ && $3 ~ /pts\/29/{ print $2}'
ps is often different, and I am uncertain of the portability of that solution.

You can use pgrep's -t flag, which enables you to list process using a given tty.
For example :
# on a first ssh session, which gets pts/0 :
sleep 10
# on a second ssh session :
pgrep -t "pts/0"
1234 # the first session's bash process
5678 # the first session's sleep process

Related

How to get the actual program name using the PID of that running program?

I am working on linux.
Is there any way to get the user defined program name, given the PID of that running program?
I want to output the program name, not the process name.
For example: I have a java application, named as stackoverflow.java. Now the process name will be decided by the system which can be different but the program name is stackoverflow.java. So the output should be the program name, given only the PID of that running program.
There are some commands which are fulfilling partial needs like:
cat /proc/"PID"/cmdline ->
This will give the command line arguments that creates that process with given "PID". But if we have various programs in different programming languages then the format of the command which runs that program will not be same. So in that case, how to extract the exact program name from this command?
readlink -f /proc/"PID"/exe -> This will give the executable file name related to the process with given "PID". But some processes do not have executable files. In that case, it will not return anything.
The ps utility does this. For example,
$ ps 12345
PID TTY STAT TIME COMMAND
12345 pts/1 S 0:00 sleep 20
Here's how to ask for just the command:
$ ps -o command 12345
COMMAND
sleep 20
So you merely need to remove that first line:
$ ps -o command 12345 |awk 'NR>1'
sleep 20
If you want just the command without arguments:
$ ps -o command 12345 |awk 'NR>1 { print $1 }'
sleep
(Note: this won't work for commands with spaces in their names.)

Run command as another user in Linux with `su - user c` creates a duplicate process

I want to run a process by a service as root user, because the daemon may have its own user.
But when I run it with system("su - root c ./testbin"), the system shows two processes (I check this via ps aux | grep testbin):
su - root c ./testbin
and
./testbin
How to achieve a single process?
The "su" process can not be avoided, but you can get it to end before your testbin does.
Your original question using "sleep" looks like this:
(su - root -c "sleep 120" &) ; ps aux | grep sleep
If you execute that line multiple times you will see multiple "su" processes as a result of the grep.
Backgrounding the subprocess allows the su process to end, like this:
(su - root -c "sleep 120 &" &) ; ps aux | grep sleep
When you execute that line multiple times you can see that the "su" processes disappear from the list but that the sleep commands continue.
Note that the ampersand inside the double quotes is for the sub process and that the ampersand just before the parentheses is for the 'su' command which is required to perform your question in a single line and speed up testing this case.
I checked if an equivalent of 'execv' exists for the command line, but this does not seem to be the case. Also, 'su' is a process that runs with the permissions of the caller and the subprocess of su runs with the permissions of the process forked by 'su'. It seems logical to me that you can not replace the 'su' process with its child as 'execv' does in 'C' for security reasons.
ps aux | grep testbin | grep -v grep
I think you just see your "grep" process. Use the command above.

How do I find the current process running on a particular PBS job

I am trying to write a script to provide diagnostics on processes. I have submitted a script to a job scheduling server using qsub. I can easily find the node that the job gets sent to. But I would like to be able to find what process is currently being run.
ie. I have a list of different commands in the submitted script, how can I find the current one that is running, and the arguments passed to it?
example of commands in script
matlab -nodesktop -nosplash -r "display('here'),quit"
python runsomethings.py
I would like to see whether the nodes is currently executing the first or second line.
When you submit a job, pbs_server pass your task to pbs_mom. pbs_mom process/daemon actually executes your script on the execution node. It
"creates a new session as identical user."
This means invoking a shell. You specialize the shell at the top of the script marking your choice with shebang: #!/bin/bash).
It's clear, that pbs_mom stores process (shell) PID somewhere to kill the job and to monitor if the job (shell process) have finished.
UPD. based on #Dmitri Chubarov comment: pbs_mom stores subshell PID internally in memory after calling fork(), and in the .TK file which is under torque installation directory: /var/spool/torque/mom_priv/jobs on my system.
Dumping file internals in decimal mode (<job_number>, <queue_name> should be your own values):
$ hexdump -d /var/spool/torque/mom_priv/jobs/<job_number>.<queue_name>.TK
have disclosed, that in my torque implementation it is stored in position
00000890 + offset 4*2 = 00000898 (it is hex value of first byte of PID in .TK file) and has a length of 2 bytes.
For example, for shell PID=27110 I have:
0000890 00001 00000 00001 00000 27110 00000 00000 00000
Let's recover PID from .TK file:
$ hexdump -s 2200 -n 2 -d /var/spool/torque/mom_priv/jobs/<job_number>.<queue_name>.TK | tr -s ' ' | cut -s -d' ' -f 2
27110
This way you've found subshell PID.
Now, monitor process list on the execution node and find name of child processes (getcpid function is a slighlty modified version of that posted earlier on SO):
function getcpid() {
cpids=`pgrep -P $1|xargs`
for cpid in $cpids;
do
ps -p "$cpid" -o comm=
getcpid $cpid
done
}
At last,
getcpid <your_PID>
gives you the child processes' names (note, there will be some garbage lines, like task numbers). This way you will finally know, what command is currently running on the execution node.
Of course, for each task monitored, you should obtain the PID and process name on the execution node after doing
ssh <your node>
You can automatically retrieve node name(s) in <node/proc+node/proc+...> format (process it further to obtain bare node names):
qstat -n <job number> | awk '{print $NF}' | grep <pattern_for_your_node_names>
Note:
The PID method is reliable and, as I believe, optimal.
Search by name is worse, it provides you unambiguous result only if your invoke different commands in your scripts, and no user executes the same software on the node.
ssh <your node>
ps aux | grep matlab
You will know if matlab runs.
Simple and elegant way to do it is to print to a log file
`
ARGS=" $A $B $test "
echo "running MATLAB now with args: $ARGS" >> $LOGFILE
matlab -nodesktop -nosplash -r "display('here'),quit"
PYARGS="$X $Y"
echo "running Python now with args: $ARGS" >> $LOGFILE
python runsomethings.py
`
And monitor the output of $LOGFILE using tail -f $LOGFILE

Linux bash script that kills a process (not started by me) after x amount of time

I'm pretty inexperienced with Linux bash. That being said, I have a CentOS7 machine that runs a COTS application server. This application server runs other processes that sometimes hang. Since I have no control over the start of these processes, I'm looking for a script that runs every 2 minutes that kills processes of the name "spicer" that have been running for longer than 10 minutes. I've looked around and have only been able to find answers for processes that are run and owned by me.
I use the command ps -eo pid, command,etime | grep spicer to get all the spicer processes. The output of this command looks like:
18216 spicer -l/opt/otmm-10.5/Spi 14:20
18415 spicer -l/opt/otmm-10.5/Spi 11:49
etc...
18588 grep --color=auto spicer
I don't know if there's a way to parse this directly in bash. I'm also not well-versed at all in other Linux tools. I know that awk (or gawk) could possibly help.
EDIT
I have no control over the data that the process is working on.
What about wrapping the executable of spicer and start it using the timeout command? Let's say it is installed in /usr/bin/spicer. Then issue:
cp /usr/bin/spicer{,.orig}
echo '#!/bin/bash' > /usr/bin/spicer
echo 'timeout 10m spicer.orig "$#"' >> /usr/bin/spicer
Another approach would be to create a cronjob defintion into /etc/cron.d/kill_spicer. Like this:
* * * * * root kill $(ps --no-headers -C spicer -o pid,etimes | awk '$2>=600{print $1}')
The cronjob will get executed minutely and uses ps to obtain a list of spicer processes that run longer than 10minutes and passes them to kill.
Probably you even want kill -9 if the process is hanging.
You can use the -C option of ps to select processes by name.
ps --no-headers -C spicer -o pid,etime
Then you can use cut to filter the results, if the spacing is consistent. On my system the pid field takes up 8 characters, so I'd use
kill $(ps --no-headers -C spicer -o pid,etime | cut -c-8)
If the spacing is inconsistent (but if so, what kind of messed up ps are you using? :-P), you can use awk { print $1 } instead of cut.

How to get a list of programs running with nohup

I am accessing a server running CentOS (linux distribution) with an SSH connection.
Since I can't always stay logged in, I use "nohup [command] &" to run my programs.
I couldn't find how to get a list of all the programs I started using nohup.
"jobs" only works out before I log out. After that, if I log back again, the jobs command shows me nothing, but I can see in my log files that my programs are still running.
Is there a way to get a list of all the programs that I started using "nohup" ?
When I started with $ nohup storm dev-zookeper ,
METHOD1 : using jobs,
prayagupd#prayagupd:/home/vmfest# jobs -l
[1]+ 11129 Running nohup ~/bin/storm/bin/storm dev-zookeeper &
NOTE: jobs shows nohup processes only on the same terminal session where nohup was started. If you close the terminal session or try on new session it won't show the nohup processes. Prefer METHOD2
METHOD2 : using ps command.
$ ps xw
PID TTY STAT TIME COMMAND
1031 tty1 Ss+ 0:00 /sbin/getty -8 38400 tty1
10582 ? S 0:01 [kworker/0:0]
10826 ? Sl 0:18 java -server -Dstorm.options= -Dstorm.home=/root/bin/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dsto
10853 ? Ss 0:00 sshd: vmfest [priv]
TTY column with ? => nohup running programs.
Description
TTY column = the terminal associated with the process
STAT column = state of a process
S = interruptible sleep (waiting for an event to complete)
l = is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
Reference
$ man ps # then search /PROCESS STATE CODES
Instead of nohup, you should use screen. It achieves the same result - your commands are running "detached". However, you can resume screen sessions and get back into their "hidden" terminal and see recent progress inside that terminal.
screen has a lot of options. Most often I use these:
To start first screen session or to take over of most recent detached one:
screen -Rd
To detach from current session: Ctrl+ACtrl+D
You can also start multiple screens - read the docs.
If you have standart output redirect to "nohup.out" just see who use this file
lsof | grep nohup.out
You cannot exactly get a list of commands started with nohup but you can see them along with your other processes by using the command ps x. Commands started with nohup will have a question mark in the TTY column.
You can also just use the top command and your user ID will indicate the jobs running and the their times.
$ top
(this will show all running jobs)
$ top -U [user ID]
(This will show jobs that are specific for the user ID)
sudo lsof | grep nohup.out | awk '{print $2}' | sort -u | while read i; do ps -o args= $i; done
returns all processes that use the nohup.out file

Resources