How to start and monitor a set of programs in bash? - linux

I use a system that is started by a script similar to that:
#!/bin/bash
prog_a & # run continuously
prog_b & # run continuously
prog_c & # run continuously
sleep 2 # wait for some stuff to be done
prog_d # start 'main' program ...
killall -9 prog_a
killall -9 prog_b
killall -9 prog_c
It works well. If I do a ctrl-c in the terminal (or if prog_d crashes), then prog_d died and the first processes prog_a, prog_b, prog_c are killed.
The problem I have is that sometimes prog_a, prog_b or prog_c crashed. And prog_d is still alive. What I would like in fact is: if one program died, then the other ones are killed.
Is it possible to do that simply in bash ? I have tried to create a kind of:
wait pid1 pid2 pid3 ... # wait that pid1 or pid2 or pid3 died
But without success (I need to be able to do a ctrl-c to kill prog_d).
Thanks !

I would do that with GNU Parallel, which has nice handling for what to do when any job fails... whether one or more or a percentage fail, whether other jobs should be terminated immediately or only no new jobs should be started.
In your specific case:
parallel -j 4 --halt now,fail=1 --line-buffer ::: progA progB progC 'sleep 2; progD'
That says... "run all four jobs parallel, and halt immediately killing all others if any job fails. Buffer the output by lines. The jobs to be run are then specified after the ::: and they are just your jobs but with a delay before the final one."
You may like the output tagged by the job-name, so you can see which outputs came from which processes, if so, use parallel --tag ...
You may like to delay/stagger the starts of each job, in which case use parallel --delay 1 to start jobs at 1 second intervals and remove the sleep 2.

Related

How do I stop a scirpt running in the background in linux?

Let's say I have a silly script:
while true;do
touch ~/test_file
sleep 3
done
And I start the script into the background and leave the terminal:
chmod u+x silly_script.sh
./silly_script.sh &
exit
Is there a way for me to identify and stop that script now? The way I see it is, that every command is started in it's own process and I might be able to catch and kill one command like the 'sleep 3' but not the execution of the entire script, am I mistaken? I expected a process to appear with the scripts name, but it does not. If I start the script with 'source silly_script.sh' I can't find a process by the name of 'source'. Do I need to identify the instance of bash, that is executing the script? How would I do that?
EDIT: There have been a few creative solutions, but so far they require the PID of the script execution to be stored right away, or the bash session to not be left with ^D or exit. I understand, that this way of running scripts should maybe be avoided, but I find it hard to believe, that any low privilege user could, even by accident, start an annoying script into the background, that is for instance filling the drive with garbage files or repeatedly starting new instances of some software and even the admin has no other option, than to restart the server, because a simple script can hide it's identifier without even trying.
With the help of the fine people here I was able to derive the answer I needed:
It is true, that the script runs every command in it's own process, so for instance killing the sleep 3 command won't do anything to the script being run, but through a command like the sleep 3 you can find the bash instance running the script, by looking for the parent process:
So after doing the above, you can run ps axf to show all processes in a tree form. You will then find this section:
18660 ? S 0:00 /bin/bash
18696 ? S 0:00 \_ sleep 3
Now you have found the bash instance, that is running the script and can stop it: kill 18660
(Of course your PID will be different from mine)
The jobs command will show you all running background jobs.
You can kill background jobs by id using kill, e.g.:
$ sleep 9999 &
[1] 58730
$ jobs
[1]+ Running sleep 9999 &
$ kill %1
[1]+ Terminated sleep 9999
$ jobs
$
58730 is the PID of the backgrounded task, and 1 is the task id of it. In this case kill 58730 and kill %1` would have the same effect.
See the JOB CONTROL section of man bash for more info.
When you exit, the backgrounded job will get a kill signal and die (assuming that's how it handles the signal - in your simple example it is), unless you disown it first.
That kill will propogate to the sleep process, which may well ignore it and continue sleeping. If this is the case you'll still see it in ps -e output, but with a parent pid of 1 indicating its original parent no longer exists.
You can use ps -o ppid= <pid> to find the parent of a process, or pstree -ap to visualise the job hierarchy and find the parent visually.

Run process in background of shell for 2 minutes, then terminate it before 2 minutes is up BASH

I am using Peppermint distro. I'm new to linux, however I need to display system processes, then create a new process to run in the background for 2 minutes, I need to prove its running and then terminate it before the 2 minutes is up.
So far i'm using xlogo to test my process is working. I have
ps
xlogo &
TASK_PID=$!
if pgrep -x xlogo>/dev/null 2>&1
then
ps
sleep 15
kill $TASK_PID
ps
fi
I can't seem to figure out a way to give it an initial time of 2 minutes but then kill it after 15 seconds anyway.
any help appreciated!
If you want the command to originally have a time limit of 2 minutes you could do
timeout 2m xlogo &
of course, then your $! will be of the timeout command. If you're using pgrep and satisfied it's only finding the process you care about though, you could use pkill instead of the PID to kill the xlogo
Of course, killing the timeout PID will also kill xlogo, so you might be able to keep things as-is for the rest if you're happy with how that works.

bash sub process won't be killed when main process been killed

everyone.
I have wrote a bash script to monitor cpu, memory and network information. Everything is just fine with cpu and memory parts. But when it comes to network part, things become interesting.
I use "ifstat" to monitor network. "ifstat" is a block thread that will continuously print network IO on the screen. My bash script is like below:
#!/bin/bash
#ignore other less important codes
......
ifstat > network.info &
while true
do
...
done
I use
bash xx.sh
to run it and use ctrl + c to kill it. And the odd thing appears, although this bash process has been killed but ifstat process is still running on the background. I use
ps -e | grep ifstat
to check it out. It's always there util I killed it manually.
In my opinion, ifstat process is a sub process of xx.sh, so I expect it be killed when I kill xx.sh. But obviously it is not !!!
Somebody can tell me why?
And how to kill it automatically when I kill xx.sh process ?
trap termination and propogate the kill.
#ignore other less important codes
ifstat > network.info &
IFSTAT_PID=$!
trap "kill $IFSTAT_PID $$" TERM INT HUP 0
while true
do
...
done

How to get right PID of a group of background command and kill it?

Ok, just like in this thread, How to get PID of background process?, I know how to get the PID of background process. However, what I need to do countains more than one operation.
{
sleep 300;
echo "Still running after 5 min, killing process manualy.";
COMMAND COMMAND COMMAND
echo "Shutdown complete"
}&
PID_CHECK_STOP=$!
some stuff...
kill -9 $PID_CHECK_STOP
But it doesn't work. It seems i get either a bad PID or I just can't kill it. I tried to run ps | grep sleep and the pid it gives is always right next to the one i get in PID_CHECK_STOP. Is there a way to make it work? Can i wrap those commands an other way so i can kill them all when i need to?
Thx guys!
kill -9 kills the process before it can do anything else, including signalling its children to exit. Use a gentler signal (kill by itself, which sends a TERM, should be sufficient). You do need to have the process signal its children to exit (if any) explicitly, though, via a trap command.
I'm assuming sleep is a placeholder for the real command. sleep is tricky, however, as it ignores any signals until it returns (i.e., it is non-interruptible). To make your example work, put sleep itself in the background and immediately wait on it. When you kill the "outer" background process, it will interrupt the wait call, which will allow sleep to be killed as well.
{
trap 'kill $(jobs -p)' EXIT
sleep 300 & wait
echo "Still running after 5 min, killing process manualy.";
COMMAND COMMAND COMMAND
echo "Shutdown complete"
}&
PID_CHECK_STOP=$!
some stuff...
kill $PID_CHECK_STOP
UPDATE: COMMAND COMMAND COMMAND includes a command that runs via sudo. To kill that process, kill must also be run via sudo. Keep in mind that doing so will run the external kill program, not the shell built-in (there is little difference between the two; the built-in exists to allow you to kill a process when your process quota has been reached).
You can have another script containing those commands and kill that script. If you are dynamically generating code for the block, just write out a script, execute it and kill when you are done.
The { ... } surrounding the statements starts a new shell, and you get its PID afterwards. sleep and other commands within the block get separate PIDs.
To illustrate, look for your process in ps afux | less - the parent shell process (above the sleep) has the PID you were just given.

Sleep in a while loop gets its own pid

I have a bash script that does some parallel processing in a loop. I don't want the parallel process to spike the CPU, so I use a sleep command. Here's a simplified version.
(while true;do sleep 99999;done)&
So I execute the above line from a bash prompt and get something like:
[1] 12345
Where [1] is the job number and 12345 is the process ID (pid) of the while loop. I do a kill 12345 and get:
[1]+ Terminated ( while true; do
sleep 99999;
done )
It looks like the entire script was terminated. However, I do a ps aux|grep sleep and find the sleep command is still going strong but with its own pid! I can kill the sleep and everything seems fine. However, if I were to kill the sleep first, the while loop starts a new sleep pid. This is such a surprise to me since the sleep is not parallel to the while loop. The loop itself is a single path of execution.
So I have two questions:
Why did the sleep command get its own process ID?
How do I easily kill the while loop and the sleep?
Sleep gets its own PID because it is a process running and just waiting. Try which sleep to see where it is.
You can use ps -uf to see the process tree on your system. From there you can determine what the PPID (parent PID) of the shell (the one running the loop) of the sleep is.
Because "sleep" is a process, not a build-in function or similar
You could do the following:
(while true;do sleep 99999;done)&
whilepid=$!
kill -- -$whilepid
The above code kills the process group, because the PID is specified as a negative number (e.g. -123 instead of 123). In addition, it uses the variable $!, which stores the PID of the most recently executed process.
Note:
When you execute any process in background on interactive mode (i.e. using the command line prompt) it creates a new process group, which is what is happening to you. That way, it's relatively easy to "kill 'em all", because you just have to kill the whole process group. However, when the same is done within a script, it doesn't create any new group, because all new processes belong to the script PID, even if they are executed in background (jobs control is disabled by default). To enable jobs control in a script, you just have to put the following at the beginning of the script:
#!/bin/bash
set -m
Have you tried doing kill %1, where 1 is the number you get after launching the command in background?
I did it right now after launching (while true;do sleep 99999;done)& and it correctly terminated it.
"ps --ppid" selects all processes with the specified parent pid, eg:
$ (while true;do sleep 99999;done)&
[1] 12345
$ ppid=12345 ; kill -9 $ppid $(ps --ppid $ppid -o pid --no-heading)
You can kill the process group.
To find the process group of your process run:
ps --no-headers -o "%r" -p 15864
Then kill the process group using:
kill -- -[PGID]
You can do it all in one command. Let's try it out:
$ (while true;do sleep 99999;done)&
[1] 16151
$ kill -- -$(ps --no-headers -o "%r" -p 16151)
[1]+ Terminated ( while true; do
sleep 99999;
done )
To kill the while loop and the sleep using $! you can also use a trap signal handler inside the subshell.
(trap 'kill ${!}; exit' TERM; while true; do sleep 99999 & wait ${!}; done)&
kill -TERM ${!}

Resources