Hi all/ I'm trying to make something like parallel tool for shell simply because the functionality of parallel is not enough for my task. The reason is that I need to run different versions of compiler.
Imagine that I need to compile 12 programs with different compilers, but I can run only 4 of them simultaneously (otherwise PC runs out of memory and crashes :). I also want to be able to observe what's going on with each compile, therefore I execute every compile in new window.
Just to make it easier here I'll replace compiler that I run with small script that waits and returns it's process id sleep.sh:
#!/bin/bash
sleep 30
echo $$
So the main script should look like parallel_run.sh :
#!/bin/bash
for i in {0..11}; do
xfce4-terminal -H -e "./sleep.sh" &
pids[$i]=$!
pstree -p $pids
if (( $i % 4 == 0 ))
then
for pid in ${pids[*]}; do
wait $pid
done
fi
done
The problem is that with $! I get pid of xfce4-terminal and not the program it executes. So if I look at ptree of 1st iteration I can see output from main script:
xfce4-terminal(31666)----{xfce4-terminal}(31668)
|--{xfce4-terminal}(31669)
and sleep.sh says that it had pid = 30876 at that time. Thus wait doesn't work at all in this case.
Q: How to get right PID of compiler that runs in subshell?
Maybe there is the other way to solve task like this?
It seems like there is no way to trace PID from parent to child if you invoke process in new xfce4-terminal as terminal process dies right after it executed given command. So I came to the solution which is not perfect, but acceptable in my situation. I run and put compiler's processes in background and redirect output to .log file. Then I run tail on these logfiles and I kill all tails which belongs to current $USER when compilers from current batch are done, then I run the other batch.
#!/bin/bash
for i in {1..8}; do
./sleep.sh > ./process_$i.log &
prcid=$!
xfce4-terminal -e "tail -f ./process_$i.log" &
pids[$i]=$prcid
if (( $i % 4 == 0 ))
then
for pid in ${pids[*]}; do
wait $pid
done
killall -u $USER tail
fi
done
Hopefully there will be no other tails running at that time :)
This question already has answers here:
How to kill all subprocesses of shell?
(9 answers)
Closed 3 years ago.
I am trying to do an operation in linux trying to burn cpu using openssl speed
this is my code from netflix simian army
#!/bin/bash
# Script for BurnCpu Chaos Monkey
cat << EOF > /tmp/infiniteburn.sh
#!/bin/bash
while true;
do openssl speed;
done
EOF
# 32 parallel 100% CPU tasks should hit even the biggest EC2 instances
for i in {1..32}
do
nohup /bin/bash /tmp/infiniteburn.sh &
done
so this is Netflix simian army code to do burn cpu, this executes properly but the issue is I cannot kill all 32 processes, I tried everything
pkill -f pid/process name
killall -9 pid/process name
etc.,
the only successful way I killed the process is through killing it via user
pkill -u username
How can I kill these process without using username?
any help is greatly appreciated
finally, I found a solution to my own question,
kill -- -$(ps -o pgid= $PID | grep -o [0-9]*)
where PID is the process ID of any of the one processes running, this works fine but I am open to hear any other options available
source: http://fibrevillage.com/sysadmin/237-ways-to-kill-parent-and-child-processes-in-one-command
Killing a process doesn't automatically kill its children. Killing your bash script won't kill the openssl speed processes.
You can either cast a wider net with your kill call, which is what you're doing with pkill -u. Or you could use trap in your script and add an error handler.
cleanup() {
# kill children
}
trap cleanup EXIT
I had a similar problem and solution where I needed to kill a NodeJS server after some amount of time.
To do this, I enabled Job control, and killed async processes by group id with jobs:
set -m
./node_modules/.bin/node src/index.js &
sleep 2
kill -- -$(jobs -p)
I have a python script that'll be checking a queue and performing an action on each item:
# checkqueue.py
while True:
check_queue()
do_something()
How do I write a bash script that will check if it's running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep?):
# keepalivescript.sh
if processidfile exists:
if processid is running:
exit, all ok
run checkqueue.py
write processid to processidfile
I'll call that from a crontab:
# crontab
*/5 * * * * /path/to/keepalivescript.sh
Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.
There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.
Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.
until myserver; do
echo "Server 'myserver' crashed with exit code $?. Respawning.." >&2
sleep 1
done
The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0, until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.
Why do we wait a second? Because if something's wrong with the startup sequence of myserver and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1 takes away the strain from that.
Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an #reboot rule. Open your cron rules with crontab:
crontab -e
Then add a rule to start your monitor script:
#reboot /usr/local/bin/myservermonitor
Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver start at a certain init level and be respawned automatically.
Edit.
Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.
Consider this:
PID recycling (killing the wrong process):
/etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
A while later: foo dies somehow.
A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
You notice foo's gone: /etc/init.d/foo/restart reads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..
What if you don't even have write access or are in a read-only environment?
It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.
See also: Are PID-files still flawed when doing it 'right'?
By the way; even worse than PID files is parsing ps! Don't ever do this.
ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.
If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.
Have a look at monit (http://mmonit.com/monit/). It handles start, stop and restart of your script and can do health checks plus restarts if necessary.
Or do a simple script:
while true
do
/your/script
sleep 1
done
In-line:
while true; do <your-bash-snippet> && break; done
This will restart continuously <your-bash-snippet> if it fails: && break will stop the loop if <your-bash-snippet> stop gracefully (return code 0).
To restart <your-bash-snippet> in all cases:
while true; do <your-bash-snippet>; done
e.g. #1
while true; do openconnect x.x.x.x:xxxx && break; done
e.g. #2
while true; do docker logs -f container-name; sleep 2; done
The easiest way to do it is using flock on file. In Python script you'd do
lf = open('/tmp/script.lock','w')
if(fcntl.flock(lf, fcntl.LOCK_EX|fcntl.LOCK_NB) != 0):
sys.exit('other instance already running')
lf.write('%d\n'%os.getpid())
lf.flush()
In shell you can actually test if it's running:
if [ `flock -xn /tmp/script.lock -c 'echo 1'` ]; then
echo 'it's not running'
restart.
else
echo -n 'it's already running with PID '
cat /tmp/script.lock
fi
But of course you don't have to test, because if it's already running and you restart it, it'll exit with 'other instance already running'
When process dies, all it's file descriptors are closed and all locks are automatically removed.
You should use monit, a standard unix tool that can monitor different things on the system and react accordingly.
From the docs: http://mmonit.com/monit/documentation/monit.html#pid_testing
check process checkqueue.py with pidfile /var/run/checkqueue.pid
if changed pid then exec "checkqueue_restart.sh"
You can also configure monit to email you when it does do a restart.
if ! test -f $PIDFILE || ! psgrep `cat $PIDFILE`; then
restart_process
# Write PIDFILE
echo $! >$PIDFILE
fi
watch "yourcommand"
It will restart the process if/when it stops (after a 2s delay).
watch -n 0.1 "yourcommand"
To restart it after 0.1s instead of the default 2 seconds
watch -e "yourcommand"
To stop restarts if the program exits with an error.
Advantages:
built-in command
one line
easy to use and remember.
Drawbacks:
Only display the result of the command on the screen once it's finished
I'm not sure how portable it is across operating systems, but you might check if your system contains the 'run-one' command, i.e. "man run-one".
Specifically, this set of commands includes 'run-one-constantly', which seems to be exactly what is needed.
From man page:
run-one-constantly COMMAND [ARGS]
Note: obviously this could be called from within your script, but also it removes the need for having a script at all.
I've used the following script with great success on numerous servers:
pid=`jps -v | grep $INSTALLATION | awk '{print $1}'`
echo $INSTALLATION found at PID $pid
while [ -e /proc/$pid ]; do sleep 0.1; done
notes:
It's looking for a java process, so I
can use jps, this is much more
consistent across distributions than
ps
$INSTALLATION contains enough of the process path that's it's totally unambiguous
Use sleep while waiting for the process to die, avoid hogging resources :)
This script is actually used to shut down a running instance of tomcat, which I want to shut down (and wait for) at the command line, so launching it as a child process simply isn't an option for me.
I use this for my npm Process
#!/bin/bash
for (( ; ; ))
do
date +"%T"
echo Start Process
cd /toFolder
sudo process
date +"%T"
echo Crash
sleep 1
done
I'm using a simple loop to restart a process if it dies. Occasionally I've seen the loop stop, which of course it shouldn't do. What could be some causes of this? I'm using a low end node/vps running ubuntu 14. Thanks. :)
This is the loop I use.
#!/bin/bash/
period=${1:-60}
while :
do
sleep 20 &
sh restart.sh
wait
done
This is restart.sh.. it greps the current PID of 'ffmpeg' and if its not found, it re-runs the ffmpeg command which broadcasts my own internet radio station, which I can then listen to anywhere :)
#!/bin/bash/
pgrep ffmpeg
if [ $? -ne 0 ]
then killall ffmpeg
killall rtmpdump
sleep 1
nohup ffmpeg (a bunch of ffmpeg stuff) &
fi
So your saying this could be an issue with ffmpeg hanging/freezing, rather then the loop dieing out? Is there ways I could improve what I am doing here? TBH I'm only about a month into linux and a week into bash, so this is the best I could do.
Thanks for your help so far, it was very useful!
Depending on what restart.sh does, it appears the script could wait forever if sh restart.sh hangs. By specifying wait with no PID, it will wait until all process IDs known to the invoking shell have terminated (see man 1P wait). If sh restart.sh hangs, wait is never satisfied.
If there is no specific requirement to background sleep, why not run sleep in the foreground and omit wait altogether:
#!/bin/bash/
period=${1:-60}
while :
do
sleep 20
sh restart.sh
done
I have two scripts, in which one is calling the other, and needs to kill it after some time. A very basic, working example is given below.
main_script.sh:
#!/bin/bash
cd "${0%/*}" #make current working directory the folder of this script
./record.sh &
PID=$!
# perform some other commands
sleep 5
kill -s SIGINT $PID
#wait $PID
echo "Finished"
record.sh:
#!/bin/bash
cd "${0%/*}" #make current working directory the folder of this script
RECORD_PIDS=1
printf "WallTimeStart: %f\n\n" $(date +%s.%N) >> test.txt
top -b -p $RECORD_PIDS -d 1.00 >> test.txt
printf "WallTimeEnd: %f\n\n" $(date +%s.%N) >> test.txt
Now, if I run main_script.sh, it will not nicely close record.sh on finish: the top command will keep on running in the background (test.txt will grow until you manually kill the top process), even though the main_script is finished and the record script is killed using SIGINT.
If I ctrl+c the main_script.sh, everything shuts down properly. If I run record.sh on its own and ctrl+c it, everything shuts down properly as well.
If I uncomment wait, the script will hang and I will need to ctrl+z it.
I have already tried all kinds of things, including using 'trap' to launch some cleanup script when receiving a SIGINT, EXIT, and/or SIGTERM, but nothing worked. I also tried bring record.sh back to the foreground using fg, but that did not help too. I have been searching for nearly a day now already, with now luck unfortunately. I have made an ugly workaround which uses pidof to find the top process and kill it manually (from main_script.sh), and then I have to write the "WallTimeEnd" statement manually to it as well from the main_script.sh. Not very satisfactory to me...
Looking forward to any tips!
Cheers,
Koen
Your issue is that the SIGINT is delivered to bash rather than to top. One option would be to use a new session and send the signal to the process group instead, like:
#!/bin/bash
cd "${0%/*}" #make current working directory the folder of this script
setsid ./record.sh &
PID=$!
# perform some other commands
sleep 5
kill -s SIGINT -$PID
wait $PID
echo "Finished"
This starts the sub-script in a new process group and the -pid tells kill to signal every process in that group, which will include top.