Parallel run and wait for pocesses from subshell - linux

Hi all/ I'm trying to make something like parallel tool for shell simply because the functionality of parallel is not enough for my task. The reason is that I need to run different versions of compiler.
Imagine that I need to compile 12 programs with different compilers, but I can run only 4 of them simultaneously (otherwise PC runs out of memory and crashes :). I also want to be able to observe what's going on with each compile, therefore I execute every compile in new window.
Just to make it easier here I'll replace compiler that I run with small script that waits and returns it's process id sleep.sh:
#!/bin/bash
sleep 30
echo $$
So the main script should look like parallel_run.sh :
#!/bin/bash
for i in {0..11}; do
xfce4-terminal -H -e "./sleep.sh" &
pids[$i]=$!
pstree -p $pids
if (( $i % 4 == 0 ))
then
for pid in ${pids[*]}; do
wait $pid
done
fi
done
The problem is that with $! I get pid of xfce4-terminal and not the program it executes. So if I look at ptree of 1st iteration I can see output from main script:
xfce4-terminal(31666)----{xfce4-terminal}(31668)
|--{xfce4-terminal}(31669)
and sleep.sh says that it had pid = 30876 at that time. Thus wait doesn't work at all in this case.
Q: How to get right PID of compiler that runs in subshell?
Maybe there is the other way to solve task like this?

It seems like there is no way to trace PID from parent to child if you invoke process in new xfce4-terminal as terminal process dies right after it executed given command. So I came to the solution which is not perfect, but acceptable in my situation. I run and put compiler's processes in background and redirect output to .log file. Then I run tail on these logfiles and I kill all tails which belongs to current $USER when compilers from current batch are done, then I run the other batch.
#!/bin/bash
for i in {1..8}; do
./sleep.sh > ./process_$i.log &
prcid=$!
xfce4-terminal -e "tail -f ./process_$i.log" &
pids[$i]=$prcid
if (( $i % 4 == 0 ))
then
for pid in ${pids[*]}; do
wait $pid
done
killall -u $USER tail
fi
done
Hopefully there will be no other tails running at that time :)

Related

How to run a second process with dependency to a running process using shell script?

I want to do an automation for 2 processes using shell script.
I have 2 programs, a is a workload, b is a CPU profiler to profile the cpu when a is running.
Previously, I run these programs manually, by opening 2 terminals. First, run a in the first terminal, then in another terminal, I get the process ID of a, and finally run ./b [pid-of-a]. This has caused me to miss the profiling of the first few seconds of process a.
I tried:
./a &
pid=$! &
./b pid
But it does not work the way I wanted. It runs b first and returns an error because the PID of a does not exist. I can't use && as well because it will wait a to finish first before b starts which is not the way I want.
What modification should I do to my code regarding such dependency?
Don't set pid in the background, and remember to put a $ when you want to expand it:
./a &
pid=$!
./b "$pid"
Or just
./a &
./b $!
Write a bash script which will check for ProcessA using pgrep and gets its pid
#!/bin/sh
while true
do
pid=`pgrep -f processA`
if [ ! -z $pid ]
then
./processB $pid
break
fi
done

Ending an mpirun process terminates a bash loop

I'm trying to schedule a series of mpi jobs on an Ubuntu 14.04 LTS machine using a bash script. Basically, I want a simulation to run on every core for a certain amount of time, then terminate and move on to the next case once that time has elapsed.
My issue arises when mpi exits at the end of the first job - it breaks the loop and returns the terminal to my control instead of heading onto the next iteration of the loop.
My script is included below. The file "case_names" is just a text file of directory names. I've tested the script with other commands and it works fine until I uncomment the mpirun call.
#!/bin/bash
while read line;
do
# Access case dierctory
cd $line
echo "Case $line accessed"
# Start simulation
echo "Case $line starting: $(date)"
mpirun -q -np 8 dsmcFoamPlus -parallel > log.dsmcFoamPlus &
# Wait for 10 hour runtime
sleep 36000
# Kill job
pkill mpirun > /dev/null
echo "Case $line terminated: $(date)"
# Return to parent directory
cd ..
done < case_names
Does anyone know of a way to stop mpirun from breaking the loop like this?
So far I've tried GNOME task scheduler and task-spooler, but neither have worked (likely due to aliases that have to be invoked before the commands I use become available). I'd really rather not have to resort to setting up slurm. I've also tried using the disown command to separate the mpi process from the shell I'm running the scheduling script in, and have even written a separate script just to kill processes which the scheduling script runs remotely.
Many thanks in advance!
I've managed to find a workaround that allows me to schedule tasks with a bash script like I wanted. Since this solves my issue, I'm posting it as an answer (although I would still welcome an explanation as to why mpi behaves in this way in loops).
The solution lay in writing a separate script for both calling and then killing mpi, which would itself be called by the scheduling script. Since this child bash process has no loops in it, there are no issues with mpi breaking them after being killed. Also, once this script has exited, the scheduling loop can continue unimpeded.
My (now working) code is included below.
Scheduling script:
while read line;
do
cd $line
echo "CWD: $(pwd)"
echo "Case $line accessed"
bash ../run_job
echo "Case $line terminated: $(date)"
cd ..
done < case_names
Execution script (run_job):
mpirun -q -np 8 dsmcFoamPlus -parallel > log.dsmcFoamPlus &
echo "Case $line starting: $(date)"
sleep 600
pkill mpirun
I hope someone will find this useful.

How to properly sigint a bash script that is run from another bash script?

I have two scripts, in which one is calling the other, and needs to kill it after some time. A very basic, working example is given below.
main_script.sh:
#!/bin/bash
cd "${0%/*}" #make current working directory the folder of this script
./record.sh &
PID=$!
# perform some other commands
sleep 5
kill -s SIGINT $PID
#wait $PID
echo "Finished"
record.sh:
#!/bin/bash
cd "${0%/*}" #make current working directory the folder of this script
RECORD_PIDS=1
printf "WallTimeStart: %f\n\n" $(date +%s.%N) >> test.txt
top -b -p $RECORD_PIDS -d 1.00 >> test.txt
printf "WallTimeEnd: %f\n\n" $(date +%s.%N) >> test.txt
Now, if I run main_script.sh, it will not nicely close record.sh on finish: the top command will keep on running in the background (test.txt will grow until you manually kill the top process), even though the main_script is finished and the record script is killed using SIGINT.
If I ctrl+c the main_script.sh, everything shuts down properly. If I run record.sh on its own and ctrl+c it, everything shuts down properly as well.
If I uncomment wait, the script will hang and I will need to ctrl+z it.
I have already tried all kinds of things, including using 'trap' to launch some cleanup script when receiving a SIGINT, EXIT, and/or SIGTERM, but nothing worked. I also tried bring record.sh back to the foreground using fg, but that did not help too. I have been searching for nearly a day now already, with now luck unfortunately. I have made an ugly workaround which uses pidof to find the top process and kill it manually (from main_script.sh), and then I have to write the "WallTimeEnd" statement manually to it as well from the main_script.sh. Not very satisfactory to me...
Looking forward to any tips!
Cheers,
Koen
Your issue is that the SIGINT is delivered to bash rather than to top. One option would be to use a new session and send the signal to the process group instead, like:
#!/bin/bash
cd "${0%/*}" #make current working directory the folder of this script
setsid ./record.sh &
PID=$!
# perform some other commands
sleep 5
kill -s SIGINT -$PID
wait $PID
echo "Finished"
This starts the sub-script in a new process group and the -pid tells kill to signal every process in that group, which will include top.

Kill bash script foreground children when a signal comes

I am wrapping a fastcgi app in a bash script like this:
#!/bin/bash
# stuff
./fastcgi_bin
# stuff
As bash only executes traps for signals when the foreground script ends I can't just kill -TERM scriptpid because the fastcgi app will be kept alive.
I've tried sending the binary to the background:
#!/bin/bash
# stuff
./fastcgi_bin &
PID=$!
trap "kill $PID" TERM
# stuff
But if I do it like this, apparently the stdin and stdout aren't properly redirected because it does not connect with lighttpds mod_fastgi, the foreground version does work.
EDIT: I've been looking at the problem and this happens because bash redirects /dev/null to stdin when a program is launched in the background, so any way of avoiding this should solve my problem as well.
Any hint on how to solve this?
There are some options that come to my mind:
When a process is launched from a shell script, both belong to the same process group. Killing the parent process leaves the children alive, so the whole process group should be killed. This can be achieved by passing the negated PGID (Process Group ID) to kill, which is the same as the parent's PID. ej: kill -TERM -$PARENT_PID
Do not execute the binary as
a child, but replacing the script
process with exec. You lose the
ability to execute stuff afterwards
though, because exec completely
replaces the parent process.
Do not kill the shell script process, but the FastCGI binary. Then, in the script, examine the return code and act accordingly. e.g: ./fastcgi_bin || exit -1
Depending on how mod_fastcgi handles worker processes, only the second option might be viable.
I have no idea if this is an option for you or not, but since you have a bounty I am assuming you might go for ideas that are outside the box.
Could you rewrite the bash script in Perl? Perl has several methods of managing child processes. You can read perldoc perlipc and more specifics in the core modules IPC::Open2 and IPC::Open3.
I don't know how this will interface with lighttpd etc or if there is more functionality in this approach, but at least it gives you some more flexibility and some more to read in your hunt.
I'm not sure I fully get your point, but here's what I tried and the process seems to be able to manage the trap (call it trap.sh):
#!/bin/bash
trap "echo trap activated" TERM INT
echo begin
time sleep 60
echo end
Start it:
./trap.sh &
And play with it (only one of those commands at once):
kill -9 %1
kill -15 %1
Or start in foreground:
./trap.sh
And interrupt with control-C.
Seems to work for me.
What exactly does not work for you?
I wrote this script just minutes ago to kill a bash script and all of its children...
#!/bin/bash
# This script will kill all the child process id for a given pid
# based on http://www.unix.com/unix-dummies-questions-answers/5245-script-kill-all-child-process-given-pid.html
ppid=$1
if [ -z $ppid ] ; then
echo "This script kills the process identified by pid, and all of its kids";
echo "Usage: $0 pid";
exit;
fi
for i in `ps j | awk '$3 == '$ppid' { print $2 }'`
do
$0 $i
kill -9 $i
done
Make sure the script is executable, or you will get an error on the $0 $i
You can override the implicit </dev/null for a background process by redirecting stdin yourself, for example:
sh -c 'exec 3<&0; { read x; echo "[$x]"; } <&3 3<&- & exec 3<&-; wait'
Try keeping the original stdin using ./fastcgi_bin 0<&0 &:
#!/bin/bash
# stuff
./fastcgi_bin 0<&0 &
PID=$!./fastcgi_bin 0<&0 &
trap "kill $PID" TERM
# stuff
# test
#sh -c 'sleep 10 & lsof -p ${!}'
#sh -c 'sleep 10 0<&0 & lsof -p ${!}'
You can do that with a coprocess.
Edit: well, coprocesses are background processes that can have stdin and stdout open (because bash prepares fifos for them). But you still need to read/write to those fifos, and the only useful primitive for that is bash's read (possibly with a timeout or a file descriptor); nothing robust enough for a cgi. So on second thought, my advice would be not to do this thing in bash. Doing the extra work in the fastcgi, or in an http wrapper like WSGI, would be more convenient.

Limiting the time a program runs in Linux

In Linux I would like to run a program but only for a limited time, like 1 second. If the program exceeds this running time I would like to kill the process and show an error message.
Ah well. timeout(1).
DESCRIPTION
Start COMMAND, and kill it if still running after DURATION.
StackOverflow won't allow me to delete my answer since it's the accepted one. It's garnering down-votes since it's at the top of the list with a better solution below it. If you're on a GNU system, please use timeout instead as suggested by #wRAR. So in the hopes that you'll stop down-voting, here's how it works:
timeout 1s ./myProgram
You can use s, m, h or d for seconds (the default if omitted), minutes, hours or days. A nifty feature here is that you may specify another option -k 30s (before the 1s above) in order to kill it with a SIGKILL after another 30 seconds, should it not respond to the original SIGTERM.
A very useful tool. Now scroll down and up-vote #wRAR's answer.
For posterity, this was my original - inferior - suggestion, it might still be if some use for someone.
A simple bash-script should be able to do that for you
./myProgram &
sleep 1
kill $! 2>/dev/null && echo "myProgram didn't finish"
That ought to do it.
$! expands to the last backgrounded process (through the use of &), and kill returns false if it didn't kill any process, so the echo is only executed if it actually killed something.
2>/dev/null redirects kill's stderr, otherwise it would print something telling you it was unable to kill the process.
You might want to add a -KILL or whichever signal you want to use to get rid of your process too.
EDIT
As ephemient pointed out, there's a race here if your program finishes and the some other process snatches the pid, it'll get killed instead. To reduce the probability of it happening, you could react to the SIGCHLD and not try to kill it if that happens. There's still chance to kill the wrong process, but it's very remote.
trapped=""
trap 'trapped=yes' SIGCHLD
./myProgram &
sleep 1
[ -z "$trapped" ] && kill $! 2>/dev/null && echo '...'
Maybe CPU time limit (ulimit -t/setrlimit(RLIMIT_CPU)) will help?
you could launch it in a shell script using &
your_program &
pid=$!
sleep 1
if [ `pgrep $pid` ]
then
kill $pid
echo "killed $pid because it took too long."
fi
hope you get the idea, I'm not sure this is correct my shell skills need some refresh :)
tail -f file & pid=$!
sleep 10
kill $pid 2>/dev/null && echo '...'
If you have the sources, you can fork() early in main() and then have the parent process measure the time and possibly kill the child process. Just use standard system calls fork(), waitpid(), kill(), ... maybe some standard Unix signal handling. Not too complicated but takes some effort.
You can also script something on the shell although I doubt it will be as accurate with respect to the time of 1 second.
If you just want to measure the time, type time <cmd ...> on the shell.
Ok, so just write a short C program that forks, calls execlp or something similar in the child, measures the time in the parent and kills the child. Should be easy ...

Resources