Ending an mpirun process terminates a bash loop - linux

I'm trying to schedule a series of mpi jobs on an Ubuntu 14.04 LTS machine using a bash script. Basically, I want a simulation to run on every core for a certain amount of time, then terminate and move on to the next case once that time has elapsed.
My issue arises when mpi exits at the end of the first job - it breaks the loop and returns the terminal to my control instead of heading onto the next iteration of the loop.
My script is included below. The file "case_names" is just a text file of directory names. I've tested the script with other commands and it works fine until I uncomment the mpirun call.
#!/bin/bash
while read line;
do
# Access case dierctory
cd $line
echo "Case $line accessed"
# Start simulation
echo "Case $line starting: $(date)"
mpirun -q -np 8 dsmcFoamPlus -parallel > log.dsmcFoamPlus &
# Wait for 10 hour runtime
sleep 36000
# Kill job
pkill mpirun > /dev/null
echo "Case $line terminated: $(date)"
# Return to parent directory
cd ..
done < case_names
Does anyone know of a way to stop mpirun from breaking the loop like this?
So far I've tried GNOME task scheduler and task-spooler, but neither have worked (likely due to aliases that have to be invoked before the commands I use become available). I'd really rather not have to resort to setting up slurm. I've also tried using the disown command to separate the mpi process from the shell I'm running the scheduling script in, and have even written a separate script just to kill processes which the scheduling script runs remotely.
Many thanks in advance!

I've managed to find a workaround that allows me to schedule tasks with a bash script like I wanted. Since this solves my issue, I'm posting it as an answer (although I would still welcome an explanation as to why mpi behaves in this way in loops).
The solution lay in writing a separate script for both calling and then killing mpi, which would itself be called by the scheduling script. Since this child bash process has no loops in it, there are no issues with mpi breaking them after being killed. Also, once this script has exited, the scheduling loop can continue unimpeded.
My (now working) code is included below.
Scheduling script:
while read line;
do
cd $line
echo "CWD: $(pwd)"
echo "Case $line accessed"
bash ../run_job
echo "Case $line terminated: $(date)"
cd ..
done < case_names
Execution script (run_job):
mpirun -q -np 8 dsmcFoamPlus -parallel > log.dsmcFoamPlus &
echo "Case $line starting: $(date)"
sleep 600
pkill mpirun
I hope someone will find this useful.

Related

How do I setup two curl commands to execute at different times forever?

For example, I want to run one command every 10 seconds and the other command every 5 minutes. I can only get the first one to log properly to a text file. Below is the shell script I am working on:
echo "script Running. Press CTRL-C to stop the process..."
while sleep 10;
do
curl -s -I --http2 https://www.ubuntu.com/ >> new.txt
echo "------------1st command--------------------" >> logs.txt;
done
||
while sleep 300;
do
curl -s -I --http2 https://www.google.com/
echo "-----------------------2nd command---------------------------" >> logs.txt;
done
I would advise you to go with #Marvin Crone's answer, but researching cronjobs and back-ground processes doesn't seem like the kind of hassle I would go through for this little script. Instead, try putting both loops into separate scripts; like so:
script1.sh
echo "job 1 Running. Type fg 1 and press CTRL-C to stop the process..."
while sleep 10;
do
echo $(curl -s -I --http2 https://www.ubuntu.com/) >> logs.txt;
done
script2.sh
echo "job 2 Running. Type fg 2 and press CTRL-C to stop the process..."
while sleep 300;
do
echo $(curl -s -I --http2 https://www.google.com/) >> logs.txt;
done
adding executable permissions
chmod +x script1.sh
chmod +x script2.sh
and last but not least running them:
./script1.sh & ./script2.sh &
this creates two separate jobs in the background that you can call by typing:
fg (1 or 2)
and stop them with CTRL-C or send them to background again by typing CTRL-Z
I think what is happening is that you start the first loop. Your first loop needs to complete before the second loop will start. But, the first loop is designed to be infinite.
I suggest you put each curl loop in a separate batch file.
Then, you can run each batch file separately, in the background.
I offer two suggestions for you to investigate for your solution.
One, research the use of crontab and set up a cron job to run the batch files.
Two, research the use of nohup as a means of running the batch files.
I strongly suggest you also research the means of monitoring the jobs and knowing how to terminate the jobs if anything goes wrong. You are setting up infinite loops. A simple Control C will not terminate jobs running in the background. You are treading in areas that can get out of control. You need to know what you are doing.

Parallel run and wait for pocesses from subshell

Hi all/ I'm trying to make something like parallel tool for shell simply because the functionality of parallel is not enough for my task. The reason is that I need to run different versions of compiler.
Imagine that I need to compile 12 programs with different compilers, but I can run only 4 of them simultaneously (otherwise PC runs out of memory and crashes :). I also want to be able to observe what's going on with each compile, therefore I execute every compile in new window.
Just to make it easier here I'll replace compiler that I run with small script that waits and returns it's process id sleep.sh:
#!/bin/bash
sleep 30
echo $$
So the main script should look like parallel_run.sh :
#!/bin/bash
for i in {0..11}; do
xfce4-terminal -H -e "./sleep.sh" &
pids[$i]=$!
pstree -p $pids
if (( $i % 4 == 0 ))
then
for pid in ${pids[*]}; do
wait $pid
done
fi
done
The problem is that with $! I get pid of xfce4-terminal and not the program it executes. So if I look at ptree of 1st iteration I can see output from main script:
xfce4-terminal(31666)----{xfce4-terminal}(31668)
|--{xfce4-terminal}(31669)
and sleep.sh says that it had pid = 30876 at that time. Thus wait doesn't work at all in this case.
Q: How to get right PID of compiler that runs in subshell?
Maybe there is the other way to solve task like this?
It seems like there is no way to trace PID from parent to child if you invoke process in new xfce4-terminal as terminal process dies right after it executed given command. So I came to the solution which is not perfect, but acceptable in my situation. I run and put compiler's processes in background and redirect output to .log file. Then I run tail on these logfiles and I kill all tails which belongs to current $USER when compilers from current batch are done, then I run the other batch.
#!/bin/bash
for i in {1..8}; do
./sleep.sh > ./process_$i.log &
prcid=$!
xfce4-terminal -e "tail -f ./process_$i.log" &
pids[$i]=$prcid
if (( $i % 4 == 0 ))
then
for pid in ${pids[*]}; do
wait $pid
done
killall -u $USER tail
fi
done
Hopefully there will be no other tails running at that time :)

How to schedule a jar file on Linux for fixed time in future?

I need to run a jar file (datacollector.jar), say having path (/home/vagrant/datatool) on a Linux machine for a fixed interval of time. For example, from 18:00 10-06-2017 to 03:00 11-06-2017 in some future time. After that, the process should be killed.
I want to write a shell script for this which takes two arguments starting and ending time.
The script should also inform if its already running or not, and I should be able to manually stop it before ending time.
I am unable to figure this out after spending some researching online. How can I achieve this?
I come up with a solution in which myshell.sh take 4 arguments
[start_date] [start_time] [end_date] [end_time].
and script code :
#!/bin/bash
JAR_PATH=/home/vagrant/datatool/datacollector.jar
PID_PATH=/tmp/datacollector-id
#calculate wait time to run the jar
wait=$(($(date -d "$1 $2" "+%s")-$(date "+%s")))
wait_mins=$((wait/60))
#calculate time for which jar file should be executed
run_interval=$(($(date -d "$3 $4" "+%s")-$(date -d "$1 $2" "+%s")))
run_interval_mins=$((run_interval/60))
echo "tool will satrt after $wait_mins"
#wait before running jar file
sleep "$wait_mins"m
#run the jar file
nohup java -jar $JAR_PATH /tmp 2>> /dev/null >> /dev/null &
echo $! > $PID_PATH
#wait for jar execution time
sleep "$run_interval_mins"m
#kill the jar process
PID=$(cat $PID_PATH);
echo "tool process killed"
kill $PID;
echo "program terminated"
and I am running the code with command:
$ nohup ./myshell.sh 2017-06-08 20:07:00 2017-06-08 20:10:00 >> scriptoutput.txt 2>> /dev/null &
and scriptoutput.txt contains:
tool will satrt after 3
tool process killed
program terminated
Where I need to improve my code?
Is there any better way to do it?
There are no wonders: considering your post, you are probably not allowed to modify its source code and recompile it. Thus, you need an external thing, some other process what
starts the jar with a JVM
stops it
informs the user about its start and stop.
The simplest way to do that are the
at command (you can start something in the future with it)
cron command (you can periodically can execute commands with it in the background)
and, the shell scripts.
If it is only a single-time execution, the "at" command would be the best.
Learn about the linux shellscripting by googling for "linux shell scripting tutorial". You can have more specific answers about your problem on the http://unix.stackexchange.com .

How can I make a ksh script terminate itself if any issues?

I have written a few ksh scripts, about 6 scripts.
These are written to handle huge data files, something like 207 MB big. while running the script, sometimes it gets stuck and does not end.
Human interruption is required.
In production environment, I want it to run automatically, and should be able to end automatically if any issues without the need of any human interruption.
If there are some issues with a file, the script should end and start executing the next file.
How can make it terminate itself, if it gets stuck?
I assume, that the only way you see the issues is that the script takes too long. In that case a simple script that kills the process after a time-out should be sufficient:
#!/bin/bash
# Killersrcipt
PID=$1
TIME=$2
typeset -i i
i=0
while [ $i -lt $TIME ] ; do
if ps $PID > /dev/null ; then
i=$i+1
sleep 1
else
exit 0
fi
done
kill $PID
Your workflow would then be something like:
#!/bin/bash
process_1 &
killerscript $! 60
process_2 &
killerscript $! 30
...
If you have other ways to detect issues in your processes, you can easily add them to the loop in your killerscript.

Check if process runs if not execute script.sh

I am trying to find a way to monitor a process. If the process is not running it should be checked again to make sure it has really crashed. If it has really crashed run a script (start.sh)
I have tried monit with no succes, I have also tried adding this script in crontab: I made it executable with chmod +x monitor.sh
the actual program is called program1
case "$(pidof program | wc -w)" in
0) echo "Restarting program1: $(date)" >> /var/log/program1_log.txt
/home/user/files/start.sh &
;;
1) # all ok
;;
*) echo "Removed double program1: $(date)" >> /var/log/program1_log.txt
kill $(pidof program1 | awk '{print $1}')
;;
esac
The problem is this script does not work, I added it to crontab and set it to run every 2 minutes. If I close the program it won't restart.
Is there any other way to check a process, and run start.sh when it has crashed?
Not to be rude, but have you considered a more obvious solution?
When a shell (e.g. bash or tcsh) starts a subprocess, by default it waits for that subprocess to complete.
So why not have a shell that runs your process in a while(1) loop? Whenever the process terminates, for any reason, legitimate or not, it will automatically restart your process.
I ran into this same problem with mythtv. The backend keeps crashing on me. It's a Heisenbug. Happens like once a month (on average). Very hard to track down. So I just wrote a little script that I run in an xterm.
The, ahh, oninter business means that control-c will terminate the subprocess and not my (parent-process) script. Similarly, the sleep is in there so I can control-c several times to kill the subprocess and then kill the parent-process script while it's sleeping...
Coredumpsize is limited just because I don't want to fill up my disk with corefiles that I cannot use.
#!/bin/tcsh -f
limit coredumpsize 0
while( 1 )
echo "`date`: Running mythtv-backend"
# Now we cannot control-c this (tcsh) process...
onintr -
# This will let /bin/ls directory-sort my logfiles based on day & time.
# It also keeps the logfile names pretty unique.
mythbackend |& tee /....../mythbackend.log.`date "+%Y.%m.%d.%H.%M.%S"`
# Now we can control-c this (tcsh) process.
onintr
echo "`date`: mythtv-backend exited. Sleeping for 30 seconds, then restarting..."
sleep 30
end
p.s. That sleep will also save you in the event your subprocess dies immediately. Otherwise the constant respawning without delay will drive your IO and CPU through the roof, making it difficult to correct the problem.

Resources