bash - close script by error or by timeout [duplicate] - linux

This question already has answers here:
Timeout a command in bash without unnecessary delay
(24 answers)
Closed 2 years ago.
On stackoverflow there are many solutions - how to close script by timeout or close script if there is an error.
But how to have both approaches together?
If during execution of the script there is an error - close script.
If timeout is out - close script.
I have following code:
#!/usr/bin/env bash
set -e
finish_time=$1
echo "finish_time=" ${finish_time}
(./execute_something.sh) & pid=$!
sleep ${finish_time}
kill $pid
But if there is an error while execution - script still waits, when timeout would be out.

First, I won't use set -e.
You'll explicitly wait on the job you want; the exit status of wait will be the exit status of the job itself.
echo "finish_time = $1"
./execute_something.sh & pid=$!
sleep "$1" & sleep_pid=$!
wait -n # Waits for either the sleep or the script to finish
rv=$?
if kill -0 $pid; then
# Script still running, kill it
# and exit
kill -s ALRM $pid
wait $pid # exit status will indicte it was killed by SIGALRM
exit
else
# Script exited before sleep
kill $sleep_pid
exit $rv
fi
There is a slight race condition here; it goes as follows:
wait -n returns after sleep exits, indicating the script will exit on its own
The script exits before we can check if it is still running
As a result, we assume it actually exited before sleep.
But that just means we'll create a script that ran slightly over the threshold as finishing on time. That's probably not a distinction you care about.
Ideally, wait would set some shell parameter that indicates which process caused it to return.

Related

How to stop all background processes(running functions) by using Trap?

I have two long running functions which needs to be executed asynchronously. For simplicity, assume one function sends Mail to client every 10 seconds & other logs text file every two seconds.
I cannot use cron jobs to do this. Everything has to be in one script. Thus I have used infinite loops and sleep with & achieve asynchronous behavior.
Used to trap 'pkill -P $$' SIGINT SIGTERM to end all child processes(to end program) when user hits CTRL+Z (SIGINT) but this doesn't work. It again starts script execution even after pressing CTRL+Z.
How can I give user the ability to end program with keyboard clicks from same terminal?
Note: Those two functions are never ending until user manually stops the program.
echo "Press: CTRL+Z to Close program"
trap 'pkill -P $$' SIGINT SIGTERM
first_fun()
{
while :; do
echo "send Mail every 10 seconds"
sleep 10
done
}
second_fun()
{
while :; do
echo "log text file every 2 seconds"
sleep 2
done
}
first_fun &
second_fun &
Suggesting to use " to let the shell interpret $$. Like this:
trap "pkill -9 -P $$"
Also suggesting to kill all process running from current directory, because process ancestory is not always working (e.g using nohup command):
trap "pkill -9 -f $PWD"
Also suggesting to kill/stop a process with CTRL-C (the standard) and avoid CTRL-Z used for suspending processes.
When problem with your script was that the script exists after runs those two functions. So "$$" is no longer refers to the script. An easy fix is to put a wait at the end of the script. Also change to this might help
trap "pkill -P $$" INT TERM
But, what I would do is to kill those functions rather than killing the script:
echo "Press: CTRL+Z to Close program"
first_fun()
{
while :; do
echo "send Mail every 10 seconds"
sleep 10
done
}
second_fun()
{
while :; do
echo "log text file every 2 seconds"
sleep 2
done
}
_INTERRUPTED
_PID1
_PID2
interrupt()
{
# Do this once.
if [[ -z "$_INTERRUPTED" ]]; then
_INTERRUPTED='true'
kill -KILL "$_PID1"
kill -KILL "$_PID2"
fi
}
trap interrupt INT TERM EXIT
first_fun &
_PID1="$!"
second_fun &
_PID2="$!"
wait

shell script with trapped signal does not ignore signal

I'm working on a test automation system and I'm coming up with misbehaving programs. With the first one I'm already encountering some unexpected behavior.
trap "echo No thanks" INT
echo Let me just chill for $1 sec
sleep $1
echo All finished
Observed behavior:
sending SIGINT causes "No thanks" to be printed, the sleep is apparently interrupted immediately and "All finished" is also printed immediately after that.
behavior is same whether signal is sent separately or performed with keyboard ctrl+c.
same behavior is observed if sleep is backgrounded and we wait for it.
Expected behavior:
sending SIGINT to the process should result in "No thanks" to be printed for as long as the sleep runs, and then "All finished" will be printed before exiting, after the sleep finishes.
If the sleep is backgrounded, issuing keyboard ctrl+c should send SIGINT to the process group, which would include the sleep, so that should stop it prematurely. I am not sure what to expect
Questions:
How can I obtain desired behavior?
Why exactly does it behave like this (different from my expectation)?
The question is essentially a dupe of this but there are no satisfactory explanations in that answer.
Currently:
bash waits for sleep to exit
bash and sleep receive sigint
sleep dies
bash finishes waiting and runs the trap
This prevents your desired behavior because:
You didn't want sleep to die
You didn't want bash to wait for the command to complete before you run the trap
To fix this, you can have sleep ignore the sigint, and have bash run wait in a loop so that the main script gets back control after the ctrl-c, but still waits for the sleep to complete:
trap 'echo "No thanks"' INT
echo "Let me just chill for $1 sec"
# Run sleep in the background
sleep "$1" &
# Loop until we've successfully waited for all processes
until wait; do true; done
echo "All finished"

Is it possible to set time out from bash script? [duplicate]

This question already has answers here:
How do I limit the running time of a BASH script
(5 answers)
Closed 7 years ago.
Sometimes my bash scripts are hanging and hold without clear reason
So they actually can hang for ever ( script process will run until I kill it )
Is it possible to combine in the bash script time out mechanism in order to exit from the program after for example ½ hour?
This Bash-only approach encapsulates all the timeout code inside your script by running a function as a background job to enforce the timeout:
#!/bin/bash
Timeout=1800 # 30 minutes
function timeout_monitor() {
sleep "$Timeout"
kill "$1"
}
# start the timeout monitor in
# background and pass the PID:
timeout_monitor "$$" &
Timeout_monitor_pid=$!
# <your script here>
# kill timeout monitor when terminating:
kill "$Timeout_monitor_pid"
Note that the function will be executed in a separate process. Therefore the PID of the monitored process ($$) must be passed. I left out the usual parameter checking for the sake of brevity.
If you have Gnu coreutils, you can use the timeout command:
timeout 1800s ./myscript
To check if the timeout occurred check the status code:
timeout 1800s ./myscript
if (($? == 124)); then
echo "./myscript timed out after 30 minutes" >>/path/to/logfile
exit 124
fi

Starting a process from bash script failed

I have a central server where I periodically start a script (from cron) which checks remote servers. The check is performed serially, so first, one server then another ... .
This script (from the central server) starts another script(lets call it update.sh) on the remote machine, and that script(on the remote machine) is doing something like this:
processID=`pgrep "processName"`
kill $processID
startProcess.sh
The process is killed and then in the script startProcess.sh started like this:
pidof "processName"
if [ ! $? -eq 0 ]; then
nohup "processName" "processArgs" >> "processLog" &
pidof "processName"
if [! $? -eq 0]; then
echo "Error: failed to start process"
...
The update.sh, startprocess.sh and the actual binary of the process that it starts is on a NFS mounted from the central server.
Now what happens sometimes, is that the process that I try to start within the startprocess.sh is not started and I get the error. The strange part is that it is random, sometime the process on one machine starts and another time on that same machine doesn't start. I'm checking about 300 servers and the errors are always random.
There is another thing, the remote servers are at 3 different geo locations (2 in America and 1 in Europe), the central server is in Europe. From what I discover so far is that the servers in America have much more errors than those in Europe.
First I thought that the error has to have something to do with kill so I added a sleep between the kill and the startprocess.sh but that didn't make any difference.
Also it seems that the process from startprocess.sh is not started at all, or something happens to it right when it is being started, because there is no output in the logfile and there should be an output in the logfile.
So, here I'm asking for help
Does anybody had this kind of problem, or know what might be wrong?
Thanks for any help
(Sorry, but my original answer was fairly wrong... Here is the correction)
Using $? to get the exit status of the background process in startProcess.sh leads to wrong result. Man bash states:
Special Parameters
? Expands to the status of the most recently executed foreground
pipeline.
As You mentioned in your comment the proper way of getting the background process's exit status is using the wait built in. But for this bash has to process the SIGCHLD signal.
I made a small test environment for this to show how it can work:
Here is a script loop.sh to run as a background process:
#!/bin/bash
[ "$1" == -x ] && exit 1;
cnt=${1:-500}
while ((++c<=cnt)); do echo "SLEEPING [$$]: $c/$cnt"; sleep 5; done
If the arg is -x then it exits with exit status 1 to simulate an error. If arg is num, then waits num*5 seconds printing SLEEPING [<PID>] <counter>/<max_counter> to stdout.
The second is the launcher script. It starts 3 loop.sh scripts in the background and prints their exit status:
#!/bin/bash
handle_chld() {
local tmp=()
for i in ${!pids[#]}; do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
unset pids[i]
fi
done
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
The handle_chld function will handle the SIGCHLD signals. Setting option monitor enables for a non-interactive script to receive SIGCHLD. Then the trap is set for SIGCHLD signal.
Then background processes are started. All of their PIDs are remembered in pids array. If SIGCHLD is received then it is checked amongst the /proc/ directories which child process was stopped (the missing one) (it could be also checked using kill -0 <PID> bash built-in). After wait the exit status of the background process is stored in the famous $? pseudo variable.
The main script waits for all pids to stop (otherwise it could not get the exit status of its children) and the it stops itself.
An example output:
WAITING FOR: 13102 13103 13104
SLEEPING [13103]: 1/2
SLEEPING [13102]: 1/3
Stopped 13104; exit code: 1
WAITING FOR: 13102 13103
WAITING FOR: 13102 13103
SLEEPING [13103]: 2/2
SLEEPING [13102]: 2/3
WAITING FOR: 13102 13103
WAITING FOR: 13102 13103
SLEEPING [13102]: 3/3
Stopped 13103; exit code: 0
WAITING FOR: 13102
WAITING FOR: 13102
WAITING FOR: 13102
Stopped 13102; exit code: 0
STOPPED
It can be seen that the exit codes are reported correctly.
I hope this can help a bit!

How to kill a child process after a given timeout in Bash?

I have a bash script that launches a child process that crashes (actually, hangs) from time to time and with no apparent reason (closed source, so there isn't much I can do about it). As a result, I would like to be able to launch this process for a given amount of time, and kill it if it did not return successfully after a given amount of time.
Is there a simple and robust way to achieve that using bash?
P.S.: tell me if this question is better suited to serverfault or superuser.
(As seen in:
BASH FAQ entry #68: "How do I run a command, and have it abort (timeout) after N seconds?")
If you don't mind downloading something, use timeout (sudo apt-get install timeout) and use it like: (most Systems have it already installed otherwise use sudo apt-get install coreutils)
timeout 10 ping www.goooooogle.com
If you don't want to download something, do what timeout does internally:
( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec ping www.goooooogle.com )
In case that you want to do a timeout for longer bash code, use the second option as such:
( cmdpid=$BASHPID;
(sleep 10; kill $cmdpid) \
& while ! ping -w 1 www.goooooogle.com
do
echo crap;
done )
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) &
or to get the exit codes as well:
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) & waiter=$!
# wait on our worker process and return the exitcode
exitcode=$(wait $pid && echo $?)
# kill the waiter subshell, if it still runs
kill -9 $waiter 2>/dev/null
# 0 if we killed the waiter, cause that means the process finished before the waiter
finished_gracefully=$?
sleep 999&
t=$!
sleep 10
kill $t
I also had this question and found two more things very useful:
The SECONDS variable in bash.
The command "pgrep".
So I use something like this on the command line (OSX 10.9):
ping www.goooooogle.com & PING_PID=$(pgrep 'ping'); SECONDS=0; while pgrep -q 'ping'; do sleep 0.2; if [ $SECONDS = 10 ]; then kill $PING_PID; fi; done
As this is a loop I included a "sleep 0.2" to keep the CPU cool. ;-)
(BTW: ping is a bad example anyway, you just would use the built-in "-t" (timeout) option.)
Assuming you have (or can easily make) a pid file for tracking the child's pid, you could then create a script that checks the modtime of the pid file and kills/respawns the process as needed. Then just put the script in crontab to run at approximately the period you need.
Let me know if you need more details. If that doesn't sound like it'd suit your needs, what about upstart?
One way is to run the program in a subshell, and communicate with the subshell through a named pipe with the read command. This way you can check the exit status of the process being run and communicate this back through the pipe.
Here's an example of timing out the yes command after 3 seconds. It gets the PID of the process using pgrep (possibly only works on Linux). There is also some problem with using a pipe in that a process opening a pipe for read will hang until it is also opened for write, and vice versa. So to prevent the read command hanging, I've "wedged" open the pipe for read with a background subshell. (Another way to prevent a freeze to open the pipe read-write, i.e. read -t 5 <>finished.pipe - however, that also may not work except with Linux.)
rm -f finished.pipe
mkfifo finished.pipe
{ yes >/dev/null; echo finished >finished.pipe ; } &
SUBSHELL=$!
# Get command PID
while : ; do
PID=$( pgrep -P $SUBSHELL yes )
test "$PID" = "" || break
sleep 1
done
# Open pipe for writing
{ exec 4>finished.pipe ; while : ; do sleep 1000; done } &
read -t 3 FINISHED <finished.pipe
if [ "$FINISHED" = finished ] ; then
echo 'Subprocess finished'
else
echo 'Subprocess timed out'
kill $PID
fi
rm finished.pipe
Here's an attempt which tries to avoid killing a process after it has already exited, which reduces the chance of killing another process with the same process ID (although it's probably impossible to avoid this kind of error completely).
run_with_timeout ()
{
t=$1
shift
echo "running \"$*\" with timeout $t"
(
# first, run process in background
(exec sh -c "$*") &
pid=$!
echo $pid
# the timeout shell
(sleep $t ; echo timeout) &
waiter=$!
echo $waiter
# finally, allow process to end naturally
wait $pid
echo $?
) \
| (read pid
read waiter
if test $waiter != timeout ; then
read status
else
status=timeout
fi
# if we timed out, kill the process
if test $status = timeout ; then
kill $pid
exit 99
else
# if the program exited normally, kill the waiting shell
kill $waiter
exit $status
fi
)
}
Use like run_with_timeout 3 sleep 10000, which runs sleep 10000 but ends it after 3 seconds.
This is like other answers which use a background timeout process to kill the child process after a delay. I think this is almost the same as Dan's extended answer (https://stackoverflow.com/a/5161274/1351983), except the timeout shell will not be killed if it has already ended.
After this program has ended, there will still be a few lingering "sleep" processes running, but they should be harmless.
This may be a better solution than my other answer because it does not use the non-portable shell feature read -t and does not use pgrep.
Here's the third answer I've submitted here. This one handles signal interrupts and cleans up background processes when SIGINT is received. It uses the $BASHPID and exec trick used in the top answer to get the PID of a process (in this case $$ in a sh invocation). It uses a FIFO to communicate with a subshell that is responsible for killing and cleanup. (This is like the pipe in my second answer, but having a named pipe means that the signal handler can write into it too.)
run_with_timeout ()
{
t=$1 ; shift
trap cleanup 2
F=$$.fifo ; rm -f $F ; mkfifo $F
# first, run main process in background
"$#" & pid=$!
# sleeper process to time out
( sh -c "echo \$\$ >$F ; exec sleep $t" ; echo timeout >$F ) &
read sleeper <$F
# control shell. read from fifo.
# final input is "finished". after that
# we clean up. we can get a timeout or a
# signal first.
( exec 0<$F
while : ; do
read input
case $input in
finished)
test $sleeper != 0 && kill $sleeper
rm -f $F
exit 0
;;
timeout)
test $pid != 0 && kill $pid
sleeper=0
;;
signal)
test $pid != 0 && kill $pid
;;
esac
done
) &
# wait for process to end
wait $pid
status=$?
echo finished >$F
return $status
}
cleanup ()
{
echo signal >$$.fifo
}
I've tried to avoid race conditions as far as I can. However, one source of error I couldn't remove is when the process ends near the same time as the timeout. For example, run_with_timeout 2 sleep 2 or run_with_timeout 0 sleep 0. For me, the latter gives an error:
timeout.sh: line 250: kill: (23248) - No such process
as it is trying to kill a process that has already exited by itself.
#Kill command after 10 seconds
timeout 10 command
#If you don't have timeout installed, this is almost the same:
sh -c '(sleep 10; kill "$$") & command'
#The same as above, with muted duplicate messages:
sh -c '(sleep 10; kill "$$" 2>/dev/null) & command'

Resources