Set timeout for shell script, to make it exit(0) when time is over - linux

When I set up a Jenkins job and found a problem about timeout for shell script.
It works like this:
Start Jenkins → control.sh is launched → test1.sh is launched in control.sh
Part code of control.sh is like:
#!/bin/sh
source func.sh
export TIMEOUT=30
# set timeout as 30s for test1.sh
( ( sleep $TIMEOUT && function_Timeout ) & ./test1.sh )
# this line of code is in a = loop actually
# it will launch test2.sh, test3.sh... one by one
# later, I want to set 30s time out for each of them.
function_Timeout() {
if [ ! -f test1_result_file]: then
killall test1.sh
# the test1_result_file will not
# be created if test1.sh is not finished executing.
fi
}
part of func.sh is as below
#!/bin/sh
function trap_fun() {
TRAP_CODE=$?
{ if [ $TRAP_CODE -ne 0 ]; then
echo "test aborted"
else
echo "test completed"
} 2>/dev/null
trap "trap_fun" EXIT
After control.sh is launched by Jenkins job, the whole control.sh will be terminated when time is over, and the line of killall test1.sh is reached, and the Jenkins job stop and fail.
I guess it's because test1.sh is killed and exit code is not 0, so it cause this problem.
So my question is, is there someway to terminate or end the sub-script (launched by the main one, like control.sh in my case) exit with code 0?
Updated on July 1:
Thanks for the answers so far, I tried #Leon's suggestion, but I found the code 124 sent by timeout's kill action, is still caught by the trap code - trap "trap_fun" EXIT, which is in func.sh.
I added more details. I did a lot google job but still not found a proper way to resolve this problem:(
Thanks for your kind help!

Use the timeout utility from coreutils:
#!/bin/sh
timeout 30 ./test1.sh
status=$?
if [ $status -eq 124 ] #timed out
then
exit 0
fi
exit $status
Note that this is slightly different from your version of timeout handling, where all running instances of test1.sh are being terminated if any one of them times out.

I resolved this problem finally, I added the code below in each testX.sh.
trap 'exit 0' SIGTERM SIGHUP
It is to make test1.sh exit normally after it receives killall signal.
Thanks to all the help!

Related

how to make sure first command finishes and then only execute second command in shell script

how to make sure first command finishes and then only execute second command in shell script
#!/bin/sh
echo "Stopping application"
#command to stop application
echo "Starting application"
#command to start application
In above code, I wanted to make sure that command to stop application is finished properly and then only start the application.
How to handle this.
Please note in my case if application is already stopped then command to stop application takes some random time to complete i.e. 20sec, 30 sec .
So adding sleep is not proper way.
Main moto behind script is to restart application.
Considering fact that if application is allready stopped it doesnt work properly.
If application is running then the script works perfect.
You can use the command return code and a condition to do this.
#!/bin/sh
echo "Stopping application"
#command to stop application
rc=$?
# if the stop command was executed successfuly
if [ $rc == 0 ]; then
echo "Starting application"
#command to start application
else
echo "ERROR - return code: $rc"
fi
There are 'exit codes', try this:
ls
...
echo $?
0
than:
ls non_existing_file
ls: cannot access 'non_existing_file': No such file or directory
echo $?
2
This command echo $? prints exit code of previous command, if it's 0 than it's OK, all non 0 codes means some kind of error which is not OK.

bash - close script by error or by timeout [duplicate]

This question already has answers here:
Timeout a command in bash without unnecessary delay
(24 answers)
Closed 2 years ago.
On stackoverflow there are many solutions - how to close script by timeout or close script if there is an error.
But how to have both approaches together?
If during execution of the script there is an error - close script.
If timeout is out - close script.
I have following code:
#!/usr/bin/env bash
set -e
finish_time=$1
echo "finish_time=" ${finish_time}
(./execute_something.sh) & pid=$!
sleep ${finish_time}
kill $pid
But if there is an error while execution - script still waits, when timeout would be out.
First, I won't use set -e.
You'll explicitly wait on the job you want; the exit status of wait will be the exit status of the job itself.
echo "finish_time = $1"
./execute_something.sh & pid=$!
sleep "$1" & sleep_pid=$!
wait -n # Waits for either the sleep or the script to finish
rv=$?
if kill -0 $pid; then
# Script still running, kill it
# and exit
kill -s ALRM $pid
wait $pid # exit status will indicte it was killed by SIGALRM
exit
else
# Script exited before sleep
kill $sleep_pid
exit $rv
fi
There is a slight race condition here; it goes as follows:
wait -n returns after sleep exits, indicating the script will exit on its own
The script exits before we can check if it is still running
As a result, we assume it actually exited before sleep.
But that just means we'll create a script that ran slightly over the threshold as finishing on time. That's probably not a distinction you care about.
Ideally, wait would set some shell parameter that indicates which process caused it to return.

Starting a process from bash script failed

I have a central server where I periodically start a script (from cron) which checks remote servers. The check is performed serially, so first, one server then another ... .
This script (from the central server) starts another script(lets call it update.sh) on the remote machine, and that script(on the remote machine) is doing something like this:
processID=`pgrep "processName"`
kill $processID
startProcess.sh
The process is killed and then in the script startProcess.sh started like this:
pidof "processName"
if [ ! $? -eq 0 ]; then
nohup "processName" "processArgs" >> "processLog" &
pidof "processName"
if [! $? -eq 0]; then
echo "Error: failed to start process"
...
The update.sh, startprocess.sh and the actual binary of the process that it starts is on a NFS mounted from the central server.
Now what happens sometimes, is that the process that I try to start within the startprocess.sh is not started and I get the error. The strange part is that it is random, sometime the process on one machine starts and another time on that same machine doesn't start. I'm checking about 300 servers and the errors are always random.
There is another thing, the remote servers are at 3 different geo locations (2 in America and 1 in Europe), the central server is in Europe. From what I discover so far is that the servers in America have much more errors than those in Europe.
First I thought that the error has to have something to do with kill so I added a sleep between the kill and the startprocess.sh but that didn't make any difference.
Also it seems that the process from startprocess.sh is not started at all, or something happens to it right when it is being started, because there is no output in the logfile and there should be an output in the logfile.
So, here I'm asking for help
Does anybody had this kind of problem, or know what might be wrong?
Thanks for any help
(Sorry, but my original answer was fairly wrong... Here is the correction)
Using $? to get the exit status of the background process in startProcess.sh leads to wrong result. Man bash states:
Special Parameters
? Expands to the status of the most recently executed foreground
pipeline.
As You mentioned in your comment the proper way of getting the background process's exit status is using the wait built in. But for this bash has to process the SIGCHLD signal.
I made a small test environment for this to show how it can work:
Here is a script loop.sh to run as a background process:
#!/bin/bash
[ "$1" == -x ] && exit 1;
cnt=${1:-500}
while ((++c<=cnt)); do echo "SLEEPING [$$]: $c/$cnt"; sleep 5; done
If the arg is -x then it exits with exit status 1 to simulate an error. If arg is num, then waits num*5 seconds printing SLEEPING [<PID>] <counter>/<max_counter> to stdout.
The second is the launcher script. It starts 3 loop.sh scripts in the background and prints their exit status:
#!/bin/bash
handle_chld() {
local tmp=()
for i in ${!pids[#]}; do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
unset pids[i]
fi
done
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
The handle_chld function will handle the SIGCHLD signals. Setting option monitor enables for a non-interactive script to receive SIGCHLD. Then the trap is set for SIGCHLD signal.
Then background processes are started. All of their PIDs are remembered in pids array. If SIGCHLD is received then it is checked amongst the /proc/ directories which child process was stopped (the missing one) (it could be also checked using kill -0 <PID> bash built-in). After wait the exit status of the background process is stored in the famous $? pseudo variable.
The main script waits for all pids to stop (otherwise it could not get the exit status of its children) and the it stops itself.
An example output:
WAITING FOR: 13102 13103 13104
SLEEPING [13103]: 1/2
SLEEPING [13102]: 1/3
Stopped 13104; exit code: 1
WAITING FOR: 13102 13103
WAITING FOR: 13102 13103
SLEEPING [13103]: 2/2
SLEEPING [13102]: 2/3
WAITING FOR: 13102 13103
WAITING FOR: 13102 13103
SLEEPING [13102]: 3/3
Stopped 13103; exit code: 0
WAITING FOR: 13102
WAITING FOR: 13102
WAITING FOR: 13102
Stopped 13102; exit code: 0
STOPPED
It can be seen that the exit codes are reported correctly.
I hope this can help a bit!

How can I check the exit status of multiple processes in background in loop?

I have a loop with a script in background
while read host
do
./script &
done
wait #waits till all the background processes are finished
but i want to check the exit status of the proceess the how would i do it
while read host
do
./script &
wait $! || let "FAIL+=1"
done
wait
echo $fail
But the will the above code execute parallely because time is an important factor
for me so i want to have parallel execution for all hosts.
Is this possible to know which process failed so that i can do
echo "these are the list of process ids in background that failed"
12346
43561
.....
And is there any limit to the number of parallel proceses that can be run in background.
is it safe to run about 20 parallel proceses in the above loop
You can add this to the end of your script:
RC=$?
test $RC -eq 0 || echo "$$ failed"
exit $RC
$$ returns the PID of the shell. Your backgrounded scripts will run in their own separate shells.
The exit $RC line is obviously optional.

How to kill a child process after a given timeout in Bash?

I have a bash script that launches a child process that crashes (actually, hangs) from time to time and with no apparent reason (closed source, so there isn't much I can do about it). As a result, I would like to be able to launch this process for a given amount of time, and kill it if it did not return successfully after a given amount of time.
Is there a simple and robust way to achieve that using bash?
P.S.: tell me if this question is better suited to serverfault or superuser.
(As seen in:
BASH FAQ entry #68: "How do I run a command, and have it abort (timeout) after N seconds?")
If you don't mind downloading something, use timeout (sudo apt-get install timeout) and use it like: (most Systems have it already installed otherwise use sudo apt-get install coreutils)
timeout 10 ping www.goooooogle.com
If you don't want to download something, do what timeout does internally:
( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec ping www.goooooogle.com )
In case that you want to do a timeout for longer bash code, use the second option as such:
( cmdpid=$BASHPID;
(sleep 10; kill $cmdpid) \
& while ! ping -w 1 www.goooooogle.com
do
echo crap;
done )
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) &
or to get the exit codes as well:
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) & waiter=$!
# wait on our worker process and return the exitcode
exitcode=$(wait $pid && echo $?)
# kill the waiter subshell, if it still runs
kill -9 $waiter 2>/dev/null
# 0 if we killed the waiter, cause that means the process finished before the waiter
finished_gracefully=$?
sleep 999&
t=$!
sleep 10
kill $t
I also had this question and found two more things very useful:
The SECONDS variable in bash.
The command "pgrep".
So I use something like this on the command line (OSX 10.9):
ping www.goooooogle.com & PING_PID=$(pgrep 'ping'); SECONDS=0; while pgrep -q 'ping'; do sleep 0.2; if [ $SECONDS = 10 ]; then kill $PING_PID; fi; done
As this is a loop I included a "sleep 0.2" to keep the CPU cool. ;-)
(BTW: ping is a bad example anyway, you just would use the built-in "-t" (timeout) option.)
Assuming you have (or can easily make) a pid file for tracking the child's pid, you could then create a script that checks the modtime of the pid file and kills/respawns the process as needed. Then just put the script in crontab to run at approximately the period you need.
Let me know if you need more details. If that doesn't sound like it'd suit your needs, what about upstart?
One way is to run the program in a subshell, and communicate with the subshell through a named pipe with the read command. This way you can check the exit status of the process being run and communicate this back through the pipe.
Here's an example of timing out the yes command after 3 seconds. It gets the PID of the process using pgrep (possibly only works on Linux). There is also some problem with using a pipe in that a process opening a pipe for read will hang until it is also opened for write, and vice versa. So to prevent the read command hanging, I've "wedged" open the pipe for read with a background subshell. (Another way to prevent a freeze to open the pipe read-write, i.e. read -t 5 <>finished.pipe - however, that also may not work except with Linux.)
rm -f finished.pipe
mkfifo finished.pipe
{ yes >/dev/null; echo finished >finished.pipe ; } &
SUBSHELL=$!
# Get command PID
while : ; do
PID=$( pgrep -P $SUBSHELL yes )
test "$PID" = "" || break
sleep 1
done
# Open pipe for writing
{ exec 4>finished.pipe ; while : ; do sleep 1000; done } &
read -t 3 FINISHED <finished.pipe
if [ "$FINISHED" = finished ] ; then
echo 'Subprocess finished'
else
echo 'Subprocess timed out'
kill $PID
fi
rm finished.pipe
Here's an attempt which tries to avoid killing a process after it has already exited, which reduces the chance of killing another process with the same process ID (although it's probably impossible to avoid this kind of error completely).
run_with_timeout ()
{
t=$1
shift
echo "running \"$*\" with timeout $t"
(
# first, run process in background
(exec sh -c "$*") &
pid=$!
echo $pid
# the timeout shell
(sleep $t ; echo timeout) &
waiter=$!
echo $waiter
# finally, allow process to end naturally
wait $pid
echo $?
) \
| (read pid
read waiter
if test $waiter != timeout ; then
read status
else
status=timeout
fi
# if we timed out, kill the process
if test $status = timeout ; then
kill $pid
exit 99
else
# if the program exited normally, kill the waiting shell
kill $waiter
exit $status
fi
)
}
Use like run_with_timeout 3 sleep 10000, which runs sleep 10000 but ends it after 3 seconds.
This is like other answers which use a background timeout process to kill the child process after a delay. I think this is almost the same as Dan's extended answer (https://stackoverflow.com/a/5161274/1351983), except the timeout shell will not be killed if it has already ended.
After this program has ended, there will still be a few lingering "sleep" processes running, but they should be harmless.
This may be a better solution than my other answer because it does not use the non-portable shell feature read -t and does not use pgrep.
Here's the third answer I've submitted here. This one handles signal interrupts and cleans up background processes when SIGINT is received. It uses the $BASHPID and exec trick used in the top answer to get the PID of a process (in this case $$ in a sh invocation). It uses a FIFO to communicate with a subshell that is responsible for killing and cleanup. (This is like the pipe in my second answer, but having a named pipe means that the signal handler can write into it too.)
run_with_timeout ()
{
t=$1 ; shift
trap cleanup 2
F=$$.fifo ; rm -f $F ; mkfifo $F
# first, run main process in background
"$#" & pid=$!
# sleeper process to time out
( sh -c "echo \$\$ >$F ; exec sleep $t" ; echo timeout >$F ) &
read sleeper <$F
# control shell. read from fifo.
# final input is "finished". after that
# we clean up. we can get a timeout or a
# signal first.
( exec 0<$F
while : ; do
read input
case $input in
finished)
test $sleeper != 0 && kill $sleeper
rm -f $F
exit 0
;;
timeout)
test $pid != 0 && kill $pid
sleeper=0
;;
signal)
test $pid != 0 && kill $pid
;;
esac
done
) &
# wait for process to end
wait $pid
status=$?
echo finished >$F
return $status
}
cleanup ()
{
echo signal >$$.fifo
}
I've tried to avoid race conditions as far as I can. However, one source of error I couldn't remove is when the process ends near the same time as the timeout. For example, run_with_timeout 2 sleep 2 or run_with_timeout 0 sleep 0. For me, the latter gives an error:
timeout.sh: line 250: kill: (23248) - No such process
as it is trying to kill a process that has already exited by itself.
#Kill command after 10 seconds
timeout 10 command
#If you don't have timeout installed, this is almost the same:
sh -c '(sleep 10; kill "$$") & command'
#The same as above, with muted duplicate messages:
sh -c '(sleep 10; kill "$$" 2>/dev/null) & command'

Resources