Write a bash script to check if process is responding in x seconds? - linux

How can I write a script to check if a process is taking a number of seconds to respond, and if over that number kill it?
I've tried the timeout command, but the problem is it is a source dedicated sever, and when i edit it's bash script:
HL=./srcds_linux
echo "Using default binary: $HL"
and change it to timeout 25 ./srcds_linux and run it as root it won't run the server:
ERROR: Source Engine binary '' not found, exiting
So assuming that I can't edit the servers bash script, is there a way to create a script that can check if any program, not executed w/ the script is timing out in x seconds?

It sounds like the problem is that you're changing the script wrong.
If you're looking at this script, the logic basically goes:
HL=./srcds_linux
if ! test -f "$HL"
then
echo "Command not found"
fi
$HL
It sounds like you're trying to set HL="timeout 25 ./srcds_linux". This will cause the file check to fail.
The somewhat more correct way is to change the invocation, not the file to invoke:
HL=./srcds_linux
if ! test -f "$HL
then
echo "Command not found"
fi
timeout 25 $HL
timeout kills the program if it takes too long, though. It doesn't care whether the program is responding to anything, just that it takes longer than 25 seconds doing it.
If the program appears to hang, you could e.g. check whether it stops outputting data for 25 seconds:
your_command_to_start_your_server | while read -t 25 foo; do echo "$foo"; done
echo "The command hasn't said anything for 25 seconds, killing it!"
pkill ./srcds_linux

Related

Different behaviour of bash script on supervisor start and restart

I have bash script which do something, (for example:)
[program:long_script]
command=/usr/local/bin/long.sh
autostart=true
autorestart=true
stderr_logfile=/var/log/long.err.log
stdout_logfile=/var/log/long.out.log
and it is bounded to supervisor.
I want to add if check in this script to determine is it executed by:
supervisor> start long_script
or
supervisor> restart long_script
I want something like that:
if [ executed by start command ]
then
echo "start"
else
echo "restart"
fi
but i don't know what should be in if clause.
Is it possible to determine this?
If not, how to achieve different behaviour of script for start and restart commands?
Please help.
Within the code there is no current difference between a restart and a stop/start. Restart within the supervisorctl calls:
self.do_stop(arg)
self.do_start(arg)
There is no status within the app for "restart" though there is some discussion of allowing different signals. The supervisor is already able to send different signals to the process. (allowing more control over reload/restart has been a long standing "gap")
This means you have at least two options but the key to making this work is that the process needs to record some state at shutdown
Option 1. The easiest option would be to use the supervisorctl signal <singal> <process> instead of calling supervisorctl restart <process> and record somewhere what signal was sent so that on startup you can read back the last signal.
Option 2. However a more interesting solution is to not expect any upstream changes ie continue to allow restart to be used and distinguish between stop, crash and restart
In this case, the only information that will be different between a start and a restart is that a restart should have a much shorter time between the shutdown of the old process and the start of the new process. So if, on shutdown, a timestamp is recorded, then on startup, the difference between now and the last shutdown will distinguish between a start and a restart
To do this, Ive got a definition like yours but with stopsignal defined:
[program:long_script]
command=/usr/local/bin/long.sh
autostart=true
autorestart=true
stderr_logfile=/var/log/long.err.log
stdout_logfile=/var/log/long.out.log
stopsignal=SIGUSR1
By making the stop from supervisord a specific signal, you can tell the difference between a crash and a normal stop event, and not interfere with normal kill or interrupt signals
Then as the very first line in the bash script, i set a trap for this singal:
trap "mkdir -p /var/run/long/; date +%s > /var/run/long/last.stop; exit 0" SIGUSR1
This means the date as epoch will be recorded in the file /var/run/long/last.stop everytime we are sent a stop from supervisord
Then as the immediate next lines in the script, calculate the difference between the last stop and now
stopdiff=0
if [ -e /var/run/long/last.stop ]; then
curtime=$(date +%s)
stoptime=$(cat /var/run/long/last.stop | grep "[0-9]*")
if [ -n "${stoptime}" ]; then
stopdiff=$[ ${curtime} - ${stoptime} ]
fi
else
stopdiff=9999
fi
stopdiff will now contain the difference in seconds between the stop and start or 9999 if the stop file didnt exist.
This can then be used to decide what to do:
if [ ${stopdiff} -gt 2 ]; then
echo "Start detected (${stopdiff} sec difference)"
elif [ ${stopdiff} -ge 0 ]; then
echo "Restart detected (${stopdiff} sec difference)"
else
echo "Error detected (${stopdiff} sec difference)"
fi
You'll have to make some choices about how long it actually takes to get from sending a stop to the script actually starting: here, ive allowed only 2 seconds and anything greater is considered a "start". If the shutdown of the script needs to happen in a specific way, you'll need a bit more complexity in the trap statement (rather than just exit 0
Since a crash shouldnt record any timestamp to the stop file, you should be able to tell that a startup is occurring because of a crash if you also regularly recorded somewhere a running timestamp.
I understand your problem. But I don't know about supervisor. Please check whether this idea works.
Instantiate a global string variable and put values to the variable before you enter the supervisor commands. Here I am making your each start and restart commands as two bash programs.
program : supervisor_start.sh
#!/bin/bash
echo "Starting.."
supervisor> start long_script
supervisor_started_command="start" # This is the one
echo "Started.."
program : supervisor_restart.sh
#!/bin/bash
echo "ReStarting.."
supervisor> restart long_script
supervisor_started_command="restart" # This is the one
echo "ReStarted.."
Now you can see what is in "supervisor_started_command" variable :)
#!/bin/bash
if [ $supervisor_started_command == "start" ]
then
echo "start"
elif [ $supervisor_started_command == "restart" ]
echo "restart"
fi
Well, I don't know this idea works for you or not..

How to kill a process on no output for some period of time

I've written a program that is suppose to run for a long time and it outputs the progress to stdout, however, under some circumstances it begins to hang and the easiest thing to do is to restart it.
My question is: Is there a way to do something that would kill the process only if it had no output for a specific number of seconds?
I have started thinking about it, and the only thing that comes to mind is something like this:
./application > output.log &
tail -f output.log
then create script which would look at the date and time of the last modification on output.log and restart the whole thing.
But it looks very tedious, and i would hate to go through all that if there were an existing command for that.
As far as I know, there isn't a standard utility to do it, but a good start for a one-liner would be:
timeout=10; if [ -z "`find output.log -newermt #$[$(date +%s)-${timeout}]`" ]; then killall -TERM application; fi
At least, this will avoid the tedious part of coding a more complex script.
Some hints:
Using the find utility to compare the last modification date of the output.log file against a time reference.
The time reference is returned by date utility as the current time in seconds (+%s) since EPOCH (1970-01-01 UTC).
Using bash $[] operation to subtract the $timeout value (10 seconds on the example)
If no output is returned from the above find, then the file wasn't changed for more than 10 seconds. This will trigger a true in the if condition and the killall command will be executed.
You can also set an alias for that, using:
alias kill_application='timeout=10; if [ -z "`find output.log -newermt #$[$(date +%s)-${timeout}]`" ]; then killall -TERM application; fi';
And then use it whenever you want by just issuing the command kill_application
If you want to automatically restart the application without human intervention, you can install a crontab entry to run every minute or so and also issue the application restart command after the killall (Probably you may also want to change the -TERM to -KILL, just in case the application becomes unresponsive to handleable signals).
The inotifywait could help here, it efficiently waits for changes to files. The exit status can be checked to identify if the event (modify) occurred in the specified interval of time.
$ inotifywait -e modify -t 10 output.log
Setting up watches.
Watches established.
$ echo $?
2
Some related info from man:
OPTIONS
-e <event>, --event <event>
Listen for specific event(s) only.
-t <seconds>, --timeout <seconds>
Exit if an appropriate event has not occurred within <seconds> seconds.
EXIT STATUS
2 The -t option was used and an event did not occur in the specified interval of time.
EVENTS
modify A watched file or a file within a watched directory was written to.

Is it possible to set time out from bash script? [duplicate]

This question already has answers here:
How do I limit the running time of a BASH script
(5 answers)
Closed 7 years ago.
Sometimes my bash scripts are hanging and hold without clear reason
So they actually can hang for ever ( script process will run until I kill it )
Is it possible to combine in the bash script time out mechanism in order to exit from the program after for example ½ hour?
This Bash-only approach encapsulates all the timeout code inside your script by running a function as a background job to enforce the timeout:
#!/bin/bash
Timeout=1800 # 30 minutes
function timeout_monitor() {
sleep "$Timeout"
kill "$1"
}
# start the timeout monitor in
# background and pass the PID:
timeout_monitor "$$" &
Timeout_monitor_pid=$!
# <your script here>
# kill timeout monitor when terminating:
kill "$Timeout_monitor_pid"
Note that the function will be executed in a separate process. Therefore the PID of the monitored process ($$) must be passed. I left out the usual parameter checking for the sake of brevity.
If you have Gnu coreutils, you can use the timeout command:
timeout 1800s ./myscript
To check if the timeout occurred check the status code:
timeout 1800s ./myscript
if (($? == 124)); then
echo "./myscript timed out after 30 minutes" >>/path/to/logfile
exit 124
fi

defer pipe process to background after text match

So I have a bash command to start a server and it outputs some lines before getting to the point where it outputs something like "Server started, Press Control+C to exit". How do I pipe this output so when this line occurs i put this process in the background and continue with another script/function (i.e to do stuff that needs to wait until the server starts such as run tests)
I want to end up with 3 functions
start_server
run_tests
stop_server
I've got something along the lines of:
function read_server_output{
while read data; do
printf "$data"
if [[ $data == "Server started, Press Control+C to exit" ]]; then
# do something here to put server process in the background
# so I can run another function
fi
done
}
function start_server{
# start the server and pipe its output to another function to check its running
start-server-command | read_server_output
}
function run_test{
# do some stuff
}
function stop_server{
# stop the server
}
# run the bash script code
start_server()
run_tests()
stop_tests()
related question possibly SH/BASH - Scan a log file until some text occurs, then exit. How?
Thanks in advance I'm pretty new to this.
First, a note on terminology...
"Background" and "foreground" are controlling-terminal concepts, i.e., they have to do with what happens when you type ctrl+C, ctrl+Z, etc. (which process gets the signal), whether a process can read from the terminal device (a "background" process gets a SIGTTIN that by default causes it to stop), and so on.
It seems clear that this has little to do with what you want to achieve. Instead, you have an ill-behaved program (or suite of programs) that needs some special coddling: when the server is first started, it needs some hand-holding up to some point, after which it's OK. The hand-holding can stop once it outputs some text string (see your related question for that, or the technique below).
There's a big potential problem here: a lot of programs, when their output is redirected to a pipe or file, produce no output until they have printed a "block" worth of output, or are exiting. If this is the case, a simple:
start-server-command | cat
won't print the line you're looking for (so that's a quick way to tell whether you will have to work around this issue as well). If so, you'll need something like expect, which is an entirely different way to achieve what you want.
Assuming that's not a problem, though, let's try an entirely-in-shell approach.
What you need is to run the start-server-command and save the process-ID so that you can (eventually) send it a SIGINT signal (as ctrl+C would if the process were "in the foreground", but you're doing this from a script, not from a controlling terminal, so there's no key the script can press). Fortunately sh has a syntax just for this.
First let's make a temporary file:
#! /bin/sh
# myscript - script to run server, check for startup, then run tests
TMPFILE=$(mktemp -t myscript) || exit 1 # create /tmp/myscript.<unique>
trap "rm -f $TMPFILE" 0 1 2 3 15 # arrange to clean up when done
Now start the server and save its PID:
start-server-command > $TMPFILE & # start server, save output in file
SERVER_PID=$! # and save its PID so we can end it
trap "kill -INT $SERVER_PID; rm -f $TMPFILE" 0 1 2 3 15 # adjust cleanup
Now you'll want to scan through $TMPFILE until the desired output appears, as in the other question. Because this requires a certain amount of polling you should insert a delay. It's also probably wise to check whether the server has failed and terminated without ever getting to the "started" point.
while ! grep '^Server started, Press Control+C to exit$' >/dev/null; do
# message has not yet appeared, is server still starting?
if kill -0 $SERVER_PID 2>/dev/null; then
# server is running; let's wait a bit and try grepping again
sleep 1 # or other delay interval
else
echo "ERROR: server terminated without starting properly" 1>&2
exit 1
fi
done
(Here kill -0 is used to test whether the process still exists; if not, it has exited. The "cleanup" kill -INT will produce an error message, but that's probably OK. If not, either redirect that kill command's error-output, or adjust the cleanup or do it manually, as seen below.)
At this point, the server is running and you can do your tests. When you want it to exit as if the user hit ctrl+C, send it a SIGINT with kill -INT.
Since there's a kill -INT in the trap set for when the script exits (0) as well as when it's terminated by SIGHUP (1), SIGINT (2), SIGQUIT (3), and SIGTERM (15)—that's the:
trap "do some stuff" 0 1 2 3 15
part—you can simply let your script exit at this point, unless you want to specifically wait for the server to exit too. If you want that, perhaps:
kill -INT $SERVER_PID; rm -f $TMPFILE # do the pre-arranged cleanup now
trap - 0 1 2 3 15 # don't need it arranged anymore
wait $SERVER_PID # wait for server to finish exit
would be appropriate.
(Obviously none of the above is tested, but that's the general framework.)
Probably the easiest thing to do is to start it in the background and block on reading its output. Something like:
{ start-server-command & } | {
while read -r line; do
echo "$line"
echo "$line" | grep -q 'Server started' && break
done
cat &
}
echo script continues here after server outputs 'Server started' message
But this is a pretty ugly hack. It would be better if the server could be modified to perform a more specific action which the script could wait for.

Instance limited cron job

I want to run a cron job every minute that will launch a script. Simple enough there. However, I need to make sure that not more than X number (defined in the script) of instances are ever running. These are queue workers, so if at any minute interval 6 workers are still active, then I would not launch another instance. The script simply launches a PHP script which exits if no job available. Right now I have a shell script that perpetually launches itself every 10 seconds after exit... but there are long periods of time where there are no jobs, and a minute delay is fine. Eventually I would like to have two cron jobs for peak and off-peak, with different intervals.
Make sure you have unique script name.
Then check if 6 instances are already running
if [ $(pgrep '^UNIQUE_SCIPT_NAME$' -c) -lt 6 ]
then
# start my script
else
# do not start my script
fi
I'd say that if you want to iterate as often as every minute, then a process like your current shell script that relaunches itself is what you actually want to do. Just increase the delay from 10 seconds to a minute.
That way, you can also more easily control your delay for peak and off-peak, as you wanted. It would be rather elegant to simply use a shorter delay if the script found something to do the last time it was launched, or a longer delay if it did not find anything.
You could use a script like OneAtATime to guard against multiple simultaneous executions.
This is what i am using in my shell scripts:
echo -n "Checking if job is already running... "
me=`basename $0`
running=$(ps aux | grep ${me} | grep -v .log | grep -v grep | wc -l)
if [ $running -gt 1 ];
then
echo "already running, stopping job"
exit 1
else
echo "OK."
fi;
The command you're looking for is in line 3. Just replace $(me) with your php script name. In case you're wondering about the grep .log part: I'm piping the output into a log file, whose name partially contains the script name, so this way i'm avoiding it to be double-counted.

Resources