Monitoring bash script won't terminate - linux

I have this bash script whose job is to monitor a log file for the occurrence of a certain line. When located, the script will send out an email warning and then terminate itself. For some reason, it keeps on running. How can I be sure to terminate bash script below:
#!/bin/sh
tail -n 0 -f output.err | grep --line-buffered "Exception" | while read line
do
echo "An exception has been detected!" | mail -s "ALERT" monitor#company.com
exit 0
done

You are opening a subshell in the while read and that subshell is who is exiting, not the proper one.
Try before entering the while loop:
SHELLPID=$$
And then in the loop:
kill $SHELLPID
exit 0
Or change your loop to not use a subshell.
Since the parent script is always going to be in the tail -f which never ends I think you have no other choice than killing it from the inner subshell.

Try something like this:
tail -n 0 -f output.err | grep --line-buffered "Exception" | while read line
do
echo "An exception has been detected!" | mail -s "ALERT" monitor#company.com
kill -term `ps ax | grep tail | grep output.err | awk '{print $1}'`
done
This should work, provided you have only one tail keeping an eye on this particular file.

Related

Multiple PIDs being stored in PID file

I have a System V init script I've developed that starts a Java program. For some reason whenever the PID file gets created, it contains multiple PIDs instead of one.
Here's the relevant code that starts the service and writes to the PID file:
daemon --pidfile=$pidfile "$JAVA_CMD &" >> $logfile 2>&1
RETVAL=$?
usleep 500000
if [ $RETVAL -eq 0 ]; then
touch "$lock"
PID=$(ps aux | grep -vE 'grep|runuser|bash' | grep <myservice> | awk '{print $2}')
echo $PID > $pidfile
When I test the ps aux... command manually, a single line returns. When running as a script, it appears that this call is returning multiple PIDs.
Example contents in the PID file: 16601 16602 16609 16619 16690. 16619 is the actual process ID found when manually running the ps aux... command mentioned above.
Try reversing your greps. The first one (-vE) may run BEFORE the myservice one starts up. Grep for your service FIRST, then filter out the unwanted lines:
PID=$(ps aux | grep <myservice> | grep -vE 'grep|runuser|bash' | awk '{print $2}')
I encounted the same issue but not the same statement, it was like this:
PID="$(ps -ef|grep command|grep options|grep -v grep|awk '{print $2}')"
in which I used the same grep order as #Marc said in first answer, but did not filter all the unwanted lines.
So I tried the below one and it worked:
PID="$(ps -ef|grep command|grep options|grep -vE 'grep|runuser|bash'|awk '{print $2}')"

Awk not working inside bash script

Im trying to write a bash script and trying to take input from user and executing a kill command to stop a specific tomcat.
...
read user_input
if [ "$user_input" = "2" ]
then
ps -ef | grep "search-tomcat" |awk {'"'"'print $2'"'"'}| xargs kill -9
echo "Search Tomcat Shut Down"
fi
...
I have confirmed that the line
ps -ef | grep "search-tomcat"
works fine in script but:
ps -ef | grep "search-tomcat" |awk {'"'"'print $2'"'"'}
doesnt yield any results in script, but gives desired output in terminal, so there has to be some problem with awk command
xargs can be tricky - Try:
kill -9 $(ps -ef | awk '/search-tomcat/ {print $2}')
If you prefer using xargs then check man page for options for your target OS (i.e. xargs -n.)
Also noting that 'kill -9' is a non-graceful process exit mechanism (i.e. possible file corruption, other strangeness) so I suggest only using as a last resort...
:)

Bash script optimization for waiting for a particular string in log files

I am using a bash script that calls multiple processes which have to start up in a particular order, and certain actions have to be completed (they then print out certain messages to the logs) before the next one can be started. The bash script has the following code which works really well for most cases:
tail -Fn +1 "$log_file" | while read line; do
if echo "$line" | grep -qEi "$search_text"; then
echo "[INFO] $process_name process started up successfully"
pkill -9 -P $$ tail
return 0
elif echo "$line" | grep -qEi '^error\b'; then
echo "[INFO] ERROR or Exception is thrown listed below. $process_name process startup aborted"
echo " ($line) "
echo "[INFO] Please check $process_name process log file=$log_file for problems"
pkill -9 -P $$ tail
return 1
fi
done
However, when we set the processes to print logging in DEBUG mode, they print so much logging that this script cannot keep up, and it takes about 15 minutes after the process is complete for the bash script to catch up. Is there a way of optimizing this, like changing 'while read line' to 'while read 100 lines', or something like that?
How about not forking up to two grep processes per log line?
tail -Fn +1 "$log_file" | grep -Ei "$search_text|^error\b" | while read line; do
So one long running grep process shall do preprocessing if you will.
Edit: As noted in the comments, it is safer to add --line-buffered to the grep invocation.
Some tips relevant for this script:
Checking that the service is doing its job is a much better check for daemon startup than looking at the log output
You can use grep ... <<<"$line" to execute fewer echos.
You can use tail -f | grep -q ... to avoid the while loop by stopping as soon as there's a matching line.
If you can avoid -i on grep it might be significantly faster to process the input.
Thou shalt not kill -9.

xargs' $1 conflicts with $1 in shell script

I have these lines in one shell script file foo.sh:
ps ax | grep -E "bar" | grep -v "grep" | awk '{print $1}' | xargs kill -9 $1
when I execute the shell script with an arguments like this:
sh foo.sh arg_one
the xargs can't work now. It takes the $1 from the shell script but not the output of awk.
I do know I can store the output of awk into one file and use it in xargs later.
But, is there any better solution?
== edited ==
thanks the answer from #peterph.
But, is there any way that I can use $1 in xargs?
== edited 2 ==
thanks #Brian Campbell
Despite weather there should be a useless $1 in the example, if a argument of "the shell script file" is given, then the $1 in xargs will not work as my wish, in my computer(In your computer too, I think).
Why? And, how to get avoid it?
xargs reads list from stdin so just discard the last $1 on the line if what you want is to kill processes by their PIDs.
As a side note, ps can also print processes according to their command name (with procps on linux see the -C option).
Instead of that complicated pipeline, you can always use killall -9 name to kill a process, or pkill -9 pattern if you don't know the exact name of the process but know a substring (be careful that you don't kill any unintended processes, though).
For your command to work, just remove the $1; xargs takes its arguments from standard in, and runs the command line passing in the values it gets from standard in at the end of the command.
edit (in response to your edit): What do you expect xargs to do with the $1 argument? What are you expecting to be in it? The only interpretation of $1 that has any meaning here is the first argument that was passed to your script.
The $1 from your awk script is what awk finds in the first column of its input; it then prints that out, and xargs takes those values from standard input, and will call the command you pass it with those values at the end of the command line. So if the awk command returns:
100
120
130
Then piping that result to xargs kill -9 will result in the following being called:
kill -9 100 120 130
You do not need a variable like $1 to make this work
This should work:
ps ax | grep -E "bar" | grep -v "grep" | awk '{print $1}' | xargs kill -9
You can also try:
result=$(ps -ef | grep -E "bar" | grep -v "grep" | awk '{print $2}')
kill -9 $result
In my case piping xargs sometimes returned below error even if matched processes existed:
usage: kill [ -s signal | -p ] [ -a ] pid ...
kill -l [ signal ]
usage: kill [ -s signal | -p ] [ -a ] pid ...
kill -l [ signal ]

Suppress Notice of Forked Command Being Killed

Let's suppose I have a bash script (foo.sh) that in a very simplified form, looks like the following:
echo "hello"
sleep 100 &
ps ax | grep sleep | grep -v grep | awk '{ print $1 } ' | xargs kill -9
echo "bye"
The third line imitates pkill, which I don't have by default on Mac OS X, but you can think of it as the same as pkill. However, when I run this script, I get the following output:
hello
foo: line 4: 54851 Killed sleep 100
bye
How do I suppress the line in the middle so that all I see is hello and bye?
While disown may have the side effect of silencing the message; this is how you start the process in a way that the message is truly silenced without having to give up job control of the process.
{ command & } 2>/dev/null
If you still want the command's own stderr (just silencing the shell's message on stderr) you'll need to send the process' stderr to the real stderr:
{ command 2>&3 & } 3>&2 2>/dev/null
To learn about how redirection works:
From the BashGuide: http://mywiki.wooledge.org/BashGuide/TheBasics/InputAndOutput#Redirection
An illustrated tutorial: http://bash-hackers.org/wiki/doku.php/howto/redirection_tutorial
And some more info: http://bash-hackers.org/wiki/doku.php/syntax/redirection
And by the way; don't use kill -9.
I also feel obligated to comment on your:
ps ax | grep sleep | grep -v grep | awk '{ print $1 } ' | xargs kill -9
This will scortch the eyes of any UNIX/Linux user with a clue. Moreover, every time you parse ps, a fairy dies. Do this:
kill $!
Even tools such as pgrep are essentially broken by design. While they do a better job of matching processes, the fundamental flaws are still there:
Race: By the time you get a PID output and parse it back in and use it for something else, the PID might already have disappeared or even replaced by a completely unrelated process.
Responsibility: In the UNIX process model, it is the responsibility of a parent to manage its child, nobody else should. A parent should keep its child's PID if it wants to be able to signal it and only the parent can reliably do so. UNIX kernels have been designed with the assumption that user programs will adhere to this pattern, not violate it.
How about disown? This mostly works for me on Bash on Linux.
echo "hello"
sleep 100 &
disown
ps ax | grep sleep | grep -v grep | awk '{ print $1 } ' | xargs kill -9
echo "bye"
Edit: Matched the poster's code better.
The message is real. The code killed the grep process as well.
Run ps ax | grep sleep and you should see your grep process on the list.
What I usually do in this case is ps ax | grep sleep | grep -v grep
EDIT: This is an answer to older form of question where author omitted the exclusion of grep for the kill sequence. I hope I still get some rep for answering the first half.
Yet another way to disable job termination messages is to put your command to be backgrounded in a sh -c 'cmd &' construct.
And as already pointed out, there is no need to imitate pkill; you may store the value of $! in another variable instead.
echo "hello"
sleep_pid=`sh -c 'sleep 30 & echo ${!}' | head -1`
#sleep_pid=`sh -c '(exec 1>&-; exec sleep 30) & echo ${!}'`
echo kill $sleep_pid
kill $sleep_pid
echo "bye"
Have you tried to deactivate job control? It's a non-interactive shell, so I would guess it's off by default, but it does not hurt to try... It's regulated by the -m (monitor) shell variable.

Resources