Bash script optimization for waiting for a particular string in log files - linux

I am using a bash script that calls multiple processes which have to start up in a particular order, and certain actions have to be completed (they then print out certain messages to the logs) before the next one can be started. The bash script has the following code which works really well for most cases:
tail -Fn +1 "$log_file" | while read line; do
if echo "$line" | grep -qEi "$search_text"; then
echo "[INFO] $process_name process started up successfully"
pkill -9 -P $$ tail
return 0
elif echo "$line" | grep -qEi '^error\b'; then
echo "[INFO] ERROR or Exception is thrown listed below. $process_name process startup aborted"
echo " ($line) "
echo "[INFO] Please check $process_name process log file=$log_file for problems"
pkill -9 -P $$ tail
return 1
fi
done
However, when we set the processes to print logging in DEBUG mode, they print so much logging that this script cannot keep up, and it takes about 15 minutes after the process is complete for the bash script to catch up. Is there a way of optimizing this, like changing 'while read line' to 'while read 100 lines', or something like that?

How about not forking up to two grep processes per log line?
tail -Fn +1 "$log_file" | grep -Ei "$search_text|^error\b" | while read line; do
So one long running grep process shall do preprocessing if you will.
Edit: As noted in the comments, it is safer to add --line-buffered to the grep invocation.

Some tips relevant for this script:
Checking that the service is doing its job is a much better check for daemon startup than looking at the log output
You can use grep ... <<<"$line" to execute fewer echos.
You can use tail -f | grep -q ... to avoid the while loop by stopping as soon as there's a matching line.
If you can avoid -i on grep it might be significantly faster to process the input.
Thou shalt not kill -9.

Related

How to wait on a backgrounded sub-process with `wait` command [duplicate]

Is there any builtin feature in Bash to wait for a process to finish?
The wait command only allows one to wait for child processes to finish.
I would like to know if there is any way to wait for any process to finish before proceeding in any script.
A mechanical way to do this is as follows but I would like to know if there is any builtin feature in Bash.
while ps -p `cat $PID_FILE` > /dev/null; do sleep 1; done
To wait for any process to finish
Linux (doesn't work on Alpine, where ash doesn't support tail --pid):
tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1 &>/dev/null
With timeout (seconds)
Linux:
timeout $timeout tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
There's no builtin. Use kill -0 in a loop for a workable solution:
anywait(){
for pid in "$#"; do
while kill -0 "$pid"; do
sleep 0.5
done
done
}
Or as a simpler oneliner for easy one time usage:
while kill -0 PIDS 2> /dev/null; do sleep 1; done;
As noted by several commentators, if you want to wait for processes that you do not have the privilege to send signals to, you have find some other way to detect if the process is running to replace the kill -0 $pid call. On Linux, test -d "/proc/$pid" works, on other systems you might have to use pgrep (if available) or something like ps | grep "^$pid ".
I found "kill -0" does not work if the process is owned by root (or other), so I used pgrep and came up with:
while pgrep -u root process_name > /dev/null; do sleep 1; done
This would have the disadvantage of probably matching zombie processes.
This bash script loop ends if the process does not exist, or it's a zombie.
PID=<pid to watch>
while s=`ps -p $PID -o s=` && [[ "$s" && "$s" != 'Z' ]]; do
sleep 1
done
EDIT: The above script was given below by Rockallite. Thanks!
My orignal answer below works for Linux, relying on procfs i.e. /proc/. I don't know its portability:
while [[ ( -d /proc/$PID ) && ( -z `grep zombie /proc/$PID/status` ) ]]; do
sleep 1
done
It's not limited to shell, but OS's themselves do not have system calls to watch non-child process termination.
FreeBSD and Solaris have this handy pwait(1) utility, which does exactly, what you want.
I believe, other modern OSes also have the necessary system calls too (MacOS, for example, implements BSD's kqueue), but not all make it available from command-line.
From the bash manpage
wait [n ...]
Wait for each specified process and return its termination status
Each n may be a process ID or a job specification; if a
job spec is given, all processes in that job's pipeline are
waited for. If n is not given, all currently active child processes
are waited for, and the return status is zero. If n
specifies a non-existent process or job, the return status is
127. Otherwise, the return status is the exit status of the
last process or job waited for.
Okay, so it seems the answer is -- no, there is no built in tool.
After setting /proc/sys/kernel/yama/ptrace_scope to 0, it is possible to use the strace program. Further switches can be used to make it silent, so that it really waits passively:
strace -qqe '' -p <PID>
All these solutions are tested in Ubuntu 14.04:
Solution 1 (by using ps command):
Just to add up to Pierz answer, I would suggest:
while ps axg | grep -vw grep | grep -w process_name > /dev/null; do sleep 1; done
In this case, grep -vw grep ensures that grep matches only process_name and not grep itself. It has the advantage of supporting the cases where the process_name is not at the end of a line at ps axg.
Solution 2 (by using top command and process name):
while [[ $(awk '$12=="process_name" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_name with the process name that appears in top -n 1 -b. Please keep the quotation marks.
To see the list of processes that you wait for them to be finished, you can run:
while : ; do p=$(awk '$12=="process_name" {print $0}' <(top -n 1 -b)); [[ $b ]] || break; echo $p; sleep 1; done
Solution 3 (by using top command and process ID):
while [[ $(awk '$1=="process_id" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_id with the process ID of your program.
Blocking solution
Use the wait in a loop, for waiting for terminate all processes:
function anywait()
{
for pid in "$#"
do
wait $pid
echo "Process $pid terminated"
done
echo 'All processes terminated'
}
This function will exits immediately, when all processes was terminated. This is the most efficient solution.
Non-blocking solution
Use the kill -0 in a loop, for waiting for terminate all processes + do anything between checks:
function anywait_w_status()
{
for pid in "$#"
do
while kill -0 "$pid"
do
echo "Process $pid still running..."
sleep 1
done
done
echo 'All processes terminated'
}
The reaction time decreased to sleep time, because have to prevent high CPU usage.
A realistic usage:
Waiting for terminate all processes + inform user about all running PIDs.
function anywait_w_status2()
{
while true
do
alive_pids=()
for pid in "$#"
do
kill -0 "$pid" 2>/dev/null \
&& alive_pids+="$pid "
done
if [ ${#alive_pids[#]} -eq 0 ]
then
break
fi
echo "Process(es) still running... ${alive_pids[#]}"
sleep 1
done
echo 'All processes terminated'
}
Notes
These functions getting PIDs via arguments by $# as BASH array.
Had the same issue, I solved the issue killing the process and then waiting for each process to finish using the PROC filesystem:
while [ -e /proc/${pid} ]; do sleep 0.1; done
There is no builtin feature to wait for any process to finish.
You could send kill -0 to any PID found, so you don't get puzzled by zombies and stuff that will still be visible in ps (while still retrieving the PID list using ps).
If you need to both kill a process and wait for it finish, this can be achieved with killall(1) (based on process names), and start-stop-daemon(8) (based on a pidfile).
To kill all processes matching someproc and wait for them to die:
killall someproc --wait # wait forever until matching processes die
timeout 10s killall someproc --wait # timeout after 10 seconds
(Unfortunately, there's no direct equivalent of --wait with kill for a specific pid).
To kill a process based on a pidfile /var/run/someproc.pid using signal SIGINT, while waiting for it to finish, with SIGKILL being sent after 20 seconds of timeout, use:
start-stop-daemon --stop --signal INT --retry 20 --pidfile /var/run/someproc.pid
Use inotifywait to monitor some file that gets closed, when your process terminates. Example (on Linux):
yourproc >logfile.log & disown
inotifywait -q -e close logfile.log
-e specifies the event to wait for, -q means minimal output only on termination. In this case it will be:
logfile.log CLOSE_WRITE,CLOSE
A single wait command can be used to wait for multiple processes:
yourproc1 >logfile1.log & disown
yourproc2 >logfile2.log & disown
yourproc3 >logfile3.log & disown
inotifywait -q -e close logfile1.log logfile2.log logfile3.log
The output string of inotifywait will tell you, which process terminated. This only works with 'real' files, not with something in /proc/
Rauno Palosaari's solution for Timeout in Seconds Darwin, is an excellent workaround for a UNIX-like OS that does not have GNU tail (it is not specific to Darwin). But, depending on the age of the UNIX-like operating system, the command-line offered is more complex than necessary, and can fail:
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
On at least one old UNIX, the lsof argument +r 1m%s fails (even for a superuser):
lsof: can't read kernel name list.
The m%s is an output format specification. A simpler post-processor does not require it. For example, the following command waits on PID 5959 for up to five seconds:
lsof -p 5959 +r 1 | awk '/^=/ { if (T++ >= 5) { exit 1 } }'
In this example, if PID 5959 exits of its own accord before the five seconds elapses, ${?} is 0. If not ${?} returns 1 after five seconds.
It may be worth expressly noting that in +r 1, the 1 is the poll interval (in seconds), so it may be changed to suit the situation.
On a system like OSX you might not have pgrep so you can try this appraoch, when looking for processes by name:
while ps axg | grep process_name$ > /dev/null; do sleep 1; done
The $ symbol at the end of the process name ensures that grep matches only process_name to the end of line in the ps output and not itself.

Shell script to run a process in background parse its output and start service if the previous process contains a string

I need to write a shell script that starts a process in background and parse its output till it checks the output doesn't contains any Error in its output. The process will keep on running in the background as it needs to listen on ports. If the process output contained an error exit the script.
Based on the output of the previous process (it didn't contain any errors, the process was able to establish connection to DB) run the next command.
I have tried many approches suggested on Stack overflow, which includes:
https://unix.stackexchange.com/questions/12075/best-way-to-follow-a-log-and-execute-a-command-when-some-text-appears-in-the-log
https://unix.stackexchange.com/questions/45941/tail-f-until-text-is-seen
https://unix.stackexchange.com/questions/137030/how-do-i-extract-the-content-of-quoted-strings-from-the-output-of-a-command
/home/build/a_process 2>&1 | tee "output_$(date +"%Y_%m_%d").log"
tail -fn0 "output_$(date +"%Y_%m_%d").log" | \
while read line ; do
if [ echo "$line" | grep "Listening" ]
then
/home/build/b_process 2>&1 | tee "output_script_$(date +"%Y_%m_%d").log"
elif [ echo "$line" | grep "error occurred in load configuration" ] || [ echo "$line" | grep "Binding Failure" ]
then
sl -e
fi
done
The problem is since the process keep running despite it contains the text i was searching for it gets stuck in parsing the staring and never able to exit watching the output or tailing. As a result it's not able to execute next command.
On surface, the issue is with "tee" command (a_process ... | tee).
Recall that a pipeline will result in the shell
Creating the pipeline between the command
Waiting for the LAST command the finish.
Since the tee will not finish until a_process is done, and since a_process is a daemon, your script may wait forever (at least, until a_process exit).
In this case, consider sending the whole pipeline to the background.
log_file=output_$(date +"%Y_%m_%d").log
( /home/build/a_process 2>&1 | tee "$logfile" ) &
tail -fn0 "$logfile" |
...
Side note: consider setting the log file into a variable. This will make it easier to maintain (and understand) the script.

Can i wait for a process termination that is not a child of current shell terminal?

I have a script that has to kill a certain number of times a resource managed by a high avialability middelware. It basically checks whether the resource is running and kills it afterwards, i need the timestamp of when the proc is really killed. So i have done this code:
#!/bin/bash
echo "$(date +"%T,%N") :New measures Run" > /home/hassan/logs/measures.log
for i in {1..50}
do
echo "Iteration: $i"
PID=`ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print$2'}`
if [ -n "$PID" ]; then
echo "$(date +"%T,%N") :Killing $PID" >> /home/hassan/logs/measures.log
ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print "kill -9 " $2'} | sh
wait $PID
else
PID=`ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print$2'}`
until [ -n "$PID" ]; do
sleep 2
PID=`ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print$2'}`
done
fi
done
But with my wait command i get the following error message: wait: pid xxxx is not a child of this shell
I assume that You started the child processes from bash and then start this script to wait for. The problem is that the child processes are not the children of the bash running the script, but the children of its parent!
If You want to launch a script inside the the current bash You should start with ..
An example. You start a vim and then You make is stop pressing ^Z (later you can use fg to get back to vim). Then You can get the list of jobs by using the˙jobs command.
$ jobs
[1]+ Stopped vim myfile
Then You can create a script called test.sh containing just one command, called jobs. Add execute right (e.g. chmod 700 test.sh), then start it:
$ cat test.sh
jobs
~/dev/fi [3:1]$ ./test.sh
~/dev/fi [3:1]$ . ./test.sh
[1]+ Stopped vim myfile
As the first version creates a new bash session no jobs are listed. But using . the script runs in the present bash script having exactly one chold process (namely vim). So launch the script above using the . so no child bash will be created.
Be aware that defining any variables or changing directory (and a lot more) will affect to your environment! E.g. PID will be visible by the calling bash!
Comments:
Do not use ...|grep ...|grep -v ... |awk --- pipe snakes! Use ...|awk... instead!
In most Linux-es you can use something like this ps -o pid= -C pcmAppBin to get just the pid, so the complete pipe can be avoided.
To call an external program from awk you could try system("mycmd"); built-in
I hope this helps a bit!

How to get watch to run a bash script with quotes

I'm trying to have a lightweight memory profiler for the matlab jobs that are run on my machine. There is either one or zero matlab job instance, but its process id changes frequently (since it is actually called by another script).
So here is the bash script that I put together to log memory usage:
#!/bin/bash
pid=`ps aux | grep '[M]ATLAB' | awk '{print $2}'`
if [[ -n $pid ]]
then
\grep VmSize /proc/$pid/status
else
echo "no pid"
fi
when I run this script in bash like this:
./script.sh
it works fine, giving me the following result:
VmSize: 1289004 kB
which is exactly what I want.
Now, I want to run this periodically. So I run it with watch, like this:
watch ./script.sh
But in this case I only receive:
no pid
Please note that I know the matlab job is still running, because I can see it with the same pid on top, and besides, I know each matlab job take several hours to finish.
I'm pretty sure that something is wrong with the quotes I have when setting pid. I just can't figure out how to fix it. Anyone knows what I'm doing wrong?
PS.
In the man page of watch, it says that commands are executed by sh -c. I did run my script like sh -c ./script and it works just fine, but watch doesn't.
Why don't you use a loop with sleep command instead?
For example:
#!/bin/bash
pid=`ps aux | grep '[M]ATLAB' | awk '{print $2}'`
while [ "1" ]
do
if [[ -n $pid ]]
then
\grep VmSize /proc/$pid/status
else
echo "no pid"
fi
sleep 10
done
Here the script sleeps(waits) for 10 seconds. You can set the interval you need changing the sleep command. For example to make the script sleep for an hour use sleep 1h.
To exit the script press Ctrl - C
This
pid=`ps aux | grep '[M]ATLAB' | awk '{print $2}'`
could be changed to:
pid=$(pidof MATLAB)
I have no idea why it's not working in watch but you could use a cron job and make the script log to a file like so:
#!/bin/bash
pid=$(pidof MATLAB) # Just to follow previously given advice :)
if [[ -n $pid ]]
then
echo "$(date): $(\grep VmSize /proc/$pid/status)" >> logfile
else
echo "$(date): no pid" >> logfile
fi
You'd of course have to create logfile with touch.
You might try just running ps command in watch. I have had issues in the past with watch chopping lines and such when they get too long.
It can be fixed by making the terminal you are running the command from wider or changing the column like this (may need to adjust the 160 to your liking):
export COLUMNS=160;

Suppress Notice of Forked Command Being Killed

Let's suppose I have a bash script (foo.sh) that in a very simplified form, looks like the following:
echo "hello"
sleep 100 &
ps ax | grep sleep | grep -v grep | awk '{ print $1 } ' | xargs kill -9
echo "bye"
The third line imitates pkill, which I don't have by default on Mac OS X, but you can think of it as the same as pkill. However, when I run this script, I get the following output:
hello
foo: line 4: 54851 Killed sleep 100
bye
How do I suppress the line in the middle so that all I see is hello and bye?
While disown may have the side effect of silencing the message; this is how you start the process in a way that the message is truly silenced without having to give up job control of the process.
{ command & } 2>/dev/null
If you still want the command's own stderr (just silencing the shell's message on stderr) you'll need to send the process' stderr to the real stderr:
{ command 2>&3 & } 3>&2 2>/dev/null
To learn about how redirection works:
From the BashGuide: http://mywiki.wooledge.org/BashGuide/TheBasics/InputAndOutput#Redirection
An illustrated tutorial: http://bash-hackers.org/wiki/doku.php/howto/redirection_tutorial
And some more info: http://bash-hackers.org/wiki/doku.php/syntax/redirection
And by the way; don't use kill -9.
I also feel obligated to comment on your:
ps ax | grep sleep | grep -v grep | awk '{ print $1 } ' | xargs kill -9
This will scortch the eyes of any UNIX/Linux user with a clue. Moreover, every time you parse ps, a fairy dies. Do this:
kill $!
Even tools such as pgrep are essentially broken by design. While they do a better job of matching processes, the fundamental flaws are still there:
Race: By the time you get a PID output and parse it back in and use it for something else, the PID might already have disappeared or even replaced by a completely unrelated process.
Responsibility: In the UNIX process model, it is the responsibility of a parent to manage its child, nobody else should. A parent should keep its child's PID if it wants to be able to signal it and only the parent can reliably do so. UNIX kernels have been designed with the assumption that user programs will adhere to this pattern, not violate it.
How about disown? This mostly works for me on Bash on Linux.
echo "hello"
sleep 100 &
disown
ps ax | grep sleep | grep -v grep | awk '{ print $1 } ' | xargs kill -9
echo "bye"
Edit: Matched the poster's code better.
The message is real. The code killed the grep process as well.
Run ps ax | grep sleep and you should see your grep process on the list.
What I usually do in this case is ps ax | grep sleep | grep -v grep
EDIT: This is an answer to older form of question where author omitted the exclusion of grep for the kill sequence. I hope I still get some rep for answering the first half.
Yet another way to disable job termination messages is to put your command to be backgrounded in a sh -c 'cmd &' construct.
And as already pointed out, there is no need to imitate pkill; you may store the value of $! in another variable instead.
echo "hello"
sleep_pid=`sh -c 'sleep 30 & echo ${!}' | head -1`
#sleep_pid=`sh -c '(exec 1>&-; exec sleep 30) & echo ${!}'`
echo kill $sleep_pid
kill $sleep_pid
echo "bye"
Have you tried to deactivate job control? It's a non-interactive shell, so I would guess it's off by default, but it does not hurt to try... It's regulated by the -m (monitor) shell variable.

Resources