continuous bash script - show custom error when process killed - linux

I am trying to write a script that keeps checking if process "XXX" gets killed and if it does, show error message with text from /proc/XXX_ID/fd/1 and start again.
I am trying to make it work on custom thinclient distro where is no more packages than needed. Linux 4.6.3TS_SMP i686
I am new in scripting in Bash and I can't seem to get it working. I were googling and trying different things last two days and I moved nowhere. What am I doing wrong?
#!/bin/bash
while [ true ] ; do
process_ID="$(pgrep XXX)"
tail -f /proc/${process_ID}/fd/1 > /bin/test2.txt
Everything works till now. Now I need to somehow check if test2.txt is empty or not and if its not, use text from it as error message and start checking again. I tried something like this
if [ -s /bin/test2.txt ]
then
err="$(cat /bin/test2.txt)"
notify-send "${err}"
else
fi
done

How about this:
output_filepath=$(readlink /proc/$pid/fd/1)
while ps $pid > /dev/null
do
sleep 1
done
tail "$output_filepath"
The whole idea only works if the stdout (fd 1) of the process is redirected into a file which can be read by someone else. Such a redirect will result in /proc/<pid>/fd/1 being a symlink. We read and memorize the target of that symlink in the first line.
Then we wait until the ps of the given PID fails which typically means the process has terminated. Then we tail the file at the memorized path.
This approach has several weaknesses. The process (or its caller) could remove, modify, or rename the output file at termination time, or reading it can somehow be impossible (permissions, etc.). The output could be redirected to something not being a file (e.g. a pipe, a socket, etc.), then tailing it won't work.
ps not failing does not necessarily mean that the same process is still there. It's next to impossible that a PID is reused that quickly, but not completely impossible.

Related

When piping a command to shell script, why does exiting piped command makes shell script exit?

First of all, sorry if the title is not clear or misleading, my question is not not exactly easy to be understood out of context.
So here it is: I am running a shell script (hello.sh) that needs to relocate itself from /root to /.
thus I made a simple recursion, to test from where the script is running and to make a temporary copy and launch it and exit(this last temporary copy will move the original file, and delete itself while still running).
#!/bin/sh
IsTMP=$(echo $0 | grep "tmp")
if [ -z "$IsTMP" ]; then
cp /root/hello.sh /tmp/hello.sh
/bin/sh /tmp/hello.sh &
exit
else
unlink /hello.sh
rsync /root/hello.sh /hello.sh
rm /root/hello.sh
rm /tmp/hello.sh
fi
while true; do
sleep 5
echo "Still Alive"
done
This script works totally well and suits my needs (even though it is a horrendous hack): the script is moved, and re-executed from a temporary place. However, when i pipe the shell script with a tee, just like:
/hello.sh | tee -a /log&
The behaviour is not the same:
hello.sh is exiting but not tee
When i try to kill tee, the temporary copy is automatically killed after a few seconds, without entering the infinite loop
This behaviour is the exact same if i replace tee with another binary (e.g. watch,...), so I am wondering if it comes from piping.
Sorry if i am not too clear about my problem.
Thanks in advance.
When i try to kill tee, the temporary copy is automatically killed after a few seconds, without entering the infinite loop
That's not the case. The script is entering the infinite loop, the few seconds are the five the sleep 5 in the loop pauses, and then it is killed by the signal SIGPIPE (Broken pipe) because it tries to echo "Still Alive" to the pipe which is closed on the read end since tee has been killed.
There is no link between tee and the second instance
That's not the case. There is a link, namely the pipe, the write end of which is the standard output of the parent as well as (inherited) the child shell script, and the read end is the standard input of tee. You can see this if you look at ls -l /proc/pid/fd, where pid is the process id of the script's shell on the one hand, and of tee on the other.

best way to check logs after Makefile command

One of my projects' Makefile runs a bunch of tests on a headless browser for the functional test step. Most of the test is for the front-end code, but i also check for any error/warning on the backend.
Currently, we are cleaning the web server log, running all the (very slow) tests, and then grepping the server log for any error or warning.
i was wondering if there was any way to have a listener parsing the log (e.g. tail -f | grep) starting on the background, and kill the make target if it detects any error/warning during the test run.
what i got so far was
start long lived grep in the background and store PID.
run tests.
check output of long lived grep
kill PID.
in case of any error, fail.
This only bought me the advantage that now i do not lose the server logs on my dev box every time, as i do not have to clean it every time. But i still have to wait a long time (minutes) for a failure that may have occurred in the very first one.
is there any solution for it?
I was wondering if there was any way to have a listener parsing the log (e.g. tail -f | grep) starting on the background, and kill the make target if it detects any error/warning during the test run.
Have you tried doing it the simple straightforward way? Like this, for example:
# make is started somehow, e.g.:
make target >make-target.log 2>&1 &
PIDOFMAKE=$!
# we know its PID and the log file name
tail -f make-target.log | grep 'ERROR!' |
while read DUMMY; do
# error was detected, kill the make:
kill $PIDOFMAKE
break
done
(Though it is not clear to me what OP is asking, I'm writing an answer since I can't put a lot of code into the comment.)

Concurrency with shell scripts in failure-prone environments

Good morning all,
I am trying to implement concurrency in a very specific environment, and keep getting stuck. Maybe you can help me.
this is the situation:
-I have N nodes that can read/write in a shared folder.
-I want to execute an application in one of them. this can be anything, like a shell script, an installed code, or whatever.
-To do so, I have to send the same command to all of them. The first one should start the execution, and the rest should see that somebody else is running the desired application and exit.
-The execution of the application can be killed at any time. This is important because does not allow relying on any cleaning step after the execution.
-if the application gets killed, the user may want to execute it again. He would then send the very same command.
My current approach is to create a shell script that wraps the command to be executed. This could also be implemented in C. Not python or other languages, to avoid library dependencies.
#!/bin/sh
# (folder structure simplified for legibility)
mutex(){
lockdir=".lock"
firstTask=1 #false
if mkdir "$lockdir" &> /dev/null
then
controlFile="controlFile"
#if this is the first node, start coordinator
if [ ! -f $controlFile ]; then
firstTask=0 #true
#tell the rest of nodes that I am in control
echo "some info" > $controlFile
fi
# remove control File when script finishes
trap 'rm $controlFile' EXIT
fi
return $firstTask
}
#The basic idea is that a task executes the desire command, stated as arguments to this script. The rest do nothing
if ! mutex ;
then
exit 0
fi
#I am the first node and the only one reaching this, so I execute whatever
$#
If there are no failures, this wrapper works great. The problem is that, if the script is killed before the execution, the trap is not executed and the control file is not removed. Then, when we execute the wrapper again to restart the task, it won't work as every node will think that somebody else is running the application.
A possible solution would be to remove the control script just before the "$#" call, but that it would lead to some race condition.
Any suggestion or idea?
Thanks for your help.
edit: edited with correct solution as future reference
Your trap syntax looks wrong: According to POSIX, it should be:
trap [action condition ...]
e.g.:
trap 'rm $controlFile' HUP INT TERM
trap 'rm $controlFile' 1 2 15
Note that $controlFile will not be expanded until the trap is executed if you use single quotes.

bash -- kill command script [duplicate]

This question already has answers here:
Find and kill a process in one line using bash and regex
(30 answers)
Closed 6 years ago.
I'm looking into writing shells scripts as a prerequisite for a class and would like some help to get started. I'm currently doing a warm up exercise that requires me to write a shell script that, when executed, will kill any currently running process of a command I have given. For this exercise, I'm using the 'less' command (so to test I would input 'man ps | less').
However, since this is the first REAL script I'm writing (besides the traditional "Hello World!" one I've done), I'm a little stuck on how to start. I'm googled a lot and have returned some rather confusing results. I'm aware I need to start with a shebang, but I'm not sure why. I was thinking of using 'if' statement; something along the lines of
if 'less' is running
kill 'less' process
fi
But I'm not sure of how to go about that. Since I'm incredibly new at this, I also want to make sure I'm going about writing a script correctly. I'm using notepad as a text editor, and once I've written my script there, I'll save it to a directory that I access in a terminal and then run from there, correct?
Thank you very much for any advice or resources you could give me. I'm certain I can figure out harder exercises once I get the basics of writing a script down.
Try:
pgrep less && killall less
pgrep less looks process ids of any process named less. If a process is found, it returns true in which case the && clause is triggered. killall less kills any process named less.
See man pgrep and man killall.
Simplification
This may miss the point of your exercise, but there is no real need to test for a less process running. Just run:
killlall less
If no less process is running, then killall does nothing.
Try this simple snippet:
#!/bin/bash
# if one or more processes matching "less" are running
# (ps will return 0 which corresponds to true in that case):
if ps -C less
then
# send all processes matching "less" the TERM signal:
killall -TERM less
fi
For more information on available signals, see the table in the man page available via man 7 signal.
You might try the following code in bash:
#Tell which interpreter will process the code
#!/bin/bash
#Creating a variable to hold program name you want to serach and kill
#mind no-space between variable name value and equals sign
program='less'
#use ps to list all process and grep to search for the specific program name
# redirect the visible text output to /dev/null(linux black hole) since we don't want to see it on screen
ps aux | grep "$program" | grep -v grep > /dev/null
#If the given program is found $? will hold 0, since if successfull grep will return 0
if [ $? -eq 0 ]; then
#program is running kill it with killall
killall -9 "$program"
fi

Bash Command Substitution Giving Weird Inconsistent Output

For some reasons not relevant to this question, I am running a Java server in a bash script not directly but via command substitution under a separate sub-shell, and in the background. The intent is for the subcommand to return the process id of the Java server as its standard output. The fragement in question is as follows:
launch_daemon()
{
/bin/bash <<EOF
$JAVA_HOME/bin/java $JAVA_OPTS -jar $JAR_FILE daemon $PWD/config/cl.yml <&- &
pid=\$!
echo \${pid} > $PID_FILE
echo \${pid}
EOF
}
daemon_pid=$(launch_daemon)
echo ${daemon_pid} > check.out
The Java daemon in question prints to standard error and quits if there is a problem in initialization, otherwise it closes standard out and standard err and continues on its way. Later in the script (not shown) I do a check to make sure the server process is running. Now on to the problem.
Whenever I check the $PID_FILE above, it contains the correct process id on one line.
But when I check the file check.out, it sometimes contains the correct id, other times it contains the process id repeated twice on the same line separated by a space charcater as in:
34056 34056
I am using the variable $daemon_pid in the script above later on in the script to check if the server is running, so if it contains the pid repeated twice this totally throws off the test and it incorrectly thinks the server is not running. Fiddling with the script on my server box running CentOS Linux by putting in more echo statements etc. seems to flip the behavior back to the correct one of $daemon_pid containing the process id just once, but if I think that has fixed it and check in this script to my source code repo and do a build and deploy again, I start seeing the same bad behavior.
For now I have fixed this by assuming that $daemon_pid could be bad and passing it through awk as follows:
mypid=$(echo ${daemon_pid} | awk '{ gsub(" +.*",""); print $0 }')
Then $mypid always contains the correct process id and things are fine, but needless to say I'd like to understand why it behaves the way it does. And before you ask, I have looked and looked but the Java server in question does NOT print its process id to its standard out before closing standard out.
Would really appreciate expert input.
Following the hint by #WilliamPursell, I tracked this down in the bash source code. I honestly don't know whether it is a bug or not; all I can say is that it seems like an unfortunate interaction with a questionable use case.
TL;DR: You can fix the problem by removing <&- from the script.
Closing stdin is at best questionable, not just for the reason mentioned by #JonathanLeffler ("Programs are entitled to have a standard input that's open.") but more importantly because stdin is being used by the bash process itself and closing it in the background causes a race condition.
In order to see what's going on, consider the following rather odd script, which might be called Duff's Bash Device, except that I'm not sure that even Duff would approve: (also, as presented, it's not that useful. But someone somewhere has used it in some hack. Or, if not, they will now that they see it.)
/bin/bash <<EOF
if (($1<8)); then head -n-$1 > /dev/null; fi
echo eight
echo seven
echo six
echo five
echo four
echo three
echo two
echo one
EOF
For this to work, bash and head both have to be prepared to share stdin, including sharing the file position. That means that bash needs to make sure that it flushes its read buffer (or not buffer), and head needs to make sure that it seeks back to the end of the part of the input which it uses.
(The hack only works because bash handles here-documents by copying them into a temporary file. If it used a pipe, it wouldn't be possible for head to seek backwards.)
Now, what would have happened if head had run in the background? The answer is, "just about anything is possible", because bash and head are racing to read from the same file descriptor. Running head in the background would be a really bad idea, even worse than the original hack which is at least predictable.
Now, let's go back to the actual program at hand, simplified to its essentials:
/bin/bash <<EOF
cmd <&- &
echo \$!
EOF
Line 2 of this program (cmd <&- &) forks off a separate process (to run in the background). In that process, it closes stdin and then invokes cmd.
Meanwhile, the foreground process continues reading commands from stdin (its stdin fd hasn't been closed, so that's fine), which causes it to execute the echo command.
Now here's the rub: bash knows that it needs to share stdin, so it can't just close stdin. It needs to make sure that stdin's file position is pointing to the right place, even though it may have actually read ahead a buffer's worth of input. So just before it closes stdin, it seeks backwards to the end of the current command line. [1]
If that seek happens before the foreground bash executes echo, then there is no problem. And if it happens after the foreground bash is done with the here-document, also no problem. But what if it happens while the echo is working? In that case, after the echo is done, bash will reread the echo command because stdin has been rewound, and the echo will be executed again.
And that's precisely what is happening in the OP. Sometimes, the background seek completes at just the wrong time, and causes echo \${pid} to be executed twice. In fact, it also causes echo \${pid} > $PID_FILE to execute twice, but that line is idempotent; had it been echo \${pid} >> $PID_FILE, the double execution would have been visible.
So the solution is simple: remove <&- from the server start-up line, and optionally replace it with </dev/null if you want to make sure the server can't read from stdin.
Notes:
Note 1: For those more familiar with bash source code and its expected behaviour than I am, I believe that the seek and close takes place at the end of case r_close_this: in function do_redirection_internal in redir.c, at approximately line 1093:
check_bash_input (redirector);
close_buffered_fd (redirector);
The first call does the lseek and the second one does the close. I saw the behaviour using strace -f and then searched the code for a plausible looking lseek, but I didn't go to the trouble of verifying in a debugger.

Resources