Why does set -e cause my script to exit when it encounters the following? - linux

I have a bash script that checks some log files created by a cron job that have time stamps in the filename (down to the second). It uses the following code:
CRON_LOG=$(ls -1 $LOGS_DIR/fetch_cron_{true,false}_$CRON_DATE*.log 2> /dev/null | sed 's/^[^0-9][^0-9]*\([0-9][0-9]*\).*/\1 &/' | sort -n | cut -d ' ' -f2- | tail -1 )
if [ -f "$CRON_LOG" ]; then
printf "Checking $CRON_LOG for errors\n"
else
printf "\n${txtred}Error: cron log for $CRON_NOW does not exist.${txtrst}\n"
printf "Either the specified date is too old for the log to still be around or there is a problem.\n"
exit 1
fi
CRIT_ERRS=$(cat $CRON_LOG | grep "ERROR" | grep -v "Duplicate tracking code")
if [ -z "$CRIT_ERRS" ]; then
printf "%74s[${txtgrn}PASS${txtrst}]\n"
else
printf "%74s[${txtred}FAIL${txtrst}]\n"
printf "Critical errors detected! Outputting to console...\n"
echo $CRIT_ERRS
fi
So this bit of code works fine, but I'm trying to clean up my scripts now and implement set -e at the top of all of them. When i do it to this script, it exits with error code 1. Note that I have errors form the first statement dumping to /dev/null. This is because some days the file has the word "true" and other days "false" in it. Anyway, i don't think this is my problem because the script outputs "Checking xxxxx.log for errors." before exiting when I add set -e to the top.
Note: the $CRON_DATE variable is derived form user input. I can run the exact same statement from command line "$./checkcron.sh 01/06/2010" and it works fine without the set -e statement at the top of the script.
UPDATE: I added "set -x" to my script and narrowed the problem down. The last bit of output is:
Checking /map/etl/tektronix/logs/fetch_cron_false_010710054501.log for errors
++ cat /map/etl/tektronix/logs/fetch_cron_false_010710054501.log
++ grep ERROR
++ grep -v 'Duplicate tracking code'
+ CRIT_ERRS=
[1]+ Exit 1 ./checkLoad.sh...
So it looks like the problem is occurring on this line:
CRIT_ERRS=$(cat $CRON_LOG | grep "ERROR" | grep -v "Duplicate tracking code")
Any help is appreciated. :)
Thanks,
Ryan

Adding set -x, which prints a trace of the script's execution, may help you diagnose the source of the error.
Edit:
Your grep is returning an exit code of 1 since it's not finding the "ERROR" string.
Edit 2:
My apologies regarding the colon. I didn't test it.
However, the following works (I tested this one before spouting off) and avoids calling the external cat. Because you're setting a variable using the results of a subshell and set -e looks at the subshell as a whole, you can do this:
CRIT_ERRS=$(cat $CRON_LOG | grep "ERROR" | grep -v "Duplicate tracking code"; true)

bash -c 'f=`false`; echo $?'
1
bash -c 'f=`true`; echo $?'
0
bash -e -c 'f=`false`; echo $?'
bash -e -c 'f=`true`; echo $?'
0
Note that backticks (and $()) "return" the error code of the last command they run. Solution:
CRIT_ERRS=$(cat $CRON_LOG | grep "ERROR" | grep -v "Duplicate tracking code" | cat)

Redirecting error messages to /dev/null does nothing about the exit status returned by the script. The reason your ls command isn't causing the error is because it's part of a pipeline, and the exit status of the pipeline is the return value of the last command in it (unless pipefail is enabled).
Given your update, it looks like the command that's failing is the last grep in the pipeline. grep only returns 0 if it finds a match; otherwise it returns 1, and if it encounters an error, it returns 2. This is a danger of set -e; things can fail even when you don't expect them to, because commands like grep return non-zero status even if there hasn't been an error. It also fails to exit on errors earlier in a pipeline, and so may miss some error.
The solutions given by geocar or ephemient (piping through cat or using || : to ensure that the last command in the pipe returns successfully) should help you get around this, if you really want to use set -e.

Asking for set -e makes the script exit as soon as a simple command exits with a non-zero exit status. This combines perniciously with your ls command, which exits with a non-zero status when asked to list a non-existent file, which is always the case for you because the true and false variants don't co-exist.

Related

Script exits with error when var=$(... | grep "value") is empty, but works when grep has results

I have the following bash code (running on Red Hat) that is exiting when I enable set -o errexit and the variable in the code is empty, BUT works fine when the variable is set; the code is designed to test if a screen session matching .monitor_* exists, and if so do something.
I have the following turned on:
set -o errexit
set -x xtrace; PS4='$LINENO: '
If there is a session matching the above pattern it works; however, if nothing matches it just exits with no information other than the following output from xtrace
someuser:~/scripts/tests> ./if_test.sh
+ ./if_test.sh
+ PS4='$LINENO: '
4: set -o errexit
5: set -o pipefail
6: set -o nounset
88: /usr/bin/ls -lR /var/run/uscreens/S-storage-rsync
88: grep '.monitor_*'
88: awk '{ print $9 }'
88: /usr/bin/grep -Ev 'total|uscreens'
8: ms=
I tested the command I am using to set the ms var and it agrees with the xtrace output, it's not set.
someuser:~/scripts/tests> test -n "${mn}"
+ test -n ''
I have tried using a select statement and got the same results... I can't figure it out, anyone able to help? Thanks.
I read through all the possible solution recommendations, nothing seems to address my issue.
The code:
#!/usr/bin/env bash
set -o xtrace; PS4='$LINENO: '
set -o errexit
set -o pipefail
set -o nounset
ms="$(/usr/bin/ls -lR /var/run/uscreens/S-"${USER}" | /usr/bin/grep -Ev "total|uscreens" | grep ".monitor_*" | awk '{ print $9 }')"
if [[ -z "${ms}" ]]; then
echo "Handling empty result"
elif [[ -n "${ms}" ]]; then
echo "Handling non-empty result"
fi
The following answer was proposed: Test if a variable is set in bash when using "set -o nounset"; however, it doesn't address the issue at all. In my case the variable being tested is set and as stated in my detail, it's set to "", or nothing. Thank you; however, it doesn't help.
It really seems to be the variable declaration that it isn't liking.
ms="$(/usr/bin/ls -lR /var/run/uscreens/S-"${USER}" | /usr/bin/grep -Ev "total|uscreens" | grep ".monitor_*" | awk '{ print $9 }')"
You're running set -o pipefail, so if any component in a pipeline has a nonzero exit status, the entire pipeline is treated as having a nonzero exit status.
Your pipeline runs grep. grep has a nonzero status whenever no matches are found.
You're running set -o errexit (aka set -e). With errexit enabled, the script terminates whenever any command fails (subject to a long and complicated set of exceptions; some of these are presented in the exercises section of BashFAQ #105, and others touched on in this excellent reference).
Thus, when you have no matches in your grep command, your script terminates on the command substitution running the pipeline in question.
If you want to exempt a specific command from set -e's behavior, the easiest way to do it is to simply append ||: (shorthand for || true), which marks the command as "checked".
"You're running set -o pipefail. When grep doesn't match anything, it has a nonzero exit status, and with pipefail, that fails the entire pipeline. This is all behaving exactly the way you're telling your shell it should behave." – Charles Duffy
Charles above comment was exactly what was going on, my script was working as intended and I need to adjust the logic to work differently if I wish to keep set -o pipefail set.
Thank you for the help.

How to use return status value for grep?

Why isn't my command returning "0"?
grep 'Unable' check_error_output.txt && echo $? | tail -1
If I remove the 'echo $?' and use tail to get the last occurrence of 'Unable' in check_error_output.txt it returns correctly. If I remove the tail -1, or replace it the pipe with && it returns as expected.
What am I missing?
The following way achieves what you're wanting to do without the use of pipes or sub shells
grep -q 'Unable' check_error_output.txt && echo $?
The -q flag stands for quiet / silent
From the man pages:
Quiet; do not write anything to standard output. Exit immediately with zero status if any match is found, even if an error was detected. Also
see the -s or --no-messages option. (-q is specified by POSIX.)
This is still not fail safe since a "No such file or directory" error will still come up both ways.
I would instead suggest the following approach, since it will output either type of return values:
grep -q 'Unable' check_error_output.txt 2> /dev/null; echo $?
The main difference is that regardless of whether it fails or succeeds, you will still get the return code and error messages will be directed to /dev/null. Notice how I use ";" rather than "&&", making it echo either type of return value.
use process Substitution:
cat <(grep 'Unable' check_error_output.txt) <(echo $?) | tail -1
The simplest way to check the return value of any command in an if statement is: if cmd; then. For example:
if grep -q 'Unable' check_error_output.txt; then ...
I resolved this by adding brackets around the grep and $?
(grep 'Unable' check_error_output.txt && echo $?) | tail -1

Variable assignment exits shell script

I have simple shell script that tries to find out if a specific docker container is running. In the shell script I have the follwoing line;
RUNNING_CONTAINERS=$(docker ps -a | grep ${IMAGE_NAME})
If the grep returns no results, the shell script exits right there. How would I write my script to make sure the script continues to execute even if the result of the grep is empty?
The reason of this is the existence of set -e somewhere in the code, which makes your script exit as soon as a command returns a non-zero status. In this case, grep because it did not find any match.
As read in The Bash reference manual -> The set builtin
-e
Exit immediately if a pipeline (see Pipelines), which may consist of a
single simple command (see Simple Commands), a list (see Lists), or a
compound command (see Compound Commands) returns a non-zero status.
The shell does not exit if the command that fails is part of the
command list immediately following a while or until keyword, part of
the test in an if statement, part of any command executed in a && or
|| list except the command following the final && or ||, any command
in a pipeline but the last, or if the command’s return status is being
inverted with !. If a compound command other than a subshell returns a
non-zero status because a command failed while -e was being ignored,
the shell does not exit. A trap on ERR, if set, is executed before the
shell exits.
Also, from man grep:
EXIT STATUS
Normally the exit status is 0 if a line is selected, 1 if no lines
were selected, and 2 if an error occurred. However, if the -q or
--quiet or --silent is used and a line is selected, the exit status is 0 even if an error occurred.
So grep doesn't find anything and returns a non-zero exit status. Then, set -e captures it and sees it does not come from an "exception" (if, while... as mentioned in the reference), neither it is before the last command in the pipeline, so it exits.
Test
Let's create a very basic script:
$ cat a.sh
#!/bin/bash
set -e
echo "hello"
grep "hello" a
echo "bye"
And generate an empty a file:
$ touch a
If we run it we see it exits when grep doesn't return any result:
$ ./a.sh
hello
However, if we remove the set -e line, it goes through to the end of the file:
$ ./a.sh
hello
bye
See also it doesn't fail if grep is not the last element in the pipeline:
$ cat a.sh
#!/bin/bash
set -e
echo "hello"
grep "hello" a | echo "he"
echo "bye"
$ ./a.sh
hello
he
bye

How to get success status of last Bash shell command inside a shell script?

Sometimes there are two commands which I often invoke in a row. However the second command only makes sense in case the first command was successful.
I wanted to do something like this:
#!/bin/bash
if [ $? -gt 0 ]
then
echo "WARNING: previous command entered at the shell prompt failed"
fi
But it doesn't work:
t#quad:~$ echo "abc" | grep def
t#quad:~$ ./warnme.sh
Last command succeeded
What I'd like is something a bit like this:
t#quad:~$ echo "abc" | grep def
t#quad:~$ echo ${PIPESTATUS[1]}
1
Where we can clearly see that the last command failed.
The result I'd like to have:
t#quad:~$ echo "abc" | grep def
t#quad:~$ ./warnme.sh
WARNING: previous command entered at the shell prompt failed
I can't find a way to do it.
command1 && command2
does exactly what you want: command 2 is executed only if command1 succeeds. For example you could do:
ls a.out && ./a.out
Then a.out would only be executed if it could be listed. I wikiblogged about this at http://www.linuxintro.org/wiki/%26%26
One option is to put this just before the list of commands you want to execute only if the previous was successful:
set -e
This will exit the script if any of the commands following it return non-zero (usually a fail). You can switch it off again with:
set +e
Or if you'd prefer to switch it off for just one line you can just logical-OR the command with true:
mycommand || true
For a lot of my scripts I have set -e at the top of the script as a safety feature to prevent the script cascading and doing something dangerous.
How about:
echo "abc" | grep def || ./warnme.sh
Where warnme.sh is now only the warning without the test. It's only called if the first command fails.
In other words, it would be the same as:
echo "abc" | grep def || echo "WARNING: That didn't work..."

How can I use exit codes to run shell scripts sequentially?

Since cruise control is full of bugs that have wasted my entire week, I have decided the existing shell scripts I have are simpler and thus better.
Here is what I have so far
svn update /var/www/k12/
#svn log --revision "HEAD" /var/www/code/ | head -2 | tail -1 | awk '{print $1}' > /var/www/path/version.txt
# upload the files
rsync -ar --verbose --stats --progress --delete --exclude=*.svn /var/www/code/ example.com:/home/path
# bring database up to date
ssh example.com 'php /path/tasks/dbrefactor.php'
# notify me
ssh example.com 'php /path/tasks/build.php'
Only thing is the other day I changed the paths and forgot to update the rsync call. As a result the "notify me" step ran several times while I was figuring stuff out.
I know in linux you can do command1 && command2 and if command 1 "fails" command2 will not run, but how can I observe the "failure/success" exit codes for debugging purposes. Some of the scripts I wrote myself and I'm sure I will need to do something special.
The best option, especially for unattended scripts, is to set the -e shell option:
#!/bin/sh -e
or
set -e
This will cause the shell to stop executing if any (untested) command exits with a nonzero error code.
-e Exit immediately if a simple command (see SHELL GRAMMAR
above) exits with a non-zero status. The shell does not
exit if the command that fails is part of an until or
while loop, part of an if statement, part of a && or ||
list, or if the command's return value is being inverted
via !. A trap on ERR, if set, is executed before the
shell exits.
The exit code of a previous process happens to be in $? variable right after its execution. Usually (that's not required, but it's the convention everyone follows) the exit code of a successful command will be equal to 0, and any other value means an error.
Remember of the caveats! One of them is that after these commands:
svn log --revision "HEAD" /var/www/code/ | head -2 | tail -1 | awk '{print $1}'
echo "$?"
the zero result would most likely be returned, because in the $? the return code of awk is contained. To avoid it, set the pipefail option somewhere above the code:
set -o pipefail 1
The return value of the last-run command is stored in the variable $?. You can use that to determine which command to run next. Overview of special variables.
i think $? contains the last exit code
if [[ -z $? ]]
then
# notify me
ssh example.com 'php /path/tasks/build.php'
fi
I would suggest you can use the exit non zero at the points where the failure is expected and before processing step further you will check
if [ $? -neq 0 ]
then there is a failure.
The $? will always return a non zero number if the last process does not executed successfully.

Resources