We have a wrapper script for Teradata TPT utility. The wrapper script is pretty straightforward but the problem is that the exit status of the wrapper is not the same as that of the utility. In many cases, the script returns 0 even if the utility fails. I have saved the exit status in a separate variable because some steps need to be done before exiting but exiting with this variable's value doesn't seem to work. Or is the utility returning status 0 even in case of some failures even if the logs clearly specify some other status?
The worse part is, this behavior is quite random, sometimes the script does fail with the exit status of the utility. I want to be sure if there is some problem with utility's exit status.
The script runs through KSH. The final part of the wrapper script is:
tbuild -f $sql.tmp -j ${id}_$JOB >$out 2>&1
ret_code=$?
cd ${TWB_ROOT}/logs
logpath=`ls -t ${TWB_ROOT}/logs/${id}_${JOB}*.out |head -1`
logpath1=${logpath##*/}
logname=${logpath1%-*}
tlogview -l ${logpath} > /edw/$GROUP/tnl/jobs/$JOB/logs/tpt_logs/${logname}.log
###Mainting 3 tpt binary log files
if [ $ret_code -eq 0 ]
then
binout=$TPTLOGDIR/${logname}.dat
binout1=$TPTLOGDIR/${logname}.dat1
binout2=$TPTLOGDIR/${logname}.dat2
[ -f $binout1 ] && mv $binout1 $binout2
[ -f $binout ] && mv $binout $binout1
mv "$logpath" "/edw/${GROUP}/tnl/jobs/$JOB/logs/tpt_logs/${logname}.dat"
fi
rm -f $sql.tmp
echo ".exit"
exit $ret_code
Thanks in advance for the help and suggestions.
The script looks ok, and should indeed return the same exit code as the tbuild utility.
It comes down to knowledge of the specific product.
I've never worked with any of these products, but Teradata has an ample User Guide for the Parallel Transporter, with an explicit Post-Job Considerations section, warning:
Even if the job completed successfully, action may still be required based on error and warning information in the job logs and error tables.
So technically, a job might complete, but results may vary from time to time.
I guess you have to define your own policies and scan the logfiles for patterns of warnings and error messages, and then generate your own exit codes for semantic failures. Tools like logstash or splunk might come in handy.
BTW, you might consider using logrotate for rotating the $TPTLOGDIR/${logname}.dat files.
Turns out that the issue was in the utility itself, as suspected. Shell script worked fine.
Related
I understand that to check for error level in Linux can be done by using $?. The thing is that the $? value is reset if one of the commands is successfully performed, even if the previous command failed. The code I use for testing as below:
cd /vobs/test2/test3
if [ $2 = "R" ]; then
mv missing ~/missing2
echo "Created"
Assuming that mv missing ~/missing2 failed the $? should be equal to 1 but due to the last command echo "Created" is performed the $? will be equal to 0. How to perform a scan for the code above so that the moment that $?=1 it will execute exit 1 command. I can perform if else for every command execute but it is not the best way to perform, is it? I need some advice on this, please.
You can exit on the first failing command using set -e. This is part of an unofficial "strict bash mode":
set -e
To exit on all errors.
set -o pipefail
To fail on the first issue in a pipeline rather than only last one.
set -u
To fail on access to undefined variables.
I have a bash script which acts as a wrapper for an analysis pipeline. If the script errors out I want to be able to run the script from the point at which the errors occurred by simply re-running the original command. I have set two different traps; one which will remove the last file being generated on a non-zero exit from my script, the other will remove all the temporary files on exit signal = 0 and essentially cleans up the file system at the end of the run. I turned on noclobber in the bash environment which allows my script to skip over lines of the script where files have already been written but this will only do this if I do not set the non-zero exit trap. As soon as I set this trap then it will exit at the first line where noclobber IDs a file it will not overwrite. Is there a way for me to skip over lines of code that have successfully run previously rather than having to re-run my code from the start? I know I could use conditional statements for each line but I thought there might be a neater way of doing this.
set -o noclobber
# Function to clean up temporary folders when script exits at the end
rmfile() { rm -r $1 }
# Function to remove the file being currently generated
# Function executed if script errors out
rmlast() {
if [ ! -z "$CURRENTFILE" ]
then
rm -r $1
exit 1
fi }
# Trap to remove the currently generated file
trap 'rmlast "$CURRENTFILE"' ERR SIGINT
#Make temporary directory if it has not been created in a previous run
TEMPDIR=$(find . -name "tmp*")
if [ -z "$TEMPDIR" ]
then
TEMPDIR=$(mktemp -d /test/tmpXXX)
fi
# Set CURRENTFILE variable
CURRENTFILE="${TEMPDIR}/Variants.vcf"
# Set CURRENTFILE variable
complexanalysis_tool input_file > $CURRENTFILE
# Set CURRENTFILE variable
CURRENTFILE="${TEMPDIR}/Filtered.vcf"
complexanalysis_tool2 input_file2 > $CURRENTFILE
CURRENTFILE="${TEMPDIR}/Filtered_2.vcf"
complexanalysis_tool3 input_file3 > $CURRENTFILE
# Move files to final destination folder
mv -nv $TEMPDIR/*.vcf /test/newdest/
# Trap to remove temporary folders when script finishes running
trap 'rmfile "$TEMPDIR"' 0
Update:
I have been offered answers suggesting the use of the make utility. I want to make use of its inbuilt utility to check if a dependency has been fulfilled.
In my hands the makefile suggested by VK Kashyap does not seem to skip execution for previously accomplished tasks. So for example I ran the script above and interrupted the script when it was running filtered.vcf with ctrl c. When I rerun the script again it runs from the beginning again i.e. starts again at varaints.vcf. Am I missing something in order to get the makefile to show sources as being fullfilled?
Answer to update:
OK this is a rookie mistake but since I am not familiar with generating makefiles I will post this explanation of my error. The reason my makefile was not rerunning from the exit point was that I had named the targets a different name to the output files being generated. So as VK Kashyap quite correctly answered if you name the targets eg.
variants.vcf
filtered.vcf
filtered2.vcf
the same as the output files being generated then the script will skip previously accomplished tasks.
make utility might be an answer for the thing you want to achive.
it has inbuilt dependecy checking (the stuff which you are trying to achive with tmp files)
#run all target when all of the files are available
all: variants.vcf filtered.vcf filtered2.vcf
mv -nv $(TEMPDIR)/*.vcf /test/newdest/
variants.vcf:
complexanalysis_tool input_file > variants.vcf
filtered.vcf:
complexanalysis_tool2 input_file2 > filtered.vcf
filtered2.vcf:
complexanalysis_tool3 input_file3 > filtered2.vcf
you may use bash script to invoke this make file as:
#/bin/bash
export TEMPDIR=xyz
make -C $TEMPDIR all
make utility will check itself for already accomplished task and skip execution for done stuffs. it will continue where you had the error finishing the task.
you can find more details on internet about exact syntax for makefile.
there is no built-in way to do that.
however, you could brew something like that by keeping track of the last successful line and building your own goto statement, as described here and in Is there a "goto" statement in bash? (just replace the 'labels' with actual line-numbers).
however, the question is whether this is really a smart idea.
a better way is to only run the commands needed, not the commands not-yet-executed.
this could be done either by explicit conditionals in your bash-script:
produce_if_missing() {
# check if first argument is existing
# if not run the rest of the arguments and pipe it into the first one
local curfile=$1
shift
if [ ! -e "${curfile}" ]; then
$# > "${curfile}"
fi
}
produce_if_missing Variants.vcf complexanalysis_tool input_file
produce_if_missing Filtered.vcf complexanalysis_tool2 input_file2
or using tools that are made for such things (see VK Kahyap's answer using make, though i prefer using variables in the make-rules to minimize typos):
Variants.vcf: input_file
complexanalysis_tool $^ > $#
Filtered.vcf: input_file
complexanalysis_tool2 $^ > $#
I am having a problem getting bash to do exactly what I want, it's not a major issue, but annoying.
1.) I have a third party software I run that produces some output as stderr. Some of it is useful, some of it is regularly stuff I don't care about and I don't want this dumped to screen, however I do want the useful parts of the stderr dumped to screen. I figured the best way to achieve this was to pass stderr to a function, then use conditions in that function to either show the stderr or not.
2.) This works fine. However the solution I have implemented dumped out my errors at the right time, but then returns a bash prompt and I want to summarise the status of the errors at the end of the function, but echo-ing here prints the text after the prompt meaning that I have to press enter to get back to a clean prompt. It shall become clear with the example below.
My error stream generator:
./TestErrorStream.sh
#!/bin/bash
echo "test1" >&2
My function to process this:
./Function.sh
#!/bin/bash
function ProcessErrors()
{
while read data;
do
echo Line was:"$data"
done
sleep 5 # This is used simply to simulate the processing work I'm doing on the errors.
echo "Completed"
}
I source the Function.sh file to make ProcessErrors() available, then I run:
2> >(ProcessErrors) ./TestErrorStream.sh
I expect (and want) to get:
user#user-desktop:~/path$ 2> >(ProcessErrors) ./TestErrorStream.sh
Line was:test1
Completed
user#user-desktop:~/path$
However what I really get is:
user#user-desktop:~/path$ 2> >(ProcessErrors) ./TestErrorStream.sh
Line was:test1
user#user-desktop:~/path$ Completed
And no clean prompt. Of course the prompt is there, but "Completed" is being printed after the prompt, I want to printed before, and then a clean prompt to appear.
NOTE: This is a minimum working example, and it's contrived. While other solutions to my error stream problem are welcome I also want to understand how to make bash run this script the way I want it to.
Thanks for your help
Joey
Your problem is that the while loop stay stick to stdin until the program exits.
The release of stdin occurs at the end of the "TestErrorStream.sh", so your prompt is almost immediately available compared to what remains to process in the function.
I suggest you wrap the command inside a script so you'll be able to handle the time you want before your prompt is back (I suggest 1sec more than the suspected time needed for the function to process the remaining lines of codes)
I successfully managed to do this like that :
./Functions.sh
#!/bin/bash
function ProcessErrors()
{
while read data;
do
echo Line was:"$data"
done
sleep 5 # simulate required time to process end of function (after TestErrorStream.sh is over and stdin is released)
echo "Completed"
}
./TestErrorStream.sh
#!/bin/bash
echo "first"
echo "firsterr" >&2
sleep 20 # any number here
./WrapTestErrorStream.sh
#!/bin/bash
source ./Functions.sh
2> >(ProcessErrors) ./TestErrorStream.sh
sleep 6 # <= this one is important
With the above you'll get a nice "Completed" before your prompt after 26 seconds of processing. (Works fine with or without the additional "time" command)
user#host:~/path$ time ./WrapTestErrorStream.sh
first
Line was:firsterr
Completed
real 0m26.014s
user 0m0.000s
sys 0m0.000s
user#host:~/path$
Note: the process substitution ">(ProcessErrors)" is a subprocess of the script "./TestErrorStream.sh". So when the script ends, the subprocess is no more tied to it nor to the wrapper. That's why we need that final "sleep 6"
#!/bin/bash
function ProcessErrors {
while read data; do
echo Line was:"$data"
done
sleep 5
echo "Completed"
}
# Open subprocess
exec 60> >(ProcessErrors)
P=$!
# Do the work
2>&60 ./TestErrorStream.sh
# Close connection or else subprocess would keep on reading
exec 60>&-
# Wait for process to exit (wait "$P" doesn't work). There are many ways
# to do this too like checking `/proc`. I prefer the `kill` method as
# it's more explicit. We'd never know if /proc updates itself quickly
# among all systems. And using an external tool is also a big NO.
while kill -s 0 "$P" &>/dev/null; do
sleep 1s
done
Off topic side-note: I'd love to see how posturing bash veterans/authors try to own this. Or perhaps they already did way way back from seeing this.
I am writing a script to check whether sybase is running on my server. If it is not running, i want to start the service. If it is running, i want to stop the sybase iq.
Please help me doing the same.
The logic i have written is :
if(sybaseiq = active)
then
stop_iq
else
start_iq ".cfg" ".db"
Below is the code which I found on internet.But i am not able to understand what they are doing there. Please answer me with explanation.
isql -U${USERNAME} -P${PASSWORD} -S${SQL_SERVER} -w1000 << ! > ${LOG_FILE}
exit
!
if [[ $? != 0 ]]
then
msg="`date` ${SQL_SERVER} problem. ${SQL_SERVER} on ${HOST} is down or cannot be accessed"
cat ${LOG_FILE}|/usr/bin/mailx -s "${msg}" ${SUPPORT}
}
exit 1
fi
Thanks a lot in advance
The script is fairly straight forward
First the script logs into the server via isql, redirecting the output to a log file. If it's able to connect, it issues all the commands between the exclamation points, which is an exit in this case.
Next the if statement checks the error status of the last command run $?. 0 indicates no error, anything else indicates an error. So if the error is not 0, then create a message, then send that message, along with the log file to someone.
You will have to set the values for $USERNAME, $PASSWORD, $SQL_SERVER, $LOG_FILE, $HOST and $SUPPORT somewhere in your script.
If you are not familiar with shell scripts, I would recommend you read up a bit. It's quite easy to get into, but they are quite powerful for managing *nix systems.
I'm debugging someone else's code, and I've run into a situation I wouldn't know how to produce if I tried to code it deliberately. It's coming from a very large Bash script, being run by Bash 4.1.2 on a CentOS 6 box. While the overall program is gigantic, the error consistently occurs in the following function:
get_las() {
echo "Getting LAS..."
pushd ${ferret_workdir} >& /dev/null
#Download:
if [ ! -e ${las_dist_file} ] || ((force_install)) ; then
echo "Don't see LAS tar file ${las_dist_file}"
echo "Downloading LAS from ${las_dist_file} -to-> $(pwd)/${las_dist_file}"
echo "wget -O '${las_dist_file}' '${las_tar_url}'"
wget -O "${las_dist_file}" "${las_tar_url}"
[ $? != 0 ] && echo " ERROR: Could not download LAS:${las_dist_file}" && popd >/dev/null && checked_done 1
fi
popd >& /dev/null
return 0
}
If I allow the script to run from scratch in a pristine environment, when this section is reached it will spit out the following error and die:
Don't see LAS tar file las-esg-v7.3.9.tar.gz
Downloading LAS from las-esg-v7.3.9.tar.gz -to-> /usr/local/src/esgf/workbench/esg/ferret/7.3.9/las-esg-v7.3.9.tar.gz
wget -O 'las-esg-v7.3.9.tar.gz' 'ftp://ftp.pmel.noaa.gov/pub/las/las-esg-v7.3.9.tar.gz'
/usr/local/bin/esg-product-server: line 428: /usr/bin/wget: Argument list too long
ERROR: Could not download LAS:las-esg-v7.3.9.tar.gz
Note that I even have a debug echo in there to prove that the arguments are only two small strings.
If I let the program error out at the point above and then immediately re-run it from the same expect script, with the only change being that it has already completed all the stages prior to this one and is detecting that and skipping them, this section will execute normally with no error. This behavior is 100% reproducible on my test box -- if I purge all traces that running the code leaves, the first run thereafter bombs out at this point, and subsequent runs will be fine.
The only thing I can think is that I've run into some obscure bug in Bash itself that is somehow causing it to leak MAX_ARG_PAGES memory invisibly, but I can't think of even any theoretical ways to make this happen, so I'm asking here.
What the heck is going on and how do I make it stop (without extreme measures like recompiling the kernel to just throw more memory at it)?
Update: To answer a question in the comments, line 428 is
wget -O "${las_dist_file}" "${las_tar_url}"
The error E2BIG refers to the sum of the bytes in the environment and the argv list. Has the script exported a huge number (or huge size) of variables? Run printenv right before the wget to see what's going on.