Notify via email if something wrong got happened in the shell script - linux

fileexist=0
mv /data/Finished-HADOOP_EXPORT_&Date#.done /data/clv/daily/archieve-wip/
fileexist=1
--some other script below
Above is the shell script I have in which in the for loop, I am moving some files. I want to notify myself via email if something wrong got happened in the moving process, as I am running this script on the Hadoop Cluster, so it might be possible that cluster went down while this was running etc etc. So how can I have better error handling mechanism in this shell script? Any thoughts?

Well, atleast you need to know "What are you expecting to go wrong". based on that you can do this
mv ..... 2> err.log
if [ $? -ne 0 ]
then
cat ./err.log | mailx -s "Error report" admin#abc.com
rm ./err.log
fi
Or as William Pursell suggested, use-
trap 'rm -f err.log' 0; mv ... 2> err.log || < err.log mailx ...
mv may return a non-zero return code upon error, and $? returns that error code. If the entire server goes down then unfortunately this script doesn't run either so that's better left to more advanced monitoring tools such as Foglight running on a different monitoring server. For more basic checks, you can use method above.

Related

What does ps actually return? (Different value depending on how it is called)

I have a script containing this snippet:
#!/bin/bash
set +e
if [ -O "myprog.pid" ]; then
PID=`/bin/cat myprog.pid`
if /bin/ps -p ${PID}; then
echo "Already running" >> myprog.log
exit 0
else
echo "Old pidfile found" >> myprog.log
fi
else
echo "No pidfile found" >> myprog.log
fi
echo $$ > myprog.pid
This file is called by a watchdog script, callmyprog, which looks like this:
#!/bin/bash
myprog &
It seems to be a problem with if /bin/ps -p ${PID}. The problem manifests itself in this way. If I manually call myprog when it is running I get the message "Already running" as it should. Same thing happens when I manually run the script callmyprog. But when the watchdog runs it, I instead get "Old pidfile found".
I have checked the output from ps and in all cases it finds the process. When I'm calling myprog manually - either directly or through callmyprog, I get the return code 0, but when the watchdog calls it I get the return code 1. I have added debug printouts to the above snippets to print basically everything, but I really cannot see what the problem is. In all cases it looks something like this in the log when the ps command is run from the script:
$ ps -p 1
PID TTY TIME CMD
1 ? 01:06:36 systemd
The only difference is that the return value is different. I checked the exit code with code like this:
/bin/ps -p ${PID}
echo $? >> myprog.log
What could possibly be the cause here? Why does the return code vary depending on how I call the script? I tried to download the source code for ps but it was to complicated for me to understand.
I was able to "solve" the problem with an ugly hack. I piped ps -p $PID | wc -l and checked that the number of lines were at least 2, but that feels like an ugly hack and I really want to understand what the problem is here.
Answer to comment below:
The original script contains absolute paths so it's not a directory problem. There is no alias for ps. which ps yields /bin/ps. The scripts are run as root, so I cannot see how it can be a permission problem.

Keep a script running through ssh after logout

This is the first question that I post here. I tried to do a throughout search, but if I haven't (and the answer is obvious somewhere else), please just let me know.
I have a script that runs a program for me, here it is:
csv_file=../data/teste_nohup.csv
trace_file=../data/gnp.trace
declare -i n=100
declare -i p=1
declare -i counter=0
while [ $counter -lt 3 ];
do
n=100
while true
do
nice -19 sage gnptest.py ${n} ${p} | tee -a ${csv_file}
notify-send "finished test gnp ${n} ${p}"
done
done
So, what I'm trying to do is run the gnptest.py program a few times, and have the result be written to the csv_file.
The problem is, that depending on the input, the program may take a long time to complete. So I'd like to connect to the server over ssh, start the program, close the terminal, and check the output file from time to time.
I've tried nohup and disown. Nohup creates a huge nohup.out file, full with errors that I don't get while normally running the script (it complains about using the -lt operand, for example). But the biggest problem that I'm facing is that no command (nohup ou disown -h) is executing the program and sending the output to the file that I've specified in the csv_file variable, which is being done using the tee command. Also, none of them seem to continue running after I logout...
Any help will be much appreciated.
Thanks in advance!!
i hv just joined so cannt add comment
Please try by using redirection instead of tee in script
And to get rid of Nohup.out use following to run script
nohup script.sh > /dev/null 2>&1 &
If above produces error use
nohup script.sh > /dev/null 2>&1 </dev/null &
Hope this will help.

Grab output from a script that was ran within another

I know the title is a bit confusing but here's my situation
#!/bin/bash
for node in `cat path/to/node.list`
do
echo "Checking node: $node"
ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no me#$node "nohup scriptXYZ.sh"
done
Basically scriptXYZ has an output. Something like: Node bla is Up or Node bla is Down. I want to do something that amounts to the following psudo code:
if (output_from_scriptXYZ == "Node bla is Up")
do this
else
do that
I've been trying to find a way to do this online but I couldn't find something that does this. Then again, I might not know what I'm searching for.
Also as a bonus: Is there any way to tell if the scriptXYZ has an error while it ran? Something like a "file does not exist" -- not the script itself but something the script tried to do and thus resulting in an error.
First, is it possible to have scriptXYZ.sh exit with 0 if the node is up, and non-zero otherwise? Then you can simply do the following, instead of capturing the output. The scripts standard output and standard error will be connected to you local terminal, so you will see them just as if you had run it locally.
#!/bin/bash
while read -r node; do
echo "Checking node: $node"
if ssh -o UserKnownHostsFile=/dev/null \
-o StrictHostKeyChecking=no me#$node scriptXYZ.sh; then
do something
else
do something else
fi
done < path/to/node.list
It doesn't really make sense to run your script with nohup, since you need to stay connected to have the results sent back to the local host.

Redirecting Output of Bash Child Scripts

I have a basic script that outputs various status messages. e.g.
~$ ./myscript.sh
0 of 100
1 of 100
2 of 100
...
I wanted to wrap this in a parent script, in order to run a sequence of child-scripts and send an email upon overall completion, e.g. topscript.sh
#!/bin/bash
START=$(date +%s)
/usr/local/bin/myscript.sh
/usr/local/bin/otherscript.sh
/usr/local/bin/anotherscript.sh
RET=$?
END=$(date +%s)
echo -e "Subject:Task Complete\nBegan on $START and finished at $END and exited with status $RET.\n" | sendmail -v group#mydomain.com
I'm running this like:
~$ topscript.sh >/var/log/topscript.log 2>&1
However, when I run tail -f /var/log/topscript.log to inspect the log I see nothing, even though running top shows myscript.sh is currently being executed, and therefore, presumably outputting status messages.
Why isn't the stdout/stderr from the child scripts being captured in the parent's log? How do I fix this?
EDIT: I'm also running these on a remote machine, connected via ssh using pseudo-tty allocation, e.g. ssh -t user#host. Could the pseudo-tty be interfering?
I just tried your the following: I have three files t1.sh, t2.sh, and t3.sh all with the following content:
#!/bin/bash
for((i=0;i<10;i++)) ; do
echo $i of 9
sleep 1
done
And a script called myscript.sh with the following content:
#!/bin/bash
./t1.sh
./t2.sh
./t3.sh
echo "All Done"
When I run ./myscript.sh > topscript.log 2>&1 and then in another terminal run tail -f topscript.log I see the lines being output just fine in the log file.
Perhaps the things being run in your subscripts use a large output buffer? I know when I've run python scripts before, it has a pretty big output buffer so you don't see any output for a while. Do you actually see the entire output in the email that gets sent out at the end of topscript.sh? Is it just that while the processes run you're not seeing the output?
try
unbuffer topscript.sh >/var/log/topscript.log 2>&1
Note that unbuffer is not always available as a std binary in old-style Unix platforms and may require a search and installation for a package to support it.
I hope this helps.

How can I check a file exists and execute a command if not?

I have a daemon I have written using Python. When it is running, it has a PID file located at /tmp/filename.pid. If the daemon isn't running then PID file doesn't exist.
On Linux, how can I check to ensure that the PID file exists and if not, execute a command to restart it?
The command would be
python daemon.py restart
which has to be executed from a specific directory.
[ -f /tmp/filename.pid ] || python daemon.py restart
-f checks if the given path exists and is a regular file (just -e checks if the path exists)
the [] perform the test and returns 0 on success, 1 otherwise
the || is a C-like or, so if the command on the left fails, execute the command on the right.
So the final statement says, if /tmp/filename.pid does NOT exist then start the daemon.
test -f filename && daemon.py restart || echo "File doesn't exists"
If it is bash scripting you are wondering about, something like this would work:
if [ ! -f "$FILENAME" ]; then
python daemon.py restart
fi
A better option may be to look into lockfile
The other answers are fine for detecting the existence of the file. However for a complete solution you probably should check that the PID in the pidfile is still running, and that it's your program.
Another approach to solving the problem is a script that ensures that your daemon "stays" alive...
Something like this (note: signal handling should be added for proper startup/shutdown):
$PIDFILE = "/path/to/pidfile"
if [ -f "$PIDFILE" ]; then
echo "Pid file exists!"
exit 1
fi
while true; do
# Write it's own pid file
python your-server.py ;
# force removal of pid in case of unexpected death.
rm -f $PIDFILE;
# sleep for 2 seconds
sleep 2;
done
In this way, the server will stay alive even if it dies unexpectedly.
You can also use a ready solution like Monit.
ls /tmp/filename.pid
It returns true if file exists. Returns false if file does not exist.

Resources