My crond job doesn't work as expected, why? - linux

I created a shell script to check a tomcat instance status. If the instance is not started, then start it:
if [ `ps -ef | grep 'travelco' | grep -v grep | wc -l` -eq 0 ];then
sudo /home/q/tools/bin/restart_tomcat.sh /home/www/travelco/
else
echo 'travelco started'
fi
Then I tested the script and it worked well. But after I added it as a crond job, this script didn't work as expected.
I used crontab -e, and added
*/1 * * * * /home/yuliang.jin/travelcoCheck.sh
After that, even though I can see the script executed in the crontab log(sudo tail -f /var/log/cron), the tomcat instance was not started. Why?

There's a sudo in your script but are you sure that your current user has the permission to execute /home/q/tools/bin/restart_tomcat.sh without password authentication?
You should add the script to /etc/sudoers to allow your current user to execute the script without password, or you can just sudo crontab -e to run the script as root (and don't forget to delete sudo in your script if you do so).

If there is any other option, don't sudo in a cron job.
travelcoCheck.sh will be matched by the grep travelco and is not cancelled by the grep -v grep, so wc -l will be at least 1 always. So restart_tomcat.sh will not run.
(As a side note: whether or not your ps-parsing stack gets caught by ps is something of a dark art and is full of corner cases and race conditions and generally difficult to get to work right. Stuff like this is why dbus was invented.)

Related

Cron script to restart memcached not working

I have a script in cron to check memcached and restart it if it's not working. For some reason it's not functioning.
Script, with permissions:
-rwxr-xr-x 1 root root 151 Aug 28 22:43 check_memcached.sh
Crontab entry:
*/5 * * * * /home/mysite/www/check_memcached.sh 1> /dev/null 2> /dev/null
Script contents:
#!/bin/sh
ps -eaf | grep 11211 | grep memcached
if [ $? -ne 0 ]; then
service memcached restart
else
echo "eq 0 - memcache running - do nothing"
fi
It works fine if I run it from the command line but last night memcached crashed and it was not restarted from cron. I can see cron is running it every 5 minutes.
What am I doing wrong?
Do I need to use the following instead of service memcached restart?
/etc/init.d/memcached restart
I have another script that checks to make sure my lighttpd instance is running and it works fine. It works a little differently to verify it's running but is using the init.d call to restart things.
Edit - Resolution: Using /etc/init.d/memcached restart solved this problem.
What usually causes crontab problems is command paths. In the command line, the paths to commands are already there, but in cron they're often not. If this is your issue, you can solve it by adding the following line into the top of your crontab:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
This will give cron explicit paths to look through to find the commands your script runs.
Also, your shebang in your script is wrong. It needs to be:
#!/bin/bash
I suspect the problem is with the grep 11211 - it's not clear the meaning of the number, and that grep may not be matching the desired process.
I think you need to log the actions of this script - then you see what's actually happening.
#!/bin/bash
exec >> /tmp/cronjob.log 2>&1
set -xv
cat2 () { tee -a /dev/stderr; }
ps -ef | cat2 | grep 11211 | grep memcached
if [ $? -ne 0 ]; then
service memcached restart
else
echo "eq 0 - memcache running - do nothing"
fi
exit 0
The set -xv output is captured to a log file in /tmp. The cat2 will copy the stdin to the log file, so you can see what grep is acting upon.
Save below code as check_memcached.sh
#!/bin/bash
MEMCACHED_STATUS=`systemctl is-active memcached.service`
if [[ ${MEMCACHED_STATUS} == 'active' ]]; then
echo " Service running.... so exiting "
exit 1
else
service memcached restart
fi
And you can schedule it as cron.

pkill with -f flag in crontab not running command after semi colon

I wanted to kill a process and remove a flag indicating that process is running. cron:
00 22 * * 1-5 pkill -f script.sh >log 2>&1 ; rm lock >log 2>&1
This works perfectly when I run it on terminal. But in crontab rm is not running. All I can think of is that whole line after -f flag is being taken as arguments for pkill.
Any reason why this is happening?
Keeping them as separate cron entries is working. Also pkill without -f flag is running (though it doesn't kill process as I want pattern to be searched in whole command).
Ran into this problem today and just wanted to post a working example for those who run into this:
pkill -f ^'python3 /Scripts/script.py' > /dev/null 2>&1 ; python3 /Scripts/script.py > /tmp/script.log 2>&1
This runs pkill and searches the whole command (-f) that starts with (regex ^) python3 /Scripts/script.py. As such, it'll never kill itself because it does not start with that command (it starts with pkill).
the short answer: it simply killed itself!
my answer explained:
if you let a command get started by a crond it'll be executed in a subshell. most probably the line you'll find in ps or htop will look like this:
/bin/sh -c pkill -f script.sh >log 2>&1 ; rm lock >log 2>&1
(details may vary. e.g. you might have bash instead of sh)
the point is, that the whole line got one PID (process id) and is one of the command lines which pgrep/pkill is parsing when using the '-f' parameter. as stated in the man page:
-f, --full
The pattern is normally only matched against the process name. When -f is set, the full command line is used.
now your pkill is looking for any command line in your running process list, which somehow contains the expression 'script.sh' and eventually will find that line at some point. as a result of it's finding, it'll get that PID and terminate it. unfortunately the very same PID holds the rest of you command chain, which just got killed by it self.
so you basically wrote a 'suicide line of commands' ;)
btw: i just did the same thing today and thats how i found your question.
hope this answer helps, even if it comes a little late
kind regards
3.141592 and nanananananananananananaBATMAN's answer is correct.
I worked around this problem like this.
00 22 * * 1-5 pkill -f script.[s][h] >log 2>&1 ; rm lock >log 2>&1
This works because script.[s][h](string) is not matched with script.[s][h](regex).

Linux bash script that kills a process (not started by me) after x amount of time

I'm pretty inexperienced with Linux bash. That being said, I have a CentOS7 machine that runs a COTS application server. This application server runs other processes that sometimes hang. Since I have no control over the start of these processes, I'm looking for a script that runs every 2 minutes that kills processes of the name "spicer" that have been running for longer than 10 minutes. I've looked around and have only been able to find answers for processes that are run and owned by me.
I use the command ps -eo pid, command,etime | grep spicer to get all the spicer processes. The output of this command looks like:
18216 spicer -l/opt/otmm-10.5/Spi 14:20
18415 spicer -l/opt/otmm-10.5/Spi 11:49
etc...
18588 grep --color=auto spicer
I don't know if there's a way to parse this directly in bash. I'm also not well-versed at all in other Linux tools. I know that awk (or gawk) could possibly help.
EDIT
I have no control over the data that the process is working on.
What about wrapping the executable of spicer and start it using the timeout command? Let's say it is installed in /usr/bin/spicer. Then issue:
cp /usr/bin/spicer{,.orig}
echo '#!/bin/bash' > /usr/bin/spicer
echo 'timeout 10m spicer.orig "$#"' >> /usr/bin/spicer
Another approach would be to create a cronjob defintion into /etc/cron.d/kill_spicer. Like this:
* * * * * root kill $(ps --no-headers -C spicer -o pid,etimes | awk '$2>=600{print $1}')
The cronjob will get executed minutely and uses ps to obtain a list of spicer processes that run longer than 10minutes and passes them to kill.
Probably you even want kill -9 if the process is hanging.
You can use the -C option of ps to select processes by name.
ps --no-headers -C spicer -o pid,etime
Then you can use cut to filter the results, if the spacing is consistent. On my system the pid field takes up 8 characters, so I'd use
kill $(ps --no-headers -C spicer -o pid,etime | cut -c-8)
If the spacing is inconsistent (but if so, what kind of messed up ps are you using? :-P), you can use awk { print $1 } instead of cut.

Bash script commands not running

I have seafile (http://www.seafile.com/en/home/) running on my NAS and I set up a cron tab that runs a script every few minutes to check if the seafile server is up, and if not, it will start it
The script looks like this:
#!/bin/bash
# exit if process is running
if ps aux | grep "[s]eafile" > /dev/null
then exit
else
# restart process
/home/simon/seafile/seafile-server-latest/seafile.sh start
/home/simon/seafile/seafile-server-latest/seahub.sh start-fastcgi
fi
running /home/simon/seafile/seafile-server-latest/seafile.sh start and /home/simon/seafile/seafile-server-latest/seahub.sh start-fastcgi individually/manually works without a problem, but when I try to manually run this script file, neither of those lines execute and seafile/seahub do not start
Is there an error in my script that is preventing execution of those 2 lines? I've made sure to chmod the script file to 755
The problem is likely that when you pipe commands into one another, you don't guarentee that the second command doesn't start before the first (it can start, but not do anything while it waits for input). For example:
oj#ironhide:~$ ps -ef | grep foo
oj 8227 8207 0 13:54 pts/1 00:00:00 grep foo
There is no process containing the word "foo" running on my machine, but the grep that I'm piping ps to appears in the process list that ps produces.
You could try using pgrep instead, which is pretty much designed for this sort of thing:
if pgrep "[s]eafile"
Or you could add another pipe to filter out results that include grep:
ps aux | grep "[s]eafile" | grep -v grep
If the name of this script matches the regex [s]eafile it will trivially always take the exit branch.
You should probably be using pidof in preference of reinventing the yak shed anyway.
turns out the script itself was working ok, although the change to using pgrep is much nicer. the problem was actually in the crontab (didn't include the sh in the command)

Script produces different result when executed by Bash than by cron

Please consider following crontab (root):
SHELL=/bin/bash
...
...
0 */3 * * * /var/maintenance/raid.sh
And the bash script /var/maintenance/raid.sh:
#!/bin/bash
echo -n "Checking /dev/md0... "
if ! [ $(mdadm --detail /dev/md0 | grep -c "active sync") -eq 2 ]; then
mdadm --detail /dev/md0 | mail -s "Raid problem /dev/md0" "my#email.com";
echo "ERROR"
else
echo "ALL OK"
fi;
#-------------------------------------------------------
echo -n "Checking /dev/md1... "
...
And this is what happen when...
...executed from shell prompt (bash):
Mail with mdadm --detail /dev/md0 output is sent to my email (proper behaviour)
...executed by cron:
Blank mail is sent to my email (subject is there, but there is no message)
Why such difference and how to fix it?
As indicated in the comments, do use full paths on crontab scripts, because crontab does have different environment variables than the normal user (root in this case).
In your case, instead of mdadm, /sbin/mdadm makes it.
How to get the full path of a command? Using the command command -v:
$ command -v rm
/bin/rm
cron tasks run in a shell that is started without your login scripts being run, which set up paths, environment variables etc.
When building cron tasks, prefer things like absolute paths and explicit options etc
Before running your script as a cron job, you can test it with no environment variables using env -i
env -i /var/maintenance/raid.sh

Resources