We have a mail server that is dying and in the process of having accounts migrated to a new server before decommissioning. With 800+ email accounts across 25+ domains, it is important for this machine to stay up until migration is finished.
Lately it has started to fill up with error logs, which freeze mysql because of no space, stop mail flow, and generally give me a headache. Until the root problem of the errors can be found and fixed, I have come up with a script to check if Dovecot and Amavis-new are running, and if not restarts them.
After reading:
https://stackoverflow.com/a/7096003/4820993
As well as a few other common examples, I came up with this.
netstat -an|grep -ce ':993.*LISTEN' >/dev/null 2>&1
if [ $? = 0 ]
then
echo 'Dovecot is up';
else
echo 'Dovecot is down, restarting...';
/etc/init.d/dovecot restart
logger -p mail.info dovecot_keepalive: Dovecot is down, restarting...
fi
/etc/init.d/amavis status |grep -ce 'running' >/dev/null 2>&1
if [ $? = 0 ]
then
echo 'AmavisD is up';
else
echo 'AmavisD is down, restarting...';
/etc/init.d/amavis restart
sleep 2
/etc/init.d/amavis status |grep -ce 'running' >/dev/null 2>&1
if [ $? = 1 ]
then
echo 'AmavisD had a problem restarting, trying to fix it now...';
logger -p mail.info amavis_keepalive: AmavisD had a problem restarting...
output=$(ps aux|grep a\[m\]avisd)
set -- $output
pid=$2
kill $pid
rm /var/run/amavis/amavisd.pid
/etc/init.d/amavis start
else
echo 'AmavisD restarted successfully';
logger -p mail.info amavis_keepalive: AmavisD is down, restarting...
fi
fi
Who knows, I'm probably making it harder that it is, and if so PLEASE LET ME KNOW!!!
I checked it against http://www.shellcheck.net and updated/corrected according to it's debug reports. I am piecing this together from examples elsewhere and would love someone to proofread this before I implement it.
The first part checking dovecot is already working just fine as a cronjob every 6 hours (yes the server is that messed up that we need to check it), it's the section about amavis I'm not sure about.
You can use Monit which will monitor your services and restart itself.
Amavisd:
# File: /etc/monit.d/amavisd
# amavis
check process amavisd with pidfile /var/amavis/amavisd.pid
group services
start program = "/etc/init.d/amavisd start"
stop program = "/etc/init.d/amavisd stop"
if failed port 10024 then restart
if 5 restarts within 5 cycles then timeout
Dovecot:
# File: /etc/monit.d/dovecot
check process dovecot with pidfile /var/run/dovecot/master.pid
start program = "/etc/init.d/dovecot start"
stop program = "/etc/init.d/dovecot stop"
group mail
if failed host localhost port 993 type tcpssl sslauto protocol imap then restart
if failed host localhost port 143 protocol imap then restart
if 5 restarts within 5 cycles then timeout
depends dovecot_init
depends dovecot_bin
check file dovecot_init with path /etc/init.d/dovecot
group mail
check file dovecot_bin with path /usr/sbin/dovecot
group mail
Related
Dear Community Members,
I'm facing an issue in which my website redis service is making my website showing 503 error on regular interval. Previously it was in 2-3days so I made a cronjob to delete redis dump file and restart service at night every day. But now I'm still facing the issue sometimes it comes in 1 week and sometimes it comes twice in a day.
So I was thinking if there is a shell script which can check 503 error on my website and restart services. I had the script to check httpd service is active or not and restart it if it goes down.
#!/bin/sh
RESTART="systemctl start httpd"
SERVICE="httpd"
LOGFILE="/opt/httpd/autostart-apache2.log"
#check for the word inactive in the result of status
if systemctl status httpd | grep -q inactive
then
echo "starting apache at $(date)" >> $LOGFILE
$RESTART >> $LOGFILE
else
echo "apache is running at $(date)"
fi
I am trying to keep 3 large directories (9G, 400G, 800G) in sync between our home site and another in a land far, far away across a network link that is a bit dodgy (slow and drops occasionally). Data was copied onto disks prior to installation so the rsync only needs to send updates.
The problem I'm having is the rsync hangs for hours on the client side.
The smaller 9G job completed, the 400G job has been in limbo for 15 hours - no output to the log file in that time, but has not timed out.
What I've done to setup for this (after reading many forum articles about rsync/rsync server/partial since I am not really a system admin)
I setup rsync server (/etc/rsyncd.conf) on our home system, entred it into xinetd and wrote a script to run rsync on the distant server, it loops if rsync fails in an attempt to deal with the dodgy network. The rsync command in the script looks like this:
rsync -avzAXP --append root#homesys01::tools /disk1/tools
Note the "-P" option is equivalent to "--progress --partial"
I can see in the log file that rsync did fail at one point and the loop restarted rsync, data was transferred after that based on entries in the log file, but the last update to the log file was 15 hours ago, and the rsync process on the client is still running.
CNT=0
while [ 1 ]
do
rsync -avzAXP --append root#homesys01::tools /disk1/tools
STATUS=$?
if [ $STATUS -eq 0 ] ; then
echo "Successful completion of tools rsync."
exit 0
else
CNT=`expr ${CNT} + 1`
echo " Rsync of tools failure. Status returned: ${STATUS}"
echo " Backing off and retrying(${CNT})..."
sleep 180
fi
done
So I expected these jobs to take a long time, I expected to see the occasional failure message in the log files (which I have) and to see rsync restart (which it has). Was not expecting rsync to just hang for 15 hours or more with no progress and no timeout error.
Is there a way to tell if rsync on the client is hung versus dealing with the dodgy network?
I set no timeout in the /etc/rsyncd.conf file. Should I and how do I determin a reasonable timeout setting?
I set rsync up to be available through xinetd, but don't always see the "rsync --daemon" process running. It restarts if I run rsync from the remote system. But shouldn't it be always running?
Any guidance or suggestions would be appreciated.
to tell the rsync client working status , with verbose option and keep a log file
change this line
rsync -avzAXP --append root#homesys01::tools /disk1/tools
to
rsync -avzAXP --append root#homesys01::tools /disk1/tools >>/tmp/rsync.log.`date +%F`
this would produce one log file per day under /tmp directory
then you can use tail -f command to trace the most recent log file ,
if it is rolling , it is working
see also
rsync - what means the f+++++++++ on rsync logs?
to understand more about the log
I thought I would post my final solution, in case it can help anyone else. I added --timeout 300 and --append-verify. The timeout eliminates the case of rsync getting hung indefinitely, the loop will restart it after the timeout. The append-verify is necessary to have it check any partial file it updated.
Note the following code is in a shell script and the output is redirected to a log file.
CNT=0
while [ 1 ]
do
rsync -avzAXP --append-verify --timeout 300 root#homesys01::tools /disk1/tools
STATUS=$?
if [ $STATUS -eq 0 ] ; then
echo "Successful completion of tools rsync."
exit 0
else
CNT=`expr ${CNT} + 1`
echo " Rsync of tools failure. Status returned: ${STATUS}"
echo " Backing off and retrying(${CNT})..."
sleep 180
fi
done
I have a script in cron to check memcached and restart it if it's not working. For some reason it's not functioning.
Script, with permissions:
-rwxr-xr-x 1 root root 151 Aug 28 22:43 check_memcached.sh
Crontab entry:
*/5 * * * * /home/mysite/www/check_memcached.sh 1> /dev/null 2> /dev/null
Script contents:
#!/bin/sh
ps -eaf | grep 11211 | grep memcached
if [ $? -ne 0 ]; then
service memcached restart
else
echo "eq 0 - memcache running - do nothing"
fi
It works fine if I run it from the command line but last night memcached crashed and it was not restarted from cron. I can see cron is running it every 5 minutes.
What am I doing wrong?
Do I need to use the following instead of service memcached restart?
/etc/init.d/memcached restart
I have another script that checks to make sure my lighttpd instance is running and it works fine. It works a little differently to verify it's running but is using the init.d call to restart things.
Edit - Resolution: Using /etc/init.d/memcached restart solved this problem.
What usually causes crontab problems is command paths. In the command line, the paths to commands are already there, but in cron they're often not. If this is your issue, you can solve it by adding the following line into the top of your crontab:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
This will give cron explicit paths to look through to find the commands your script runs.
Also, your shebang in your script is wrong. It needs to be:
#!/bin/bash
I suspect the problem is with the grep 11211 - it's not clear the meaning of the number, and that grep may not be matching the desired process.
I think you need to log the actions of this script - then you see what's actually happening.
#!/bin/bash
exec >> /tmp/cronjob.log 2>&1
set -xv
cat2 () { tee -a /dev/stderr; }
ps -ef | cat2 | grep 11211 | grep memcached
if [ $? -ne 0 ]; then
service memcached restart
else
echo "eq 0 - memcache running - do nothing"
fi
exit 0
The set -xv output is captured to a log file in /tmp. The cat2 will copy the stdin to the log file, so you can see what grep is acting upon.
Save below code as check_memcached.sh
#!/bin/bash
MEMCACHED_STATUS=`systemctl is-active memcached.service`
if [[ ${MEMCACHED_STATUS} == 'active' ]]; then
echo " Service running.... so exiting "
exit 1
else
service memcached restart
fi
And you can schedule it as cron.
I am writing a shell script and here is the snippet ..
sudo service httpd restart --- ( 1)
if [ $? -eq 0 ]; then
# Now configure php.ini using sed command
sudo sed -i 's_;date.timezone =_date.timezone = "Asia/Kolkata"_' /etc/php.ini
# Now restart the httpd server again
sudo service httpd restart ---- This statement throws error
When I ran above script then I got error Address already in use: make_sock: could not bind to address at second sudo service httpd restart statement.
I doubt it is because when first time sudo service httpd restart runs , before it finishes completely second 'sudo service httpd restart' runs.
So how can I test surely If first sudo service httpd restart finishes then only rest of the code execute.
I hope I am understandable ..
Thanks
Forget the [ giving:
if sudo service http restart; then
# stuff
if sudo service http restart; then
echo ok
else
# error handling for second service failure
fi
else
# error handling for first failure
fi
Commands are true if their return status is zero, false otherwise. Shell commands also must terminate before the next line is called.
It is possible, in principle, that service does stuff in the background and exits before completion. In practice, it doesn't because that would really screw things up.
Not sure if the command backgrounds. If so, you could check if your command is still to be found in the process list:
// ... your code ...
c=1
while [[ c -gt 0 ]]
do
c=`ps l | grep -c "service httpd restart"`
done
service httpd restart
Sometimes TeamViewer disconnects itself (or gets disconnected) from its internet's main servers.
I am programming a script that will check if connection is lost and, if yes, kills and reopens the concerned process to make TeamViewer up and running again.
The problem is: I don't know how to discover that TeamViewer has lost its remote access capability (this is: the capability to be remotely accessed and controlled).
Tested until now:
Check TeamViewer process and/or daemon. Not valid: they keep working even after disconnected.
NICs review. Not valid: TeamViewer seems not to add any.
See the TeamViewer's main window. Not programmatically valid or easy to implement.
How can I programmatically know if TeamViewer has disconnected?
I don't know if this method differs between platforms, but at least I would like to know about a solution for some Linux shell. Bash if possible.
Probably I'm late, but run into the same problem and found a possible solution. I'm using teamviewer 12.
I noticed that, in my case sometimes some GUI related process are not launched so the machine is not online in my computer and contact list, if I ssh it and check for the list of teamviewer processes using:
ps -ef | grep [t]eamviewer
I get just one process, the teamviewer daemon:
root 1808 1 0 09:22 ? 00:00:53 /opt/teamviewer/tv_bin/teamviewerd -d
But, when everything is fine I have:
root 1808 1 0 09:22 ? 00:00:53 /opt/teamviewer/tv_bin/teamviewerd -d
rocco 10975 8713 0 09:31 ? 00:00:58 /opt/teamviewer/tv_bin/wine/bin/wineserver
rocco 11064 10859 0 09:31 ? 00:00:33 /opt/teamviewer//tv_bin/TVGuiSlave.64 31 1
rocco 11065 10859 0 09:31 ? 00:00:28 /opt/teamviewer//tv_bin/TVGuiDelegate 31 1
So simply counting the number of process works for me..
#!/bin/bash
online() {
## Test connection
ping -c1 www.google.com > /dev/null
return $?
}
online
if (test $? -eq 0)
then
network=$(ps -ef | grep [t]eamviewer | wc -l)
if (test $network -gt 3)
then
echo Machine online, teamviewer connected
else
echo Machine online, teamviewer not connected, trying restart daemon
sudo teamviewer --daemon restart
fi
fi
Have you considered trapping the signal(if possible) and executing a function that will restart TeamViewer.
Start it from a script and trap an exit signal
function restartTV {
# re-start TeamViewrt
sudo /etc/init.d/something start
}
trap finish EXIT # or appropriate signal
sudo /etc/init.d/something stop
# Do the work...