Dear Community Members,
I'm facing an issue in which my website redis service is making my website showing 503 error on regular interval. Previously it was in 2-3days so I made a cronjob to delete redis dump file and restart service at night every day. But now I'm still facing the issue sometimes it comes in 1 week and sometimes it comes twice in a day.
So I was thinking if there is a shell script which can check 503 error on my website and restart services. I had the script to check httpd service is active or not and restart it if it goes down.
#!/bin/sh
RESTART="systemctl start httpd"
SERVICE="httpd"
LOGFILE="/opt/httpd/autostart-apache2.log"
#check for the word inactive in the result of status
if systemctl status httpd | grep -q inactive
then
echo "starting apache at $(date)" >> $LOGFILE
$RESTART >> $LOGFILE
else
echo "apache is running at $(date)"
fi
Related
I have Jenkins pipeline job which goes thought all our Jenkins servers and check the connectivity (runs every few minutes).
ksh file:
#!/bin/ksh
JENKINS_URL=$1
curl --connect-timeout 10 "$JENKINS_URL" >/dev/null
status=`echo $?`
if [ "$status" == "7" ]; then
export SUBJECT="Connection refused or can not connect to URL $JENKINS_URL"
echo "$SUBJECT"|/usr/sbin/sendmail -t XXXX#gmail.com
else
echo "successfully connected $JENKINS_URL"
fi
exit 0
I would like to add another piece of code, which record all the times that server was down (it should include the name of the server and timestamp) into a file, and in case the server is up again, send an email which will notify about it, and it will be also recorded in the file.
I don't want to get extra alerts, only one alert (to file and mail) when it's down, and one when it's up again. any idea how to implement it?
The detailed answer was given by unix.stackexchange community:
https://unix.stackexchange.com/questions/562594/how-to-set-and-record-alerts-for-jenkin-server-down-and-up
I am trying to keep 3 large directories (9G, 400G, 800G) in sync between our home site and another in a land far, far away across a network link that is a bit dodgy (slow and drops occasionally). Data was copied onto disks prior to installation so the rsync only needs to send updates.
The problem I'm having is the rsync hangs for hours on the client side.
The smaller 9G job completed, the 400G job has been in limbo for 15 hours - no output to the log file in that time, but has not timed out.
What I've done to setup for this (after reading many forum articles about rsync/rsync server/partial since I am not really a system admin)
I setup rsync server (/etc/rsyncd.conf) on our home system, entred it into xinetd and wrote a script to run rsync on the distant server, it loops if rsync fails in an attempt to deal with the dodgy network. The rsync command in the script looks like this:
rsync -avzAXP --append root#homesys01::tools /disk1/tools
Note the "-P" option is equivalent to "--progress --partial"
I can see in the log file that rsync did fail at one point and the loop restarted rsync, data was transferred after that based on entries in the log file, but the last update to the log file was 15 hours ago, and the rsync process on the client is still running.
CNT=0
while [ 1 ]
do
rsync -avzAXP --append root#homesys01::tools /disk1/tools
STATUS=$?
if [ $STATUS -eq 0 ] ; then
echo "Successful completion of tools rsync."
exit 0
else
CNT=`expr ${CNT} + 1`
echo " Rsync of tools failure. Status returned: ${STATUS}"
echo " Backing off and retrying(${CNT})..."
sleep 180
fi
done
So I expected these jobs to take a long time, I expected to see the occasional failure message in the log files (which I have) and to see rsync restart (which it has). Was not expecting rsync to just hang for 15 hours or more with no progress and no timeout error.
Is there a way to tell if rsync on the client is hung versus dealing with the dodgy network?
I set no timeout in the /etc/rsyncd.conf file. Should I and how do I determin a reasonable timeout setting?
I set rsync up to be available through xinetd, but don't always see the "rsync --daemon" process running. It restarts if I run rsync from the remote system. But shouldn't it be always running?
Any guidance or suggestions would be appreciated.
to tell the rsync client working status , with verbose option and keep a log file
change this line
rsync -avzAXP --append root#homesys01::tools /disk1/tools
to
rsync -avzAXP --append root#homesys01::tools /disk1/tools >>/tmp/rsync.log.`date +%F`
this would produce one log file per day under /tmp directory
then you can use tail -f command to trace the most recent log file ,
if it is rolling , it is working
see also
rsync - what means the f+++++++++ on rsync logs?
to understand more about the log
I thought I would post my final solution, in case it can help anyone else. I added --timeout 300 and --append-verify. The timeout eliminates the case of rsync getting hung indefinitely, the loop will restart it after the timeout. The append-verify is necessary to have it check any partial file it updated.
Note the following code is in a shell script and the output is redirected to a log file.
CNT=0
while [ 1 ]
do
rsync -avzAXP --append-verify --timeout 300 root#homesys01::tools /disk1/tools
STATUS=$?
if [ $STATUS -eq 0 ] ; then
echo "Successful completion of tools rsync."
exit 0
else
CNT=`expr ${CNT} + 1`
echo " Rsync of tools failure. Status returned: ${STATUS}"
echo " Backing off and retrying(${CNT})..."
sleep 180
fi
done
I have a Node.JS server running on PM2 that's crashing every once in a while because of a database limit, which I'm working on.
In the meantime, I thought I'd try just setting up a cron job in cpanel to restart the server every hour if it's down.
So I wrote a bash script like the following:
#!/bin/bash
status_code=$(curl --write-out %{http_code} --silent --output /dev/null https://website.com/)
date >> cronlog.txt
if [[ "$status_code" -ne 200 ]] ; then
pkill node
nohup pm2 start bin/www &
echo "Site status $status_code" >> cronlog.txt
echo "Restarting Server" >> cronlog.txt
exit
else
echo "Site fine" >> cronlog.txt
exit 0
fi
Running this from an SSH terminal works perfectly; if the site is down, it'll restart it.
However, once I set up the cron job in cpanel, like so: 0 * * * * /home/acc123/fix.sh, looking at the output of cronlog.txt, I see that the script is definitely running every hour, trying to restart the server - it's just that the server doesn't restart.
A preliminary Google suggested that maybe pm2 wasn't on the path that the cron job runs from, so I modified the script to look like this:
#!/bin/bash
status_code=$(curl --write-out %{http_code} --silent --output /dev/null https://website.com/)
date >> cronlog.txt
if [[ "$status_code" -ne 200 ]] ; then
pkill node
nohup /home/acc123/bin/pm2 start /home/acc123/bin/www &
echo "Site status $status_code" >> cronlog.txt
echo "Restarting Server" >> cronlog.txt
exit
else
echo "Site fine" >> cronlog.txt
exit 0
fi
But nothing changes. Looking at the text file I write to, the script is definitely running every hour, and it's definitely picking up that the site is down, but while the words "Restarting Server" get written to the text file, the server doesn't actually start.
Checking nohup.out confirms that nothing has been written to it, suggesting that somehow the command nohup /home/acc123/bin/pm2 start /home/acc123/bin/www & isn't running correctly.
I'm stumped. Has anyone seen something similar before?
Found it. Looks like node itself also wasn't on the path variable for the cron job. Explicitly specifying where node was fixed the problem.
I have an issue where I need to compare the date in a log file with the system date. I am trying to create a script that will send an email when the server restarts. However the issue is that when the Server restarts its creates a new log, the script I have for emailing so far is as follows:
LOG1=grep -B 15 "Server failed so attempting to restart" /home/testing/Server.out
echo $LOG1 > attachment.txt
grep -B 10 "Server failed so attempting to restart" /home/testing/Server.out &&
mailx -s "Alert - Server has shutdown and is attempting a restart" email#domain.com < attachment.txt
LOG2=grep -B 15 "Server failed so attempting to restart" /home/testing/Server.log
echo $LOG2 > attachment2.txt
grep -B 10 "Server failed so attempting to restart" /home/testing/Server.log &&
mailx -s "Alert - Server has shutdown and is attempting a restart" email#domain.com < attachment.txt
Now this works and sends the email, however if I sent this on a cron job it will run (and send an email) every single time until the log file is deleted. I need a way to say (only if you found "Server failed so attempting to restart" in the last 15 mins then run the script)
Could anyone advise on a way to do this?
Thanks for your help
We have a mail server that is dying and in the process of having accounts migrated to a new server before decommissioning. With 800+ email accounts across 25+ domains, it is important for this machine to stay up until migration is finished.
Lately it has started to fill up with error logs, which freeze mysql because of no space, stop mail flow, and generally give me a headache. Until the root problem of the errors can be found and fixed, I have come up with a script to check if Dovecot and Amavis-new are running, and if not restarts them.
After reading:
https://stackoverflow.com/a/7096003/4820993
As well as a few other common examples, I came up with this.
netstat -an|grep -ce ':993.*LISTEN' >/dev/null 2>&1
if [ $? = 0 ]
then
echo 'Dovecot is up';
else
echo 'Dovecot is down, restarting...';
/etc/init.d/dovecot restart
logger -p mail.info dovecot_keepalive: Dovecot is down, restarting...
fi
/etc/init.d/amavis status |grep -ce 'running' >/dev/null 2>&1
if [ $? = 0 ]
then
echo 'AmavisD is up';
else
echo 'AmavisD is down, restarting...';
/etc/init.d/amavis restart
sleep 2
/etc/init.d/amavis status |grep -ce 'running' >/dev/null 2>&1
if [ $? = 1 ]
then
echo 'AmavisD had a problem restarting, trying to fix it now...';
logger -p mail.info amavis_keepalive: AmavisD had a problem restarting...
output=$(ps aux|grep a\[m\]avisd)
set -- $output
pid=$2
kill $pid
rm /var/run/amavis/amavisd.pid
/etc/init.d/amavis start
else
echo 'AmavisD restarted successfully';
logger -p mail.info amavis_keepalive: AmavisD is down, restarting...
fi
fi
Who knows, I'm probably making it harder that it is, and if so PLEASE LET ME KNOW!!!
I checked it against http://www.shellcheck.net and updated/corrected according to it's debug reports. I am piecing this together from examples elsewhere and would love someone to proofread this before I implement it.
The first part checking dovecot is already working just fine as a cronjob every 6 hours (yes the server is that messed up that we need to check it), it's the section about amavis I'm not sure about.
You can use Monit which will monitor your services and restart itself.
Amavisd:
# File: /etc/monit.d/amavisd
# amavis
check process amavisd with pidfile /var/amavis/amavisd.pid
group services
start program = "/etc/init.d/amavisd start"
stop program = "/etc/init.d/amavisd stop"
if failed port 10024 then restart
if 5 restarts within 5 cycles then timeout
Dovecot:
# File: /etc/monit.d/dovecot
check process dovecot with pidfile /var/run/dovecot/master.pid
start program = "/etc/init.d/dovecot start"
stop program = "/etc/init.d/dovecot stop"
group mail
if failed host localhost port 993 type tcpssl sslauto protocol imap then restart
if failed host localhost port 143 protocol imap then restart
if 5 restarts within 5 cycles then timeout
depends dovecot_init
depends dovecot_bin
check file dovecot_init with path /etc/init.d/dovecot
group mail
check file dovecot_bin with path /usr/sbin/dovecot
group mail