Linux Script to check if process is running and act on the result - linux

I have a process that fails regularly & sometimes starts duplicate instances..
When I run:
ps x |grep -v grep |grep -c "processname"
I will get:
2
This is normal as the process runs with a recovery process..
If I get
0
I will want to start the process
if I get:
4
I will want to stop & restart the process
What I need is a way of taking the result of ps x |grep -v grep |grep -c "processname"
Then setup a simple 3 option function
ps x |grep -v grep |grep -c "processname"
if answer = 0 (start process & write NOK & Time to log /var/processlog/check)
if answer = 2 (Do nothing & write OK & time to log /var/processlog/check)
if answer = 4 (stot & restart the process & write NOK & Time to log /var/processlog/check)
The process is stopped with
killall -9 process
The process is started with
process -b -c /usr/local/etc
My main problem is finding a way to act on the result of ps x |grep -v grep |grep -c "processname".
Ideally, I would like to make the result of that grep a variable within the script with something like this:
process=$(ps x |grep -v grep |grep -c "processname")
If possible.

Programs to monitor if a process on a system is running.
Script is stored in crontab and runs once every minute.
This works with if process is not running or process is running multiple times:
#! /bin/bash
case "$(pidof amadeus.x86 | wc -w)" in
0) echo "Restarting Amadeus: $(date)" >> /var/log/amadeus.txt
/etc/amadeus/amadeus.x86 &
;;
1) # all ok
;;
*) echo "Removed double Amadeus: $(date)" >> /var/log/amadeus.txt
kill $(pidof amadeus.x86 | awk '{print $1}')
;;
esac
0 If process is not found, restart it.
1 If process is found, all ok.
* If process running 2 or more, kill the last.
A simpler version. This just test if process is running, and if not restart it.
It just tests the exit flag $? from the pidof program. It will be 0 of process is running and 1 if not.
#!/bin/bash
pidof amadeus.x86 >/dev/null
if [[ $? -ne 0 ]] ; then
echo "Restarting Amadeus: $(date)" >> /var/log/amadeus.txt
/etc/amadeus/amadeus.x86 &
fi
And at last, a one liner
pidof amadeus.x86 >/dev/null ; [[ $? -ne 0 ]] && echo "Restarting Amadeus: $(date)" >> /var/log/amadeus.txt && /etc/amadeus/amadeus.x86 &
This can then be used in crontab to run every minute like this:
* * * * * pidof amadeus.x86 >/dev/null ; [[ $? -ne 0 ]] && echo "Restarting Amadeus: $(date)" >> /var/log/amadeus.txt && /etc/amadeus/amadeus.x86 &
cccam oscam

I adopted the #Jotne solution and works perfectly! For example for mongodb server in my NAS
#! /bin/bash
case "$(pidof mongod | wc -w)" in
0) echo "Restarting mongod:"
mongod --config mongodb.conf
;;
1) echo "mongod already running"
;;
esac

I have adopted your script for my situation Jotne.
#! /bin/bash
logfile="/var/oscamlog/oscam1check.log"
case "$(pidof oscam1 | wc -w)" in
0) echo "oscam1 not running, restarting oscam1: $(date)" >> $logfile
/usr/local/bin/oscam1 -b -c /usr/local/etc/oscam1 -t /usr/local/tmp.oscam1 &
;;
2) echo "oscam1 running, all OK: $(date)" >> $logfile
;;
*) echo "multiple instances of oscam1 running. Stopping & restarting oscam1: $(date)" >> $logfile
kill $(pidof oscam1 | awk '{print $1}')
;;
esac
While I was testing, I ran into a problem..
I started 3 extra process's of oscam1 with this line:
/usr/local/bin/oscam1 -b -c /usr/local/etc/oscam1 -t /usr/local/tmp.oscam1
which left me with 8 process for oscam1. the problem is this..
When I run the script, It only kills 2 process's at a time, so I would have to run it 3 times to get it down to 2 process..
Other than killall -9 oscam1 followed by /usr/local/bin/oscam1 -b -c /usr/local/etc/oscam1 -t /usr/local/tmp.oscam1, in *)is there any better way to killall apart from the original process? So there would be zero downtime?

If you changed awk '{print $1}' to '{ $1=""; print $0}' you will get all processes except for the first as a result. It will start with the field separator (a space generally) but I don't recall killall caring. So:
#! /bin/bash
logfile="/var/oscamlog/oscam1check.log"
case "$(pidof oscam1 | wc -w)" in
0) echo "oscam1 not running, restarting oscam1: $(date)" >> $logfile
/usr/local/bin/oscam1 -b -c /usr/local/etc/oscam1 -t /usr/local/tmp.oscam1 &
;;
2) echo "oscam1 running, all OK: $(date)" >> $logfile
;;
*) echo "multiple instances of oscam1 running. Stopping & restarting oscam1: $(date)" >> $logfile
kill $(pidof oscam1 | awk '{ $1=""; print $0}')
;;
esac
It is worth noting that the pidof route seems to work fine for commands that have no spaces, but you would probably want to go back to a ps-based string if you were looking for, say, a python script named myscript that showed up under ps like
root 22415 54.0 0.4 89116 79076 pts/1 S 16:40 0:00 /usr/bin/python /usr/bin/myscript
Just an FYI

The 'pidof' command will not display pids of shell/perl/python scripts. So to find the process id’s of my Perl script I had to use the -x option i.e. 'pidof -x perlscriptname'

I cannot get case to work at all.
Heres what I have:
#! /bin/bash
logfile="/home/name/public_html/cgi-bin/check.log"
case "$(pidof -x script.pl | wc -w)" in
0) echo "script not running, Restarting script: $(date)" >> $logfile
# ./restart-script.sh
;;
1) echo "script Running: $(date)" >> $logfile
;;
*) echo "Removed duplicate instances of script: $(date)" >> $logfile
# kill $(pidof -x ./script.pl | awk '{ $1=""; print $0}')
;;
esac
rem the case action commands for now just to test the script. the above pidof -x command is returning '1', the case statement is returning the results for '0'.
Anyone have any idea where I'm going wrong?
Solved it by adding the following to my BIN/BASH Script:
PATH=$PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

In case you're looking for a more modern way to check to see if a service is running (this will not work for just any old process), then systemctl might be what you're looking for.
Here's the basic command:
systemctl show --property=ActiveState your_service_here
Which will yield very simple output (one of the following two lines will appear depending on whether the service is running or not running):
ActiveState=active
ActiveState=inactive
And if you'd like to know all of the properties you can get:
systemctl show --all your_service_here
If you prefer that alphabetized:
systemctl show --all your_service_here | sort
And the full code to act on it:
service=$1
result=`systemctl show --property=ActiveState $service`
if [[ "$result" == 'ActiveState=active' ]]; then
echo "$service is running" # Do something here
else
echo "$service is not running" # Do something else here
fi

If you are using CentOS, no need to write a script and set cron job. Here is one of the smartest ways to ensure systemd services restart on failure.
Make following changes to /usr/lib/systemd/system/mariadb.service
Then under the [Service] section in the file, add the following 2 lines:
Restart=always
RestartSec=3
After saving the file we need to reload the daemon configurations to ensure systemd is aware of the new file
systemctl daemon-reload
Read the following link for the complete steps -
https://jonarcher.info/2015/08/ensure-systemd-services-restart-on-failure/

Related

Getting last process ID in bash script with $! doesn't always work

I am a little down the rabbit hole here..
I had a little script that should just start a dotnet process into the background and save the process id.
#!/bin/bash
echo "killing $(cat ~/currentPID)"
kill `cat ~/currentPID`
export ASPNETCORE_ENVIRONMENT=Staging && cd ~/Project/bin/Debug/net6.0/ && dotnet Project.dll --urls=http://localhost:42666 &
echo "$!" > ~/currentPID
This works from time to time but most of my tries, the PID would be $! + 1
Even if I try something like this:
#!/bin/bash
echo "killing $(cat ~/currentPID)"
kill `cat ~/currentPID`
echo "killing $(cat ~/currentPID2)"
kill `cat ~/currentPID2`
echo "killing $(cat ~/currentPID3)"
kill `cat ~/currentPID3`
export ASPNETCORE_ENVIRONMENT=Staging && cd ~/Project/bin/Debug/net6.0/ && dotnet Project.dll --urls=http://localhost:42666 &
CURRPID="$!"
sleep 3
echo "$(($CURRPID + 1))" > ~/currentPID2
echo "$(($CURRPID + 2))" > ~/currentPID3
echo "$CURRPID" > ~/currentPID
it will be currentPID3 + 1
I guess I will use pkill -P but anyone got any ideas what's going on here?! :)
PS: ps aux says that currentPID is the process of the shell script itself. Which makes sense. The other PIDs will be unassigned though.

Bash - use 1 stop script for multiple similar services, and kill the correct process only

I have multiple processes running as services on a machine
Before moving from 1 process/service to multiple ones, I used the following script to stop my service
#!/bin/sh
SIGNAL=${SIGNAL:-TERM}
PIDS=$(ps ax | grep -i 'datastream' | grep java | grep -v grep | awk '{print $1}')
if [ -z "$PIDS" ]; then
echo "No Brooklin server to stop"
exit 1
else
kill -s $SIGNAL $PIDS
fi
The issue now is that this script kills all processes of this type if invoked as a service stop command
My services are called for example service-A, service-B, service-C. If I send a service service-C stop command, the current script will stop all 3 processes.
I would like to make the script use the provided service name to determine which process to stop (I can grep A/B/C from the process output to ps, but I haven't managed to tell it how to stop only the process given in the service stop command.
Does anyone have experience handling something similar?
You can try something like below while starting your application which can store your PID in a static file and then you can use the same file to kill the process.
Pasting below one of my start - stop script which I have used in past for churning up multiple processes.
Start Script :-
#!/bin/bash
export PORT=$1
. /application/setEnv.sh
/java/jdk1.8.0_152/bin/java -Xms512m -Xmx2G -XX:+DisableExplicitGC -jar /application/api-1.0-0-all.jar </dev/null >>$LOGDIR/service$PORT.log 2>&1 &
echo $! > /application/service$PORT.pid
disown $!
Stop Script :-
#!/bin/bash
PORT=$1
PID=`cat /application/service$PORT.pid`
if [ ! -z "$PID" ]; then
rm /application/service$PORT.pid
kill -9 $PID >/dev/null 2>&1
if [ $? -gt 0 ]; then
echo "PID file found but no matching process was found. Stop aborted."
exit 1
fi
else
echo "PID file is empty and has been ignored."
fi
mv /application/logs/service$PORT.log /application/logs/service$PORT.log`date +%d%m%Y%H%M%S`
Only change which I can think of is the replace my port utilisation logic viz. $PORT with your service names viz. A/B/C.

Background rsync and pid from a shell script

I have a shell script that does a backup. I set this script in a cron but the problem is that the backup is heavy so it is possible to execute a second rsync before the first ends up.
I thought to launch rsync in a script and then get PID and write a file that script checks if the process exist or not (if this file exist or not).
If I put rsync in background I get the PID but I don't know how to know when rsync ends up but, if I set rsync (no background) I can't get PID before the process finish so I can't write a file whit PID.
I don't know what is the best way to "have rsync control" and know when it finish.
My script
#!/bin/bash
pidfile="/home/${USER}/.rsync_repository"
if [ -f $pidfile ];
then
echo "PID file exists " $(date +"%Y-%m-%d %H:%M:%S")
else
rsync -zrt --delete-before /repository/ /mnt/backup/repositorio/ < /dev/null &
echo $$ > $pidfile
# If I uncomment this 'rm' and rsync is running in background, the file is deleted so I can't "control" when rsync finish
# rm $pidfile
fi
Can anybody help me?!
Thanks in advance !! :)
# check to make sure script isn't still running
# if it's still running then exit this script
sScriptName="$(basename $0)"
if [ $(pidof -x ${sScriptName}| wc -w) -gt 2 ]; then
exit
fi
pidof finds the pid of a process
-x tells it to look for scripts too
${sScriptName} is just the name of the script...you can hardcode this
wc -w returns the word count by words
-gt 2 no more than one instance running (instance plus 1 for the pidof check)
if more than one instance running then exit script
Let me know if this works for you.
Test both for presence of pid file and status of the running process like this:
#!/bin/bash
pidfile="/home/${USER}/.rsync_repository"
is_running =0
if [ -f $pidfile ];
then
echo "PID file exists " $(date +"%Y-%m-%d %H:%M:%S")
previous_pid=`cat $pidfile`
is_running=`ps -ef | grep $previous_pid | wc -l`
fi
if [ $is_running -gt 0 ];
then
echo "Previous process didn't quit yet"
else
rsync -zrt --delete-before /repository/ /mnt/backup/repositorio/ < /dev/null &
echo $$ > $pidfile
fi
Hope this helps!!!

Check whether a process is running or not Linux

Here is my code:
#!/bin/bash
ps cax | grep testing > /dev/null
while [ 1 ]
do
if [ $? -eq 0 ]; then
echo "Process is running."
sleep 10
else
nohup ./testing.sh &
sleep 10
fi
done
I run it as nohup ./script.sh &
and it said nohup: failed to run command './script.sh': No such file or directory
What is wrong?
The file script.sh simply does not exist in the directory that you are issuing the command from.
If it did exist and was not executable you would get:
`nohup: failed to run command ‘./script.sh’: Permission denied
For each newly created scripts on Linux, you should first change the permission as you can see the permission details by using
ls -lah
The following content may help you:
#!/bin/bash
while [ 1 ];
do
date=`date`
pid=`ps -ef | grep "your process" | grep -v grep | awk -F' ' '{print $2}'`
if [[ -n $pid ]]; then
echo "$date - processID $pid is running."
else
echo "$date - the process is not running"
# script to restart your process
say: start the process
fi
sleep 5m
done
Make sure your script is saved as script.sh
and your executing nohup ./script.sh & from the same directory in which script.sh.
Also you can give executable permission for script.sh by
chmod 776 script.sh
or
nohup ./script.sh &
Run as
nohup sh ./script.sh &

need a restart server script in 1 hour if not stopped

I am working on a remote servers network setup.
What I need is a script that will rename the "/etc/network/interfaces" file and then restart the computer. The renaming I got but what I don't get is how i can terminate this script in case I don't need it.
See if everything works out fine I like to issue a stop command that will terminate this script, so that the server doesn't restart.
So here is what I got so far. the issues are:
It doesn't return the prompt
The stop command doesn't work. It doesn't get the pid file for some reason. It returns "rm: missing operand" although the echo tells me that the pid file is called "start.pid" and it is present in the /tmp folder
Any ideas?
#! /bin/sh
PATH=/sbin:/usr/sbin:/bin:/usr/bin
. /lib/lsb/init-functions
case "$1" in
start)
;;
export PIDFILE=/var/run/${1}.pid
ps -fe | grep ${1} | head -n1 | cut -d" " -f 6 > ${PIDFILE}
sleep 30 #3600
log_action_msg "WARNING: Will in 60 sec rename /etc/network/interfaces and then restart"
sleep 30# 60
SUFFIX=$(date +%s)
#cp /etc/network/interfaces /etc/network/interfaces.$SUFFIX
cp /tmp/interfaces /etc/network/interfaces.$SUFFIX
sleep 1
#cp /etc/network/interfaces.org /tmp/interfaces
cp /tmp/interfaces.org /tmp/interfaces
sleep 1
#reboot -d -f -i
;;
stop)
if [ -f ${PIDFILE} ]; then
rm ${PIDFILE}
fi
exit 0
;;
*)
echo "Usage: $0 start|stop" >&2
exit 3
;;
esac
Usually this is done using a 'pid-file' - a predetermined file that holds the process identifier of the currently running process. That way if it is called and told to stop, it looks up the pid-file and uses the kill command to send a signal to the currently running process.
There is another benefit of this as well - if you check for the existence of a pid-file (and the existence of that process) when the script is told to start, you can prevent accidentally starting the script twice, which would make stopping both instances problematic.
The stop action can create a file do.not.restart.server in an appropriate location.
The start action can be modified to check whether the do.not.restart.server file exists, and avoid restarting the server if it is. It can/should probably remove the file for future restarts - or maybe it should remove it before it goes to sleep.
Okay, here is a working script, it does what I need. The only improvement I could still wish for is how to return the prompt from the sleep command.
The functionality is there so I am posting it in case others needed as well.
Thanks Dan and Jonathan Leffler for your help and ideas.
#! /bin/sh
PATH=/sbin:/usr/sbin:/bin:/usr/bin
. /lib/lsb/init-functions
export PIDFILESTART=/tmp/network-safty-restart-start.pid
export PIDFILESTOP=/tmp/network-safty-restart-stop.pid
#export FILE=/etc/network/interfaces
export FILE=/tmp/interfaces
case "$1" in
start)
if [ -f ${PIDFILESTART} ]; then
rm ${PIDFILESTART}
fi
if [ -f ${PIDFILESTOP} ]; then
rm ${PIDFILESTOP}
fi
ps -fe | grep ${1} | head -n1 | cut -d" " -f 6 > ${PIDFILESTART}
sleep 3600
log_action_msg "WARNING: Will in 60 sec rename ${FILE} and then restart"
sleep 60
if ! [ -f ${PIDFILESTOP} ]; then
log_action_msg "Restarting NOW"
SUFFIX=$(date +%s)
cp ${FILE} ${FILE}.${SUFFIX}
sleep 1
cp ${FILE}.org ${FILE}
sleep 1
reboot -d -f -i
else
rm ${PIDFILESTOP}
log_action_msg "NOT Restaring as you wish"
fi
;;
stop)
if [ -f ${PIDFILESTART} ]; then
rm ${PIDFILESTART}
ps -fe | grep ${1} | head -n1 | cut -d" " -f 6 > ${PIDFILESTOP}
log_action_msg "Terminating restart script"
fi
log_action_msg "Terminated restart script"
exit 0
;;
*)
echo "Usage: $0 start|stop" >&2
exit 3
;;
esac

Resources