Shell scripts and how to avoid running the same script at the same time on a Linux machine - linux

I have Linux centralize server – Linux 5.X.
In some cases on my Linux server the get_hosts.ksh script could be run from some other different hosts.
For example get_hosts.ksh could run on my Linux machine three or more times at the same time.
My question:
How to avoid running multiple instances of process/script?

A common solution for your problem on *nix systems is to check for a lock file existence.
Usually lock file contains current process PID.
This is an example ksh script:
#!/bin/ksh
pid="/var/run/get_hosts.pid"
trap "rm -f $pid" SIGSEGV
trap "rm -f $pid" SIGINT
if [ -e $pid ]; then
exit # pid file exists, another instance is running, so now we politely exit
else
echo $$ > $pid # pid file doesn't exit, create one and go on
fi
# your normal workflow here...
rm -f $pid # remove pid file just before exiting
exit
UPDATE: Answering to OP comment, I add handling program interruptions and segfaults with trap command.

The normal way of doing this is to write the process id into a file. The first thing the script does is check for the existence of the file, read the pid, check if a process with that pid exists, and for extra paranoia points, if that process actually runs the script. If yes, the script exits.
Here's a simple example. The process in question is a binary, and this script makes sure the binary runs only once. This is not exactly what you need, but you should be able to adapt this:
RUNNING=0
PIDFILE=$PATH_TO/var/run/example.pid
if [ -f $PIDFILE ]
then
PID=`cat $PIDFILE`
ps -eo pid | grep $PID >/dev/null 2>&1
if [ $? -eq 0 ]
then
RUNNING=1
fi
fi
if [ $RUNNING -ne 1 ]
then
run_binary
PID=$!
echo $PID > $PIDFILE
fi
This is not very elaborate but should get you on the right track.

You can use a pid file to keep track of when the process is running. At the top of the script, check for the existence of the pid file and if it doesn't exist, create it and run the script, otherwise return.
Some sample code can be seen in this answer to a similar question.

You might consider using the (optional) lockfile(1) command (provided by procmail package on Debian).

I have a lot of scripts, and using this below code for prevent multiple/simulate run:
PID="/var/scripts/PID.txt" # Temp file
if [ ! -f "$PID" ]; then
echo $$ > "$PID" # Print actual PID into a file
else
ps -p $(cat "$PID") > /dev/null && exit || echo $$ > "$PID"
fi

Building on wallenborn's answer I also added a "staleness" check just in case the PID lock file is beyond a certain expected age in seconds.
# prevent simultaneous executions within an hourish
pid_file="$HOME/.harness.pid"
max_stale_seconds=3600
if [ -f $pid_file ]; then
pid="$(cat "$pid_file")"
let age_in_seconds="$(date +%s) - $(date -r "$pid_file" +%s)"
if ps $pid >/dev/null && [ $age_in_seconds -lt $max_stale_seconds ]; then
exit 1
fi
fi
echo $$>"$pid_file"
trap "rm -f \"$pid_file\"" SIGSEGV
trap "rm -f \"$pid_file\"" SIGINT
This could be made "smarter" to kill off the other executions should the PID be valid but this would be dangerous. Consider a sudden power failure and reset situation where the PID file contains a number that may now reference a completely different process.

Related

Bash - use 1 stop script for multiple similar services, and kill the correct process only

I have multiple processes running as services on a machine
Before moving from 1 process/service to multiple ones, I used the following script to stop my service
#!/bin/sh
SIGNAL=${SIGNAL:-TERM}
PIDS=$(ps ax | grep -i 'datastream' | grep java | grep -v grep | awk '{print $1}')
if [ -z "$PIDS" ]; then
echo "No Brooklin server to stop"
exit 1
else
kill -s $SIGNAL $PIDS
fi
The issue now is that this script kills all processes of this type if invoked as a service stop command
My services are called for example service-A, service-B, service-C. If I send a service service-C stop command, the current script will stop all 3 processes.
I would like to make the script use the provided service name to determine which process to stop (I can grep A/B/C from the process output to ps, but I haven't managed to tell it how to stop only the process given in the service stop command.
Does anyone have experience handling something similar?
You can try something like below while starting your application which can store your PID in a static file and then you can use the same file to kill the process.
Pasting below one of my start - stop script which I have used in past for churning up multiple processes.
Start Script :-
#!/bin/bash
export PORT=$1
. /application/setEnv.sh
/java/jdk1.8.0_152/bin/java -Xms512m -Xmx2G -XX:+DisableExplicitGC -jar /application/api-1.0-0-all.jar </dev/null >>$LOGDIR/service$PORT.log 2>&1 &
echo $! > /application/service$PORT.pid
disown $!
Stop Script :-
#!/bin/bash
PORT=$1
PID=`cat /application/service$PORT.pid`
if [ ! -z "$PID" ]; then
rm /application/service$PORT.pid
kill -9 $PID >/dev/null 2>&1
if [ $? -gt 0 ]; then
echo "PID file found but no matching process was found. Stop aborted."
exit 1
fi
else
echo "PID file is empty and has been ignored."
fi
mv /application/logs/service$PORT.log /application/logs/service$PORT.log`date +%d%m%Y%H%M%S`
Only change which I can think of is the replace my port utilisation logic viz. $PORT with your service names viz. A/B/C.

Wait for a program (non-child) to finish and execute a command

I have a program running on a remote computer which shouldn't be stopped. I need to track when this program is stopped and immediately execute a command. PID is known. How can I do that?
You cannot wait for non-child processes.
Probably the most efficient way in a shell would be to poll using the exit code of kill -0 <pid> to check if the process still exists:
while kill -0 $PID 2>/dev/null; do sleep 1; done
This is both simpler and more efficient than any approaches involving ps and grep. However, it only works if your user has permission to send signals to that process.
Code like this can do the work (to be run on remote computer)
while true
do
if [ "$(ps -efl|grep $PIDN|grep -v grep|wc -l)" -lt 1 ]
then <exec code>; break
fi
sleep 5
done
It expect the variable PIDN to contain the PID.
P.S. I know the code is ugly and power hungry
EDIT: it is possible to use -p in ps to avoid one grep
while true
do
if [ "$(ps -p $PIDN|wc -l)" -lt 2 ]
then <exec code>; break
fi
sleep 5
done
Here's a fairly simple way to wait for a process to terminate using the ps -p PID strategy:
if ps -p "$PID" >/dev/null 2>&1; then
echo "Process $PID is running ..."
while ps -p "$PID" >/dev/null 2>&1; do
sleep 5
done
echo "Process $PID is not running anymore."
fi
Checking for a process by PID
In general, to check for process ownership or permission to kill (send signals to) a proccess, you can use a combination of ps -p PID and kill -0:
if ps -p "$PID" >/dev/null 2>&1; then
echo "Process $PID exists!"
if kill -0 "$PID" >/dev/null 2>&1; then
echo "You can send signals to process $PID, e.g. with 'kill $PID'"
else
echo "You do not have permission to send signals to process $PID"
fi
else
echo "Process $PID does not exist."
fi
You can use exitsnoop to achieve this.
The bcc toolkit implements many excellent monitoring capabilities based on eBPF. Among them, exitsnoop traces process termination, showing the command name and reason for termination,
either an exit or a fatal signal.
It catches processes of all users, processes in containers, as well as processes that
become zombie.
This works by tracing the kernel sched_process_exit() function using dynamic tracing, and
will need updating to match any changes to this function.
Since this uses BPF, only the root user can use this tool.
exitsnoop examples:
Trace all process termination
# exitsnoop
Trace all process termination, and include timestamps:
# exitsnoop -t
Exclude successful exits, only include non-zero exit codes and fatal signals:
# exitsnoop -x
Trace PID 181 only:
# exitsnoop -p 181
Label each output line with 'EXIT':
# exitsnoop --label EXIT
You can get more information about this tool from the link below:
Github repo: tools/exitsnoop: Trace process termination (exit and fatal signals). Examples.
Linux Extended BPF (eBPF) Tracing Tools
ubuntu manpages: exitsnoop-bpfcc
Another option
use this project:
https://github.com/stormc/waitforpid

Background rsync and pid from a shell script

I have a shell script that does a backup. I set this script in a cron but the problem is that the backup is heavy so it is possible to execute a second rsync before the first ends up.
I thought to launch rsync in a script and then get PID and write a file that script checks if the process exist or not (if this file exist or not).
If I put rsync in background I get the PID but I don't know how to know when rsync ends up but, if I set rsync (no background) I can't get PID before the process finish so I can't write a file whit PID.
I don't know what is the best way to "have rsync control" and know when it finish.
My script
#!/bin/bash
pidfile="/home/${USER}/.rsync_repository"
if [ -f $pidfile ];
then
echo "PID file exists " $(date +"%Y-%m-%d %H:%M:%S")
else
rsync -zrt --delete-before /repository/ /mnt/backup/repositorio/ < /dev/null &
echo $$ > $pidfile
# If I uncomment this 'rm' and rsync is running in background, the file is deleted so I can't "control" when rsync finish
# rm $pidfile
fi
Can anybody help me?!
Thanks in advance !! :)
# check to make sure script isn't still running
# if it's still running then exit this script
sScriptName="$(basename $0)"
if [ $(pidof -x ${sScriptName}| wc -w) -gt 2 ]; then
exit
fi
pidof finds the pid of a process
-x tells it to look for scripts too
${sScriptName} is just the name of the script...you can hardcode this
wc -w returns the word count by words
-gt 2 no more than one instance running (instance plus 1 for the pidof check)
if more than one instance running then exit script
Let me know if this works for you.
Test both for presence of pid file and status of the running process like this:
#!/bin/bash
pidfile="/home/${USER}/.rsync_repository"
is_running =0
if [ -f $pidfile ];
then
echo "PID file exists " $(date +"%Y-%m-%d %H:%M:%S")
previous_pid=`cat $pidfile`
is_running=`ps -ef | grep $previous_pid | wc -l`
fi
if [ $is_running -gt 0 ];
then
echo "Previous process didn't quit yet"
else
rsync -zrt --delete-before /repository/ /mnt/backup/repositorio/ < /dev/null &
echo $$ > $pidfile
fi
Hope this helps!!!

Linux Single Instance Kill if running too long

I am using the following to keep a single instance of a script running on my server. I have a cronjob to run this every minute.
How do I daemonize an arbitrary script in unix?
#!/bin/bash
if [[ $# < 1 ]]; then
echo "Name of pid file not given."
exit
fi
# Get the pid file's name.
PIDFILE=$1
shift
if [[ $# < 1 ]]; then
echo "No command given."
exit
fi
echo "Checking pid in file $PIDFILE."
#Check to see if process running.
PID=$(cat $PIDFILE 2>/dev/null)
if [[ $? = 0 ]]; then
ps -p $PID >/dev/null 2>&1
if [[ $? = 0 ]]; then
echo "Command $1 already running."
exit
fi
fi
# Write our pid to file.
echo $$ >$PIDFILE
# Get command.
COMMAND=$1
shift
# Run command
$COMMAND "$*"
Now I found out that my script had hung for some reason and therefore it was stuck. I'd like a way to check if the $PIDFILE is "old" and if so, kill the process. I know that's possible (check the timestamp on the file) but I don't know the syntax or if this is even a good idea. Also, when this script is running, the CPU should be pretty heavily used. If it hangs (rare but it happened at least once so far), the CPU usage drops to 0%. It would be nice if I could check that the process is really hung/not active, but I don't know if there's an easy way to do that (and I don't want to have many false positives where it gets killed but it's running fine).
To answer the question in your title, which seems quite different from your problem, use timeout.
Now, for your problem, I don't see where it could hang, unless you gave it a fifo queue for the pid file. Now, to run and respawn, you can just run this script once, on startup:
#!/bin/bash
while /bin/true; do
"$#"
wait
done
Which brings up another bug in the code you got from the other question: "$*" will pass all the arguments to the script as a single argument; without the quotes it'll split arguments with white space. "$#" will pass them individually and handling white space properly.
Call with /path/to/script command [argument]....

wait child process but get error: 'pid is not a child of this shell'

I write a script to get data from HDFS parrallel,then I wait these child processes in a for loop, but sometimes it returns "pid is not a child of this shell". sometimes, it works well。It's so puzzled. I use "jobs -l" to show all the jobs run in the background. I am sure these pid is the child process of the shell process, and I use "ps aux" to make sure these pids is note assign to other process. Here is my script.
PID=()
FILE=()
let serial=0
while read index_tar
do
echo $index_tar | grep index > /dev/null 2>&1
if [[ $? -ne 0 ]]
then
continue
fi
suffix=`printf '%03d' $serial`
mkdir input/output_$suffix
$HADOOP_HOME/bin/hadoop fs -cat $index_tar | tar zxf - -C input/output_$suffix \
&& mv input/output_$suffix/index_* input/output_$suffix/index &
PID[$serial]=$!
FILE[$serial]=$index_tar
let serial++
done < file.list
for((i=0;i<$serial;i++))
do
wait ${PID[$i]}
if [[ $? -ne 0 ]]
then
LOG "get ${FILE[$i]} failed, PID:${PID[$i]}"
exit -1
else
LOG "get ${FILE[$i]} success, PID:${PID[$i]}"
fi
done
Just find the process id of the process you want to wait for and replace that with 12345 in below script. Further changes can be made as per your requirement.
#!/bin/sh
PID=12345
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running" >> /home/parv/waitAndRun.log
sleep .6
done
echo "Process $PID has finished" >> /home/parv/waitAndRun.log
/usr/bin/waitingScript.sh
http://iamparv.blogspot.in/2013/10/unix-wait-for-running-process-not-child.html
Either your while loop or the for loop runs in a subshell, which is why you cannot await a child of the (parent, outer) shell.
Edit this might happen if the while loop or for loop is actually
(a) in a {...} block
(b) participating in a piper (e.g. for....done|somepipe)
If you're running this in a container of some sort, the condition apparently can be caused by a bug in bash that is easier to encounter in a containerized envrionment.
From my reading of the bash source (specifically see comments around RECYCLES_PIDS and CHILD_MAX in bash-4.2/jobs.c), it looks like in their effort to optimize their tracking of background jobs, they leave themselves vulnerable to PID aliasing (where a new process might obscure the status of an old one); to mitigate that, they prune their background process history (apparently as mandated by POSIX?). If you should happen to want to wait on a pruned process, the shell can't find it in the history and assumes this to mean that it never knew about it (i.e., that it "is not a child of this shell").

Resources