Multi-threading with bash - multithreading

It may look menacing, but the task is really simple:
server has the following directory structure:
/usr/multi
/1
job
file.a
file.b
/2
job
file.a
file.b
/3
job
file.a
file.b
And the following code:
#this is thread.sh
cd /usr/multi
#find the first directory that has a job file
id=$(ls */job)
#strip everything after "/" ("1/job" becomes "1")
id=${id%%/*}
#read job
read job <$id/job
if [ "$id" == "" ] || [ "$job" == "" ]
then
false
else
#mark that id as busy
mv $id/job $id/_job
#execute the job
script.sh $1 $job
#mark that id as available
mv $id/_job $id/job
fi
script.sh performs some operations (described in job file and received argument) on file.a and file.b.
Clients, on the other hand, execute this code:
#loop infinitely on failure, break loop on success
false
while [ "$?" != "0" ]
do
result=$(ssh $server "thread.sh 'some instructions'" </dev/null)
done
echo $result
So every client gets a separate id and gives the server some instructions to perform the job specified for that id. If there are more clients than available jobs on the server, clients will keep trying to grab the first available id and when they do, give some instructions to the server to perform corresponding job.
The problem is that every once in a while two clients get the same id; and thread.sh messes up the file.a and file.b.
My theory is that this happens when two clients request an id from the server at almost the same time, so that server cannot rename the job file quick enough for one client to see it as available, and the other one to see it as busy.
Should I put random sleep interval just before the if [ "$id" == "" ] || [ "$job" == "" ] so I will get some more randomness in the timing?

As you have correctly determined, your script is quite racy.
A simple locking in bash can be implemented using
set -o noclobber
and writing to a lockfile. If the lock is already being held (the file exists), your write attempt will fail in an atomic manner.

Related

Check script double execution [duplicate]

This question already has answers here:
Quick-and-dirty way to ensure only one instance of a shell script is running at a time
(43 answers)
Closed 4 years ago.
I have a bash script and sometimes happened that, even my script was scheduled, it was executed two times. So I added few code lines to check if the script is already in execution. Initially I hadn't a problem, but in the last three days I had got it again
PID=`echo $$`
PROCESS=${SL_ROOT_FOLDER}/xxx/xxx/xxx_xxx_$PID.txt
ps auxww | grep $scriptToVerify | grep -v $$ | grep -v grep > $PROCESS
num_proc=`awk -v RS='\n' 'END{print NR}' $PROCESS`
if [ $num_proc -gt 1 ];
then
sl_log "---------------------------Warning---------------------------"
sl_log "$scriptToVerify already executed"
sl_log "num proc $num_proc"
sl_log "--------"
sl_log $PROCESS
sl_log "--------"
exit 0;
fi
In this way I check how many rows I've got into my log and if the result is more than one, then I have two process in execution and one will be stopped.
This method doesn't work correctly, though. How can I fix my code to check how many scripts in execution I have?
Anything that involves:
read some state information
check results
do action based on results
finish
must do all three steps at the same time otherwise there is a "race" condition. For example:
(A) reads state
(A) checks results (ok)
(A) does action (ok)
(A) finishes
(B) reads state
(B) checks results (bad)
(B) does action (bad)
(B) finishes
but if timing is slightly different:
(A) reads state
(A) checks results (ok)
(B) reads state
(B) checks results (ok)
(A) does action (ok)
(B) does action (ok)
(A) finishes
(B) finishes
The usual example people give is updating bank balances.
Using your method, you may be able to reduce the frequency of your code running when it shouldn't but you can never avoid it entirely.
A better solution is to use locking. This guarantees that only one process can run at a time. For example, using flock, you can wrap all calls to your script:
flock -x /var/lock/myscript-lockfile myscript
Or, inside your script, you can do something like:
exec 300>/var/lock/myscript-lockfile
flock -x 300
# rest of script
flock -u 300
or:
exec 300>/var/logk/myscript-lockfile
if ! flock -nx 300; then
# another process is running
exit 1
fi
# rest of script
flock -u 300
flock(1)
#!/bin/bash
# Makes sure we exit if flock fails.
set -e
(
# Wait for lock on /var/lock/.myscript.exclusivelock (fd 200) for 10 seconds
flock -x -w 10 200
# Do stuff
) 200>/var/lock/.myscript.exclusivelock
This ensures that code between "(" and ")" is run only by one process at a time and that the process does wait for a lock too long.
Credit goes to Alex B.

how to check if condition more than once in Shell Script?

I am creating a shell script which checks the status of a service running on another machine and if didn't get any response than performs some operation at the local system. I am using if clause in the script for this task.
Sometimes due to the network connection, it falsely assumes that the remote server is not responding and performs the tasks mentioned inside if clause. I want to set up a retry so that it checks if condition more than once when it didn't find any response in the first attempt.
is there any way to setup retry like thing in a shell script for this purpose?
Below is a sample code
RSI1_STATUS=$(psql -U username -h serverip -d postgres -t -c "select version();" )
if [ -z "$RSI1_STATUS" ] #Condition will be true if remote server is not active
then
touch /tmp/postgresql.trigger
fi
now I want to check if condition more than once if it is true in the first attempt.
You could add retry loop with a number of retries using a while loop:
retries=5
while ! check_network_connection && ((--retries)); do
sleep 1 # or probe the network, etc.
done
if [[ $retries -eq 0 ]]; then
echo "Error: Connection retries exhausted."
else
# connection succeeded.
fi
Whether you want to sleep or do something else depends on your usage and the application.
Note: The "network connection" might have succeeded after checking in the loop. So if retries is 0, it doesn't necessarily mean that the connection is still down.

Duplicate Execution Prevention shell script [duplicate]

What's a quick-and-dirty way to make sure that only one instance of a shell script is running at a given time?
Use flock(1) to make an exclusive scoped lock a on file descriptor. This way you can even synchronize different parts of the script.
#!/bin/bash
(
# Wait for lock on /var/lock/.myscript.exclusivelock (fd 200) for 10 seconds
flock -x -w 10 200 || exit 1
# Do stuff
) 200>/var/lock/.myscript.exclusivelock
This ensures that code between ( and ) is run only by one process at a time and that the process doesn’t wait too long for a lock.
Caveat: this particular command is a part of util-linux. If you run an operating system other than Linux, it may or may not be available.
Naive approaches that test the existence of "lock files" are flawed.
Why? Because they don't check whether the file exists and create it in a single atomic action. Because of this; there is a race condition that WILL make your attempts at mutual exclusion break.
Instead, you can use mkdir. mkdir creates a directory if it doesn't exist yet, and if it does, it sets an exit code. More importantly, it does all this in a single atomic action making it perfect for this scenario.
if ! mkdir /tmp/myscript.lock 2>/dev/null; then
echo "Myscript is already running." >&2
exit 1
fi
For all details, see the excellent BashFAQ: http://mywiki.wooledge.org/BashFAQ/045
If you want to take care of stale locks, fuser(1) comes in handy. The only downside here is that the operation takes about a second, so it isn't instant.
Here's a function I wrote once that solves the problem using fuser:
# mutex file
#
# Open a mutual exclusion lock on the file, unless another process already owns one.
#
# If the file is already locked by another process, the operation fails.
# This function defines a lock on a file as having a file descriptor open to the file.
# This function uses FD 9 to open a lock on the file. To release the lock, close FD 9:
# exec 9>&-
#
mutex() {
local file=$1 pid pids
exec 9>>"$file"
{ pids=$(fuser -f "$file"); } 2>&- 9>&-
for pid in $pids; do
[[ $pid = $$ ]] && continue
exec 9>&-
return 1 # Locked by a pid.
done
}
You can use it in a script like so:
mutex /var/run/myscript.lock || { echo "Already running." >&2; exit 1; }
If you don't care about portability (these solutions should work on pretty much any UNIX box), Linux' fuser(1) offers some additional options and there is also flock(1).
Here's an implementation that uses a lockfile and echoes a PID into it. This serves as a protection if the process is killed before removing the pidfile:
LOCKFILE=/tmp/lock.txt
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
echo "already running"
exit
fi
# make sure the lockfile is removed when we exit and then claim it
trap "rm -f ${LOCKFILE}; exit" INT TERM EXIT
echo $$ > ${LOCKFILE}
# do stuff
sleep 1000
rm -f ${LOCKFILE}
The trick here is the kill -0 which doesn't deliver any signal but just checks if a process with the given PID exists. Also the call to trap will ensure that the lockfile is removed even when your process is killed (except kill -9).
There's a wrapper around the flock(2) system call called, unimaginatively, flock(1). This makes it relatively easy to reliably obtain exclusive locks without worrying about cleanup etc. There are examples on the man page as to how to use it in a shell script.
To make locking reliable you need an atomic operation. Many of the above proposals
are not atomic. The proposed lockfile(1) utility looks promising as the man-page
mentioned, that its "NFS-resistant". If your OS does not support lockfile(1) and
your solution has to work on NFS, you have not many options....
NFSv2 has two atomic operations:
symlink
rename
With NFSv3 the create call is also atomic.
Directory operations are NOT atomic under NFSv2 and NFSv3 (please refer to the book 'NFS Illustrated' by Brent Callaghan, ISBN 0-201-32570-5; Brent is a NFS-veteran at Sun).
Knowing this, you can implement spin-locks for files and directories (in shell, not PHP):
lock current dir:
while ! ln -s . lock; do :; done
lock a file:
while ! ln -s ${f} ${f}.lock; do :; done
unlock current dir (assumption, the running process really acquired the lock):
mv lock deleteme && rm deleteme
unlock a file (assumption, the running process really acquired the lock):
mv ${f}.lock ${f}.deleteme && rm ${f}.deleteme
Remove is also not atomic, therefore first the rename (which is atomic) and then the remove.
For the symlink and rename calls, both filenames have to reside on the same filesystem. My proposal: use only simple filenames (no paths) and put file and lock into the same directory.
You need an atomic operation, like flock, else this will eventually fail.
But what to do if flock is not available. Well there is mkdir. That's an atomic operation too. Only one process will result in a successful mkdir, all others will fail.
So the code is:
if mkdir /var/lock/.myscript.exclusivelock
then
# do stuff
:
rmdir /var/lock/.myscript.exclusivelock
fi
You need to take care of stale locks else aftr a crash your script will never run again.
Another option is to use shell's noclobber option by running set -C. Then > will fail if the file already exists.
In brief:
set -C
lockfile="/tmp/locktest.lock"
if echo "$$" > "$lockfile"; then
echo "Successfully acquired lock"
# do work
rm "$lockfile" # XXX or via trap - see below
else
echo "Cannot acquire lock - already locked by $(cat "$lockfile")"
fi
This causes the shell to call:
open(pathname, O_CREAT|O_EXCL)
which atomically creates the file or fails if the file already exists.
According to a comment on BashFAQ 045, this may fail in ksh88, but it works in all my shells:
$ strace -e trace=creat,open -f /bin/bash /home/mikel/bin/testopen 2>&1 | grep -F testopen.lock
open("/tmp/testopen.lock", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0666) = 3
$ strace -e trace=creat,open -f /bin/zsh /home/mikel/bin/testopen 2>&1 | grep -F testopen.lock
open("/tmp/testopen.lock", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_LARGEFILE, 0666) = 3
$ strace -e trace=creat,open -f /bin/pdksh /home/mikel/bin/testopen 2>&1 | grep -F testopen.lock
open("/tmp/testopen.lock", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0666) = 3
$ strace -e trace=creat,open -f /bin/dash /home/mikel/bin/testopen 2>&1 | grep -F testopen.lock
open("/tmp/testopen.lock", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0666) = 3
Interesting that pdksh adds the O_TRUNC flag, but obviously it's redundant:
either you're creating an empty file, or you're not doing anything.
How you do the rm depends on how you want unclean exits to be handled.
Delete on clean exit
New runs fail until the issue that caused the unclean exit to be resolved and the lockfile is manually removed.
# acquire lock
# do work (code here may call exit, etc.)
rm "$lockfile"
Delete on any exit
New runs succeed provided the script is not already running.
trap 'rm "$lockfile"' EXIT
You can use GNU Parallel for this as it works as a mutex when called as sem. So, in concrete terms, you can use:
sem --id SCRIPTSINGLETON yourScript
If you want a timeout too, use:
sem --id SCRIPTSINGLETON --semaphoretimeout -10 yourScript
Timeout of <0 means exit without running script if semaphore is not released within the timeout, timeout of >0 mean run the script anyway.
Note that you should give it a name (with --id) else it defaults to the controlling terminal.
GNU Parallel is a very simple install on most Linux/OSX/Unix platforms - it is just a Perl script.
For shell scripts, I tend to go with the mkdir over flock as it makes the locks more portable.
Either way, using set -e isn't enough. That only exits the script if any command fails. Your locks will still be left behind.
For proper lock cleanup, you really should set your traps to something like this psuedo code (lifted, simplified and untested but from actively used scripts) :
#=======================================================================
# Predefined Global Variables
#=======================================================================
TMPDIR=/tmp/myapp
[[ ! -d $TMP_DIR ]] \
&& mkdir -p $TMP_DIR \
&& chmod 700 $TMPDIR
LOCK_DIR=$TMP_DIR/lock
#=======================================================================
# Functions
#=======================================================================
function mklock {
__lockdir="$LOCK_DIR/$(date +%s.%N).$$" # Private Global. Use Epoch.Nano.PID
# If it can create $LOCK_DIR then no other instance is running
if $(mkdir $LOCK_DIR)
then
mkdir $__lockdir # create this instance's specific lock in queue
LOCK_EXISTS=true # Global
else
echo "FATAL: Lock already exists. Another copy is running or manually lock clean up required."
exit 1001 # Or work out some sleep_while_execution_lock elsewhere
fi
}
function rmlock {
[[ ! -d $__lockdir ]] \
&& echo "WARNING: Lock is missing. $__lockdir does not exist" \
|| rmdir $__lockdir
}
#-----------------------------------------------------------------------
# Private Signal Traps Functions {{{2
#
# DANGER: SIGKILL cannot be trapped. So, try not to `kill -9 PID` or
# there will be *NO CLEAN UP*. You'll have to manually remove
# any locks in place.
#-----------------------------------------------------------------------
function __sig_exit {
# Place your clean up logic here
# Remove the LOCK
[[ -n $LOCK_EXISTS ]] && rmlock
}
function __sig_int {
echo "WARNING: SIGINT caught"
exit 1002
}
function __sig_quit {
echo "SIGQUIT caught"
exit 1003
}
function __sig_term {
echo "WARNING: SIGTERM caught"
exit 1015
}
#=======================================================================
# Main
#=======================================================================
# Set TRAPs
trap __sig_exit EXIT # SIGEXIT
trap __sig_int INT # SIGINT
trap __sig_quit QUIT # SIGQUIT
trap __sig_term TERM # SIGTERM
mklock
# CODE
exit # No need for cleanup code here being in the __sig_exit trap function
Here's what will happen. All traps will produce an exit so the function __sig_exit will always happen (barring a SIGKILL) which cleans up your locks.
Note: my exit values are not low values. Why? Various batch processing systems make or have expectations of the numbers 0 through 31. Setting them to something else, I can have my scripts and batch streams react accordingly to the previous batch job or script.
Really quick and really dirty? This one-liner on the top of your script will work:
[[ $(pgrep -c "`basename \"$0\"`") -gt 1 ]] && exit
Of course, just make sure that your script name is unique. :)
Here's an approach that combines atomic directory locking with a check for stale lock via PID and restart if stale. Also, this does not rely on any bashisms.
#!/bin/dash
SCRIPTNAME=$(basename $0)
LOCKDIR="/var/lock/${SCRIPTNAME}"
PIDFILE="${LOCKDIR}/pid"
if ! mkdir $LOCKDIR 2>/dev/null
then
# lock failed, but check for stale one by checking if the PID is really existing
PID=$(cat $PIDFILE)
if ! kill -0 $PID 2>/dev/null
then
echo "Removing stale lock of nonexistent PID ${PID}" >&2
rm -rf $LOCKDIR
echo "Restarting myself (${SCRIPTNAME})" >&2
exec "$0" "$#"
fi
echo "$SCRIPTNAME is already running, bailing out" >&2
exit 1
else
# lock successfully acquired, save PID
echo $$ > $PIDFILE
fi
trap "rm -rf ${LOCKDIR}" QUIT INT TERM EXIT
echo hello
sleep 30s
echo bye
Create a lock file in a known location and check for existence on script start? Putting the PID in the file might be helpful if someone's attempting to track down an errant instance that's preventing execution of the script.
This example is explained in the man flock, but it needs some impovements, because we should manage bugs and exit codes:
#!/bin/bash
#set -e this is useful only for very stupid scripts because script fails when anything command exits with status more than 0 !! without possibility for capture exit codes. not all commands exits >0 are failed.
( #start subprocess
# Wait for lock on /var/lock/.myscript.exclusivelock (fd 200) for 10 seconds
flock -x -w 10 200
if [ "$?" != "0" ]; then echo Cannot lock!; exit 1; fi
echo $$>>/var/lock/.myscript.exclusivelock #for backward lockdir compatibility, notice this command is executed AFTER command bottom ) 200>/var/lock/.myscript.exclusivelock.
# Do stuff
# you can properly manage exit codes with multiple command and process algorithm.
# I suggest throw this all to external procedure than can properly handle exit X commands
) 200>/var/lock/.myscript.exclusivelock #exit subprocess
FLOCKEXIT=$? #save exitcode status
#do some finish commands
exit $FLOCKEXIT #return properly exitcode, may be usefull inside external scripts
You can use another method, list processes that I used in the past. But this is more complicated that method above. You should list processes by ps, filter by its name, additional filter grep -v grep for remove parasite nad finally count it by grep -c . and compare with number. Its complicated and uncertain
The existing answers posted either rely on the CLI utility flock or do not properly secure the lock file. The flock utility is not available on all non-Linux systems (i.e. FreeBSD), and does not work properly on NFS.
In my early days of system administration and system development, I was told that a safe and relatively portable method of creating a lock file was to create a temp file using mkemp(3) or mkemp(1), write identifying information to the temp file (i.e. PID), then hard link the temp file to the lock file. If the link was successful, then you have successfully obtained the lock.
When using locks in shell scripts, I typically place an obtain_lock() function in a shared profile and then source it from the scripts. Below is an example of my lock function:
obtain_lock()
{
LOCK="${1}"
LOCKDIR="$(dirname "${LOCK}")"
LOCKFILE="$(basename "${LOCK}")"
# create temp lock file
TMPLOCK=$(mktemp -p "${LOCKDIR}" "${LOCKFILE}XXXXXX" 2> /dev/null)
if test "x${TMPLOCK}" == "x";then
echo "unable to create temporary file with mktemp" 1>&2
return 1
fi
echo "$$" > "${TMPLOCK}"
# attempt to obtain lock file
ln "${TMPLOCK}" "${LOCK}" 2> /dev/null
if test $? -ne 0;then
rm -f "${TMPLOCK}"
echo "unable to obtain lockfile" 1>&2
if test -f "${LOCK}";then
echo "current lock information held by: $(cat "${LOCK}")" 1>&2
fi
return 2
fi
rm -f "${TMPLOCK}"
return 0;
};
The following is an example of how to use the lock function:
#!/bin/sh
. /path/to/locking/profile.sh
PROG_LOCKFILE="/tmp/myprog.lock"
clean_up()
{
rm -f "${PROG_LOCKFILE}"
}
obtain_lock "${PROG_LOCKFILE}"
if test $? -ne 0;then
exit 1
fi
trap clean_up SIGHUP SIGINT SIGTERM
# bulk of script
clean_up
exit 0
# end of script
Remember to call clean_up at any exit points in your script.
I've used the above in both Linux and FreeBSD environments.
Add this line at the beginning of your script
[ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -en "$0" "$0" "$#" || :
It's a boilerplate code from man flock.
If you want more logging, use this one
[ "${FLOCKER}" != "$0" ] && { echo "Trying to start build from queue... "; exec bash -c "FLOCKER='$0' flock -E $E_LOCKED -en '$0' '$0' '$#' || if [ \"\$?\" -eq $E_LOCKED ]; then echo 'Locked.'; fi"; } || echo "Lock is free. Completing."
This sets and checks locks using flock utility.
This code detects if it was run first time by checking FLOCKER variable, if it is not set to script name, then it tries to start script again recursively using flock and with FLOCKER variable initialized, if FLOCKER is set correctly, then flock on previous iteration succeeded and it is OK to proceed. If lock is busy, it fails with configurable exit code.
It seems to not work on Debian 7, but seems to work back again with experimental util-linux 2.25 package. It writes "flock: ... Text file busy". It could be overridden by disabling write permission on your script.
When targeting a Debian machine I find the lockfile-progs package to be a good solution. procmail also comes with a lockfile tool. However sometimes I am stuck with neither of these.
Here's my solution which uses mkdir for atomic-ness and a PID file to detect stale locks. This code is currently in production on a Cygwin setup and works well.
To use it simply call exclusive_lock_require when you need get exclusive access to something. An optional lock name parameter lets you share locks between different scripts. There's also two lower level functions (exclusive_lock_try and exclusive_lock_retry) should you need something more complex.
function exclusive_lock_try() # [lockname]
{
local LOCK_NAME="${1:-`basename $0`}"
LOCK_DIR="/tmp/.${LOCK_NAME}.lock"
local LOCK_PID_FILE="${LOCK_DIR}/${LOCK_NAME}.pid"
if [ -e "$LOCK_DIR" ]
then
local LOCK_PID="`cat "$LOCK_PID_FILE" 2> /dev/null`"
if [ ! -z "$LOCK_PID" ] && kill -0 "$LOCK_PID" 2> /dev/null
then
# locked by non-dead process
echo "\"$LOCK_NAME\" lock currently held by PID $LOCK_PID"
return 1
else
# orphaned lock, take it over
( echo $$ > "$LOCK_PID_FILE" ) 2> /dev/null && local LOCK_PID="$$"
fi
fi
if [ "`trap -p EXIT`" != "" ]
then
# already have an EXIT trap
echo "Cannot get lock, already have an EXIT trap"
return 1
fi
if [ "$LOCK_PID" != "$$" ] &&
! ( umask 077 && mkdir "$LOCK_DIR" && umask 177 && echo $$ > "$LOCK_PID_FILE" ) 2> /dev/null
then
local LOCK_PID="`cat "$LOCK_PID_FILE" 2> /dev/null`"
# unable to acquire lock, new process got in first
echo "\"$LOCK_NAME\" lock currently held by PID $LOCK_PID"
return 1
fi
trap "/bin/rm -rf \"$LOCK_DIR\"; exit;" EXIT
return 0 # got lock
}
function exclusive_lock_retry() # [lockname] [retries] [delay]
{
local LOCK_NAME="$1"
local MAX_TRIES="${2:-5}"
local DELAY="${3:-2}"
local TRIES=0
local LOCK_RETVAL
while [ "$TRIES" -lt "$MAX_TRIES" ]
do
if [ "$TRIES" -gt 0 ]
then
sleep "$DELAY"
fi
local TRIES=$(( $TRIES + 1 ))
if [ "$TRIES" -lt "$MAX_TRIES" ]
then
exclusive_lock_try "$LOCK_NAME" > /dev/null
else
exclusive_lock_try "$LOCK_NAME"
fi
LOCK_RETVAL="${PIPESTATUS[0]}"
if [ "$LOCK_RETVAL" -eq 0 ]
then
return 0
fi
done
return "$LOCK_RETVAL"
}
function exclusive_lock_require() # [lockname] [retries] [delay]
{
if ! exclusive_lock_retry "$#"
then
exit 1
fi
}
If flock's limitations, which have already been described elsewhere on this thread, aren't an issue for you, then this should work:
#!/bin/bash
{
# exit if we are unable to obtain a lock; this would happen if
# the script is already running elsewhere
# note: -x (exclusive) is the default
flock -n 100 || exit
# put commands to run here
sleep 100
} 100>/tmp/myjob.lock
Some unixes have lockfile which is very similar to the already mentioned flock.
From the manpage:
lockfile can be used to create one
or more semaphore files. If lock-
file can't create all the specified
files (in the specified order), it
waits sleeptime (defaults to 8)
seconds and retries the last file that
didn't succeed. You can specify the
number of retries to do until
failure is returned. If the number
of retries is -1 (default, i.e.,
-r-1) lockfile will retry forever.
I use a simple approach that handles stale lock files.
Note that some of the above solutions that store the pid, ignore the fact that the pid can wrap around. So - just checking if there is a valid process with the stored pid is not enough, especially for long running scripts.
I use noclobber to make sure only one script can open and write to the lock file at one time. Further, I store enough information to uniquely identify a process in the lockfile. I define the set of data to uniquely identify a process to be pid,ppid,lstart.
When a new script starts up, if it fails to create the lock file, it then verifies that the process that created the lock file is still around. If not, we assume the original process died an ungraceful death, and left a stale lock file. The new script then takes ownership of the lock file, and all is well the world, again.
Should work with multiple shells across multiple platforms. Fast, portable and simple.
#!/usr/bin/env sh
# Author: rouble
LOCKFILE=/var/tmp/lockfile #customize this line
trap release INT TERM EXIT
# Creates a lockfile. Sets global variable $ACQUIRED to true on success.
#
# Returns 0 if it is successfully able to create lockfile.
acquire () {
set -C #Shell noclobber option. If file exists, > will fail.
UUID=`ps -eo pid,ppid,lstart $$ | tail -1`
if (echo "$UUID" > "$LOCKFILE") 2>/dev/null; then
ACQUIRED="TRUE"
return 0
else
if [ -e $LOCKFILE ]; then
# We may be dealing with a stale lock file.
# Bring out the magnifying glass.
CURRENT_UUID_FROM_LOCKFILE=`cat $LOCKFILE`
CURRENT_PID_FROM_LOCKFILE=`cat $LOCKFILE | cut -f 1 -d " "`
CURRENT_UUID_FROM_PS=`ps -eo pid,ppid,lstart $CURRENT_PID_FROM_LOCKFILE | tail -1`
if [ "$CURRENT_UUID_FROM_LOCKFILE" == "$CURRENT_UUID_FROM_PS" ]; then
echo "Script already running with following identification: $CURRENT_UUID_FROM_LOCKFILE" >&2
return 1
else
# The process that created this lock file died an ungraceful death.
# Take ownership of the lock file.
echo "The process $CURRENT_UUID_FROM_LOCKFILE is no longer around. Taking ownership of $LOCKFILE"
release "FORCE"
if (echo "$UUID" > "$LOCKFILE") 2>/dev/null; then
ACQUIRED="TRUE"
return 0
else
echo "Cannot write to $LOCKFILE. Error." >&2
return 1
fi
fi
else
echo "Do you have write permissons to $LOCKFILE ?" >&2
return 1
fi
fi
}
# Removes the lock file only if this script created it ($ACQUIRED is set),
# OR, if we are removing a stale lock file (first parameter is "FORCE")
release () {
#Destroy lock file. Take no prisoners.
if [ "$ACQUIRED" ] || [ "$1" == "FORCE" ]; then
rm -f $LOCKFILE
fi
}
# Test code
# int main( int argc, const char* argv[] )
echo "Acquring lock."
acquire
if [ $? -eq 0 ]; then
echo "Acquired lock."
read -p "Press [Enter] key to release lock..."
release
echo "Released lock."
else
echo "Unable to acquire lock."
fi
I wanted to do away with lockfiles, lockdirs, special locking programs and even pidof since it isn't found on all Linux installations. Also wanted to have the simplest code possible (or at least as few lines as possible). Simplest if statement, in one line:
if [[ $(ps axf | awk -v pid=$$ '$1!=pid && $6~/'$(basename $0)'/{print $1}') ]]; then echo "Already running"; exit; fi
Actually although the answer of bmdhacks is almost good, there is a slight chance the second script to run after first checked the lockfile and before it wrote it. So they both will write the lock file and they will both be running. Here is how to make it work for sure:
lockfile=/var/lock/myscript.lock
if ( set -o noclobber; echo "$$" > "$lockfile") 2> /dev/null ; then
trap 'rm -f "$lockfile"; exit $?' INT TERM EXIT
else
# or you can decide to skip the "else" part if you want
echo "Another instance is already running!"
fi
The noclobber option will make sure that redirect command will fail if file already exists. So the redirect command is actually atomic - you write and check the file with one command. You don't need to remove the lockfile at the end of file - it'll be removed by the trap. I hope this helps to people that will read it later.
P.S. I didn't see that Mikel already answered the question correctly, although he didn't include the trap command to reduce the chance the lock file will be left over after stopping the script with Ctrl-C for example. So this is the complete solution
An example with flock(1) but without subshell. flock()ed file /tmp/foo is never removed, but that doesn't matter as it gets flock() and un-flock()ed.
#!/bin/bash
exec 9<> /tmp/foo
flock -n 9
RET=$?
if [[ $RET -ne 0 ]] ; then
echo "lock failed, exiting"
exit
fi
#Now we are inside the "critical section"
echo "inside lock"
sleep 5
exec 9>&- #close fd 9, and release lock
#The part below is outside the critical section (the lock)
echo "lock released"
sleep 5
This one line answer comes from someone related Ask Ubuntu Q&A:
[ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -en "$0" "$0" "$#" || :
# This is useful boilerplate code for shell scripts. Put it at the top of
# the shell script you want to lock and it'll automatically lock itself on
# the first run. If the env var $FLOCKER is not set to the shell script
# that is being run, then execute flock and grab an exclusive non-blocking
# lock (using the script itself as the lock file) before re-execing itself
# with the right arguments. It also sets the FLOCKER env var to the right
# value so it doesn't run again.
PID and lockfiles are definitely the most reliable. When you attempt to run the program, it can check for the lockfile which and if it exists, it can use ps to see if the process is still running. If it's not, the script can start, updating the PID in the lockfile to its own.
I find that bmdhack's solution is the most practical, at least for my use case. Using flock and lockfile rely on removing the lockfile using rm when the script terminates, which can't always be guaranteed (e.g., kill -9).
I would change one minor thing about bmdhack's solution: It makes a point of removing the lock file, without stating that this is unnecessary for the safe working of this semaphore. His use of kill -0 ensures that an old lockfile for a dead process will simply be ignored/over-written.
My simplified solution is therefore to simply add the following to the top of your singleton:
## Test the lock
LOCKFILE=/tmp/singleton.lock
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
echo "Script already running. bye!"
exit
fi
## Set the lock
echo $$ > ${LOCKFILE}
Of course, this script still has the flaw that processes that are likely to start at the same time have a race hazard, as the lock test and set operations are not a single atomic action. But the proposed solution for this by lhunath to use mkdir has the flaw that a killed script may leave behind the directory, thus preventing other instances from running.
The semaphoric utility uses flock (as discussed above, e.g. by presto8) to implement a counting semaphore. It enables any specific number of concurrent processes you want. We use it to limit the level of concurrency of various queue worker processes.
It's like sem but much lighter-weight. (Full disclosure: I wrote it after finding the sem was way too heavy for our needs and there wasn't a simple counting semaphore utility available.)
Answered a million times already, but another way, without the need for external dependencies:
LOCK_FILE="/var/lock/$(basename "$0").pid"
trap "rm -f ${LOCK_FILE}; exit" INT TERM EXIT
if [[ -f $LOCK_FILE && -d /proc/`cat $LOCK_FILE` ]]; then
// Process already exists
exit 1
fi
echo $$ > $LOCK_FILE
Each time it writes the current PID ($$) into the lockfile and on script startup checks if a process is running with the latest PID.
Using the process's lock is much stronger and takes care of the ungraceful exits also.
lock_file is kept open as long as the process is running. It will be closed (by shell) once the process exists (even if it gets killed).
I found this to be very efficient:
lock_file=/tmp/`basename $0`.lock
if fuser $lock_file > /dev/null 2>&1; then
echo "WARNING: Other instance of $(basename $0) running."
exit 1
fi
exec 3> $lock_file
I use oneliner # the very beginning of script:
#!/bin/bash
if [[ $(pgrep -afc "$(basename "$0")") -gt "1" ]]; then echo "Another instance of "$0" has already been started!" && exit; fi
.
the_beginning_of_actual_script
It is good to see the presence of process in the memory (no matter what the status of process is); but it does the job for me.
If you do not want to or cannot use flock (e.g. you are not using a shared file system), consider using an external service like lockable.
It exposes advisory lock primitives, much like flock would. In particular, you can acquire a lock via:
https://lockable.dev/api/acquire/my-lock-name
and release it via
https://lockable.dev/api/release/my-lock-name
By wrapping script execution with lock acquisition and release, you can make sure only a single instance of the process is running at any given time.

Linux - Run script after time period expires

I have a small NodeJS script that does some processing. Depending on the amount of data needing to be processed, this can take a couple of seconds to hours.
What I want is to do is schedule this command to run every hour after the previous attempt has completed. I'm wary of using something like cron because I need to ensure that two instances of the script aren't running at the same.
If you really don't like cron (or at) you can just use a simple bash script:
#!/bin/bash
while true
do
#Do something
echo Invoke long-running node.js script
#Wait an hour
sleep 3600
done
The (obvious) drawback is that you will have to make it run in background somehow (i.e. via nohup or screen) and add a proper error handling (taking that you script might fail, and you still want it to run again in an hour).
A bit more elaborate "custom script" solution might be like that:
#!/bin/bash
#Settings
LAST_RUN_FILE=/var/run/lock/hourly.timestamp
FLOCK_LOCK_FILE=/var/run/lock/hourly.lock
FLOCK_FD=100
#Minimum time to wait between two job runs
MIN_DELAY=3600
#Welcome message, parameter check
if [ -z $1 ]
then
echo "Please specify the command (job) to run, as follows:"
echo "./hourly COMMAND"
exit 1
fi
echo "[$(date)] MIN_DELAY=$MIN_DELAY seconds, JOB=$#"
#Set an exclusive lock, or skip execution if it is already set
eval "exec $FLOCK_FD>$FLOCK_LOCK_FILE"
if ! flock -n $FLOCK_FD
then
echo "Lock is already set, skipping execution."
exit 0
fi
#Last run timestamp
if ! [ -e $LAST_RUN_FILE ]
then
echo "Timestamp file ($LAST_RUN_FILE) is missing, creating a new one."
echo 0 >$LAST_RUN_FILE
fi
#Compute delay, and wait
let DELAY="$MIN_DELAY-($(date +%s)-$(cat $LAST_RUN_FILE))"
if [ $DELAY -gt 0 ]
then
echo "Waiting for $DELAY seconds, before proceeding..."
sleep $DELAY
fi
#Proceed with an actual task
echo "[$(date)] Running the task..."
echo
"$#"
#Update the last run timestamp
echo
echo "Done, going to update the last run timestamp now."
date +%s >$LAST_RUN_FILE
This will do 2 things:
Set an exclusive execution lock (with flock), so that no two instances of the job will run at the same time, irregardless of how you start them (manually or via cron e.t.c.);
If the last job was completed less then MIN_DELAY seconds ago,
it will sleep for the remaining time, before running the job again;
Now, if you schedule this script to run, say every 15 minutes with cron, like that:
* * * * * /home/myuser/hourly my_periodic_task and it's arguments
It will be guaranteed to execute with the fixed delay of at least MIN_DELAY (one hour) since the last job completed, and any intermediate runs will be skipped.
In the worst case, it will execute in MIN_DELAY + 15 minutes,
(as the scheduling period is discrete), but never earlier than that.
Other non-cron scheduling methods should work too (i.e. just running this script in a loop, or re-scheduling and each run with at).
You can use a cron and add process.exit(0) to your node script

How to check if file size is not incrementing ,if not then kill the $$ of script

I am trying to figure out a way to monitor the files I am dumping from my script. If there is no increment seen in child files then kill my script.
I am doing this to free up the resources when not needed. Here is what I think of , but I think my apporch is going to add burden to CPU. Can anyone please suggest more efficent way of doing this?
Below script is suppose to poll in every 15 sec and collect two file size of same file, if the size of the two samples are same then exit.
checkUsage() {
while true; do
sleep 15
fileSize=$(stat -c%s $1)
sleep 10;
fileSizeNew=$(stat -c%s $1)
if [ "$fileSize" == "$fileSizeNew" ]; then
echo -e "[Error]: No activity noted on this window from 5 sec. Exiting..."
kill -9 $$
fi
done
}
And I am planning to call it as follow (in background):
checkUsage /var/log/messages &
I can also get solution if, someone suggest how to monitor tail command and if nothing printing on tail then exit. NOT SURE WHY PEOPLE ARE CONFUSED. End goal of this question is to ,check if the some file is edited in last 15 seconds. If not exit or throw some error.
I have achived this by above script,but I don't know if this is the smartest way of achiveing this. I have asked this question to know views from other if there is any alternative way or better way of doing it.
I would based the check on file modification time instead of size, so something like this (untested code):
checkUsage() {
while true; do
# Test if file mtime is 'second arg' seconds older than date, default to 10 seconds
if [ $(( $(date +"%s") - $(stat -c%Y /var/log/message) )) -gt ${2-10} ]; then
echo -e "[Error]: No activity noted on this window from ${2-10} sec. Exiting..."
return 1
fi
#Sleep 'first arg' second, 15 seconds by default
sleep ${1-15}
done
}
The idea is to compare the file mtime with current time, if it's greater than second argument seconds, print the message and return.
And then I would call it like this later (or with no args to use defaults):
[ checkusage 20 10 ] || exit 1
Which would exit the script with code 1 as when the function return from it's infinite loop (as long as the file is modified)
Edit: reading me again, the target file could be a parameter too, to allow a better reuse of the function, left as an exercise to the reader.
If on Linux, in a local file system (Ext4, BTRFS, ...) -not a network file system- then you could consider inotify(7) facilities: something could be triggered when some file or directory changes or is accessed.
In particular, you might have some incron job thru incrontab(5) file; maybe it could communicate with some other job ...
PS. I am not sure to understand what you really want to do...
I suppose an external programme is modifying /var/log/messages.
If this is the case, below is my script (with minor changes to yours)
#Bash script to monitor changes to file
#!/bin/bash
checkUsage() # Note that U is in caps
{
while true
do
sleep 15
fileSize=$(stat -c%s $1)
sleep 10;
fileSizeNew=$(stat -c%s $1)
if [ "$fileSize" == "$fileSizeNew" ]
then
echo -e "[Notice : ] no changes noted in $1 : gracefully exiting"
exit # previously this was kill -9 $$
# changing this to exit would end the program gracefully.
# use kill -9 to kill a process which is not under your control.
# kill -9 sends the SIGKILL signal.
fi
done
}
checkUsage $1 # I have added this to your script
#End of the script
Save the script as checkusage and run it like :
./checkusage /var/log/messages &
Edit :
Since you're looking for better solutions I would suggest inotifywait, thanks for the suggestion from the other answerer.
Below would be my code :
while inotifywait -t 10 -q -e modify $1 >/dev/null
do
sleep 15 # as you said the polling would happen in 15 seconds.
done
echo "Script exited gracefully : $1 has not been changed"
Below are the details from the inotifywait manpage
-t <seconds>, --timeout <seconds> Exit if an appropriate event has not occurred within <seconds> seconds. If <seconds> is zero (the default),
wait indefinitely for an event.
-e <event>, --event <event> Listen for specific event(s) only. The events which can be listened for are listed in the EVENTS section.
This option can be specified more than once. If omitted, all events
are listened for.
-q, --quiet If specified once, the program will be less verbose. Specifically, it will not state when it has completed establishing all
inotify watches.
modify(Event) A watched file or a file within a watched directory was
written to.
Notes
You might have to install the inotify-tools first to make use of the inotifywait command. Check the inotify-tools page at Github.

Resources