Prevent race condition when creating lock file - linux

My script must not be run more than once concurrently. So it creates a lock file, and deletes it before exiting. It checks that lock file doesn't exist before starting its work.
A very common approach to locking is something like this:
function setupLockFile() {
if (set -o noclobber; echo "lock" > "$lockfile") 2>/dev/null; then
trap "rm -f $lockfile; exit $?" INT TERM EXIT
else
echo "Script running... exiting!"
exit 1
fi
}
However there is a race condition - the if creates the file if it doesn't exist, and the script could be terminated before the trap is defined. Then the lockfile will not be deleted.
So what is a safe way to do this?

That's not a race - it's resilience to failure. In situations where the script dies before it can remove the file, you need manual cleanup.
The usual way to try and automate this cleanup is to read the PID from any existing file, test to see if the process still exists, and essentially ignore its existence if it doesn't. Unfortunately without an atomic compare-and-set operation that's not trivial to do correctly, since it introduces a new race, between the read of the PID and someone else trying to ignore its existence.
Check out this question for more ideas around locking using just the file system.
My advice is to either store the lock file on a temporary filesystem (/var/run is usually tmpfs to permit pidfiles to disappear safely on reboot) so that things fix themselves after a reboot, or have the script throw up its hands and ask for manual intervention. Handling every failure case reliably increases complexity and thus probably introduces more probability of failure than asking a human for help.
And complexity isn't just today, it's for the lifetime of the code. It might be correct when you're done, but will the next person along break it?

Let's try another approach:
set up the trap before lock file is created
store PID in the lock file
make the trap check if the PID of current instance matches whatever is in the lockfile
For example:
trap "cleanUp" INT TERM EXIT
function cleanUp {
if [[ $$ -eq $(<$lockfile) ]]; then
rm -f $lockfile
exit $?
fi
}
function setupLockFile {
if ! (set -o noclobber; echo "$$" > "$lockfile") 2>/dev/null; then
echo "Script running... exiting!"
exit 1
fi
}
This way you keep the check for lock file existence and its creation as a single operation, while also preventing the trap from deleting a lockfile of a previously running instance.
Additionally, as I mentioned in the comments below, in case the lock file already exist I'd suggest to check if a process with given PID is running.
Because you never know if for whatever reason the lock file can still remain orphaned on the disk.
So if you want to mitigate the need for manual removal of orphaned lock fiels, you can add additional logic to check if the PID is orphaned or not.
For example - if no running process with given PID from the lock file not found, you can assume that this is an orphaned lock file from aprevious instance that, and you can overwrite it with your current PID and continue.
If a process is found, you can compare its name to see if it really is another instance of the same script or not - if not, you can overwrite the PID in the lock file and continue.
I did not include this in the code to keep it simple, you can try to create this logic by yourself if you want. :)

First check for lockfile, then trap, then write to it:
function setupLockFile() {
if [ -f "$lockfile" ]; then
echo "Script running... exiting!"
exit 1
else trap "rm -f $lockfile; exit $?" INT TERM EXIT
set -o noclobber; echo "lock" > "$lockfile" || exit 1
fi
}
And there is an "official" way to check lockfiles with flock command which is part of util-linux.

Related

How best to implement atomic update on a file inside a bash script [duplicate]

This question already has answers here:
Quick-and-dirty way to ensure only one instance of a shell script is running at a time
(43 answers)
Closed 3 years ago.
I have a script which has multiple functions, running in parallel which checks a file and updates it frequently. I dont want two functions to update the file at the same time and create an issue. So what will be the best way to have an atomic update. I have the following so far.
counter(){
a=$1
while true;do
if [ ! -e /tmp/counter.lock ];then
touch /tmp/counter.lock
curr_count=`cat /tmp/count.txt`
n_count=`echo "${curr_count} + $a" | bc`
echo ${n_count} > /tmp/count.txt
rm -fv /tmp/counter.lock
break
fi
sleep 1
done
}
I am not sure how to convert my function to use flock, since it uses file descriptor and it might create issue if I call this function multiple time(or I think so.)
flock works by letting anyone open the lock file, but blocking if someone else locks it first. In your code, a second process could test for the existence of the lock after you see it doesn't exist but before you actually create it.
counter () {
a=$1
{
flock -s 200
read current_count < /tmp/count.xt
...
echo new_count > /tmp/count.txt
} 200> /tmp/counter.lock
}
Here, two processes can open /tmp/counter.lock for writing. In one process, flock will get the lock and exit immediately. In the other, flock will block until the first process releases the lock by closing its file descriptor once the command block completes.

Raise error in a Bash script

I want to raise an error in a Bash script with message "Test cases Failed !!!". How to do this in Bash?
For example:
if [ condition ]; then
raise error "Test cases failed !!!"
fi
This depends on where you want the error message be stored.
You can do the following:
echo "Error!" > logfile.log
exit 125
Or the following:
echo "Error!" 1>&2
exit 64
When you raise an exception you stop the program's execution.
You can also use something like exit xxx where xxx is the error code you may want to return to the operating system (from 0 to 255). Here 125 and 64 are just random codes you can exit with. When you need to indicate to the OS that the program stopped abnormally (eg. an error occurred), you need to pass a non-zero exit code to exit.
As #chepner pointed out, you can do exit 1, which will mean an unspecified error.
Basic error handling
If your test case runner returns a non-zero code for failed tests, you can simply write:
test_handler test_case_x; test_result=$?
if ((test_result != 0)); then
printf '%s\n' "Test case x failed" >&2 # write error message to stderr
exit 1 # or exit $test_result
fi
Or even shorter:
if ! test_handler test_case_x; then
printf '%s\n' "Test case x failed" >&2
exit 1
fi
Or the shortest:
test_handler test_case_x || { printf '%s\n' "Test case x failed" >&2; exit 1; }
To exit with test_handler's exit code:
test_handler test_case_x || { ec=$?; printf '%s\n' "Test case x failed" >&2; exit $ec; }
Advanced error handling
If you want to take a more comprehensive approach, you can have an error handler:
exit_if_error() {
local exit_code=$1
shift
[[ $exit_code ]] && # do nothing if no error code passed
((exit_code != 0)) && { # do nothing if error code is 0
printf 'ERROR: %s\n' "$#" >&2 # we can use better logging here
exit "$exit_code" # we could also check to make sure
# error code is numeric when passed
}
}
then invoke it after running your test case:
run_test_case test_case_x
exit_if_error $? "Test case x failed"
or
run_test_case test_case_x || exit_if_error $? "Test case x failed"
The advantages of having an error handler like exit_if_error are:
we can standardize all the error handling logic such as logging, printing a stack trace, notification, doing cleanup etc., in one place
by making the error handler get the error code as an argument, we can spare the caller from the clutter of if blocks that test exit codes for errors
if we have a signal handler (using trap), we can invoke the error handler from there
Error handling and logging library
Here is a complete implementation of error handling and logging:
https://github.com/codeforester/base/blob/master/lib/stdlib.sh
Related posts
Error handling in Bash
The 'caller' builtin command on Bash Hackers Wiki
Are there any standard exit status codes in Linux?
BashFAQ/105 - Why doesn't set -e (or set -o errexit, or trap ERR) do what I expected?
Equivalent of __FILE__, __LINE__ in Bash
Is there a TRY CATCH command in Bash
To add a stack trace to the error handler, you may want to look at this post: Trace of executed programs called by a Bash script
Ignoring specific errors in a shell script
Catching error codes in a shell pipe
How do I manage log verbosity inside a shell script?
How to log function name and line number in Bash?
Is double square brackets [[ ]] preferable over single square brackets [ ] in Bash?
There are a couple more ways with which you can approach this problem. Assuming one of your requirement is to run a shell script/function containing a few shell commands and check if the script ran successfully and throw errors in case of failures.
The shell commands in generally rely on exit-codes returned to let the shell know if it was successful or failed due to some unexpected events.
So what you want to do falls upon these two categories
exit on error
exit and clean-up on error
Depending on which one you want to do, there are shell options available to use. For the first case, the shell provides an option with set -e and for the second you could do a trap on EXIT
Should I use exit in my script/function?
Using exit generally enhances readability In certain routines, once you know the answer, you want to exit to the calling routine immediately. If the routine is defined in such a way that it doesn’t require any further cleanup once it detects an error, not exiting immediately means that you have to write more code.
So in cases if you need to do clean-up actions on script to make the termination of the script clean, it is preferred to not to use exit.
Should I use set -e for error on exit?
No!
set -e was an attempt to add "automatic error detection" to the shell. Its goal was to cause the shell to abort any time an error occurred, but it comes with a lot of potential pitfalls for example,
The commands that are part of an if test are immune. In the example, if you expect it to break on the test check on the non-existing directory, it wouldn't, it goes through to the else condition
set -e
f() { test -d nosuchdir && echo no dir; }
f
echo survived
Commands in a pipeline other than the last one, are immune. In the example below, because the most recently executed (rightmost) command's exit code is considered ( cat) and it was successful. This could be avoided by setting by the set -o pipefail option but its still a caveat.
set -e
somecommand that fails | cat -
echo survived
Recommended for use - trap on exit
The verdict is if you want to be able to handle an error instead of blindly exiting, instead of using set -e, use a trap on the ERR pseudo signal.
The ERR trap is not to run code when the shell itself exits with a non-zero error code, but when any command run by that shell that is not part of a condition (like in if cmd, or cmd ||) exits with a non-zero exit status.
The general practice is we define an trap handler to provide additional debug information on which line and what cause the exit. Remember the exit code of the last command that caused the ERR signal would still be available at this point.
cleanup() {
exitcode=$?
printf 'error condition hit\n' 1>&2
printf 'exit code returned: %s\n' "$exitcode"
printf 'the command executing at the time of the error was: %s\n' "$BASH_COMMAND"
printf 'command present on line: %d' "${BASH_LINENO[0]}"
# Some more clean up code can be added here before exiting
exit $exitcode
}
and we just use this handler as below on top of the script that is failing
trap cleanup ERR
Putting this together on a simple script that contained false on line 15, the information you would be getting as
error condition hit
exit code returned: 1
the command executing at the time of the error was: false
command present on line: 15
The trap also provides options irrespective of the error to just run the cleanup on shell completion (e.g. your shell script exits), on signal EXIT. You could also trap on multiple signals at the same time. The list of supported signals to trap on can be found on the trap.1p - Linux manual page
Another thing to notice would be to understand that none of the provided methods work if you are dealing with sub-shells are involved in which case, you might need to add your own error handling.
On a sub-shell with set -e wouldn't work. The false is restricted to the sub-shell and never gets propagated to the parent shell. To do the error handling here, add your own logic to do (false) || false
set -e
(false)
echo survived
The same happens with trap also. The logic below wouldn't work for the reasons mentioned above.
trap 'echo error' ERR
(false)
Here's a simple trap that prints the last argument of whatever failed to STDERR, reports the line it failed on, and exits the script with the line number as the exit code. Note these are not always great ideas, but this demonstrates some creative application you could build on.
trap 'echo >&2 "$_ at $LINENO"; exit $LINENO;' ERR
I put that in a script with a loop to test it. I just check for a hit on some random numbers; you might use actual tests. If I need to bail, I call false (which triggers the trap) with the message I want to throw.
For elaborated functionality, have the trap call a processing function. You can always use a case statement on your arg ($_) if you need to do more cleanup, etc. Assign to a var for a little syntactic sugar -
trap 'echo >&2 "$_ at $LINENO"; exit $LINENO;' ERR
throw=false
raise=false
while :
do x=$(( $RANDOM % 10 ))
case "$x" in
0) $throw "DIVISION BY ZERO" ;;
3) $raise "MAGIC NUMBER" ;;
*) echo got $x ;;
esac
done
Sample output:
# bash tst
got 2
got 8
DIVISION BY ZERO at 6
# echo $?
6
Obviously, you could
runTest1 "Test1 fails" # message not used if it succeeds
Lots of room for design improvement.
The draw backs include the fact that false isn't pretty (thus the sugar), and other things tripping the trap might look a little stupid. Still, I like this method.
You have 2 options: Redirect the output of the script to a file, Introduce a log file in the script and
Redirecting output to a file:
Here you assume that the script outputs all necessary info, including warning and error messages. You can then redirect the output to a file of your choice.
./runTests &> output.log
The above command redirects both the standard output and the error output to your log file.
Using this approach you don't have to introduce a log file in the script, and so the logic is a tiny bit easier.
Introduce a log file to the script:
In your script add a log file either by hard coding it:
logFile='./path/to/log/file.log'
or passing it by a parameter:
logFile="${1}" # This assumes the first parameter to the script is the log file
It's a good idea to add the timestamp at the time of execution to the log file at the top of the script:
date '+%Y%-m%d-%H%M%S' >> "${logFile}"
You can then redirect your error messages to the log file
if [ condition ]; then
echo "Test cases failed!!" >> "${logFile}";
fi
This will append the error to the log file and continue execution. If you want to stop execution when critical errors occur, you can exit the script:
if [ condition ]; then
echo "Test cases failed!!" >> "${logFile}";
# Clean up if needed
exit 1;
fi
Note that exit 1 indicates that the program stop execution due to an unspecified error. You can customize this if you like.
Using this approach you can customize your logs and have a different log file for each component of your script.
If you have a relatively small script or want to execute somebody else's script without modifying it to the first approach is more suitable.
If you always want the log file to be at the same location, this is the better option of the 2. Also if you have created a big script with multiple components then you may want to log each part differently and the second approach is your only option.
I often find it useful to write a function to handle error messages so the code is cleaner overall.
# Usage: die [exit_code] [error message]
die() {
local code=$? now=$(date +%T.%N)
if [ "$1" -ge 0 ] 2>/dev/null; then # assume $1 is an error code if numeric
code="$1"
shift
fi
echo "$0: ERROR at ${now%???}${1:+: $*}" >&2
exit $code
}
This takes the error code from the previous command and uses it as the default error code when exiting the whole script. It also notes the time, with microseconds where supported (GNU date's %N is nanoseconds, which we truncate to microseconds later).
If the first option is zero or a positive integer, it becomes the exit code and we remove it from the list of options. We then report the message to standard error, with the name of the script, the word "ERROR", and the time (we use parameter expansion to truncate nanoseconds to microseconds, or for non-GNU times, to truncate e.g. 12:34:56.%N to 12:34:56). A colon and space are added after the word ERROR, but only when there is a provided error message. Finally, we exit the script using the previously determined exit code, triggering any traps as normal.
Some examples (assume the code lives in script.sh):
if [ condition ]; then die 123 "condition not met"; fi
# exit code 123, message "script.sh: ERROR at 14:58:01.234564: condition not met"
$command |grep -q condition || die 1 "'$command' lacked 'condition'"
# exit code 1, "script.sh: ERROR at 14:58:55.825626: 'foo' lacked 'condition'"
$command || die
# exit code comes from command's, message "script.sh: ERROR at 14:59:15.575089"

wanted to know pid for outcome of script

./abcd.sh #this script is responsible to run a java code for creating a zip file in /tmp/abcd/
Some times abcd.sh script takes 30 seconds to create a zip file and some times it takes 60 seconds to create.
Since i don't have permission to edit abcd.sh file, I wrote this code to get pid for ./abcd.sh, but dont know how to get the pid for its child process.
./abcd.sh &
pid=$!
wait $pid
This code is waiting till the ./abcd.sh executes, but its not waiting till the zip file is complete.
Is there any way where it can wait till the zip file creates? my idea is, if we get to know the pid of zip file creation we can use wait $zipfilepid, but not sure how to get the pid for zip file creation.
.abcd.sh
sleep 60
I know sleep is an alternative for this, but i don't want to wait even if the zip file is created.
If you know the name of the "zip" program, you can find it in the process list and then determine its process ID. Assuming the name of the process you're waiting on is simply "zip", you can use something like this:
PROCNAME="zip"
PROCRUN=`ps h -ef|egrep -e "$PROCNAME"|grep -v " $$ "`
PIDS=`echo "$PROCRUN"|tr -s " "|cut -d" " -f2`
wait $PIDS
This will work for multiple PIDs. you may need to alter the PROCNAME variable's value to better target a specific process, such as:
PROCNAME=" zip "
PROCNAME="zip name-of-file-to-zip"
PROCNAME="zipscriptname"
...etc. Hope this helps.
After doing some research, i came to a conclusion that this approach may not be possible because i dont have control over the java code that creating a zip file and shell script(abcd.sh) that is instantiating to run the java code.
The approach that i took is:
find out whether zip file is still writing/open using /usr/sbin/lsof /path/to/zip/abcd.zip, with this command i will $? 0 if file is writing/open, i get $? 1 if file completed/close.
put $? into while loop and if $? -eq 0 go for sleep else exit the program.
/usr/sbin/lsof /path/to/zip/abcd.zip
rc=$?
while [ $rc -eq 0 ]
do
echo "Zip file is still creating, sleeping for 3 seconds"
sleep 3
done
Please let me know if you have better approach.

Re-run bash script if another instance was invoked

I have a bash script that may be invoked multiple times simultaneously. To protect the state information (saved in a /tmp file) that the script accesses, I am using file locking like this:
do_something()
{
...
}
// Check if there are any other instances of the script; if so exit
exec 8>$LOCK
if ! flock -n -x 8; then
exit 1
fi
// script does something...
do_something
Now any other instance that was invoked when this script was running exits. I want the script to run only one extra time if there were n simultaneous invocations, not n-times, something like this:
do_something()
{
...
}
// Check if there are any other instances of the script; if so exit
exec 8>$LOCK
if ! flock -n -x 8; then
exit 1
fi
// script does something...
do_something
// check if another instance was invoked, if so re-run do_something again
if [ condition ]; then
do_something
fi
How can I go about doing this? Touching a file inside the flock before quitting and having that file as the condition for the second if doesn't seem to work.
Have one flag (lockfile) to signal that a things needs doing, and always set it. Have a separate flag that is unset by the execution part.
REQUEST_FILE=/tmp/please_do_something
LOCK_FILE=/tmp/doing_something
# request running
touch $REQUEST_FILE
# lock and run
if ln -s /proc/$$ $LOCK_FILE 2>/dev/null ; then
while [ -e $REQUEST_FILE ]; do
do_something
rm $REQUEST_FILE
done
rm $LOCK_FILE
fi
If you want to ensure that "do_something" is run exactly once for each time the whole script is run, then you need to create some kind of a queue. The overall structure is similar.
They're not everone's favourite, but I've always been a fan of symbolic links to make lockfiles, since they're atomic. For example:
lockfile=/var/run/`basename $0`.lock
if ! ln -s "pid=$$ when=`date '+%s'` status=$something" "$lockfile"; then
echo "Can't set lock." >&2
exit 1
fi
By encoding useful information directly into the link target, you eliminate the race condition introduced by writing to files.
That said, the link that Dennis posted provides much more useful information that you should probably try to understand before writing much more of your script. My example above is sort of related to BashFAQ/045 which suggests doing a similar thing with mkdir.
If I understand your question correctly, then what you want to do might be achieved (slightly unreliably) by using two lock files. If setting the first lock fails, we try the second lock. If setting the second lock fails, we exit. The error exists if the first lock is delete after we check it but before check the second existant lock. If this level of error is acceptable to you, that's great.
This is untested; but it looks reasonable to me.
#!/usr/local/bin/bash
lockbase="/tmp/test.lock"
setlock() {
if ln -s "pid=$$" "$lockbase".$1 2>/dev/null; then
trap "rm \"$lockbase\".$1" 0 1 2 5 15
else
return 1
fi
}
if setlock 1 || setlock 2; then
echo "I'm in!"
do_something_amazing
else
echo "No lock - aborting."
fi
Please see Process Management.

How can I change the current directory in a thread-safe manner in Perl?

I'm using Thread::Pool::Simple to create a few working threads. Each working thread does some stuff, including a call to chdir followed by an execution of an external Perl script (from the jbrowse genome browser, if it matters). I use capturex to call the external script and die on its failure.
I discovered that when I use more then one thread, things start to be messy. after some research. it seems that the current directory of some threads is not the correct one.
Perhaps chdir propagates between threads (i.e. isn't thread-safe)?
Or perhaps it's something with capturex?
So, how can I safely set the working directory for each thread?
** UPDATE **
Following the suggestions to change dir while executing, I'd like to ask how exactly should I pass these two commands to capturex?
currently I have:
my #args = ( "bin/flatfile-to-json.pl", "--gff=$gff_file", "--tracklabel=$track_label", "--key=$key", #optional_args );
capturex( [0], #args );
How do I add another command to #args?
Will capturex continue die on errors of any of the commands?
I think that you can solve your "how do I chdir in the child before running the command" problem pretty easily by abandoning IPC::System::Simple as not the right tool for the job.
Instead of doing
my $output = capturex($cmd, #args);
do something like:
use autodie qw(open close);
my $pid = open my $fh, '-|';
unless ($pid) { # this is the child
chdir($wherever);
exec($cmd, #args) or exit 255;
}
my $output = do { local $/; <$fh> };
# If child exited with error or couldn't be run, the exception will
# be raised here (via autodie; feel free to replace it with
# your own handling)
close ($fh);
If you were getting a list of lines instead of scalar output from capturex, the only thing that needs to change is the second-to-last line (to my #output = <$fh>;).
More info on forking-open is in perldoc perlipc.
The good thing about this in preference to capture("chdir wherever ; $cmd #args") is that it doesn't give the shell a chance to do bad things to your #args.
Updated code (doesn't capture output)
my $pid = fork;
die "Couldn't fork: $!" unless defined $pid;
unless ($pid) { # this is the child
chdir($wherever);
open STDOUT, ">/dev/null"; # optional: silence subprocess output
open STDERR, ">/dev/null"; # even more optional
exec($cmd, #args) or exit 255;
}
wait;
die "Child error $?" if $?;
I don't think "current working directory" is a per-thread property. I'd expect it to be a property of the process.
It's not clear exactly why you need to use chdir at all though. Can you not launch the external script setting the new process's working directory appropriately instead? That sounds like a more feasible approach.

Resources