I have implemented a file locking mechanism along the lines of the suggestion from the linux man page for "open", which states:
Portable programs that want to perform atomic file locking using a
lockfile, and need to avoid reliance on NFS support for O_EXCL, can
create a unique file on the same file system (e.g., incorporating
hostname and PID), and use link(2) to make a link to the lockfile. If
link(2) returns 0, the lock is successful. Otherwise, use stat(2) on
the unique file to check if its link count has increased to 2, in
which case the lock is also successful.
This seems to work perfectly, however to get 100% code coverage in my testing, I need to cover the case where the link count is increased to 2.
I've tried googling, but all I seem to be able to find is the same reference above regurgitated as "the way it's done".
Can anybody explain to me what set of circumstances would cause the link to fail (returns -1), but the link count is increased to 2?
The answer to your question is provided at the bottom of the link(2) page of the Linux Programmer's Manual:
On NFS file systems, the return code may be wrong in case the NFS
server performs the link creation and dies before it can say so. Use
stat(2) to find out if the link got created.
Creating another file is more trouble than anything. Create a directory instead and check the result of the creation. The Unix manual states that only one task can succeed in creating a directory, the other will get a failure if the directory already exists, including the case where 2 task tried it at the same time. The OS itself handles the issue so you don't have to.
If it wasn't for possible stale locks, that is all you would have to do. However, things happen, programs abort and do not always remove their lock. So the implementation can be a little bit more elaborate.
In a script I have often used the code below. It handles stale locks automatically. You can implement the same in C. Check man page:
man -s 2 mkdir
EXECUTION_CONTROL_FILE: is a name PATH and Dir name, something like /usr/tmp/myAppName
second_of_now: return the current time in seconds (included below)
LOCK_MAX_TIME: is how long in seconds a lock can exists before it is considered stale
sleep 5: It is always assumed that a lock will do something short and sweet. If not, maybe your sleep cycle should be longer.
LockFile() {
L_DIR=${EXECUTION_CONTROL_FILE}.lock
L_DIR2=${EXECUTION_CONTROL_FILE}.lock2
(
L_STATUS=1
L_FILE_COUNT=2
L_COUNT=10
while [ $L_STATUS != 0 ]; do
mkdir $L_DIR 2>/dev/null
L_STATUS=$?
if [ $L_STATUS = 0 ]; then
# Create the timetime stamp file
second_of_now >$L_DIR/timestamp
else
# The directory exists, check how long it has been there
L_NOW=`second_of_now`
L_THEN=`cat $L_DIR/timestamp 2>/dev/null`
# The file does not exist, how many times did this happen?
if [ "$L_THEN" = "" ]; then
if [ $L_FILE_COUNT != 0 ]; then
L_THEN=$L_NOW
L_FILE_COUNT=`expr $L_FILE_COUNT - 1`
else
L_THEN=0
fi
fi
if [ `expr $L_NOW - $L_THEN` -gt $LOCK_MAX_TIME ]; then
# We will try 10 times to unlock, but the 10th time
# we will force the unlock.
UnlockFile $L_COUNT
L_COUNT=`expr $L_COUNT - 1`
else
L_COUNT=10 # Reset this back in case it has gone down
sleep 5
fi
fi
done
)
L_STATUS=$?
return $L_STATUS
}
####
#### Remove access lock
####
UnlockFile() {
U_DIR=${EXECUTION_CONTROL_FILE}.lock
U_DIR2=${EXECUTION_CONTROL_FILE}.lock2
(
# This 'cd' fixes an issue with UNIX which sometimes report this error:
# rm: cannot determine if this is an ancestor of the current working directory
cd `dirname "${EXECUTION_CONTROL_FILE}"`
mkdir $U_DIR2 2>/dev/null
U_STATUS=$?
if [ $U_STATUS != 0 ]; then
if [ "$1" != "0" ]; then
return
fi
fi
trap "rm -rf $U_DIR2" 0
# The directory exists, check how long it has been there
# in case it has just been added again
U_NOW=`second_of_now`
U_THEN=`cat $U_DIR/timestamp 2>/dev/null`
# The file does not exist then we assume it is obsolete
if [ "$U_THEN" = "" ]; then
U_THEN=0
fi
if [ `expr $U_NOW - $U_THEN` -gt $LOCK_MAX_TIME -o "$1" = "mine" ]; then
# Remove lock directory as it is still too old
rm -rf $U_DIR
fi
# Remove this short lock directory
rm -rf $U_DIR2
)
U_STATUS=$?
return $U_STATUS
}
####
second_of_now() {
second_of_day `date "+%y%m%d%H%M%S"`
}
####
#### Return which second of the date/time this is. The parameters must
#### be in the form "yymmddHHMMSS", no centuries for the year and
#### years before 2000 are not supported.
second_of_day() {
year=`printf "$1\n"|cut -c1-2`
year=`expr $year + 0`
month=`printf "$1\n"|cut -c3-4`
day=`printf "$1\n"|cut -c5-6`
day=`expr $day - 1`
hour=`printf "$1\n"|cut -c7-8`
min=`printf "$1\n"|cut -c9-10`
sec=`printf "$1\n"|cut -c11-12`
sec=`expr $min \* 60 + $sec`
sec=`expr $hour \* 3600 + $sec`
sec=`expr $day \* 86400 + $sec`
if [ `expr 20$year % 4` = 0 ]; then
bisex=29
else
bisex=28
fi
mm=1
while [ $mm -lt $month ]; do
case $mm in
4|6|9|11) days=30 ;;
2) days=$bisex ;;
*) days=31 ;;
esac
sec=`expr $days \* 86400 + $sec`
mm=`expr $mm + 1`
done
year=`expr $year + 2000`
while [ $year -gt 2000 ]; do
year=`expr $year - 1`
if [ `expr $year % 4` = 0 ]; then
sec=`expr 31622400 + $sec`
else
sec=`expr 31536000 + $sec`
fi
done
printf "$sec\n"
}
Use like this:
# Make sure that 2 operations don't happen at the same time
LockFile
# Make sure we get rid of our lock if we exit unexpectedly
trap "UnlockFile mine" 0
.
. Do what you have to do
.
# We need to remove the lock
UnlockFile mine
Related
I have DB error log file, it will grow continuously.
Now i want to set some error monitoring on that file for every 5 minutes.
The problem is i don’t want to scan whole file for every 5 minutes(when monitoring cron executed), because it may grow very big in future. Scanning through whole(big) file for every 5 mins will consume bit more resources.
So i just want to scan only the lines which were inserted/written to the log during last 5 mins interval.
Each error recorded in log will have Timestamp prepend to it like below:
180418 23:45:00 [ERROR] mysql got signal 11.
So i want to search with pattern [ERROR] only on lines which were added from last 5 mins(not whole file) and place the output to another file.
Please help me here.
Feel free if u need more clarification on my question.
I’m using RHEL 7 and i’m trying to implement above monitoring through bash shell script
Serializing the Byte Offset
This picks up where the last instance left off. If you run it every 5 minutes, then, it'll scan 5 minutes of data.
Note that this implementation knowingly can scan data added during an invocation's run twice. This is a little sloppy, but it's much safer to scan overlapping data twice than to never read it at all, which is a risk that can be run if relying on cron to run your program on schedule (likewise, sleeps can run over the requested time if the system is busy).
#!/usr/bin/env bash
file=$1; shift # first input: filename
grep_opts=( "$#" ) # remaining inputs: grep options
dir=$(dirname -- "$file") # extract directory name to use for offset storage
basename=${file##*/} # pick up file name w/o directory
size_file="$dir/.$basename.size" # generate filename to use to store offset
if [[ -s $size_file ]]; then # ...if we already have a file with an offset...
old_size=$(<"$size_file") # ...read it from that file
else
old_size=0 # ...otherwise start at the front.
fi
new_size=$(stat --format=%s -- "$file") || exit # Figure out current size
if (( new_size < old_size )); then
old_size=0 # file was truncated, so we can't trust old_size
elif (( new_size == old_size )); then
exit 0 # no new contents, so no point in trying to search
fi
# read starting at old_size and grep only that content
dd iflag=skip_bytes skip="$old_size" if="$file" | grep "${grep_opts[#]}"; grep_retval=$?
# if the read failed, don't store an updated offset
(( ${PIPESTATUS[0]} != 0 )) && exit 1
# create a new tempfile to store offset in
tempfile=$(mktemp -- "${size_file}.XXXXXX") || exit
# write to that temporary file...
printf '%s\n' "$new_size" > "$tempfile" || { rm -f "$tempfile"; exit 1; }
# ...and if that write succeeded, overwrite the last place where we serialized output.
mv -- "$tempfile" "$new_size" || exit
exit "$grep_retval"
Alternate Mode: Bisect For The Timestamp
Note that this can miss content if you're relying on, say, cron to invoke your code every 5 minutes on-the-dot; storing byte offsets can thus be more accurate.
Using the bsearch tool by Ole Tange:
#!/usr/bin/env bash
file=$1; shift
start_date=$(date -d 'now - 5 minutes' '+%y%m%d %H:%M:%S')
byte_offset=$(bsearch --byte-offset "$file" "$start_date")
dd iflag=skip_bytes skip="$byte_offset" if="$file" | grep "$#"
Another approach could be something like this:
DB_FILE="FULL_PATH_TO_YOUR_DB_FILE"
current_db_size=$(du -b "$DB_FILE" | cut -f 1)
if [[ ! -a SOME_PATH_OF_YOUR_CHOICE/last_size_db_file ]] ; then
tail --bytes $current_db_size $DB_FILE > SOME_PATH_OF_YOUR_CHOICE/log-file_$(date +%Y-%m-%d_%H-%M-%S)
else
if [[ $(cat last_size_db_file) -gt $current_db_size ]] ; then
previously_readed_bytes=0
else
previously_readed_bytes=$(cat last_size_db_file)
fi
new_bytes=$(($current_db_size - $previously_readed_bytes))
tail --bytes $new_bytes $DB_FILE > SOME_PATH_OF_YOUR_CHOICE/log-file_$(date +%Y-%m-%d_%H-%M-%S)
fi
printf $current_db_size > SOME_PATH_OF_YOUR_CHOICE/last_size_db_file
this prints all bytes of DB_FILE not previously printed to SOME_PATH_OF_YOUR_CHOICE/log-file_$(date +%Y-%m-%d_%H-%M-%S)
Note that $(date +%Y-%m-%d_%H-%M-%S) will be the current 'full' date at the time of creating the log file
you can make this an script, and use cron to execute that script every five minutes; something like this:
*/5 * * * * PATH_TO_YOUR_SCRIPT
Here is my approach:
First, read the whole log once so far.
If you reach the end, collect and read new lines for a timespan (in my example 9 seconds, for faster testing, while my dummy server appends to the logfile every 3 seconds).
After the timespan, echo the cache, clear the cache (an array arr), loop and sleep for some time, so that this process doesn't consume all CPU time.
First, my dummy logfile writer:
#!/bin/bash
#
# dummy logfile writer
#
while true
do
s=$(( $(date +%s) % 3600))
echo $s server msg
sleep 3
done >> seconds.log
Startet via ./seconds-out.sh &.
Now the more complicated part:
#!/bin/bash
#
# consume a logfile as written so far. Then, collect every new line
# and show it in an interval of $interval
#
interval=9 # 9 seconds
#
printf -v secnow '%(%s)T' -1
start=$(( secnow % (3600*24*365) ))
declare -a arr
init=false
while true
do
read line
printf -v secnow '%(%s)T' -1
now=$(( secnow % (3600*24*365) ))
# consume every line created in the past
if (( ! init ))
then
# assume reading a line might not take longer than a second (rounded to whole seconds)
while (( ${#line} > 0 && (now - start) < 2 ))
do
read line
start=$now
echo -n "." # for debugging purpose, remove
printf -v secnow '%(%s)T' -1
now=$(( secnow % (3600*24*365) ))
done
init=1
echo "init=$init" # for debugging purpose, remove
# collect new lines, display them every $interval seconds
else
if ((${#line} > 0 ))
then
echo -n "-" # for debugging purpose, remove
arr+=("read: $line \n")
fi
if (( (now - start) > interval ))
then
echo -e "${arr[#]]}"
arr=()
start=$now
fi
fi
sleep .1
done < seconds.log
Output with logfile generator in 3 seconds, running for some time, then starting the read-seconds.sh script, with debugging output activated:
./read-seconds.sh
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................init=1
---read: 1688 server msg
read: 1691 server msg
read: 1694 server msg
---read: 1697 server msg
read: 1700 server msg
read: 1703 server msg
----read: 1706 server msg
read: 1709 server msg
read: 1712 server msg
read: 1715 server msg
^C
Every dot represents a logfile line from the past and therefor skipped.
Every dash represents a logfile line collected.
This is a shortened-version of a script for reading 8mm tapes from a EXB-8500 with an autoloader (only 10 tapes at a time maximum) attached. It dd's in tape data (straight binary) and saves it to files that are named after the tape's 4-digit number (exmaple D1002.dat) in both our main storage and our backup. During this time it's logging info and displaying its status in the terminal so we can see how far along it is.
#!/bin/bash
echo "Please enter number of tapes: [int]"
read i
j=1
until [ $i -lt $j ]
do
echo "What is the number of tape $j ?"
read Tape_$j
(( j += 1 ))
done
echo "Load tapes into the tower and press return when the drive is ready"
read a
j=1
until [ $i -lt $j ]
do
k="Tape_$j"
echo "tower1 $j D$(($k)) `date` Begin"
BEG=$j" "D$(($k))" "`date`" ""Begin"
echo "tower1 $j D$(($k)) `date` End"
END=$j" "D$(($k))" "`date`" ""End"
echo "$BEG $END"
echo "$BEG $END"
sleep 2
(( j += 1 ))
done
echo "tower1 done"
Everything was hunky-dory until we got under 1000 (startig at 0999). Error code was ./tower1: 0999: Value too great for base (error token is "0999"). Now I already realize that this is because the script is forcing octal values when I type in the leading 0, and I know I should insert a 10# somewhere in the script, but the question is: Where?
Also is there a way for me to just define Tape_$j as a string? I feel like that would clear up a lot of these problems
To get the error, run the script, define however many tapes you want (at least one, lol), and insert a leading 0 into the name of the tape
EXAMPLE:
./test
Please enter number of tapes: [int]
1
What is the number of tape 1?
0999
./test: 0999: Value too great for base (error token is "0999")
You don't want to use $k as a number, but as a string. You used the numeric expression to evaluate a variable value as a variable name. That's very bad practice.
Fortunately, you can use variable indirection in bash to achieve your goal. No numbers involved, no error thrown.
echo "tower1 $j ${!k} `date` Begin"
BEG=$j" "D${!k}" "`date`" ""Begin"
And similarly in other places.
I have the following bash function that generate an epoch list in iso-8601 format on a machine that runs ubuntu and it works fine. (where isdate and isint bash functions to test the input)
gen_epoch()
{
## USAGE: gen_epoch [start_date_iso] [end_date_iso] [increment_in_seconds]
##
## TASK : generate an epoch list (epoch list in isodate format).
## result on STDOUT: [epoch_list]
## error_code : 2 0
## test argument
if [ "$#" -ne 3 ]; then echo "$FUNCNAME: input error [nb_of_input]"; return 2
elif [ $( isdate $1 &> /dev/null; echo $? ) -eq 2 ]; then echo "$FUNCNAME: argument error [$1]"; return 2
elif [ $( isdate $2 &> /dev/null; echo $? ) -eq 2 ]; then echo "$FUNCNAME: argument error [$2]"; return 2
elif [ $( isint $3 &> /dev/null; echo $? ) -eq 2 ]; then echo "$FUNCNAME: argument error [$3]"; return 2
else local beg=$( TZ=UTC date --date="$1" +%s ); local end=$( TZ=UTC date --date="$2" +%s ); local inc=$3; fi
## generate epoch
while [ $beg -le $end ]
do
local date_out=$( TZ=UTC date --date="UTC 1970-01-01 $beg secs" --iso-8601=seconds ); beg=$(( $beg + $inc ))
echo ${date_out%+*}
done
}
It generates the expected values for this command line example:
gen_epoch 2014-04-01T00:00:00 2014-04-01T07:00:00 3600
expected values:
2014-04-01T00:00:00
2014-04-01T01:00:00
2014-04-01T02:00:00
2014-04-01T03:00:00
2014-04-01T04:00:00
2014-04-01T05:00:00
2014-04-01T06:00:00
2014-04-01T07:00:00
However i have tried this function on a server where i have no root privileges and i have found the following results:
2014-03-31T17:00:00
2014-03-31T18:00:00
2014-03-31T19:00:00
2014-03-31T20:00:00
2014-03-31T21:00:00
2014-03-31T22:00:00
2014-03-31T23:00:00
2014-04-01T00:00:00
and i have seen that the server time origin is not at 1970-01-01T00:00:00.
typing TZ=UTC date --date="1970-01-01T00:00:00" +%s command gives the value of -25200 which corresponds to a 7 hours lag while it should give 0.
My question is how this problem could be corrected on the server?
Could You help me to find an equivalent solution for this function assuming that i don't know on which machine i am running it, so i have know apriori knowledge if the system time is correct or not?
Not a complete answer but too long for a comment.
I guess that this particular server is incorrectly configured upon setup. The problem is that BIOS clocks are set to localtime time, while the systems thinks it's in UTC (or vise versa) (use hwclock to query hardware clocks settings).
If the system is configured incorrectly and you can't fix it for any reason (don't have superuser account or whatever), I'd suggest to provide a "fixing timezone description file with your software and specify it in TZ variable like this: TZ=:/path/to/fixing/timezone date --date="1970-01-01T00:00:00" +%s. Obviously you have to pre-calculate which TZ description file fixes the problem and use a proper one. Usually available timezones are stored in /usr/share/zoneinfo
I know how to check the status of the previously executed command using $?, and we can make that status using exit command. But for the loops in bash are always returning a status 0 and is there any way I can break the loop with some status.
#!/bin/bash
while true
do
if [ -f "/test" ] ; then
break ### Here I would like to exit with some status
fi
done
echo $? ## Here I want to check the status.
The status of the loop is the status of the last command that executes. You can use break to break out of the loop, but if the break is successful, then the status of the loop will be 0. However, you can use a subshell and exit instead of breaking. In other words:
for i in foo bar; do echo $i; false; break; done; echo $? # The loop succeeds
( for i in foo bar; do echo $i; false; exit; done ); echo $? # The loop fails
You could also put the loop in a function and return a value from it. eg:
in() { local c="$1"; shift; for i; do test "$i" = "$c" && return 0; done; return 1; }
Something like this?
while true; do
case $RANDOM in *0) exit 27 ;; esac
done
Or like this?
rc=0
for file in *; do
grep fnord "$file" || rc=$?
done
exit $rc
The real question is to decide whether the exit code of the loop should be success or failure if one iteration fails. There are scenarios where one make more sense than the other, and other where it's not at all clear cut.
The bash manual says:
while list-1; do list-2; done
until list-1; do list-2; done
[..]The exit status of the while and until commands is the exit status
of the last command executed in list-2, or zero if none was executed.[..]
The last command that is executed inside the loop is break. And the exit value of break is 0 (see: help break).
This is why your program keeps exiting with 0.
The break builtin for bash does allow you to accomplish what you are doing, just break with a negative value and the status returned by $? will be 1:
while true
do
if [ -f "./test" ] ; then
break -1
fi
done
echo $? ## You'll get 1 here..
Note, this is documented in the help for the break builtin:
help break
break: break [n] Exit for, while, or until loops.
Exit a FOR, WHILE or UNTIL loop. If N is specified, break N enclosing
loops.
Exit Status: The exit status is 0 unless N is not greater than or
equal to 1.
You can break out of n number of loops or send a negative value for breaking with a non zero return, ie, 1
I agree with #hagello as one option doing a sleep and changing the loop:
#!/bin/bash
timeout=120
waittime=0
sleepinterval=3
until [[ -f "./test" || ($waittime -eq $timeout) ]]
do
$(sleep $sleepinterval)
waittime=$((waittime + sleepinterval))
echo "waittime is $waittime"
done
if [ $waittime -lt $sleepinterval ]; then
echo "file already exists"
elif [ $waittime -lt $timeout ]; then
echo "waited between $((waittime-3)) and $waittime seconds for this to finish..."
else
echo "operation timed out..."
fi
I think what you should be asking is: How can I wait until a file or a directory (/test) gets created by another process?
What you are doing up to now is polling with full power. Your loop will allocate up to 100% of the processing power of one core. The keyword is "polling", which is ethically wrong by the standards of computer scientists.
There are two remedies:
insert a sleep statement in your loop; advantage: very simple; disadvantage: the delay will be an arbitrary trade-off between CPU load and responsiveness. ("Arbitrary" is ethically wrong, too).
use a notification mechanism like inotify (see: man inotify); advantage: no CPU load, great responsiveness, no delays, no arbitrary constants in your code; disadvantage: inotify is a kernel API – you need some code to access it: inotify-tools or some C/Perl/Python code. Have a look at inotify and bash!
I would like to submit an alternative solution which is simpler and I think more elegant:
(while true
do
if [ -f "test" ] ; then
break
fi
done
Of course of this is part of a script then you could user return 1 instead of exit 1
exit 1
)
echo "Exit status is: $?"
Git 2.27 (Q2 2020), offers a good illustration of the exit status in a loop, here within the context of aborting a failing test early (e.g. by exiting a loop), which is to say "return 1".
See commit 7cc112d (27 Mar 2020) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit b07c721, 28 Apr 2020)
t/README: suggest how to leave test early with failure
Helped-by: Jeff King
Over time, we added the support to our test framework to make it easy to leave a test early with failure, but it was not clearly documented in t/README to help developers writing new tests.
The documentation now includes:
Be careful when you loop
You may need to verify multiple things in a loop, but the following does not work correctly:
test_expect_success 'test three things' '
for i in one two three
do
test_something "$i"
done &&
test_something_else
'
Because the status of the loop itself is the exit status of the test_something in the last round, the loop does not fail when "test_something" for "one" or "two" fails.
This is not what you want.
Instead, you can break out of the loop immediately when you see a failure.
Because all test_expect_* snippets are executed inside a function, "return 1" can be used to fail the test immediately upon a failure:
test_expect_success 'test three things' '
for i in one two three
do
test_something "$i" || return 1
done &&
test_something_else
'
Note that we still &&-chain the loop to propagate failures from earlier commands.
Use artificial exit code 🙂
Before breaking the loop set a variable then check the variable as the status code of the loop, like this:
while true; do
if [ -f "/test" ] ; then
{broken=1 && break; };
fi
done
echo $broken #check the status with [[ -n $broken ]]
I have one requirement in my automation.
I need to pass values like 1, 2, 3 to MY_IMAGE in command line in Linux.
I had defines activities for all these inputs in other make file.
The code similar to below i wrote for my requirement. Issue was whenever I passes values like MY_IMAGE=1, MY_IMAGE=2, MY_IMAGE=3
it's printing only echo ACT_DO=XYZ;
It's not displaying the other info whenever I selected 2 or 3. Can anyone check and correct my code.
export MY_IMAGE
MY_IMAGE=$img_value;
if [ $img_value :="1" ]
then
echo ACT_DO=XYZ;
else
if [ $img_value :="2 ]
then
echo ACT_DO=ABC;
else
if [ $img_value :=3 ]
then
echo ACT_DO=ETC;
else
echo ""$img_value" is unsupported";
exit 1;
fi
fi
fi
Your code has a quote in the wrong place, and uses := which doesn't mean anything, as far as I know. It's also implemented confusingly.
Try this:
export MY_IMAGE
MY_IMAGE=$img_value
case "$img_value" in
1 ) echo ACT_DO=XYZ ;;
2 ) echo ACT_DO=ABC ;;
3 ) echo ACT_DO=ETC ;;
* )
echo "\"$img_value\" is unsupported"
exit 1
;;
esac
The first two lines are not required for this code, but I presume you wanted that for something else.