How to restrict the number of copies of a script can run a certain block of code? - semaphore

I use lockfile to ensure a piece of code can only be run one at a time.
If I want to allow n copies of the same code run at a time, how to do it? Thanks.
if lockfile -1 lockfile.txt
then
# something
rm -f lockfile.txt
else
echo error
exit
fi

gnu sem looks like a viable option: https://www.gnu.org/software/parallel/sem.html

Related

Linux script variables to SCP and delete files

I am looking to set up a script to do the following:
1st: SCP a directory on the first day of month to another server
2nd: Delete the directory after successful transfer
The directory I need to move will always have a different name, and the lowest numbered one is always the one that needs to move:
2018/files/02/
2018/files/03/
So what im looking to write up is something like:
scp /2018/files/% user#host:/backups/2018/files/
{where % = lowest num} &&
rm -rf /2018/files/%
{where % = lowest num} &&
exit
Thanks for any advice
If you are open to using Ruby, you could accomplish it with something like this:
def file_number(filespec)
filespect.split('/').last.to_i
end
directories = Dir['/2018/files'].select { |f| File.directory?(f) }
sorted_dirs = directories.sort_by do |dir1, dir2|
file_number(dir1) <=> file_number(dir1)
end
dir_to_copy = sorted_dirs.first
destination_dir = File.join('/', 'backups', dir_to_copy)
`scp #{dir_to_copy} user#host:#{destination_dir}`
`rm -rf #{dir_to_copy}`
I have not tested this, but if you have any problems, let me know what they are and I can work through it with you.
While using shell scripting eliminates the need for the Ruby interpreter, to me the code is not nearly as straightforward.
In very large directory lists (maybe 10,000's?) the sort might be intolerably slow, and another method would be needed to optimize for speed.
I would caution you against doing an unconditional rm -rf after the backup -- that seems really risky to me.
The big challenge here is to actually find the right files to copy, and shudder, delete. So let us call that step 0.
Let's start with some boiler plate
sourceD=/2018/files/
targetD=/backups/2018/files/
And a little assertion, which bails out from the script if $1 does not equate to a directory.
assert_directory() { (cd ${1:?directory name}) || exit; }
step 0: Identify directory:
assert_directory $sourceD
to_be_archived=$(
# source must be two characters, hence "??"
# source must a directory, hence trailing "/"
# set -- sorts its arguments
# First match must be our source
set -- $sourceD/??/ &&
assert_directory "$1"
echo ${1:?nothing found}
) || exit
This is only a couple of lines of condensed code. Note that this may
cause trouble if you (accidentally) run this multiple times in a row.
Step 1, Copy files now appears to be the easy part.
scp -r ${to_be_archived:?} user#host:${targetD:?}
This is a simple method for copying files, but also slow and risky.
Lookup rsync over ssh for alternatives.
Step 2, Remove
The rm -fr line will do the job, but I won't include that here.
We are missing an essential step, as we need to make sure that our
files have arrived safely. Again, rsync has options for that.
In summary:
assert_directory() { (cd ${1:?directory name}) || exit; }
assert_directory $sourceD
to_be_archived=$(
set -- $sourceD/??/ &&
assert_directory "$1"
echo ${1:?nothing found}
) || exit
This will give you the first two-character name directory (if one exists) in sourceD or abort the running script. It will break if $sourceD contains spaces.

How do I rerun a bash script skipping over lines which have previously run sucesfully?

I have a bash script which acts as a wrapper for an analysis pipeline. If the script errors out I want to be able to run the script from the point at which the errors occurred by simply re-running the original command. I have set two different traps; one which will remove the last file being generated on a non-zero exit from my script, the other will remove all the temporary files on exit signal = 0 and essentially cleans up the file system at the end of the run. I turned on noclobber in the bash environment which allows my script to skip over lines of the script where files have already been written but this will only do this if I do not set the non-zero exit trap. As soon as I set this trap then it will exit at the first line where noclobber IDs a file it will not overwrite. Is there a way for me to skip over lines of code that have successfully run previously rather than having to re-run my code from the start? I know I could use conditional statements for each line but I thought there might be a neater way of doing this.
set -o noclobber
# Function to clean up temporary folders when script exits at the end
rmfile() { rm -r $1 }
# Function to remove the file being currently generated
# Function executed if script errors out
rmlast() {
if [ ! -z "$CURRENTFILE" ]
then
rm -r $1
exit 1
fi }
# Trap to remove the currently generated file
trap 'rmlast "$CURRENTFILE"' ERR SIGINT
#Make temporary directory if it has not been created in a previous run
TEMPDIR=$(find . -name "tmp*")
if [ -z "$TEMPDIR" ]
then
TEMPDIR=$(mktemp -d /test/tmpXXX)
fi
# Set CURRENTFILE variable
CURRENTFILE="${TEMPDIR}/Variants.vcf"
# Set CURRENTFILE variable
complexanalysis_tool input_file > $CURRENTFILE
# Set CURRENTFILE variable
CURRENTFILE="${TEMPDIR}/Filtered.vcf"
complexanalysis_tool2 input_file2 > $CURRENTFILE
CURRENTFILE="${TEMPDIR}/Filtered_2.vcf"
complexanalysis_tool3 input_file3 > $CURRENTFILE
# Move files to final destination folder
mv -nv $TEMPDIR/*.vcf /test/newdest/
# Trap to remove temporary folders when script finishes running
trap 'rmfile "$TEMPDIR"' 0
Update:
I have been offered answers suggesting the use of the make utility. I want to make use of its inbuilt utility to check if a dependency has been fulfilled.
In my hands the makefile suggested by VK Kashyap does not seem to skip execution for previously accomplished tasks. So for example I ran the script above and interrupted the script when it was running filtered.vcf with ctrl c. When I rerun the script again it runs from the beginning again i.e. starts again at varaints.vcf. Am I missing something in order to get the makefile to show sources as being fullfilled?
Answer to update:
OK this is a rookie mistake but since I am not familiar with generating makefiles I will post this explanation of my error. The reason my makefile was not rerunning from the exit point was that I had named the targets a different name to the output files being generated. So as VK Kashyap quite correctly answered if you name the targets eg.
variants.vcf
filtered.vcf
filtered2.vcf
the same as the output files being generated then the script will skip previously accomplished tasks.
make utility might be an answer for the thing you want to achive.
it has inbuilt dependecy checking (the stuff which you are trying to achive with tmp files)
#run all target when all of the files are available
all: variants.vcf filtered.vcf filtered2.vcf
mv -nv $(TEMPDIR)/*.vcf /test/newdest/
variants.vcf:
complexanalysis_tool input_file > variants.vcf
filtered.vcf:
complexanalysis_tool2 input_file2 > filtered.vcf
filtered2.vcf:
complexanalysis_tool3 input_file3 > filtered2.vcf
you may use bash script to invoke this make file as:
#/bin/bash
export TEMPDIR=xyz
make -C $TEMPDIR all
make utility will check itself for already accomplished task and skip execution for done stuffs. it will continue where you had the error finishing the task.
you can find more details on internet about exact syntax for makefile.
there is no built-in way to do that.
however, you could brew something like that by keeping track of the last successful line and building your own goto statement, as described here and in Is there a "goto" statement in bash? (just replace the 'labels' with actual line-numbers).
however, the question is whether this is really a smart idea.
a better way is to only run the commands needed, not the commands not-yet-executed.
this could be done either by explicit conditionals in your bash-script:
produce_if_missing() {
# check if first argument is existing
# if not run the rest of the arguments and pipe it into the first one
local curfile=$1
shift
if [ ! -e "${curfile}" ]; then
$# > "${curfile}"
fi
}
produce_if_missing Variants.vcf complexanalysis_tool input_file
produce_if_missing Filtered.vcf complexanalysis_tool2 input_file2
or using tools that are made for such things (see VK Kahyap's answer using make, though i prefer using variables in the make-rules to minimize typos):
Variants.vcf: input_file
complexanalysis_tool $^ > $#
Filtered.vcf: input_file
complexanalysis_tool2 $^ > $#

One liner to append a file into another file but only if it hasn't already been added

I have an automated process that has a number of lines like the following pattern:
sudo cat /some/path/to/a/file >> /some/other/file
I'd like to transform that into a one liner that will only append to /some/other/file if /some/path/to/a/file has not already been added.
Edit
It's clear I need some examples here.
example 1: Updating a .bashrc script for a specific login
example 2: Creating a .screenrc for different logins
example 3: Appending to the end of a /etc/ config file
Some other caveats. The text is going to be added in a block (>>). Consequently, it should be relatively straight forward to see if the entire code block is added or not near the end of a file. I am trying to come up with a simple method for determining whether or not the file has already been appended to the original.
Thanks!
Example python script...
def check_for_appended(new_file, original_file):
""" Checks original_file to see if it has the contents of new_file """
new_lines = reversed(new_file.split("\n"))
original_lines = reversed(original_file.split("\n"))
appended = None
for new_line, orig_line in zip(new_lines, original_lines):
if new_line != orig_line:
appended = False
break
else:
appended = True
return appended
Maybe this will get you started - this GNU awk script:
gawk -v RS='^$' 'NR==FNR{f1=$0;next} {print (index($0,f1) ? "present" : "absent")}' file1 file2
will tell you if the contents of "file1" are present in "file2". It cannot tell you why, e.g. because you previously concatenated file1 onto the end of file2.
Is that all you need? If not update your question to clarify/explain.
Here's a technique to see if a file contains another file
contains_file_in_file() {
local small=$1
local big=$2
awk -v RS="" '{small=$0; getline; exit !index($0, small)}' "$small" "$big"
}
if ! contains_file_in_file /some/path/to/a/file /some/other/file; then
sudo cat /some/path/to/a/file >> /some/other/file
fi
EDIT: Op just told me in the comments that the files he wants to concatenate are bash scripts -- this brings us back to the good ole C preprocessor include guard tactics:
prepend every file with
if [ -z "$__<filename>__" ]; then __<filename>__=1; else
(of course replacing <filename> with the name of the file) and at the end
fi
this way, you surround the script in each file with a test for something that's only true once.
Does this work for you?
sudo (set -o noclobber; date > /tmp/testfile)
noclobber prevents overwriting an existing file.
I think it doesn't, since you wrote you want to append something but this technique might help.
When the appending all occurs in one script, then use a flag:
if [ -z "${appended_the_file}" ]; then
cat /some/path/to/a/file >> /some/other/file
appended_the_file="Yes I have done it except for permission/right issues"
fi
I would continue into writing a function appendOnce { .. }, with the content above. If you really want an ugly oneliner (ugly: pain for the eye and colleague):
test -z "${ugly}" && cat /some/path/to/a/file >> /some/other/file && ugly="dirt"
Combining this with sudo:
test -z "${ugly}" && sudo "cat /some/path/to/a/file >> /some/other/file" && ugly="dirt"
It appears that what you want is a collection of script segments which can be run as a unit. Your approach -- making them into a single file -- is hard to maintain and subject to a variety of race conditions, making its implementation tricky.
A far simpler approach, similar to that used by most modern Linux distributions, is to create a directory of scripts, say ~/.bashrc.d and keep each chunk as an individual file in that directory.
The driver (which replaces the concatenation of all those files) just runs the scripts in the directory one at a time:
if [[ -d ~/.bashrc.d ]]; then
for f in ~/.bashrc.d/*; do
if [[ -f "$f" ]]; then
source "$f"
fi
done
fi
To add a file from a skeleton directory, just make a new symlink.
add_fragment() {
if [[ -f "$FRAGMENT_SKELETON/$1" ]]; then
# The following will silently fail if the symlink already
# exists. If you wanted to report that, you could add || echo...
ln -s "$FRAGMENT_SKELETON/$1" "~/.bashrc.d/$1" 2>>/dev/null
else
echo "Not a valid fragment name: '$1'"
exit 1
fi
}
Of course, it is possible to effectively index the files by contents rather than by name. But in most cases, indexing by name will work better, because it is robust against editing the script fragment. If you used content checks (md5sum, for example), you would run the risk of having an old and a new version of the same fragment, both active, and without an obvious way to remove the old one.
But it should be straight-forward to adapt the above structure to whatever requirements and constraints you might have.
For example, if symlinks are not possible (because the skeleton and the instance do not share a filesystem, for example), then you can copy the files instead. You might want to avoid the copy if the file is already present and has the same content, but that's just for efficiency and it might not be very important if the script fragments are small. Alternatively, you could use rsync to keep the skeleton and the instance(s) in sync with each other; that would be a very reliable and low-maintenance solution.

Incorrect exit status of wrapper script in KSH

We have a wrapper script for Teradata TPT utility. The wrapper script is pretty straightforward but the problem is that the exit status of the wrapper is not the same as that of the utility. In many cases, the script returns 0 even if the utility fails. I have saved the exit status in a separate variable because some steps need to be done before exiting but exiting with this variable's value doesn't seem to work. Or is the utility returning status 0 even in case of some failures even if the logs clearly specify some other status?
The worse part is, this behavior is quite random, sometimes the script does fail with the exit status of the utility. I want to be sure if there is some problem with utility's exit status.
The script runs through KSH. The final part of the wrapper script is:
tbuild -f $sql.tmp -j ${id}_$JOB >$out 2>&1
ret_code=$?
cd ${TWB_ROOT}/logs
logpath=`ls -t ${TWB_ROOT}/logs/${id}_${JOB}*.out |head -1`
logpath1=${logpath##*/}
logname=${logpath1%-*}
tlogview -l ${logpath} > /edw/$GROUP/tnl/jobs/$JOB/logs/tpt_logs/${logname}.log
###Mainting 3 tpt binary log files
if [ $ret_code -eq 0 ]
then
binout=$TPTLOGDIR/${logname}.dat
binout1=$TPTLOGDIR/${logname}.dat1
binout2=$TPTLOGDIR/${logname}.dat2
[ -f $binout1 ] && mv $binout1 $binout2
[ -f $binout ] && mv $binout $binout1
mv "$logpath" "/edw/${GROUP}/tnl/jobs/$JOB/logs/tpt_logs/${logname}.dat"
fi
rm -f $sql.tmp
echo ".exit"
exit $ret_code
Thanks in advance for the help and suggestions.
The script looks ok, and should indeed return the same exit code as the tbuild utility.
It comes down to knowledge of the specific product.
I've never worked with any of these products, but Teradata has an ample User Guide for the Parallel Transporter, with an explicit Post-Job Considerations section, warning:
Even if the job completed successfully, action may still be required based on error and warning information in the job logs and error tables.
So technically, a job might complete, but results may vary from time to time.
I guess you have to define your own policies and scan the logfiles for patterns of warnings and error messages, and then generate your own exit codes for semantic failures. Tools like logstash or splunk might come in handy.
BTW, you might consider using logrotate for rotating the $TPTLOGDIR/${logname}.dat files.
Turns out that the issue was in the utility itself, as suspected. Shell script worked fine.

Re-run bash script if another instance was invoked

I have a bash script that may be invoked multiple times simultaneously. To protect the state information (saved in a /tmp file) that the script accesses, I am using file locking like this:
do_something()
{
...
}
// Check if there are any other instances of the script; if so exit
exec 8>$LOCK
if ! flock -n -x 8; then
exit 1
fi
// script does something...
do_something
Now any other instance that was invoked when this script was running exits. I want the script to run only one extra time if there were n simultaneous invocations, not n-times, something like this:
do_something()
{
...
}
// Check if there are any other instances of the script; if so exit
exec 8>$LOCK
if ! flock -n -x 8; then
exit 1
fi
// script does something...
do_something
// check if another instance was invoked, if so re-run do_something again
if [ condition ]; then
do_something
fi
How can I go about doing this? Touching a file inside the flock before quitting and having that file as the condition for the second if doesn't seem to work.
Have one flag (lockfile) to signal that a things needs doing, and always set it. Have a separate flag that is unset by the execution part.
REQUEST_FILE=/tmp/please_do_something
LOCK_FILE=/tmp/doing_something
# request running
touch $REQUEST_FILE
# lock and run
if ln -s /proc/$$ $LOCK_FILE 2>/dev/null ; then
while [ -e $REQUEST_FILE ]; do
do_something
rm $REQUEST_FILE
done
rm $LOCK_FILE
fi
If you want to ensure that "do_something" is run exactly once for each time the whole script is run, then you need to create some kind of a queue. The overall structure is similar.
They're not everone's favourite, but I've always been a fan of symbolic links to make lockfiles, since they're atomic. For example:
lockfile=/var/run/`basename $0`.lock
if ! ln -s "pid=$$ when=`date '+%s'` status=$something" "$lockfile"; then
echo "Can't set lock." >&2
exit 1
fi
By encoding useful information directly into the link target, you eliminate the race condition introduced by writing to files.
That said, the link that Dennis posted provides much more useful information that you should probably try to understand before writing much more of your script. My example above is sort of related to BashFAQ/045 which suggests doing a similar thing with mkdir.
If I understand your question correctly, then what you want to do might be achieved (slightly unreliably) by using two lock files. If setting the first lock fails, we try the second lock. If setting the second lock fails, we exit. The error exists if the first lock is delete after we check it but before check the second existant lock. If this level of error is acceptable to you, that's great.
This is untested; but it looks reasonable to me.
#!/usr/local/bin/bash
lockbase="/tmp/test.lock"
setlock() {
if ln -s "pid=$$" "$lockbase".$1 2>/dev/null; then
trap "rm \"$lockbase\".$1" 0 1 2 5 15
else
return 1
fi
}
if setlock 1 || setlock 2; then
echo "I'm in!"
do_something_amazing
else
echo "No lock - aborting."
fi
Please see Process Management.

Resources