A modified version of a shell script converts an audio file from FLAC to MP3 format. The computer has a quad-core CPU. The script is run using:
./flac2mp3.sh $(find flac -type f)
This converts the FLAC files in the flac directory (no spaces in file names) to MP3 files in the mp3 directory (at the same level as flac). If the destination MP3 file already exists, the script skips the file.
The problem is that sometimes two instances of the script check for the existence of the same MP3 file at nearly the same time, resulting in mangled MP3 files.
How would you run the script multiple times (i.e., once per core), without having to specify a different file set on each command-line, and without overwriting work?
Update - Minimal Race Condition
The script uses the following locking mechanism:
# Convert FLAC to MP3 using tags from flac file.
#
if [ ! -e $FLAC.lock ]; then
touch $FLAC.lock
flac -dc "$FLAC" | lame${lame_opts} \
--tt "$TITLE" \
--tn "$TRACKNUMBER" \
--tg "$GENRE" \
--ty "$DATE" \
--ta "$ARTIST" \
--tl "$ALBUM" \
--add-id3v2 \
- "$MP3"
rm $FLAC.lock
fi;
However, this still leaves a race condition.
The "lockfile" command provides what you're trying to do for shell scripts without the race condition. The command was written by the procmail folks specifically for this sort of purpose and is available on most BSD/Linux systems (as procmail is available for most environments).
Your test becomes something like this:
lockfile -r 3 $FLAC.lock
if test $? -eq 0 ; then
flac -dc "$FLAC" | lame${lame_opts} \
--tt "$TITLE" \
--tn "$TRACKNUMBER" \
--tg "$GENRE" \
--ty "$DATE" \
--ta "$ARTIST" \
--tl "$ALBUM" \
--add-id3v2 \
- "$MP3"
fi
rm -f $FLAC.lock
Alternatively, you could make lockfile keep retrying indefinitely so you don't need to test the return code, and instead can test for the output file for determining whether to run flac.
If you don't have lockfile and cannot install it (in any of its versions - there are several implementations) a robust and portable atomic mutex is mkdir.
If the directory you attempt to create already exists, mkdir will fail, so you can check for that; when creation succeeds, you have a guarantee that no other cooperating process is in the critical section at the same time as your code.
if mkdir "$FLAC.lockdir"; then
# you now have the exclusive lock
: critical section
: code goes here
rmdir "$FLAC.lockdir"
else
: nothing? to skip this file
# or maybe sleep 1 and loop back and try again
fi
For completeness, maybe also look for flock if you are on a set of platforms where that is reliably made available and need a performant alternative to lockfile.
You could implement locking of FLAC files that it's working on. Something like:
if (not flac locked)
lock flac
do work
else
continue to next flac
Send output to a temporary file with a unique name, then rename the file to the desired name.
flac -dc "$FLAC" | lame${lame_opts} \
--tt "$TITLE" \
--tn "$TRACKNUMBER" \
--tg "$GENRE" \
--ty "$DATE" \
--ta "$ARTIST" \
--tl "$ALBUM" \
--add-id3v2 \
- "$MP3.$$"
mv "$MP3.$$" "$MP3"
If a race condition leaks through your file locking system every once in a while, the final output will still be the result of one process.
To lock a file process you can create a file with the same name with a .lock extension.
Before starting the encoding check the existence of the .lock file, and optionally make sure the date of the lockfile isn't too old (in case the process dies). If it does not exist, create it before the encoding starts, and remove it after the encoding is complete.
You can also flock the file, but this only really works in c where you are calling flock() and writing to the file then closing and unlocking. For a shell script, you probably are calling another utility to do the writing of the file.
How about writing a Makefile?
ALL_FLAC=$(wildcard *.flac)
ALL_MP3=$(patsubst %.flac, %.mp3, $(ALL_FLAC)
all: $(ALL_MP3)
%.mp3: %.flac
$(FLAC) ...
Then do
$ make -j4 all
In bash it's possible to set noclobber option to avoid file overwriting.
help set | egrep 'noclobber|-C'
Use a tool like FLOM (Free LOck Manager) and simply serialize your command as below:
flom -- flac ....
Related
I'm trying to display the output of an AWS lambda that is being captured in a temporary text file, and I want to remove that file as I display its contents. Right now I'm doing:
... && cat output.json && rm output.json
Is there a clever way to combine those last two commands into one command? My goal is to make the full combined command string as short as possible.
For cases where
it is possible to control the name of the temporary text file.
If file is not used by other code
Possible to pass "/dev/stdout" as the.name of the output
Regarding portability: see stack exchange how portable ... /dev/stdout
POSIX 7 says they are extensions.
Base Definitions,
Section 2.1.1 Requirements:
The system may provide non-standard extensions. These are features not required by POSIX.1-2008 and may include, but are not limited to:
[...]
• Additional character special files with special properties (for example, /dev/stdin, /dev/stdout, and /dev/stderr)
Using the mandatory supported /dev/tty will force output into “current” terminal, making it impossible to pipe the output of the whole command into different program (or log file), or to use the program when there is no connected terminals (cron job, or other automation tools)
No, you cannot easily remove the lines of a file while displaying them. It would be highly inefficient as it would require removing characters from the beginning of a file each time you read a line. Current filesystems are pretty good at truncating lines at the end of a file, but not at the beginning.
A simple but extremely slow method would look like this:
while [ -s output.json ]
do
head -1 output.json
sed -i 1d output.json
done
While this algorithm is plain and simple, you should know that each time you remove the first line with sed -i 1d it will copy the whole content of the file but the first line into a temporary file, resulting in approximately 0.5*n² lines written in total (where n is the number of lines in your file).
In theory you could avoid this by do something like that:
while [ -s output.json ]
do
line=$(head -1 output.json)
printf -- '%s\n' "$line"
fallocate -c -o 0 -l $((${#len}+1)) output.json
done
But this does not account for variable newline characters (namely DOS-formatted newlines) and fallocate does not always work on xfs, among other issues.
Since you are trying to consume a file alongside its creation without leaving a trace of its existence on disk, you are essentially asking for a pipe functionality. In my opinion you should look into how your output.json file is produced and hopefully you can pipe it to a script of your own.
I need to decompress a very large file (100GB+) and get it be processed by two parallel threads. Problem is that I want to feed uncompressed content to both threads at the same time using STDIN/STDOUT
bzip2 north-america-latest.osm.bz2 | \
osmosis --read-xml file=- \ # get first thread
--tf accept-ways highway=motorway
outPipe.0=motorway \
--fast-read-xml file=- # here another thread
--tf accept-nodes place=\*
outPipe.0=places \
--merge inPipe.0=motorway inPipe.1=places
Syntax might be not so transparent but idea is that both threads read from the same standard input and basically steal each other data.
Somehow I need to get each thread its own STDIN (or another temp in-memory stream) and split output of bzip2 between them.
You can use tee to split output to multiple processes
bzip2 north-america-latest.osm.bz2 | tee >(command1) | command2
You can have as many commands as you want.
bzip2 north-america-latest.osm.bz2 | tee >(command1) >(command2) >(command3) | command4
The command after the pipe is optional. If omitted it will continue to go to stdout.
So I have a script which unzips a file:
#!/bin/bash -e
# will unzip the data without removing the zipped version
gzip -dc $1 > RawData/unzipped/$(basename $1 .gz)
I then want to execute code on that unzipped file, I have
# will run fast qc on the argument passed
fastqc RawData/unzipped/$(basename $1 .gz) --outdir=fastReports/
but the second script never seems to execute. (Note these are in the same script so I was assuming it would execute the initial script before the second one)
Zipped:
14624_1#10_1.fastq.gz 14624_1#12_2.fastq.gz 14624_1#4_1.fastq.gz 14624_1#7_1.fastq.gz
14624_1#10_2.fastq.gz 14624_1#1_2.fastq.gz 14624_1#4_2.fastq.gz 14624_1#7_2.fastq.gz
14624_1#11_1.fastq.gz 14624_1#2_1.fastq.gz 14624_1#5_1.fastq.gz 14624_1#8_1.fastq.gz
14624_1#11_2.fastq.gz 14624_1#2_2.fastq.gz 14624_1#5_2.fastq.gz 14624_1#8_2.fastq.gz
14624_1#1_1.fastq.gz 14624_1#3_1.fastq.gz 14624_1#6_1.fastq.gz 14624_1#9_1.fastq.gz
14624_1#12_1.fastq.gz 14624_1#3_2.fastq.gz 14624_1#6_2.fastq.gz 14624_1#9_2.fastq.gz
Extracted:
14624_1#10_1.fastq 14624_1#12_1.fastq 14624_1#3_1.fastq 14624_1#5_2.fastq 14624_1#8_1.fastq
14624_1#10_2.fastq 14624_1#12_2.fastq 14624_1#3_2.fastq 14624_1#6_1.fastq 14624_1#8_2.fastq
14624_1#11_1.fastq 14624_1#1_2.fastq 14624_1#4_1.fastq 14624_1#6_2.fastq 14624_1#9_1.fastq
14624_1#11_2.fastq 14624_1#2_1.fastq 14624_1#4_2.fastq 14624_1#7_1.fastq 14624_1#9_2.fastq
14624_1#1_1.fastq 14624_1#2_2.fastq 14624_1#5_1.fastq 14624_1#7_2.fastq
You might just use zcat and process the file on the fly:
fastqc <(zcat path/to/file.gz)
Btw, the <() syntax is a Process Substitution.
If you need both the unzipped file and the process result you may use tee:
fastqc <(zcat path/to/file.gz | tee file)
I have an automated process that has a number of lines like the following pattern:
sudo cat /some/path/to/a/file >> /some/other/file
I'd like to transform that into a one liner that will only append to /some/other/file if /some/path/to/a/file has not already been added.
Edit
It's clear I need some examples here.
example 1: Updating a .bashrc script for a specific login
example 2: Creating a .screenrc for different logins
example 3: Appending to the end of a /etc/ config file
Some other caveats. The text is going to be added in a block (>>). Consequently, it should be relatively straight forward to see if the entire code block is added or not near the end of a file. I am trying to come up with a simple method for determining whether or not the file has already been appended to the original.
Thanks!
Example python script...
def check_for_appended(new_file, original_file):
""" Checks original_file to see if it has the contents of new_file """
new_lines = reversed(new_file.split("\n"))
original_lines = reversed(original_file.split("\n"))
appended = None
for new_line, orig_line in zip(new_lines, original_lines):
if new_line != orig_line:
appended = False
break
else:
appended = True
return appended
Maybe this will get you started - this GNU awk script:
gawk -v RS='^$' 'NR==FNR{f1=$0;next} {print (index($0,f1) ? "present" : "absent")}' file1 file2
will tell you if the contents of "file1" are present in "file2". It cannot tell you why, e.g. because you previously concatenated file1 onto the end of file2.
Is that all you need? If not update your question to clarify/explain.
Here's a technique to see if a file contains another file
contains_file_in_file() {
local small=$1
local big=$2
awk -v RS="" '{small=$0; getline; exit !index($0, small)}' "$small" "$big"
}
if ! contains_file_in_file /some/path/to/a/file /some/other/file; then
sudo cat /some/path/to/a/file >> /some/other/file
fi
EDIT: Op just told me in the comments that the files he wants to concatenate are bash scripts -- this brings us back to the good ole C preprocessor include guard tactics:
prepend every file with
if [ -z "$__<filename>__" ]; then __<filename>__=1; else
(of course replacing <filename> with the name of the file) and at the end
fi
this way, you surround the script in each file with a test for something that's only true once.
Does this work for you?
sudo (set -o noclobber; date > /tmp/testfile)
noclobber prevents overwriting an existing file.
I think it doesn't, since you wrote you want to append something but this technique might help.
When the appending all occurs in one script, then use a flag:
if [ -z "${appended_the_file}" ]; then
cat /some/path/to/a/file >> /some/other/file
appended_the_file="Yes I have done it except for permission/right issues"
fi
I would continue into writing a function appendOnce { .. }, with the content above. If you really want an ugly oneliner (ugly: pain for the eye and colleague):
test -z "${ugly}" && cat /some/path/to/a/file >> /some/other/file && ugly="dirt"
Combining this with sudo:
test -z "${ugly}" && sudo "cat /some/path/to/a/file >> /some/other/file" && ugly="dirt"
It appears that what you want is a collection of script segments which can be run as a unit. Your approach -- making them into a single file -- is hard to maintain and subject to a variety of race conditions, making its implementation tricky.
A far simpler approach, similar to that used by most modern Linux distributions, is to create a directory of scripts, say ~/.bashrc.d and keep each chunk as an individual file in that directory.
The driver (which replaces the concatenation of all those files) just runs the scripts in the directory one at a time:
if [[ -d ~/.bashrc.d ]]; then
for f in ~/.bashrc.d/*; do
if [[ -f "$f" ]]; then
source "$f"
fi
done
fi
To add a file from a skeleton directory, just make a new symlink.
add_fragment() {
if [[ -f "$FRAGMENT_SKELETON/$1" ]]; then
# The following will silently fail if the symlink already
# exists. If you wanted to report that, you could add || echo...
ln -s "$FRAGMENT_SKELETON/$1" "~/.bashrc.d/$1" 2>>/dev/null
else
echo "Not a valid fragment name: '$1'"
exit 1
fi
}
Of course, it is possible to effectively index the files by contents rather than by name. But in most cases, indexing by name will work better, because it is robust against editing the script fragment. If you used content checks (md5sum, for example), you would run the risk of having an old and a new version of the same fragment, both active, and without an obvious way to remove the old one.
But it should be straight-forward to adapt the above structure to whatever requirements and constraints you might have.
For example, if symlinks are not possible (because the skeleton and the instance do not share a filesystem, for example), then you can copy the files instead. You might want to avoid the copy if the file is already present and has the same content, but that's just for efficiency and it might not be very important if the script fragments are small. Alternatively, you could use rsync to keep the skeleton and the instance(s) in sync with each other; that would be a very reliable and low-maintenance solution.
I have an issue with "make" (Oh, the horror!).
We're trying to migrate some COBOL code from Windows to Linux. The compiler and such are from Micro Focus. Under Windows the code is developed with Micro Focus Net Express. Linux has Micro Focus Server Express as the equivalent. The programs are compiled and linked using "make" scripts.
So much for the background.
The problem is a "make" script that doesn't want to compile and link an executable under Linux. The targets look like this:
# HP INIT-Daten laden
#
datLoad$O: \
$(UI)/defretrn.cpy \
$(UI)/e12sy00s.cpy \
$(UI)/e12sy005.cpy \
$(UI)/e12sy006.cpy \
$(UI)/e12sy010.cpy \
$(UI)/e12sy013.cpy \
$(UI)/e12sy050.cpy \
$(UI)/e12db001.cpy \
$(UI)/e12db050.cpy \
$(UI)/evlg.cpy \
$(UI)/deffehl.cpy \
datLoad.xcbl $(FRC)
# #echo "dollar-O is \"$O\" in $#"
datLoad$X: $(LIBDSQL) datLoad$O \
$(LP)/evlg$O $(LP)/alock$O
$(LCOB) -o $(#:$X=) -e $(#:$X=) $(LCOBFLAGS) \
-d e12db001 -d e12db003 -d e12db012 \
-d e12sy005 -d e12sy006 -d e12sy009 \
-d e12sy010 -d e12sy012 -d e12sy013 \
-d e12sy050 \
-I EvLgSetCategory $(LP)/evlg$O \
-I ALckSetDebug $(LP)/alock$O \
$(LIBEXEEXT) "$(LIBXSQL)"
if [ -f $B/$# -a ! -w $B/$# ] ; then rm -f $B/$# ; fi
cp $# $B
To put this into context, $0=".o" (i.e. an object file extension). $(LCOB) is the link command. $X=".exe" (an executable ... just forget about the extension, we'll fix that in due course). All the other stuff relates to paths ==> not relevant to the issue at hand and, yes, they've all been checked and verified.
Ultimately, I am trying to get "make" to resolve a target called "datLoad.o".
Included is a second "make" script containing the following:
COBFLAGS = -cx # create object file
GNTFLAGS = -ug # create .gnt file
SOFLAGS = -z # create
LCOB = cob
...
.cbl$O:
$(CCOB) $(COBFLAGS) $*.cbl, $*$O, NUL, NUL
if [ -f $(LP)/$*$O -a ! -w $(LP)/$*$O ] ; then rm -f $(LP)/$*$O ; fi
cp $*$O $(LP)
The relevant part is the target which resolves to ".cbl.o:". Yes, that's the shorthand version and I don't really like it but I did not write this script. I'm assured that it really means *.o:*.cbl and other similar constructs in the script do work correctly.
With a simple "make" I get a link error:
> In function `cbi_entry_point': (.data+0x384): undefined reference to
> `datLoad' /tmp/cobwZipkt/%cob0.o: In function `main': (.text+0x28):
> undefined reference to `datLoad' make: *** [datLoad.exe] Error 1
That means datLoad.o was not created. If I do create it explicitly with:
cob -cx datload
Then "make" still gives the same error as above. Weird! However, what I really cannot understand is the response I get from "make datLoad.o" when the target does not exist:
make: Nothing to be done for `datLoad.o'.
I assumed (heaven help me) that the target "datLoad.o" would try to create the required target file if that file does not already exist. Am I going mad?
Sorry if this seems a bit obscure, I'm not sure how to phrase it better. If anybody has an idea what might be going on, I'd be really grateful...
Thank you Mad Scientist. Your tip was correct.
The included .mk contained a .SUFFIXES rule. The problem was that the $O was not being used consistently. $O was originally set to ".obj" for Windows. Under Linux it's ".o". However, the .SUFFIXES rule had the ".obj" hard coded into it, so of course the ".o" targets were not being recognised. I replaced the hard coded suffix with the $O variable and it now works.
Achim