Bash script to change file names of different extension [duplicate] - linux

This question already has answers here:
Batch renaming files with Bash
(10 answers)
Closed 4 years ago.
we are working on a angularjs project, where the compiled output contains lot of file extensions like js,css, woff, etc.. along with individual dynamic hashing as part of file name.
I am working on simple bash script to search the files belonging to the mentioned file extensions and moving to some folder with hashing removed by
searching for first instance of '.'.
Please note file extension .woff and .css should be retained.
/src/main.1cc794c25c00388d81bb.js ==> /dst/main.js
/src/polyfills.eda7b2736c9951cdce19.js ==> /dst/polyfills.js
/src/runtime.a2aefc53e5f0bce023ee.js ==> /dst/runtime.js
/src/styles.8f19c7d2fbe05fc53dc4.css ==> /dst/styles.css
/src/1.620807da7415abaeeb47.js ==> /dst/1.js
/src/2.93e8bd3b179a0199a6a3.js ==> /dst/2.js
/src/some-webfont.fee66e712a8a08eef580.woff ==> /dst/some-webfont.woff
/src/Web_Bd.d2138591460eab575216.woff ==> /dst/Web_Bd.woff
Bash code:
#!/bin/bash
echo Process web binary files!
echo Processing the name change for js files!!!!!!!!!!!!!
sfidx=0;
SFILES=./src/*.js #{js,css,voff}
DST=./dst/
for files in $SFILES
do
echo $(basename $files)
cp $files ${DST}"${files//.*}".js
sfidx=$((sfidx+1))
done
echo Number of target files detected in srcdir $sfidx!!!!!!!!!!
The above code has 2 problems,
Need to add file extensions in for loop at a common place, instead of running for each extension. However, this method fails, not sure this needs to be changed.
SFILES=./src/*.{js,css,voff}
cp: cannot stat `./src/*.{js,css,voff}': No such file or directory
Second, the cp cmd fails due to below reason, need some help to figure out correct syntax.
cp $files ${DST}"${files//.*}".js
1.620807da7415abaeeb47.js
cp: cannot create regular file `./dst/./src/1.620807da7415abaeeb47.js.js': No such file or directory

Here is a relatively simple command to do it:
find ./src -type f \( -name \*.js -o -name \*.css -o -name \*.woff \) -print0 |
while IFS= read -r -d $'\0' line; do
dest="./dst/$(echo $(basename $line) | sed -E 's/(\..{20}\.)(js|css|woff)/\.\2/g')"
echo Copying $line to $dest
cp $line $dest
done

This is based on the original code and is Shellcheck-clean:
#!/bin/bash
shopt -s nullglob # Make globs that match nothing expand to nothing
echo 'Process web binary files!'
echo 'Processing the name change for js, css, and woff files!!!!!!!!!!!!!'
srcfiles=( src/*.{js,css,woff} )
destdir=dst
for srcpath in "${srcfiles[#]}" ; do
filename=${srcpath##*/}
printf '%s\n' "$filename"
nohash_base=${filename%.*.*} # Remove the hash and suffix from the end
suffix=${filename##*.} # Remove everything up to the final '.'
newfilename=$nohash_base.$suffix
cp -- "$srcpath" "$destdir/$newfilename"
done
echo "Number of target files detected in srcdir ${#srcfiles[*]}!!!!!!!!!"
The code uses an array instead of a string to hold the list of files because it is easier (and generally safer because it can handle file with names that contain spaces and other special characters). See Arrays [Bash Hackers Wiki] for information about using arrays in Bash.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for information about using ${var##pattern} etc. for extracting parts of strings.
See Correct Bash and shell script variable capitalization for an explanation of why it is best to avoid uppercase variable names (such as SFILES).
shopt -s nullglob prevents strange things happening if the glob pattern(s) fail to match. See Why is nullglob not default? for more information.
See Bash Pitfalls #2 (cp $file $target) for why it's generally better to use cp -- instead of plain cp (though it's not necessary in this case (since neither argument can begin with '-')).
It's best to keep Bash code Shellcheck-clean. When run on the code in the question it identifies the key problem, and recommends the use of arrays as a way to fix it. It also identifies several other potential problems.

Your problem is precedence of the expansions. Here is my solution:
#!/bin/bash
echo Process web binary files!
echo Processing the name change for js files!!!!!!!!!!!!!
sfidx=0;
SFILES=$(echo ./src/*.{js,css,voff})
DST=./dst/
for file in $SFILES
do
new=${file##*/} # basename
new="${DST}${new%\.*}.js" # escape \ the .
echo "copying $file to $new" # sanity check
cp $file "$new"
sfidx=$((sfidx+1))
done
echo Number of target files detected in srcdir $sfidx!!!!!!!!!!
With three files in ./src all named "gash" I get:
Process web binary files!
Processing the name change for js files!!!!!!!!!!!!!
copying ./src/gash.js to ./dst/gash.js
copying ./src/gash.css to ./dst/gash.js
copying ./src/gash.voff to ./dst/gash.js
Number of target files detected in srcdir 3!!!!!!!!!!
(You might be able to get around using eval, but that can be a security issue)
new=${file##*/} - remove the longest string on the left ending in / (remove leading directory names, as basename). If you wanted to use the external non-shell basename program then it would be new=$(basename $file).
${new%\.*} - remove the shortest string on the right starting . (remove the old file extension)

A possible approach is to have the find command generate a shell script, then execute it.
src=./src
dst=./dst
find "$src" \( -name \*.js -o -name \*.woff -o -name \*.css \) \
-printf 'p="%p"; f="%f"; cp "$p" "'"${dst}"'/${f%%%%.*}.${f##*.}"\n'
This will print the shell commands you want to execute. If they are what you want,
just pipe the output to a shell:
find "$src" \( -name \*.js -o -name \*.woff -o -name \*.css \) \
-printf 'p="%p"; f="%f"; cp "$p" "'"${dst}"'/${f%%%%.*}.${f##*.}"\n'|bash
(or |bash -x if you want to see what is going on.)
If you have files named, e.g., ./src/dir1/a.xyz.js and ./src/dir2/a.uvw.js they will both end up as ./dst/a.js, the second overwriting the first. To avoid this, you might want to use cp -i instead of cp.
If you are absolutely sure that there will never be spaces or other strange characters in your pathnames, you can use less quotes (to the horror of some shell purists)
find $src \( -name \*.js -o -name \*.woff -o -name \*.css \) \
-printf "p=%p; f=%f; cp \$p ${dst}/\${f%%%%.*}.\${f##*.}\\n"|bash
Some final remarks:
%p and %f are expanded by -printf as the full pathname and the basename of the processed file. they enable us to avoid the basename command. Unfortunately, the is no such directive for the file extension, so we must use brace expansion in the shell to compose the final name.
In the -printf argument, we must use %% to write a single percent character. Since we need two of them, there have to be four...
${f%%.*} expands in the shell as the value of $f with everything removed from the first dot onwards
${f##*.} expands in the shell as the value of $f with everything removed up to the last dot (i.e., it expands to the file extension)

Related

Passing filename as variable from find's exec into a second exec command

From reading this stackoverflow answer I was able to remove the file extension from the files using find:
find . -name "S4*" -execdir basename {} .fastq.gz ';'
returned:
S9_S34_R1_001
S9_S34_R2_001
I'm making a batch script where I want to extract the filename with the above prefix to pass as arguments into a program. At the moment I'm currently doing this with a loop but am wondering if it can be achieved using find.
for i in $(ls | grep 'S9_S34*' | cut -d '.' -f 1); do echo "$i"_trim.log "$i"_R1_001.fastq.gz "$i"_R2_001.fastq.gz; done; >> trim_script.sh
Is it possible to do something as follows:
find . -name "S4*" -execdir basename {} .fastq.gz ';' | echo {}_trim.log {}_R1_001.fastq.gz {}_R2_001.fastq.gz {}\ ; >> trim_script.sh
You don't need basename at all, or -exec, if all you're doing is generating a series of strings that contain your file's basenames within them; the -printf action included in GNU find can do all that for you, as it provides a %P built-in to insert the basename of your file:
find . -name "S4*" \
-printf '%P_trim.log %P_R1_001.fastq.gz %P_R2_001.fastq.gz %P\n' \
>trim_script.sh
That said, be sure you only do this if you trust your filenames. If you're truly running the result as a script, there are serious security concerns if someone could create a S4$(rm -rf ~).txt file, or something with a similarly malicious name.
What if you don't trust your filenames, or don't have the GNU version of find? Then consider making find pass them into a shell (like bash or ksh) that supports the %q extension, to generate a safely-escaped version of those names (note that you should run the script with the same interpreter you used for this escaping):
find . -name "S4*" -exec bash -c '
for file do # iterates over "$#", so processes each file in turn
file=${file##*/} # get the basename
printf "%q_trim.log %q_R1_001.fastq.gz %q_R2_001.fastq.gz %q\n" \
"$file" "$file" "$file" "$file"
done
' _ {} + >trim_script.sh
Using -exec ... {} + invokes the smallest possible number of subprocesses -- not one per file found, but instead one per batch of filenames (using the largest possible batch that can fit on a command line).

Is there a utility for creating hard link backup?

I need to create a clone of a directory tree so I can clean up duplicate files.
I don't need copies of the files, I just need the files, so I want to create a matching tree with hard links.
I threw this together in a couple of minutes when I realized my backup was going to take hours
It just echos the commands which I redirect to a file to examine before I run it.
Of course the usual problems, like files and directories containing quote or commas have not been addressed (bash scripting sucks for this, doesn't it, this and files containing leading dashes)
Isn't there some utility that already does this in a robust fashion?
BASEDIR=$1
DESTDIR=$2
for DIR in `find "$BASEDIR" -type d`
do
RELPATH=`echo $DIR | sed "s,$BASEDIR,,"`
DESTPATH=${DESTDIR}/$RELPATH
echo mkdir -p \"$DESTPATH\"
done
for FILE in `find "$BASEDIR" -type f`
do
RELPATH=`echo $FILE | sed "s,$BASEDIR,,"`
DESTPATH=${DESTDIR}/$RELPATH
echo ln \"$FILE\" \"$DESTPATH\"
done
Generally using find like that is a bad idea - you are basically relying on separating filenames on whitespace, when in fact all forms of whitespace are valid as filenames on most UNIX systems. Find itself has the ability to run single commands on each file found, which is generally a better thing to use. I would suggest doing something like this (I'd use a couple of scripts for this for simplicity, not sure how easy it would be to do it all in one):
main.sh:
BASEDIR="$1" #I tend to quote all variables - good habit to avoid problems with spaces, etc.
DESTDIR="$2"
find "$BASEDIR" -type d -exec ./handle_file.sh \{\} "$BASEDIR" "$DESTDIR" \; # \{\} is replaced with the filename, \; tells find the command is over
find "$BASEDIR" -type f -exec ./handle_file.sh \{\} "$BASEDIR" "$DESTDIR" \;
handle_file.sh:
FILENAME="$1"
BASEDIR="$2"
DESTDIR="$3"
RELPATH="${FILENAME#"$BASEDIR"}" # bash string substitution double quoting, to stop BASEDIR being interpreted as a pattern
DESTPATH="${DESTDIR}/$RELPATH"
if [ -f "$FILENAME" ]; then
echo ln \""$FILENAME"\" \""$DESTPATH"\"
elif [ -d "$FILENAME" ]; then
echo mkdir -p \""$DESTPATH"\"
fi
I've tested this with a simple tree with spaces, asterisks, apostrophes and even a carriage return in filenames and it seems to work.
Obviously remove the escaped quotes and the "echo" (but leave the real quotes) to make it work for real.

Calling commands in bash script with parameters which have embedded spaces (eg filenames)

I am trying to write a bash script which does some processing on music files. Here is the script so far:
#!/bin/bash
SAVEIFS=$IFS
IFS=printf"\n\0"
find `pwd` -iname "*.mp3" -o -iname "*.flac" | while read f
do
echo "$f"
$arr=($(f))
exiftool "${arr[#]}"
done
IFS=$SAVEIFS
This fails with:
[johnd:/tmp/tunes] 2 $ ./test.sh
./test.sh: line 9: syntax error near unexpected token `$(f)'
./test.sh: line 9: ` $arr=($(f))'
[johnd:/tmp/tunes] 2 $
I have tried many different incantations, none of which have worked. The bottom line is I'm trying to call a command exiftool, and one of the parameters of that command is a filename which may contain spaces. Above I'm trying to assign the filename $f to an array and pass that array to exiftool, but I'm having trouble with the construction of the array.
Immediate question is, how do I construct this array? But the deeper question is how, from within a bash script, do I call an external command with parameters which may contain spaces?
You actually did have the call-with-possibly-space-containing-arguments syntax right (program "${args[#]}"). There were several problems, though.
Firstly, $(foo) executes a command. If you want a variable's value, use $foo or ${foo}.
Secondly, if you want to append something onto an array, the syntax is array+=(value) (or, if that doesn't work, array=("${array[#]}" value)).
Thirdly, please separate filenames with \0 whenever possible. Newlines are all well and good, but filenames can contain newlines.
Fourthly, read takes the switch -d, which can be used with an empty string '' to specify \0 as the delimiter. This eliminates the need to mess around with IFS.
Fifthly, be careful when piping into while loops - this causes the loop to be executed in a subshell, preventing variable assignments inside it from taking effect outside. There is a way to get around this, however - instead of piping (command | while ... done), use process substitution (while ... done < <(command)).
Sixthly, watch your process substitutions - there's no need to use $(pwd) as an argument to a command when . will do. (Or if you really must have full paths, try quoting the pwd call.)
tl;dr
The script, revised:
while read -r -d '' f; do
echo "$f" # For debugging?
arr+=("$f")
done < <(find . -iname "*.mp3" -o -iname "*.flac" -print0)
exiftool "${arr[#]}"
Another way
Leveraging find's full capabilities:
find . -iname "*.mp3" -o -iname "*.flac" -exec exiftool {} +
# Much shorter!
Edit 1
So you need to save the output of exiftool, manipulate it, then copy stuff? Try this:
while read -r -d '' f; do
echo "$f" # For debugging?
arr+=("$f")
done < <(find . -iname "*.mp3" -o -iname "*.flac" -print0)
# Warning: somewhat misleading syntax highlighting ahead
newfilename="$(exiftool "${arr[#]}")"
newfilename="$(manipulate "$newfilename")"
cp -- "$some_old_filename" "$newfilename"
You probably will need to change that last bit - I've never used exiftool, so I don't know precisely what you're after (or how to do it), but that should be a start.
You can do this just with bash:
shopt -s globstar nullglob
a=( **/*.{mp3,flac} )
exiftool "${a[#]}"
This probably works too: exiftool **/*.{mp3,flac}

bash script collecting filenames seems to get confused by spaces

I'm trying to build a script that lists all the zip files in a set of directories, with some filters and get it to spit them out to file but when a filename has a space in it it seems to appear on a new line.
This list will eventually be used as an input to tar to gzip all the zip files, script is below:
#!/bin/bash
rm -f set1.txt
rm -f set2.txt
for line in $(find /home -type d -name assets ;);
do
echo $line >> set1.txt
for line in $(find $line -type f -name \*.zip -mtime +2 ;);
do
echo \"$line\" >> set2.txt
done;
This works as expected until you get a space in a filename then set2.txt contains entries like this:
"/home/xxxxxx/oldwebroot/htdocs/upload/assets/jobbags/rbjbCost"
"in"
"use"
"sept"
"2010.zip"
Does anyone know how I can get it to keep these filenames with spaces in in a single line with the whole lot wrapped in one set of quotes?
Thanks!
The correct way to loop over a set of files located via find is with a while read construct, thus:
while IFS= read -r -d '' line ; do
echo "$line" >> set1.txt
while IFS= read -r -d '' file ; do
printf '"%s"\n' "$file" >> set2.txt
done < <(find "$line" -type f -name \*.zip -mtime +2 -print0)
done < <(find /home -type d -name assets -print0)
For clarity I have given the inner loop variable a different name.
If you didn't have bash you'd have to issue the find command separately and redirect the output to a file, then read the file with while read ; do .. done < filename.
Note that each expansion of each variable is double-quoted. This is necessary.
Note also, however, that for what you want you can simply use the -printf switch to find, if you have GNU find.
find /home -type f -path '*/assets/*.zip' -mtime +2 -printf '"%p"\n' > set2.txt
Although, as #sarnold notes, this is not safe.
You should probably be executing your tar(1) command through some other mechanism; the find(1) program supports a -print0 option to request ASCII NUL-separated filename output, and the xargs(1) program supports a -0 option to tell it that the input is separated by ASCII NUL characters. (Since NUL is the only character that is not allowed in filenames, this is the only way to get reliable filename handling.)
Simply using the -print0 and -0 options will help but this still leaves the script open to another problem -- xargs(1) might decide to execute the tar(1) command two, three, or more times, depending upon its input. The last execution is the one that will "win", and the data from earlier invocations will be lost for ever. (This is useless as a backup.)
So you should also look into adding the --concatenate command line option to tar(1), too, so that it will add to the archive. It might make sense to perform the compression after all the files have been added, via gzip(1) or bzip2(1). (This does mean you need to remove the archive before a "fresh run" of this script.)

Add prefix to all images (recursive)

I have a folder with more than 5000 images, all with JPG extension.
What i want to do, is to add recursively the "thumb_" prefix to all images.
I found a similar question: Rename Files and Directories (Add Prefix) but i only want to add the prefix to files with the JPG extension.
One of possibly solutions:
find . -name '*.jpg' -printf "'%p' '%h/thumb_%f'\n" | xargs -n2 echo mv
Principe: find all needed files, and prepare arguments for the standard mv command.
Notes:
arguments for the mv are surrounded by ' for allowing spaces in filenames.
The drawback is: this will not works with filenames what are containing ' apostrophe itself, like many mp3 files. If you need moving more strange filenames check bellow.
the above command is for dry run (only shows the mv commands with args). For real work remove the echo pretending mv.
ANY filename renaming. In the shell you need a delimiter. The problem is, than the filename (stored in a shell variable) usually can contain the delimiter itself, so:
mv $file $newfile #will fail, if the filename contains space, TAB or newline
mv "$file" "$newfile" #will fail, if the any of the filenames contains "
the correct solution are either:
prepare a filename with a proper escaping
use a scripting language what easuly understands ANY filename
Preparing the correct escaping in bash is possible with it's internal printf and %q formatting directive = print quoted. But this solution is long and boring.
IMHO, the easiest way is using perl and zero padded print0, like next.
find . -name \*.jpg -print0 | perl -MFile::Basename -0nle 'rename $_, dirname($_)."/thumb_".basename($_)'
The above using perl's power to mungle the filenames and finally renames the files.
Beware of filenames with spaces in (the for ... in ... expression trips over those), and be aware that the result of a find . ... will always start with ./ (and hence try to give you names like thumb_./file.JPG which isn't quite correct).
This is therefore not a trivial thing to get right under all circumstances. The expression I've found to work correctly (with spaces, subdirs and all that) is:
find . -iname \*.JPG -exec bash -c 'mv "$1" "`echo $1 | sed \"s/\(.*\)\//\1\/thumb/\"`"' -- '{}' \;
Even that can fall foul of certain names (with quotes in) ...
In OS X 10.8.5, find does not have the -printf option. The port that contained rename seemed to depend upon a WebkitGTK development package that was taking hours to install.
This one line, recursive file rename script worked for me:
find . -iname "*.jpg" -print | while read name; do cur_dir=$(dirname "$name"); cur_file=$(basename "$name"); mv "$name" "$cur_dir/thumb_$cur_file"; done
I was actually renaming CakePHP view files with an 'admin_' prefix, to move them all to an admin section.
You can use that same answer, just use *.jpg, instead of just *.
for file in *.JPG; do mv $file thumb_$file; done
if it's multiple directory levels under the current one:
for file in $(find . -name '*.JPG'); do mv $file $(dirname $file)/thumb_$(basename $file); done
proof:
jcomeau#intrepid:/tmp$ mkdir test test/a test/a/b test/a/b/c
jcomeau#intrepid:/tmp$ touch test/a/A.JPG test/a/b/B.JPG test/a/b/c/C.JPG
jcomeau#intrepid:/tmp$ cd test
jcomeau#intrepid:/tmp/test$ for file in $(find . -name '*.JPG'); do mv $file $(dirname $file)/thumb_$(basename $file); done
jcomeau#intrepid:/tmp/test$ find .
.
./a
./a/b
./a/b/thumb_B.JPG
./a/b/c
./a/b/c/thumb_C.JPG
./a/thumb_A.JPG
jcomeau#intrepid:/tmp/test$
Use rename for this:
rename 's/(\w{1})\.JPG$/thumb_$1\.JPG/' `find . -type f -name *.JPG`
For only jpg files in current folder
for f in `ls *.jpg` ; do mv "$f" "PRE_$f" ; done

Resources