Calling commands in bash script with parameters which have embedded spaces (eg filenames)

Calling commands in bash script with parameters which have embedded spaces (eg filenames) - linux

I am trying to write a bash script which does some processing on music files. Here is the script so far:
#!/bin/bash
SAVEIFS=$IFS
IFS=printf"\n\0"
find `pwd` -iname "*.mp3" -o -iname "*.flac" | while read f
do
echo "$f"
$arr=($(f))
exiftool "${arr[#]}"
done
IFS=$SAVEIFS
This fails with:
[johnd:/tmp/tunes] 2 $ ./test.sh
./test.sh: line 9: syntax error near unexpected token `$(f)'
./test.sh: line 9: ` $arr=($(f))'
[johnd:/tmp/tunes] 2 $
I have tried many different incantations, none of which have worked. The bottom line is I'm trying to call a command exiftool, and one of the parameters of that command is a filename which may contain spaces. Above I'm trying to assign the filename $f to an array and pass that array to exiftool, but I'm having trouble with the construction of the array.
Immediate question is, how do I construct this array? But the deeper question is how, from within a bash script, do I call an external command with parameters which may contain spaces?

You actually did have the call-with-possibly-space-containing-arguments syntax right (program "${args[#]}"). There were several problems, though.
Firstly, $(foo) executes a command. If you want a variable's value, use $foo or ${foo}.
Secondly, if you want to append something onto an array, the syntax is array+=(value) (or, if that doesn't work, array=("${array[#]}" value)).
Thirdly, please separate filenames with \0 whenever possible. Newlines are all well and good, but filenames can contain newlines.
Fourthly, read takes the switch -d, which can be used with an empty string '' to specify \0 as the delimiter. This eliminates the need to mess around with IFS.
Fifthly, be careful when piping into while loops - this causes the loop to be executed in a subshell, preventing variable assignments inside it from taking effect outside. There is a way to get around this, however - instead of piping (command | while ... done), use process substitution (while ... done < <(command)).
Sixthly, watch your process substitutions - there's no need to use $(pwd) as an argument to a command when . will do. (Or if you really must have full paths, try quoting the pwd call.)
tl;dr
The script, revised:
while read -r -d '' f; do
echo "$f" # For debugging?
arr+=("$f")
done < <(find . -iname "*.mp3" -o -iname "*.flac" -print0)
exiftool "${arr[#]}"
Another way
Leveraging find's full capabilities:
find . -iname "*.mp3" -o -iname "*.flac" -exec exiftool {} +
# Much shorter!
Edit 1
So you need to save the output of exiftool, manipulate it, then copy stuff? Try this:
while read -r -d '' f; do
echo "$f" # For debugging?
arr+=("$f")
done < <(find . -iname "*.mp3" -o -iname "*.flac" -print0)
# Warning: somewhat misleading syntax highlighting ahead
newfilename="$(exiftool "${arr[#]}")"
newfilename="$(manipulate "$newfilename")"
cp -- "$some_old_filename" "$newfilename"
You probably will need to change that last bit - I've never used exiftool, so I don't know precisely what you're after (or how to do it), but that should be a start.

You can do this just with bash:
shopt -s globstar nullglob
a=( **/*.{mp3,flac} )
exiftool "${a[#]}"
This probably works too: exiftool **/*.{mp3,flac}

Related

Passing linux command as a command line argument to shell script

Following command
"find . -type f -regextype posix-extended -regex './ctrf.|./rbc.' -exec basename {} ;"
And executing it.
I am stroring the command in variable in shell script link
Find_Command=$1
For Execution
Files="$(${Find_Command})"
Not working.

Best Practice: Accept An Array, Not A String
First, your shell script should take the command to run as a series of separate arguments, not a single argument.
#!/usr/bin/env bash
readarray -d '' Files < <("$#")
echo "Found ${#Files[#]} files" >&2
printf ' - %q\n' "${Files[#]}"
called as:
./yourscript find . -type f -regextype posix-extended -regex './ctrf.*|./rbc.*' -printf '%f\0'
Note that there's no reason to use the external basename command: find -printf can directly print you only the filename.
Fallback: Parsing A String To An Array Correctly
If you must accept a string, you can use the answers in Reading quoted/escaped arguments correctly from a string to convert that string to an array safely.
Compromising complete shell compatibility to avoid needing nonstandard tools, we can use xargs:
#!/usr/bin/env bash
readarray -d '' Command_Arr < <(xargs printf '%s\0' <<<"$1")
readarray -d '' Files < <("${Command_Arr[#]}")
echo "Found ${#Files[#]} files" >&2
printf ' - %q\n' "${Files[#]}"
...with your script called as:
./yourscript $'find . -type f -regextype posix-extended -regex \'./ctrf.*|./rbc.*\' -printf \'%f\\0\''

If you want to run a command specified in a variable and save the output in another variable, you can use following commands.
command="find something" output=$($command)
Or if you want to store output in array:
typeset -a output=$($command)
However, storing filenames in variables and then attempting to access files with those filenames is a bad idea because it is impossible to set the proper delimiter to separate filenames because filenames can contain any character except NUL (see https://mywiki.wooledge.org/BashPitfalls).
I'm not sure what you're trying to accomplish, but your find command contains an error. The -exec option must end with ; to indicate the end of the -exec parameters. Aside from that, it appears to be 'The xy problem' see https://xyproblem.info/
If you want to get basename of regular files with the extension .ctrf or.rbc, use the bash script below.
for x in **/*.+(ctrf|rbc); do basename $x ; done
Or zsh script
basename **/*.(ctrf|rbc)(#q.)
Make sure you have enabled 'extended glob' option in your shell.
To enable it in bash run following comand.
shopt -s extglob
And for zsh
setopt extendedglob

You should use array instead of string for Find_Command :
#!/usr/bin/env bash
Find_Command=(find . -type f -regextype posix-extended -regex '(./ctrf.|./rbc.)' -exec basename {} \;)
Files=($(“${Find_Command[#]}”))
Second statement assumes you don't have special characters (like spaces) in your file names.

Use eval:
Files=$(eval "${Find_Command}")
Be mindful of keeping the parameter sanitized and secure.

Bash script to change file names of different extension [duplicate]

This question already has answers here:
Batch renaming files with Bash
(10 answers)
Closed 4 years ago.
we are working on a angularjs project, where the compiled output contains lot of file extensions like js,css, woff, etc.. along with individual dynamic hashing as part of file name.
I am working on simple bash script to search the files belonging to the mentioned file extensions and moving to some folder with hashing removed by
searching for first instance of '.'.
Please note file extension .woff and .css should be retained.
/src/main.1cc794c25c00388d81bb.js ==> /dst/main.js
/src/polyfills.eda7b2736c9951cdce19.js ==> /dst/polyfills.js
/src/runtime.a2aefc53e5f0bce023ee.js ==> /dst/runtime.js
/src/styles.8f19c7d2fbe05fc53dc4.css ==> /dst/styles.css
/src/1.620807da7415abaeeb47.js ==> /dst/1.js
/src/2.93e8bd3b179a0199a6a3.js ==> /dst/2.js
/src/some-webfont.fee66e712a8a08eef580.woff ==> /dst/some-webfont.woff
/src/Web_Bd.d2138591460eab575216.woff ==> /dst/Web_Bd.woff
Bash code:
#!/bin/bash
echo Process web binary files!
echo Processing the name change for js files!!!!!!!!!!!!!
sfidx=0;
SFILES=./src/*.js #{js,css,voff}
DST=./dst/
for files in $SFILES
do
echo $(basename $files)
cp $files ${DST}"${files//.*}".js
sfidx=$((sfidx+1))
done
echo Number of target files detected in srcdir $sfidx!!!!!!!!!!
The above code has 2 problems,
Need to add file extensions in for loop at a common place, instead of running for each extension. However, this method fails, not sure this needs to be changed.
SFILES=./src/*.{js,css,voff}
cp: cannot stat `./src/*.{js,css,voff}': No such file or directory
Second, the cp cmd fails due to below reason, need some help to figure out correct syntax.
cp $files ${DST}"${files//.*}".js
1.620807da7415abaeeb47.js
cp: cannot create regular file `./dst/./src/1.620807da7415abaeeb47.js.js': No such file or directory

Here is a relatively simple command to do it:
find ./src -type f \( -name \*.js -o -name \*.css -o -name \*.woff \) -print0 |
while IFS= read -r -d $'\0' line; do
dest="./dst/$(echo $(basename $line) | sed -E 's/(\..{20}\.)(js|css|woff)/\.\2/g')"
echo Copying $line to $dest
cp $line $dest
done

This is based on the original code and is Shellcheck-clean:
#!/bin/bash
shopt -s nullglob # Make globs that match nothing expand to nothing
echo 'Process web binary files!'
echo 'Processing the name change for js, css, and woff files!!!!!!!!!!!!!'
srcfiles=( src/*.{js,css,woff} )
destdir=dst
for srcpath in "${srcfiles[#]}" ; do
filename=${srcpath##*/}
printf '%s\n' "$filename"
nohash_base=${filename%.*.*} # Remove the hash and suffix from the end
suffix=${filename##*.} # Remove everything up to the final '.'
newfilename=$nohash_base.$suffix
cp -- "$srcpath" "$destdir/$newfilename"
done
echo "Number of target files detected in srcdir ${#srcfiles[*]}!!!!!!!!!"
The code uses an array instead of a string to hold the list of files because it is easier (and generally safer because it can handle file with names that contain spaces and other special characters). See Arrays [Bash Hackers Wiki] for information about using arrays in Bash.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for information about using ${var##pattern} etc. for extracting parts of strings.
See Correct Bash and shell script variable capitalization for an explanation of why it is best to avoid uppercase variable names (such as SFILES).
shopt -s nullglob prevents strange things happening if the glob pattern(s) fail to match. See Why is nullglob not default? for more information.
See Bash Pitfalls #2 (cp $file $target) for why it's generally better to use cp -- instead of plain cp (though it's not necessary in this case (since neither argument can begin with '-')).
It's best to keep Bash code Shellcheck-clean. When run on the code in the question it identifies the key problem, and recommends the use of arrays as a way to fix it. It also identifies several other potential problems.

Your problem is precedence of the expansions. Here is my solution:
#!/bin/bash
echo Process web binary files!
echo Processing the name change for js files!!!!!!!!!!!!!
sfidx=0;
SFILES=$(echo ./src/*.{js,css,voff})
DST=./dst/
for file in $SFILES
do
new=${file##*/} # basename
new="${DST}${new%\.*}.js" # escape \ the .
echo "copying $file to $new" # sanity check
cp $file "$new"
sfidx=$((sfidx+1))
done
echo Number of target files detected in srcdir $sfidx!!!!!!!!!!
With three files in ./src all named "gash" I get:
Process web binary files!
Processing the name change for js files!!!!!!!!!!!!!
copying ./src/gash.js to ./dst/gash.js
copying ./src/gash.css to ./dst/gash.js
copying ./src/gash.voff to ./dst/gash.js
Number of target files detected in srcdir 3!!!!!!!!!!
(You might be able to get around using eval, but that can be a security issue)
new=${file##*/} - remove the longest string on the left ending in / (remove leading directory names, as basename). If you wanted to use the external non-shell basename program then it would be new=$(basename $file).
${new%\.*} - remove the shortest string on the right starting . (remove the old file extension)

A possible approach is to have the find command generate a shell script, then execute it.
src=./src
dst=./dst
find "$src" \( -name \*.js -o -name \*.woff -o -name \*.css \) \
-printf 'p="%p"; f="%f"; cp "$p" "'"${dst}"'/${f%%%%.*}.${f##*.}"\n'
This will print the shell commands you want to execute. If they are what you want,
just pipe the output to a shell:
find "$src" \( -name \*.js -o -name \*.woff -o -name \*.css \) \
-printf 'p="%p"; f="%f"; cp "$p" "'"${dst}"'/${f%%%%.*}.${f##*.}"\n'|bash
(or |bash -x if you want to see what is going on.)
If you have files named, e.g., ./src/dir1/a.xyz.js and ./src/dir2/a.uvw.js they will both end up as ./dst/a.js, the second overwriting the first. To avoid this, you might want to use cp -i instead of cp.
If you are absolutely sure that there will never be spaces or other strange characters in your pathnames, you can use less quotes (to the horror of some shell purists)
find $src \( -name \*.js -o -name \*.woff -o -name \*.css \) \
-printf "p=%p; f=%f; cp \$p ${dst}/\${f%%%%.*}.\${f##*.}\\n"|bash
Some final remarks:
%p and %f are expanded by -printf as the full pathname and the basename of the processed file. they enable us to avoid the basename command. Unfortunately, the is no such directive for the file extension, so we must use brace expansion in the shell to compose the final name.
In the -printf argument, we must use %% to write a single percent character. Since we need two of them, there have to be four...
${f%%.*} expands in the shell as the value of $f with everything removed from the first dot onwards
${f##*.} expands in the shell as the value of $f with everything removed up to the last dot (i.e., it expands to the file extension)

No such file or directory when piping. Each command works separately, but not when piping

I have 2 folders: folder_a & folder_b. In each of these folders there are a bunch of files. I am trying to use sed to move all of these files out of these folders and into my current working directory I am currently in.
My folder structure looks like this:
mytest:
a:
1.txt
2.txt
3.txt
b:
4.txt
5.txt
The command I am trying to use is:
find . -type d ! -iname '*.*' # find all folders other than root
| sed -r 's/.*/&\/*/' # add '/*' to each of the arguments
| sed -r 'p;s/.*/./' # output: a/* . b/* .
| xargs -n 2 mv # should be creating two commands: 'mv a/* .' and 'mv b/* .'
Unfortunately I get an error:
mv: cannot stat './aaa/*': No such file or directory
I also get the same error when I try this other strategy (using ls instead of mv):
for dir in */; do
ls $dir;
done;
Even if I use sed to replace the spaces in each directory name with '\ ', or surround the directory names with quotes I get the same error.
I'm not sure if these 2 examples are related in my misunderstanding of bash but they both seem to demonstrate my ignorance of how bash translates the output from one command into the input of another command.
Can anyone shed some light on this?

Update: Completely rewritten.
As #EtanReisner and #melpomene have noted, mv */* . or, more specifically, mv a/* b/* . is the most straightforward solution, but you state that this is in part a learning exercise, so the remainder of the answer shows an efficient find-based solution and explains the problem with the original command.
An efficient find-based solution
Generally, if feasible, it's best and most efficient to let find itself do the work, without involving additional tools; find's -exec action is like a built-in xargs, with {} representing the path at hand (with terminator \;) / all paths (with +):
find . -type f -exec echo mv -t . {} +
To be safe, his will just print the mv commands that would be executed; remove the echo to actually execute them.
This will execute a single[1] mv command to which all matching files are passed, and -t . moves them all to the current dir.
[1] If the resulting command line is too long (which is unlikely), it is split up into multiple commands, just as with xargs.
Operating on files (-type f) bypasses the need for globbing, as find will then enumerate all files for you (it also bypasses the need to exclude . explicitly).
Note that this solution works on entire subtrees, not just (immediate) subdirectories.
It's tempting to consider turning on Bash 4's globstar option and using mv */** ., but that won't work, because it will attempt to move directories as well, not just the files in them.
A caveat re -exec with +: it only works if {} - the placeholder for all paths - is the token immediately before the +.
Since you're on Linux, we can satisfy this condition by specifying the target folder for mv with option -t before the {}; on BSD-based systems such as OSX, you could not do that, because mv doesn't support -t there, so you'd have to use terminator \;, which means that mv is called once for every path, which is obviously much slower.
Why your command didn't work:
As #EtanReisner points out in a comment, xargs invokes the command specified without (implicitly) involving a shell, so globbing won't work; you can verify this with the following command:
echo '*' | xargs echo # -> '*' - NO globbing
If we leave the globbing issue aside, additional work would have been necessary to make your xargs command work correctly with folder names with embedded spaces (or other shell metacharacters):
find . -mindepth 1 -type d |
sed -r "s/.*/'&'\/* ./" | # -> '<input-path>'/* . (including single-quotes)
xargs -n 2 echo mv # NOTE: still won't work due to lack of globbing
Note how the (combined) sed command now produces a single output line '<input-path>'/* ., with the input path enclosed in embedded single-quotes, which is required for xargs to recognize <input-path> as a single argument, even if it contains embedded spaces.
(If your filenames contain single-quotes, you'd have to do more work; also note that since now all arguments for a given dir. are on a single line, you could use xargs -L 1 ....)
Also note how -mindepth 1 (only process paths at the subdirectory level or below) is used to skip processing of . itself.
The only way to make globbing happen is to get the shell involved:
find . -mindepth 1 -type d |
sed -r "s/.*/'&'\/* ./" | # -> '<input-path>'/* . (including single-quotes)
xargs -I {} sh -c 'echo mv {}' # works, but is inefficient
Note the use of xargs' -I option to treat each input line as its own argument ({} is a self-chosen placeholder for the input).
sh -c invokes the (default) shell to execute the resulting command, at which globbing does happen.
However, overall, this is quite inefficient:
A pipeline with 3 segments is used.
A shell instance is invoked for every input path, which in turn calls the mv utility.
Compare this to the efficient find-only solution above, which (typically) creates only 2 processes in total.

problems with source command in shell

Good afternoon I have the following command to run me 'code.sh' file which I pass a parameter '$ 1' the problem is that I want to run a 'code.sh' with 'source' this is my command:
find . -name "*.txt" -type f -exec ./code.sh {} \;
And I do do well occupied
source ./code.sh

This is tricky. When you source a script you need to do it in the current shell, not in a sub-shell or child process. Executing source from find won't work because find is a child process, and so changes to environment variables will be lost.
It's rather roundabout, but you can use a loop to parse find's output and run the source commands directly in the top-level shell (using process substitution).
while read -d $'\0' fileName; do
source code.sh "$fileName"
done < <(find . -name "*.txt" -type f -print0)
Now what's with -print0 and -d $'\0', you ask? Using these two flags together is a way of making the script extra safe.† File names in UNIX are allowed to contain lots of oddball characters including spaces, tabs, and even newlines. While newlines are rare, they are indeed legal.
-print0 tells find to use NUL characters (\0) to separate the file names rather than the default newlines (\n). Doing this means file names containing \n won't mess up the loop. Using \0 as a separator works well because \0 is not a legal character in file names.
-d $'\0'&ddagger; does the same thing with read on the other side. It tells read that lines are delimited with \0 instead of \n.
† You may have seen this trick before. It's common to write find ... -print0 | xargs -0 ... to get the same sort of safety when pairing find with xargs.
&ddagger; If you're wondering about $'...': that's Bash ANSI-C quoting syntax for writing string literals containing escape codes. Dollar sign plus single quotes. You can write $'\n' for a newline or $'\t' for a tab or $'\0' for a NUL.

You won't be able to use find in this way; it will always execute a command in a separate process, not the current shell. If you are using bash 4, there's a simple alternative to using find:
shopt -s globstar
for f in **/*.txt; do
[[ -f $f ]] && source code.sh "$f"
done

bash script collecting filenames seems to get confused by spaces

I'm trying to build a script that lists all the zip files in a set of directories, with some filters and get it to spit them out to file but when a filename has a space in it it seems to appear on a new line.
This list will eventually be used as an input to tar to gzip all the zip files, script is below:
#!/bin/bash
rm -f set1.txt
rm -f set2.txt
for line in $(find /home -type d -name assets ;);
do
echo $line >> set1.txt
for line in $(find $line -type f -name \*.zip -mtime +2 ;);
do
echo \"$line\" >> set2.txt
done;
This works as expected until you get a space in a filename then set2.txt contains entries like this:
"/home/xxxxxx/oldwebroot/htdocs/upload/assets/jobbags/rbjbCost"
"in"
"use"
"sept"
"2010.zip"
Does anyone know how I can get it to keep these filenames with spaces in in a single line with the whole lot wrapped in one set of quotes?
Thanks!

The correct way to loop over a set of files located via find is with a while read construct, thus:
while IFS= read -r -d '' line ; do
echo "$line" >> set1.txt
while IFS= read -r -d '' file ; do
printf '"%s"\n' "$file" >> set2.txt
done < <(find "$line" -type f -name \*.zip -mtime +2 -print0)
done < <(find /home -type d -name assets -print0)
For clarity I have given the inner loop variable a different name.
If you didn't have bash you'd have to issue the find command separately and redirect the output to a file, then read the file with while read ; do .. done < filename.
Note that each expansion of each variable is double-quoted. This is necessary.
Note also, however, that for what you want you can simply use the -printf switch to find, if you have GNU find.
find /home -type f -path '*/assets/*.zip' -mtime +2 -printf '"%p"\n' > set2.txt
Although, as #sarnold notes, this is not safe.

You should probably be executing your tar(1) command through some other mechanism; the find(1) program supports a -print0 option to request ASCII NUL-separated filename output, and the xargs(1) program supports a -0 option to tell it that the input is separated by ASCII NUL characters. (Since NUL is the only character that is not allowed in filenames, this is the only way to get reliable filename handling.)
Simply using the -print0 and -0 options will help but this still leaves the script open to another problem -- xargs(1) might decide to execute the tar(1) command two, three, or more times, depending upon its input. The last execution is the one that will "win", and the data from earlier invocations will be lost for ever. (This is useless as a backup.)
So you should also look into adding the --concatenate command line option to tar(1), too, so that it will add to the archive. It might make sense to perform the compression after all the files have been added, via gzip(1) or bzip2(1). (This does mean you need to remove the archive before a "fresh run" of this script.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string