bash script collecting filenames seems to get confused by spaces - linux

I'm trying to build a script that lists all the zip files in a set of directories, with some filters and get it to spit them out to file but when a filename has a space in it it seems to appear on a new line.
This list will eventually be used as an input to tar to gzip all the zip files, script is below:
#!/bin/bash
rm -f set1.txt
rm -f set2.txt
for line in $(find /home -type d -name assets ;);
do
echo $line >> set1.txt
for line in $(find $line -type f -name \*.zip -mtime +2 ;);
do
echo \"$line\" >> set2.txt
done;
This works as expected until you get a space in a filename then set2.txt contains entries like this:
"/home/xxxxxx/oldwebroot/htdocs/upload/assets/jobbags/rbjbCost"
"in"
"use"
"sept"
"2010.zip"
Does anyone know how I can get it to keep these filenames with spaces in in a single line with the whole lot wrapped in one set of quotes?
Thanks!

The correct way to loop over a set of files located via find is with a while read construct, thus:
while IFS= read -r -d '' line ; do
echo "$line" >> set1.txt
while IFS= read -r -d '' file ; do
printf '"%s"\n' "$file" >> set2.txt
done < <(find "$line" -type f -name \*.zip -mtime +2 -print0)
done < <(find /home -type d -name assets -print0)
For clarity I have given the inner loop variable a different name.
If you didn't have bash you'd have to issue the find command separately and redirect the output to a file, then read the file with while read ; do .. done < filename.
Note that each expansion of each variable is double-quoted. This is necessary.
Note also, however, that for what you want you can simply use the -printf switch to find, if you have GNU find.
find /home -type f -path '*/assets/*.zip' -mtime +2 -printf '"%p"\n' > set2.txt
Although, as #sarnold notes, this is not safe.

You should probably be executing your tar(1) command through some other mechanism; the find(1) program supports a -print0 option to request ASCII NUL-separated filename output, and the xargs(1) program supports a -0 option to tell it that the input is separated by ASCII NUL characters. (Since NUL is the only character that is not allowed in filenames, this is the only way to get reliable filename handling.)
Simply using the -print0 and -0 options will help but this still leaves the script open to another problem -- xargs(1) might decide to execute the tar(1) command two, three, or more times, depending upon its input. The last execution is the one that will "win", and the data from earlier invocations will be lost for ever. (This is useless as a backup.)
So you should also look into adding the --concatenate command line option to tar(1), too, so that it will add to the archive. It might make sense to perform the compression after all the files have been added, via gzip(1) or bzip2(1). (This does mean you need to remove the archive before a "fresh run" of this script.)

Related

Save output command in a variable and write for loop

I want to write a shell script. I list my jpg files inside nested subdirectories with the following command line:
find . -type f -name "*.jpg"
How can I save the output of this command inside a variable and write a for loop for that? (I want to do some processing steps for each jpg file)
You don't want to store output containing multiple files into a variable/array and then post-process it later. You can just do those actions on the files on-the-run.
Assuming you have bash shell available, you could write a small script as
#!/usr/bin/env bash
# ^^^^ bash shell needed over any POSIX shell because
# of the need to use process-substitution <()
while IFS= read -r -d '' image; do
printf '%s\n' "$image"
# Your other actions can be done here
done < <(find . -type f -name "*.jpg" -print0)
The -print0 option writes filenames with a null byte terminator, which is then subsequently read using the read command. This will ensure the file names containing special characters are handled without choking on them.
Better than storing in a variable, use this :
find . -type f -name "*.jpg" -exec command {} \;
Even, if you want, command can be a full bloated shell script.
A demo is better than an explanation, no ? Copy paste the whole lines in a terminal :
cat<<'EOF' >/tmp/test
#!/bin/bash
echo "I play with $1 and I can replay with $1, even 3 times: $1"
EOF
chmod +x /tmp/test
find . -type f -name "*.jpg" -exec /tmp/test {} \;
Edit: new demo (from new questions from comments)
find . -type f -name "*.jpg" | head -n 10 | xargs -n1 command
(this another solution doesn't take care of filenames with newlines or spaces)
This one take care :
#!/bin/bash
shopt -s globstar
count=0
for file in **/*.jpg; do
if ((++count < 10)); then
echo "process file $file number $count"
else
break
fi
done

How can I search for files in directories that contain spaces in names, using "find"?

How can I search for files in directories that contain spaces in names, using find?
i use script
#!/bin/bash
for i in `find "/tmp/1/" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
do
for j in `ls "$i" | grep sh | sed 's/\.txt//g'`
do
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
but the files and directories that contain spaces in names are not processed?
This will grab all the files that have spaces in them
$ls
more space nospace stillnospace this is space
$find -type f -name "* *"
./this is space
./more space
I don't know how to achieve you goal. But given your actual solution, the problem is not really with find but with the for loops since "spaces" are taken as delimiter between items.
find has a useful option for those cases:
from man find:
-print0
True; print the full file name on the standard output, followed by a null character
(instead of the newline character that -print uses). This allows file names
that contain newlines or other types of white space to be correctly interpreted
by programs that process the find output. This option corresponds to the -0
option of xargs.
As the man saids, this will match with the -0 option of xargs. Several other standard tools have the equivalent option. You probably have to rewrite your complex pipeline around those tools in order to process cleanly file names containing spaces.
In addition, see bash "for in" looping on null delimited string variable to learn how to use for loop with 0-terminated arguments.
Do it like this
find . -type f -name "* *"
Instead of . you can specify your path, where you want to find files with your criteria
Your first for loop is:
for i in `find "/tmp/1" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
If I understand it correctly, it is looking for all text files in the /tmp/1 directory, and then attempting to remove the file name with the sed command right? This would cause a single directory with multiple .txt files to be processed by the inner for loop more than once. Is that what you want?
Instead of using sed to get rid of the filename, you can use dirname instead. Also, later on, you use sed to get rid of the extension. You can use basename for that.
for i in `find "/tmp/1" -iname "*.txt"` ; do
path=$(dirname "$i")
for j in `ls $path | grep POD` ; do
file=$(basename "$j" .txt)
# Do what ever you want with the file
This doesn't solve the problem of having a single directory processed multiple times, but if it is an issue for you, you can use the for loop above to store the file name in an array instead and then remove duplicates with sort and uniq.
Use while read loop with null-delimited pathname output from find:
#!/bin/bash
while IFS= read -rd '' i; do
while IFS= read -rd '' j; do
find "/tmp/2/" -iname "$j.sh" -exec echo cp '{}' "$i" \;
done <(exec find "$i" -maxdepth 1 -mindepth 1 -name '*POD*' -not -name '*.txt' -printf '%f\0')
done <(exec find /tmp/1 -iname '*.txt' -not -iname '[0-9A-Za-z]*.txt' -print0)
Never used for i in $(find...) or similar as it'll fail for file names containing white space as you saw.
Use find ... | while IFS= read -r i instead.
It's hard to say without sample input and expected output but something like this might be what you need:
find "/tmp/1/" -iname "*.txt" |
while IFS= read -r i
do
i="${i%%[0-9A-Za-z]*\.txt}"
for j in "$i"/*sh*
do
j="${j%%\.txt}"
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
The above will still fail for file names that contains newlines. If you have that situation and can't fix the file names then look into the -print0 option for find, and piping it to xargs -0.

Calling commands in bash script with parameters which have embedded spaces (eg filenames)

I am trying to write a bash script which does some processing on music files. Here is the script so far:
#!/bin/bash
SAVEIFS=$IFS
IFS=printf"\n\0"
find `pwd` -iname "*.mp3" -o -iname "*.flac" | while read f
do
echo "$f"
$arr=($(f))
exiftool "${arr[#]}"
done
IFS=$SAVEIFS
This fails with:
[johnd:/tmp/tunes] 2 $ ./test.sh
./test.sh: line 9: syntax error near unexpected token `$(f)'
./test.sh: line 9: ` $arr=($(f))'
[johnd:/tmp/tunes] 2 $
I have tried many different incantations, none of which have worked. The bottom line is I'm trying to call a command exiftool, and one of the parameters of that command is a filename which may contain spaces. Above I'm trying to assign the filename $f to an array and pass that array to exiftool, but I'm having trouble with the construction of the array.
Immediate question is, how do I construct this array? But the deeper question is how, from within a bash script, do I call an external command with parameters which may contain spaces?
You actually did have the call-with-possibly-space-containing-arguments syntax right (program "${args[#]}"). There were several problems, though.
Firstly, $(foo) executes a command. If you want a variable's value, use $foo or ${foo}.
Secondly, if you want to append something onto an array, the syntax is array+=(value) (or, if that doesn't work, array=("${array[#]}" value)).
Thirdly, please separate filenames with \0 whenever possible. Newlines are all well and good, but filenames can contain newlines.
Fourthly, read takes the switch -d, which can be used with an empty string '' to specify \0 as the delimiter. This eliminates the need to mess around with IFS.
Fifthly, be careful when piping into while loops - this causes the loop to be executed in a subshell, preventing variable assignments inside it from taking effect outside. There is a way to get around this, however - instead of piping (command | while ... done), use process substitution (while ... done < <(command)).
Sixthly, watch your process substitutions - there's no need to use $(pwd) as an argument to a command when . will do. (Or if you really must have full paths, try quoting the pwd call.)
tl;dr
The script, revised:
while read -r -d '' f; do
echo "$f" # For debugging?
arr+=("$f")
done < <(find . -iname "*.mp3" -o -iname "*.flac" -print0)
exiftool "${arr[#]}"
Another way
Leveraging find's full capabilities:
find . -iname "*.mp3" -o -iname "*.flac" -exec exiftool {} +
# Much shorter!
Edit 1
So you need to save the output of exiftool, manipulate it, then copy stuff? Try this:
while read -r -d '' f; do
echo "$f" # For debugging?
arr+=("$f")
done < <(find . -iname "*.mp3" -o -iname "*.flac" -print0)
# Warning: somewhat misleading syntax highlighting ahead
newfilename="$(exiftool "${arr[#]}")"
newfilename="$(manipulate "$newfilename")"
cp -- "$some_old_filename" "$newfilename"
You probably will need to change that last bit - I've never used exiftool, so I don't know precisely what you're after (or how to do it), but that should be a start.
You can do this just with bash:
shopt -s globstar nullglob
a=( **/*.{mp3,flac} )
exiftool "${a[#]}"
This probably works too: exiftool **/*.{mp3,flac}

How do I rename lots of files changing the same filename element for each in Linux?

I'm trying to rename a load of files (I count over 200) that either have the company name in the filename, or in the text contents. I basically need to change any references to "company" to "newcompany", maintaining capitalisation where applicable (ie "Company becomes Newcompany", "company" becomes "newcompany"). I need to do this recursively.
Because the name could occur pretty much anywhere I've not been able to find example code anywhere that meets my requirements. It could be any of these examples, or more:
company.jpg
company.php
company.Class.php
company.Company.php
companysomething.jpg
Hopefully you get the idea. I not only need to do this with filenames, but also the contents of text files, such as HTML and PHP scripts. I'm presuming this would be a second command, but I'm not entirely sure what.
I've searched the codebase and found nearly 2000 mentions of the company name in nearly 300 files, so I don't fancy doing it manually.
Please help! :)
bash has powerful looping and substitution capabilities:
for filename in `find /root/of/where/files/are -name *company*`; do
mv $filename ${filename/company/newcompany}
done
for filename in `find /root/of/where/files/are -name *Company*`; do
mv $filename ${filename/Company/Newcompany}
done
For the file and directory names, use for, find, mv and sed.
For each path (f) that has company in the name, rename it (mv) from f to the new name where company is replaced by newcompany.
for f in `find -name '*company*'` ; do mv "$f" "`echo $f | sed s/company/nemcompany/`" ; done
For the file contents, use find, xargs and sed.
For every file, change company by newcompany in its content, keeping original file with extension .backup.
find -type f -print0 | xargs -0 sed -i .bakup 's/company/newcompany/g'
I'd suggest you take a look at man rename an extremely powerful perl-utility for, well, renaming files.
Standard syntax is
rename 's/\.htm$/\.html/' *.htm
the clever part is that the tool accept any perl-regexp as a pattern for a filename to be changed.
you might want to run it with the -n switch which will make the tool to only report what it would have changed.
Can't figure out a nice way to keep the capitalization right now, but since you already can search through the filestructure, issue several rename with different capitalization until all files are changed.
To loop through all files below current folder and to search for a particular string, you can use
find . -type f -exec grep -n -i STRING_TO_SEARCH_FOR /dev/null {} \;
The output from that command can be directed to a file (after some filtering to just extract the file names of the files that need to be changed).
find . /type ... > files_to_operate_on
Then wrap that in a while read loop and do some perl-magic for inplace-replacement
while read file
do
perl -pi -e 's/stringtoreplace/replacementstring/g' $file
done < files_to_operate_on
There are few right ways to recursively process files. Here's one:
while IFS= read -d $'\0' -r file ; do
newfile="${file//Company/Newcompany}"
newfile="${newfile//company/newcompany}"
mv -f "$file" "$newfile"
done < <(find /basedir/ -iname '*company*' -print0)
This will work with all possible file names, not just ones without whitespace in them.
Presumes bash.
For changing the contents of files I would advise caution because a blind replacement within a file could break things if the file is not plain text. That said, sed was made for this sort of thing.
while IFS= read -d $'\0' -r file ; do
sed -i '' -e 's/Company/Newcompany/g;s/company/newcompany/g'"$file"
done < <(find /basedir/ -iname '*company*' -print0)
For this run I recommend adding some additional switches to find to limit the files it will process, perhaps
find /basedir/ \( -iname '*company*' -and \( -iname '*.txt' -or -ianem '*.html' \) \) -print0

Add prefix to all images (recursive)

I have a folder with more than 5000 images, all with JPG extension.
What i want to do, is to add recursively the "thumb_" prefix to all images.
I found a similar question: Rename Files and Directories (Add Prefix) but i only want to add the prefix to files with the JPG extension.
One of possibly solutions:
find . -name '*.jpg' -printf "'%p' '%h/thumb_%f'\n" | xargs -n2 echo mv
Principe: find all needed files, and prepare arguments for the standard mv command.
Notes:
arguments for the mv are surrounded by ' for allowing spaces in filenames.
The drawback is: this will not works with filenames what are containing ' apostrophe itself, like many mp3 files. If you need moving more strange filenames check bellow.
the above command is for dry run (only shows the mv commands with args). For real work remove the echo pretending mv.
ANY filename renaming. In the shell you need a delimiter. The problem is, than the filename (stored in a shell variable) usually can contain the delimiter itself, so:
mv $file $newfile #will fail, if the filename contains space, TAB or newline
mv "$file" "$newfile" #will fail, if the any of the filenames contains "
the correct solution are either:
prepare a filename with a proper escaping
use a scripting language what easuly understands ANY filename
Preparing the correct escaping in bash is possible with it's internal printf and %q formatting directive = print quoted. But this solution is long and boring.
IMHO, the easiest way is using perl and zero padded print0, like next.
find . -name \*.jpg -print0 | perl -MFile::Basename -0nle 'rename $_, dirname($_)."/thumb_".basename($_)'
The above using perl's power to mungle the filenames and finally renames the files.
Beware of filenames with spaces in (the for ... in ... expression trips over those), and be aware that the result of a find . ... will always start with ./ (and hence try to give you names like thumb_./file.JPG which isn't quite correct).
This is therefore not a trivial thing to get right under all circumstances. The expression I've found to work correctly (with spaces, subdirs and all that) is:
find . -iname \*.JPG -exec bash -c 'mv "$1" "`echo $1 | sed \"s/\(.*\)\//\1\/thumb/\"`"' -- '{}' \;
Even that can fall foul of certain names (with quotes in) ...
In OS X 10.8.5, find does not have the -printf option. The port that contained rename seemed to depend upon a WebkitGTK development package that was taking hours to install.
This one line, recursive file rename script worked for me:
find . -iname "*.jpg" -print | while read name; do cur_dir=$(dirname "$name"); cur_file=$(basename "$name"); mv "$name" "$cur_dir/thumb_$cur_file"; done
I was actually renaming CakePHP view files with an 'admin_' prefix, to move them all to an admin section.
You can use that same answer, just use *.jpg, instead of just *.
for file in *.JPG; do mv $file thumb_$file; done
if it's multiple directory levels under the current one:
for file in $(find . -name '*.JPG'); do mv $file $(dirname $file)/thumb_$(basename $file); done
proof:
jcomeau#intrepid:/tmp$ mkdir test test/a test/a/b test/a/b/c
jcomeau#intrepid:/tmp$ touch test/a/A.JPG test/a/b/B.JPG test/a/b/c/C.JPG
jcomeau#intrepid:/tmp$ cd test
jcomeau#intrepid:/tmp/test$ for file in $(find . -name '*.JPG'); do mv $file $(dirname $file)/thumb_$(basename $file); done
jcomeau#intrepid:/tmp/test$ find .
.
./a
./a/b
./a/b/thumb_B.JPG
./a/b/c
./a/b/c/thumb_C.JPG
./a/thumb_A.JPG
jcomeau#intrepid:/tmp/test$
Use rename for this:
rename 's/(\w{1})\.JPG$/thumb_$1\.JPG/' `find . -type f -name *.JPG`
For only jpg files in current folder
for f in `ls *.jpg` ; do mv "$f" "PRE_$f" ; done

Resources