how to iterate over files using find in bash/ksh shell

how to iterate over files using find in bash/ksh shell - linux

I am using find in a loop to search recursively for files of a specific extension, and then do something with that loop.
cd $DSJobs
jobs=$(find $DSJobs -name "*.dsx")
for j in jobs; do
echo "$j"
done
assuming $DSJobs is a relevent folder, the output of $j is "Jobs" one time. doesn't even repeat.
I want to list all *.dsx files in a folder recursively through subfolders as well.
How do Make this work?
Thanks

The idiomatic way to do this is:
cd "$DSJobs"
find . -name "*.dsx" -print0 | while IFS= read -r -d "" job; do
echo "$job"
done
The complication derives from the fact that space and newline are perfectly valid filename characters, so you get find to output the filenames separated by the null character (which is not allowed to appear in a filename). Then you tell read to use the null character (with -d "") as the delimiter while reading the names.
IFS= read -r var is the way to get bash to read the characters verbatim, without dropping any leading/trailing whitespace or any backslashes.
There are further complications regarding the use of the pipe, which may or may not matter to you depending on what you do inside the loop.
Note: take care to quote your variables, unless you know exactly when to leave the quotes off. Very detailed discussion here.
Having said that, bash can do this without find:
shopt -s globstar
cd "$DSJobs"
for job in **/*.dsx; do
echo "$job"
done
This approach removes all the complications of find | while read.
Incorporating #Gordon's comment:
shopt -s globstar nullglob
for job in "$DSJobs"/**/*.dsx; do
do_stuff_with "$job"
done
The "nullglob" setting is useful when no files match the pattern. Without it, the for loop will have a single iteration where job will have the value job='/path/to/DSJobs/**/*.dsx' (or whatever the contents of the variable) -- including the literal asterisks.

Since all you want is to find files with a specific extension...
find ${DSJobs} -name "*.dsx"
Want to do this for several directories?
for d in <some list of directories>; do
find ${d} -name ""*.dsx"
done
Want to do something interesting with the files?
find ${DSJobs} -name "*.dsx" -exec dostuffwith.sh "{}" \;

Related

How to rename string in multiple filename in a folder using shell script without mv command since it will move the files to different folder? [duplicate]

This question already has answers here:
Rename multiple files based on pattern in Unix
(24 answers)
Closed 5 years ago.
Write a simple script that will automatically rename a number of files. As an example we want the file *001.jpg renamed to user defined string + 001.jpg (ex: MyVacation20110725_001.jpg) The usage for this script is to get the digital camera photos to have file names that make some sense.
I need to write a shell script for this. Can someone suggest how to begin?

An example to help you get off the ground.
for f in *.jpg; do mv "$f" "$(echo "$f" | sed s/IMG/VACATION/)"; done
In this example, I am assuming that all your image files contain the string IMG and you want to replace IMG with VACATION.
The shell automatically evaluates *.jpg to all the matching files.
The second argument of mv (the new name of the file) is the output of the sed command that replaces IMG with VACATION.
If your filenames include whitespace pay careful attention to the "$f" notation. You need the double-quotes to preserve the whitespace.

You can use rename utility to rename multiple files by a pattern. For example following command will prepend string MyVacation2011_ to all the files with jpg extension.
rename 's/^/MyVacation2011_/g' *.jpg
or
rename <pattern> <replacement> <file-list>

this example, I am assuming that all your image files begin with "IMG" and you want to replace "IMG" with "VACATION"
solution : first identified all jpg files and then replace keyword
find . -name '*jpg' -exec bash -c 'echo mv $0 ${0/IMG/VACATION}' {} \;

for file in *.jpg ; do mv $file ${file//IMG/myVacation} ; done
Again assuming that all your image files have the string "IMG" and you want to replace "IMG" with "myVacation".
With bash you can directly convert the string with parameter expansion.
Example: if the file is IMG_327.jpg, the mv command will be executed as if you do mv IMG_327.jpg myVacation_327.jpg. And this will be done for each file found in the directory matching *.jpg.
IMG_001.jpg -> myVacation_001.jpg
IMG_002.jpg -> myVacation_002.jpg
IMG_1023.jpg -> myVacation_1023.jpg
etcetera...

find . -type f |
sed -n "s/\(.*\)factory\.py$/& \1service\.py/p" |
xargs -p -n 2 mv
eg will rename all files in the cwd with names ending in "factory.py" to be replaced with names ending in "service.py"
explanation:
In the sed cmd, the -n flag will suppress normal behavior of echoing input to output after the s/// command is applied, and the p option on s/// will force writing to output if a substitution is made. Since a sub will only be made on match, sed will only have output for files ending in "factory.py"
In the s/// replacement string, we use "& " to interpolate the entire matching string, followed by a space character, into the replacement. Because of this, it's vital that our RE matches the entire filename. after the space char, we use "\1service.py" to interpolate the string we gulped before "factory.py", followed by "service.py", replacing it. So for more complex transformations youll have to change the args to s/// (with an re still matching the entire filename)
Example output:
foo_factory.py foo_service.py
bar_factory.py bar_service.py
We use xargs with -n 2 to consume the output of sed 2 delimited strings at a time, passing these to mv (i also put the -p option in there so you can feel safe when running this). voila.
NOTE: If you are facing more complicated file and folder scenarios, this post explains find (and some alternatives) in greater detail.

Another option is:
for i in *001.jpg
do
echo "mv $i yourstring${i#*001.jpg}"
done
remove echo after you have it right.
Parameter substitution with # will keep only the last part, so you can change its name.

Can't comment on Susam Pal's answer but if you're dealing with spaces, I'd surround with quotes:
for f in *.jpg; do mv "$f" "`echo $f | sed s/\ /\-/g`"; done;

You can try this:
for file in *.jpg;
do
mv $file $somestring_${file:((-7))}
done
You can see "parameter expansion" in man bash to understand the above better.

listing file in unix and saving the output in a variable(Oldest File fetching for a particular extension)

This might be a very simple thing for a shell scripting programmer but am pretty new to it. I was trying to execute the below command in a shell script and save the output into a variable
inputfile=$(ls -ltr *.{PDF,pdf} | head -1 | awk '{print $9}')
The command works fine when I fire it from terminal but fails when executed through a shell script (sh). Why is that the command fails, does it mean that shell script doesn't support the command or am I doing it wrong? Also how do I know if a command will work in shell or not?
Just to give you a glimpse of my requirement, I was trying to get the oldest file from a particular directory (I also want to make sure upper case and lower case extensions are handled). Is there any other way to do this ?

The above command will work correctly only if BOTH *.pdf and *.PDF files are in the directory you are currently.
If you would like to execute it in a directory with only one of those you should consider using e.g.:
inputfiles=$(find . -maxdepth 1 -type f \( -name "*.pdf" -or -name "*.PDF" \) | xargs ls -1tr | head -1 )
NOTE: The above command doesn't work with files with new lines, or with long list of found files.

Parsing ls is always a bad idea. You need another strategy.
How about you make a function that gives you the oldest file among the ones given as argument? the following works in Bash (adapt to your needs):
get_oldest_file() {
# get oldest file among files given as parameters
# return is in variable get_oldest_file_ret
local oldest f
for f do
[[ -e $f ]] && [[ ! $oldest || $f -ot $oldest ]] && oldest=$f
done
get_oldest_file_ret=$oldest
}
Then just call as:
get_oldest_file *.{PDF,pdf}
echo "oldest file is: $get_oldest_file_ret"
Now, you probably don't want to use brace expansions like this at all. In fact, you very likely want to use the shell options nocaseglob and nullglob:
shopt -s nocaseglob nullglob
get_oldest_file *.pdf
echo "oldest file is: $get_oldest_file_ret"
If you're using a POSIX shell, it's going to be a bit trickier to have the equivalent of nullglob and nocaseglob.

Is perl an option? It's ubiquitous on Unix.
I would suggest:
perl -e 'print ((sort { -M $b <=> -M $a } glob ( "*.{pdf,PDF}" ))[0]);';
Which:
uses glob to fetch all files matching the pattern.
sort, using -M which is relative modification time. (in days).
fetches the first element ([0]) off the sort.
Prints that.

As #gniourf_gniourf says, parsing ls is a bad idea. Such as leaving unquoted globs, and generally not counting for funny characters in file names.
find is your friend:
#!/bin/sh
get_oldest_pdf() {
#
# echo path of oldest *.pdf (case-insensitive) file in current directory
#
find . -maxdepth 1 -mindepth 1 -iname "*.pdf" -printf '%T# %p\n' \
| sort -n \
| tail -1 \
| cut -d\ -f1-
}
whatever=$(get_oldest_pdf)
Notes:
find has numerous ways of formatting the output, including
things like access time and/or write time. I used '%T# %p\n',
where %T# is last write time in UNIX time format incl.fractal part.
This will never containt space so it's safe to use as separator.
Numeric sort and tail get the last item, sorting by the time,
cut removes the time from the output.
I used IMO much easier to read/maintain pipe notation, with help of \.
the code should run on any POSIX shell,
You could easily adjust the function to parametrize the pattern,
time used (access/write), control the search depth or starting dir.

find returning inverted results

In a few words a wrote this little script to clean up some directories where I had consolidated directories/files from multiple sources where I used the cp command with the --backup=numbered feature so that files with identical names would have a suffix like .~1~ appended to avoid overwriting. I then ran fdupes to remove duplicate files, in some cases fdupes removed the file which did not have the suffix appended from the cp command (the original file) so I wanted to scan the directories looking for files with the suffix appended by the cp command and if the file does not exist with the suffix removed I would move mv the file otherwise I would leave it to avoid deleting anything as fdupes did not think it was a duplicate.
The issues is the test condition if [ -f ... ] part of the code below returns inverted results than what it should and I cannot understand why. For example, when the file exists it would return false and when the file did not exist it would return true. I fixed it by reversing the actions that I wanted to do based on the inverted return code and verified it was working as intended and it was so I ran it as such but would like to know if anyone knows why it would behave the way it did. I am not a bash script expert by any means so its possible that I missed something simple.
#!/bin/bash
logfile=$$.log
exec > $logfile 2>&1
IFS='
'
#set -f
for FILE in $(find . -type f -regextype posix-extended -regex '^.*(\.~[0-9]+~)+$')
do
FILE2=${FILE%%.~[0-9]*} # remove the suffix
if [ -f "${FILE2}" ]
then
echo ERROR: "${FILE2}" already exists!
else
echo "${FILE}" renamed "${FILE2}"
mv "${FILE}" "${FILE2}"
fi
done

You might be able to see the problem by modifying your script to show both FILE and FILE2 in the error message. There are a few minor problems with the script which could cause some confusion (but not the "inverted" logic):
find output is not sorted. If you had more than one backup file, a randomly chosen one would replace the original file;
you could sort the output using an expression like |sort -t~ -n -k2 on the end of the find-command.
the regular expression allows multiple matches of the ~[0-9]~ pattern. Conceivably you could have some odd file which ends with ~1~~2~.
the part where the suffix is removed assumes a single ~[0-9]~ is on the end of the filename. An embedded ~0, e.g., foo~0bar~1~ would reduce FILE to foo. The workaround for that would be more cumbersome (since the suffix-stripping uses globbing), but could be done with a case statement which matched an explicit number of digits (likely three digits would be enough).

How to make this (l)unix script dynamically accept directory name in for-loop?

I am teaching myself more (l)unix skills and wanted to see if I could begin to write a program that will eventually read all .gz files and expand them. However, I want it to be super dynamic.
#!/bin/bash
dir=~/derp/herp/path/goes/here
for file in $(find dir -name '*gz')
do
echo $file
done
So when I excute this file, I simply go
bash derp.sh.
I don't like this. I feel the script is too brittle.
How can I rework my for loop so that I can say
bash derp.sh ~/derp/herp/path/goes/here (1)
I tried re-coding it as follows:
for file in $*
However, I don't want to have to type in bash
derp.sh ~/derp/herp/path/goes/here/*.gz.
How could I rewrite this so I could simply type what is in (1)? I feel I must be missing something simple?
Note
I tried
for file in $*/*.gz and that obviously did not work. I appreciate your assistance, my sources have been a wrox unix text, carpentry v5, and man files. Unfortunately, I haven't found anything that will what I want.
Thanks,
GeekyOmega

for dir in "$#"
do
for file in "$dir"/*.gz
do
echo $file
done
done
Notes:
In the outer loop, dir is assigned successively to each argument given on the command line. The special form "$#" is used so that the directory names that contain spaces will be processed correctly.
The inner loop runs over each .gz file in the given directory. By placing $dir in double-quotes, the loop will work correctly even if the directory name contains spaces. This form will also work correctly if the gz file names have spaces.

#!/bin/bash
for file in $(find "$#" -name '*.gz')
do
echo $file
done
You'll probably prefer "$#" instead of $*; if you were to have spaces in filenames, like with a directory named My Documents and a directory named Music, $* would effectively expand into:
find My Documents Music -name '*.gz'
where "$#" would expand into:
find "My Documents" "Music" -name '*.gz'
Requisite note: Using for file in $(find ...) is generally regarded as a bad practice, because it does tend to break if you have spaces or newlines in your directory structure. Using nested for loops (as in John's answer) is often a better idea, or using find -print0 and read as in this answer.

Recursively look for files with a specific extension

I'm trying to find all files with a specific extension in a directory and its subdirectories with my bash (Latest Ubuntu LTS Release).
This is what's written in a script file:
#!/bin/bash
directory="/home/flip/Desktop"
suffix="in"
browsefolders ()
for i in "$1"/*;
do
echo "dir :$directory"
echo "filename: $i"
# echo ${i#*.}
extension=`echo "$i" | cut -d'.' -f2`
echo "Erweiterung $extension"
if [ -f "$i" ]; then
if [ $extension == $suffix ]; then
echo "$i ends with $in"
else
echo "$i does NOT end with $in"
fi
elif [ -d "$i" ]; then
browsefolders "$i"
fi
done
}
browsefolders "$directory"
Unfortunately, when I start this script in terminal, it says:
[: 29: in: unexpected operator
(with $extension instead of 'in')
What's going on here, where's the error?
But this curly brace

find "$directory" -type f -name "*.in"
is a bit shorter than that whole thing (and safer - deals with whitespace in filenames and directory names).
Your script is probably failing for entries that don't have a . in their name, making $extension empty.

find {directory} -type f -name '*.extension'
Example: To find all csv files in the current directory and its sub-directories, use:
find . -type f -name '*.csv'

The syntax I use is a bit different than what #Matt suggested:
find $directory -type f -name \*.in
(it's one less keystroke).

Without using find:
du -a $directory | awk '{print $2}' | grep '\.in$'

Though using find command can be useful here, the shell itself provides options to achieve this requirement without any third party tools. The bash shell provides an extended glob support option using which you can get the file names under recursive paths that match with the extensions you want.
The extended option is extglob which needs to be set using the shopt option as below. The options are enabled with the -s support and disabled with he -u flag. Additionally you could use couple of options more i.e. nullglob in which an unmatched glob is swept away entirely, replaced with a set of zero words. And globstar that allows to recurse through all the directories
shopt -s extglob nullglob globstar
Now all you need to do is form the glob expression to include the files of a certain extension which you can do as below. We use an array to populate the glob results because when quoted properly and expanded, the filenames with special characters would remain intact and not get broken due to word-splitting by the shell.
For example to list all the *.csv files in the recursive paths
fileList=(**/*.csv)
The option ** is to recurse through the sub-folders and *.csv is glob expansion to include any file of the extensions mentioned. Now for printing the actual files, just do
printf '%s\n' "${fileList[#]}"
Using an array and doing a proper quoted expansion is the right way when used in shell scripts, but for interactive use, you could simply use ls with the glob expression as
ls -1 -- **/*.csv
This could very well be expanded to match multiple files i.e. file ending with multiple extension (i.e. similar to adding multiple flags in find command). For example consider a case of needing to get all recursive image files i.e. of extensions *.gif, *.png and *.jpg, all you need to is
ls -1 -- **/+(*.jpg|*.gif|*.png)
This could very well be expanded to have negate results also. With the same syntax, one could use the results of the glob to exclude files of certain type. Assume you want to exclude file names with the extensions above, you could do
excludeResults=()
excludeResults=(**/!(*.jpg|*.gif|*.png))
printf '%s\n' "${excludeResults[#]}"
The construct !() is a negate operation to not include any of the file extensions listed inside and | is an alternation operator just as used in the Extended Regular Expressions library to do an OR match of the globs.
Note that these extended glob support is not available in the POSIX bourne shell and its purely specific to recent versions of bash. So if your are considering portability of the scripts running across POSIX and bash shells, this option wouldn't be right.

find "$PWD" -type f -name "*.in"

There's a { missing after browsefolders ()
All $in should be $suffix
The line with cut gets you only the middle part of front.middle.extension. You should read up your shell manual on ${varname%%pattern} and friends.
I assume you do this as an exercise in shell scripting, otherwise the find solution already proposed is the way to go.
To check for proper shell syntax, without running a script, use sh -n scriptname.

To find all the pom.xml files in your current directory and print them, you can use:
find . -name 'pom.xml' -print

find $directory -type f -name "*.in"|grep $substring

for file in "${LOCATION_VAR}"/*.zip
do
echo "$file"
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string