Recursively look for files with a specific extension

Recursively look for files with a specific extension - linux

I'm trying to find all files with a specific extension in a directory and its subdirectories with my bash (Latest Ubuntu LTS Release).
This is what's written in a script file:
#!/bin/bash
directory="/home/flip/Desktop"
suffix="in"
browsefolders ()
for i in "$1"/*;
do
echo "dir :$directory"
echo "filename: $i"
# echo ${i#*.}
extension=`echo "$i" | cut -d'.' -f2`
echo "Erweiterung $extension"
if [ -f "$i" ]; then
if [ $extension == $suffix ]; then
echo "$i ends with $in"
else
echo "$i does NOT end with $in"
fi
elif [ -d "$i" ]; then
browsefolders "$i"
fi
done
}
browsefolders "$directory"
Unfortunately, when I start this script in terminal, it says:
[: 29: in: unexpected operator
(with $extension instead of 'in')
What's going on here, where's the error?
But this curly brace

find "$directory" -type f -name "*.in"
is a bit shorter than that whole thing (and safer - deals with whitespace in filenames and directory names).
Your script is probably failing for entries that don't have a . in their name, making $extension empty.

find {directory} -type f -name '*.extension'
Example: To find all csv files in the current directory and its sub-directories, use:
find . -type f -name '*.csv'

The syntax I use is a bit different than what #Matt suggested:
find $directory -type f -name \*.in
(it's one less keystroke).

Without using find:
du -a $directory | awk '{print $2}' | grep '\.in$'

Though using find command can be useful here, the shell itself provides options to achieve this requirement without any third party tools. The bash shell provides an extended glob support option using which you can get the file names under recursive paths that match with the extensions you want.
The extended option is extglob which needs to be set using the shopt option as below. The options are enabled with the -s support and disabled with he -u flag. Additionally you could use couple of options more i.e. nullglob in which an unmatched glob is swept away entirely, replaced with a set of zero words. And globstar that allows to recurse through all the directories
shopt -s extglob nullglob globstar
Now all you need to do is form the glob expression to include the files of a certain extension which you can do as below. We use an array to populate the glob results because when quoted properly and expanded, the filenames with special characters would remain intact and not get broken due to word-splitting by the shell.
For example to list all the *.csv files in the recursive paths
fileList=(**/*.csv)
The option ** is to recurse through the sub-folders and *.csv is glob expansion to include any file of the extensions mentioned. Now for printing the actual files, just do
printf '%s\n' "${fileList[#]}"
Using an array and doing a proper quoted expansion is the right way when used in shell scripts, but for interactive use, you could simply use ls with the glob expression as
ls -1 -- **/*.csv
This could very well be expanded to match multiple files i.e. file ending with multiple extension (i.e. similar to adding multiple flags in find command). For example consider a case of needing to get all recursive image files i.e. of extensions *.gif, *.png and *.jpg, all you need to is
ls -1 -- **/+(*.jpg|*.gif|*.png)
This could very well be expanded to have negate results also. With the same syntax, one could use the results of the glob to exclude files of certain type. Assume you want to exclude file names with the extensions above, you could do
excludeResults=()
excludeResults=(**/!(*.jpg|*.gif|*.png))
printf '%s\n' "${excludeResults[#]}"
The construct !() is a negate operation to not include any of the file extensions listed inside and | is an alternation operator just as used in the Extended Regular Expressions library to do an OR match of the globs.
Note that these extended glob support is not available in the POSIX bourne shell and its purely specific to recent versions of bash. So if your are considering portability of the scripts running across POSIX and bash shells, this option wouldn't be right.

find "$PWD" -type f -name "*.in"

There's a { missing after browsefolders ()
All $in should be $suffix
The line with cut gets you only the middle part of front.middle.extension. You should read up your shell manual on ${varname%%pattern} and friends.
I assume you do this as an exercise in shell scripting, otherwise the find solution already proposed is the way to go.
To check for proper shell syntax, without running a script, use sh -n scriptname.

To find all the pom.xml files in your current directory and print them, you can use:
find . -name 'pom.xml' -print

find $directory -type f -name "*.in"|grep $substring

for file in "${LOCATION_VAR}"/*.zip
do
echo "$file"
done

Related

how to iterate over files using find in bash/ksh shell

I am using find in a loop to search recursively for files of a specific extension, and then do something with that loop.
cd $DSJobs
jobs=$(find $DSJobs -name "*.dsx")
for j in jobs; do
echo "$j"
done
assuming $DSJobs is a relevent folder, the output of $j is "Jobs" one time. doesn't even repeat.
I want to list all *.dsx files in a folder recursively through subfolders as well.
How do Make this work?
Thanks

The idiomatic way to do this is:
cd "$DSJobs"
find . -name "*.dsx" -print0 | while IFS= read -r -d "" job; do
echo "$job"
done
The complication derives from the fact that space and newline are perfectly valid filename characters, so you get find to output the filenames separated by the null character (which is not allowed to appear in a filename). Then you tell read to use the null character (with -d "") as the delimiter while reading the names.
IFS= read -r var is the way to get bash to read the characters verbatim, without dropping any leading/trailing whitespace or any backslashes.
There are further complications regarding the use of the pipe, which may or may not matter to you depending on what you do inside the loop.
Note: take care to quote your variables, unless you know exactly when to leave the quotes off. Very detailed discussion here.
Having said that, bash can do this without find:
shopt -s globstar
cd "$DSJobs"
for job in **/*.dsx; do
echo "$job"
done
This approach removes all the complications of find | while read.
Incorporating #Gordon's comment:
shopt -s globstar nullglob
for job in "$DSJobs"/**/*.dsx; do
do_stuff_with "$job"
done
The "nullglob" setting is useful when no files match the pattern. Without it, the for loop will have a single iteration where job will have the value job='/path/to/DSJobs/**/*.dsx' (or whatever the contents of the variable) -- including the literal asterisks.

Since all you want is to find files with a specific extension...
find ${DSJobs} -name "*.dsx"
Want to do this for several directories?
for d in <some list of directories>; do
find ${d} -name ""*.dsx"
done
Want to do something interesting with the files?
find ${DSJobs} -name "*.dsx" -exec dostuffwith.sh "{}" \;

How to delete numbers, dashes and underscores in the beginning of a file name

I have thousands of mp3 files but all with unusual file names such as 1-2songone.mp3, 2songtwo.mp3, 2_2_3_songthree.mp3. I want to remove all the numbers, dashes and underscores in the beginning of these files and get the result:
songone.mp3
songtwo.mp3
songthree.mp3

This can be done using extended globbing:
$ ls
1-2songone.mp3 2_2_3_songthree.mp3 2songtwo.mp3
$ shopt -s extglob
$ for fname in *.mp3; do mv -- "$fname" "${fname##*([-_[:digit:]])}"; done
$ ls
songone.mp3 songthree.mp3 songtwo.mp3
This uses parameter expansion: ${fname##pattern} removes the longest possible match from the beginning of fname. As the pattern, we use *([-_[:digit:]]), where *(pattern) stands for "zero or more matches of pattern", and the actual pattern is a bracket expression for hyhpens, underscores and digits.
Remarks:
The -- after mv indicates the end of options for move and makes sure that filenames starting with - aren't interpreted as options.
The *() expression requires the extglob shell option. As pointed out, if you don't want extended globs later, you have to unset it again with shopt -u extglob.
As per Gordon Davisson's comment: this will clobber files if you have, for example, something like 1file.mp3 and 2file.mp3. To avoid that, you can either use mv -i (or --interactive), which will prompt you before overwriting a file, or mv -n (or --noclobber), which will just not overwrite any files.
triplee points out that this needlessly moves files onto themselves if they don't start with slash, underscore or digit. To avoid that, we can iterate only over matching files with
for fname in [-_[:digit:]]*.mp3; do mv -- "$fname" "${fname##*([-_[:digit:]])}"; done
which makes sure that there is something to rename.

Benjamin W.'s answer is helpful and efficient, but has two drawbacks:
It requires setting global shell option extglob, which should be restored to its previous value afterward (the alternative, at the cost of creating an extra process, is to use a subshell: (shopt -s extglob; for fname ...)).
The extglob syntax, an extension to regular glob syntax, is familiar to few people and still less powerful than true regular expressions.
Using Bash's regex-matching operator, =~:
for f in *.mp3; do [[ $f =~ ^[0-9_-]+(.+)$ ]] && echo mv "$f" "${BASH_REMATCH[1]}"; done
Remove the echo to perform actual renaming.
$f =~ ^[0-9_-]+(.+)$ matches the longest nonempty sequence of digits, hyphens, and underscores at the start of the filename, followed by any nonempty sequence of characters captured in a parenthesized subexpression (capture group).
If the match succeeds (&&), the mv command is invoked, with the captured subexpression - accessible via element 1 of special BASH array variable ${BASH_REMATCH[#]} - forming the target filename.

You may do it this way too :
find . -type f -name "*.mp3" -print0 | while read -r -d '' line
do
mv "$line" "$( sed -E 's!(.*)/[^[:alpha:]]*([[:alpha:]].*mp3)$!\1/\2!' <<<"$line")" 2>/dev/null
done
Using sed gives you more control over the regex, I guess. Also, the 2>/dev/null is for ignoring the mv error for already converted/correct filenames.
Note:
This will recursively change the filenames across subfolders too.

listing file in unix and saving the output in a variable(Oldest File fetching for a particular extension)

This might be a very simple thing for a shell scripting programmer but am pretty new to it. I was trying to execute the below command in a shell script and save the output into a variable
inputfile=$(ls -ltr *.{PDF,pdf} | head -1 | awk '{print $9}')
The command works fine when I fire it from terminal but fails when executed through a shell script (sh). Why is that the command fails, does it mean that shell script doesn't support the command or am I doing it wrong? Also how do I know if a command will work in shell or not?
Just to give you a glimpse of my requirement, I was trying to get the oldest file from a particular directory (I also want to make sure upper case and lower case extensions are handled). Is there any other way to do this ?

The above command will work correctly only if BOTH *.pdf and *.PDF files are in the directory you are currently.
If you would like to execute it in a directory with only one of those you should consider using e.g.:
inputfiles=$(find . -maxdepth 1 -type f \( -name "*.pdf" -or -name "*.PDF" \) | xargs ls -1tr | head -1 )
NOTE: The above command doesn't work with files with new lines, or with long list of found files.

Parsing ls is always a bad idea. You need another strategy.
How about you make a function that gives you the oldest file among the ones given as argument? the following works in Bash (adapt to your needs):
get_oldest_file() {
# get oldest file among files given as parameters
# return is in variable get_oldest_file_ret
local oldest f
for f do
[[ -e $f ]] && [[ ! $oldest || $f -ot $oldest ]] && oldest=$f
done
get_oldest_file_ret=$oldest
}
Then just call as:
get_oldest_file *.{PDF,pdf}
echo "oldest file is: $get_oldest_file_ret"
Now, you probably don't want to use brace expansions like this at all. In fact, you very likely want to use the shell options nocaseglob and nullglob:
shopt -s nocaseglob nullglob
get_oldest_file *.pdf
echo "oldest file is: $get_oldest_file_ret"
If you're using a POSIX shell, it's going to be a bit trickier to have the equivalent of nullglob and nocaseglob.

Is perl an option? It's ubiquitous on Unix.
I would suggest:
perl -e 'print ((sort { -M $b <=> -M $a } glob ( "*.{pdf,PDF}" ))[0]);';
Which:
uses glob to fetch all files matching the pattern.
sort, using -M which is relative modification time. (in days).
fetches the first element ([0]) off the sort.
Prints that.

As #gniourf_gniourf says, parsing ls is a bad idea. Such as leaving unquoted globs, and generally not counting for funny characters in file names.
find is your friend:
#!/bin/sh
get_oldest_pdf() {
#
# echo path of oldest *.pdf (case-insensitive) file in current directory
#
find . -maxdepth 1 -mindepth 1 -iname "*.pdf" -printf '%T# %p\n' \
| sort -n \
| tail -1 \
| cut -d\ -f1-
}
whatever=$(get_oldest_pdf)
Notes:
find has numerous ways of formatting the output, including
things like access time and/or write time. I used '%T# %p\n',
where %T# is last write time in UNIX time format incl.fractal part.
This will never containt space so it's safe to use as separator.
Numeric sort and tail get the last item, sorting by the time,
cut removes the time from the output.
I used IMO much easier to read/maintain pipe notation, with help of \.
the code should run on any POSIX shell,
You could easily adjust the function to parametrize the pattern,
time used (access/write), control the search depth or starting dir.

Recursion in a linux file search - Bash script

I need to make a linux file search which involves recursion for a project. I got a bit of help making this so I don't understand this code fully only parts of it. Could someone explain what it means and also give a bit of help as to how I would go about getting a user to input a keyword and for this function to search for that keyword in the directories? Thankyou
#!/bin/bash
lookIn() {
echo $2
for d in $(find $1 -type d)
do
if [ "$d" != "$1" ]
echo "looking in $d"
lookIn $d
fi
done
}
lookIn

You only need find. find will traverse the entire directory. Assuming $1 points to the folder you want to search:
read -p "Enter file name to find: " KEYWORD
find $1 -type f -name "$KEYWORD"
If you want to find names that contain the keyword, then use:
find $1 -type f -name "*${KEYWORD}*"
Try this then you can work this into your bigger script (whatever it does).

TL;DR
Don't use recursion. It may work, but it's more work than necessary; Bash doesn't have tail-call optimization, and it's not a functional programming language. Just use find with the right set of arguments.
Parameterized Bash Function to Call Find
find_name() {
starting_path="$1"
filename="$2"
find "$1" -name "$2" 2>&-
}
Example Output
Make sure you quote properly, especially if using globbing characters like * or ?. For example:
$ find_name /etc 'pass?d'
/etc/passwd
/etc/pam.d/passwd

You don't really need find for recursive file search. grep -r (recursive) will work fine.
See below script:
#!/bin/bash
# change dir to base dir where files are stored for search
cd /base/search/dir
# accept input from user
read -p "Enter Search Keyword: " kw
# perform case insensitive recursive search and list matched file
grep -irl "$kw" *

ambiguous redirection

I'm trying to go the current directory and all sub direcotires, and add some annotations into each file that ends in .sql
heres a snippet of the code
HEADER="--SQL HEADER"
for f in 'find . -name *.sql';
do
echo $f
echo -e $HEADER > $f.tmp;
FNAME=${f//\//_/};
echo -e "\n\n--MORE ANNOTATIONS ${FNAME%.*}:1" >> $f.tmp;
cat $f >> $f.tmp;
mv $f.tmp $f;
rm $f.tmp
done;
im a beginner at bash so i think some of the errors im getting might be due to the find statement with the loop
but this is the error i get
find . -name X.sql A.sql W.sql E.sql S.sql
./annotate.sh: line 6: $f.tmp: ambiguous redirect
./annotate.sh: line 8: $f.tmp: ambiguous redirect
./annotate.sh: line 9: $f.tmp: ambiguous redirect
mv: invalid option -- n
Try `mv --help' for more information.
rm: invalid option -- n
Try `rm --help' for more information.
any help would be greatly appreciated =)

Here's the problem. Your "echo" gives it away:
echo $f
outputs
find . -name X.sql A.sql W.sql E.sql S.sql
I think the problem is you have straight single quotes ('') in the find command, instead of backquotes (``). So it's not really running find, but simply expanding the wildcards.
You may have to quote the wildcard so it gets passed to find instead of evaluated by the shell:
for f in `find . -name \*.sql`;
However, there are several problems in your script, which you should address if you want to use it more than once. See ormaaj's answer.

The problem, as already pointed out, is that find isn't actually being executed. However, this pattern is very wrong. Iterating using a for loop over anything that happens with a command substitution doesn't work because splitting the output into words requires word-splitting, which requires not quoting, which is a problem even if pathname expansion is disabled because filenames can contain newlines.
Preferably, use -exec. First write this script to a file and chmod u+x scriptname:
#!/usr/bin/env bash
header="--SQL HEADER"
for f in "$#"; do
echo "$f" >&2
fname=${f//\//_/}
cat - "$f" <<EOF >"$f.tmp"
${header}$'\n\n'
--MORE ANNOTATIONS ${fname%.*}:1
EOF
mv "$f.tmp" "$f"
done
Then run find like this:
find . -name '*.sql' -exec scriptname {} +
Alternatively, (and assuming this is a recent version of Bash), use globstar and no find (ksh has a similar feature if you prefer). This may be slower depending upon the job - the shell must pre-generate the list of files.
#!/usr/bin/env bash
shopt -s globstar
for f in ./**/*.sql; do
...
Alternatively, if you have Bash 4 and a system with the necessary GNU utilities, use -print0.
find . -name '*.sql' -print0 | while IFS= read -rd '' f; do
# <body of the above for loop here>
done
See: http://mywiki.wooledge.org/UsingFind

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string