Recursion in a linux file search - Bash script

Recursion in a linux file search - Bash script - linux

I need to make a linux file search which involves recursion for a project. I got a bit of help making this so I don't understand this code fully only parts of it. Could someone explain what it means and also give a bit of help as to how I would go about getting a user to input a keyword and for this function to search for that keyword in the directories? Thankyou
#!/bin/bash
lookIn() {
echo $2
for d in $(find $1 -type d)
do
if [ "$d" != "$1" ]
echo "looking in $d"
lookIn $d
fi
done
}
lookIn

You only need find. find will traverse the entire directory. Assuming $1 points to the folder you want to search:
read -p "Enter file name to find: " KEYWORD
find $1 -type f -name "$KEYWORD"
If you want to find names that contain the keyword, then use:
find $1 -type f -name "*${KEYWORD}*"
Try this then you can work this into your bigger script (whatever it does).

TL;DR
Don't use recursion. It may work, but it's more work than necessary; Bash doesn't have tail-call optimization, and it's not a functional programming language. Just use find with the right set of arguments.
Parameterized Bash Function to Call Find
find_name() {
starting_path="$1"
filename="$2"
find "$1" -name "$2" 2>&-
}
Example Output
Make sure you quote properly, especially if using globbing characters like * or ?. For example:
$ find_name /etc 'pass?d'
/etc/passwd
/etc/pam.d/passwd

You don't really need find for recursive file search. grep -r (recursive) will work fine.
See below script:
#!/bin/bash
# change dir to base dir where files are stored for search
cd /base/search/dir
# accept input from user
read -p "Enter Search Keyword: " kw
# perform case insensitive recursive search and list matched file
grep -irl "$kw" *

Related

Show where directory exists

I have a bunch of code thats relatively new i.e. lots of bugs hiding and I have code as such:
if [ -d $DATA_ROOT/$name ], I've done research and understand that this means if directory exists but now I'm trying to print out those directories that exist to fix a problem.
Tried using
echo `First: $DATA_ROOT`
echo `Second: $name`
echo `Last: $DATA_ROOT/$name`
exit 1;
Got command not found for all, the code is meant to fix the bug I'm trying to by extracting all files but does not end up extracting all ending up with the data extraction failed error below, code:
num_files=`find $DATA_ROOT/$name -name '*' | wc -l`
if [ ! $num_files -eq $extract_file ] ; then
echo "Data extraction failed! Extracted $num_files instead of $extract_file"
exit 1;
I just want to extract all files correctly, how to do this please?

The back-ping you are using means "execute this as an command"
echo `First: $DATA_ROOT`
echo `Second: $name`
echo `Last: $DATA_ROOT/$name`
would try to execute a command called "First:" which does not exists.
Instead use double quotes as they allow for variable substitution, like this and does not try to execute it as a command
echo "First: $DATA_ROOT"
echo "Second: $name"
echo "Last: $DATA_ROOT/$name"
Also
find $DATA_ROOT/$name -name '*'
is probably not what you want, the -name '*' is the default so you don't need it. As others points out, find will return everything, including directories and special files if you have any of those. find "$DATA_ROOT/$name" -type f is what you want if you only want to list the files or find "$DATA_ROOT/$name" -type d if you only want to list directories. Also always use double quotes around your "$DATA_ROOT/$name" as it allows you to handle file names with spaces -- if you have a $name that contains a space you will fail otherwise.

find reports not only ordinary files, but also directories (including .).
Use find "$DATA_ROOT/$name" -type f.

You are using backticks and hence anything under backticks is treated as a command to execute and as a result you are getting command not found exception. You could use double quotes to avoid the error like below:
echo "First: $DATA_ROOT"
echo "Second: $name"
echo "Last: $DATA_ROOT/$name"
You could use find command to list down all directories like:
find $DATA_ROOT/$name -type d
Above command would list all the directories (with -type d option and use -type f to list all the files) within $DATA_ROOT/$name and then you can perform operations on those directories.

listing file in unix and saving the output in a variable(Oldest File fetching for a particular extension)

This might be a very simple thing for a shell scripting programmer but am pretty new to it. I was trying to execute the below command in a shell script and save the output into a variable
inputfile=$(ls -ltr *.{PDF,pdf} | head -1 | awk '{print $9}')
The command works fine when I fire it from terminal but fails when executed through a shell script (sh). Why is that the command fails, does it mean that shell script doesn't support the command or am I doing it wrong? Also how do I know if a command will work in shell or not?
Just to give you a glimpse of my requirement, I was trying to get the oldest file from a particular directory (I also want to make sure upper case and lower case extensions are handled). Is there any other way to do this ?

The above command will work correctly only if BOTH *.pdf and *.PDF files are in the directory you are currently.
If you would like to execute it in a directory with only one of those you should consider using e.g.:
inputfiles=$(find . -maxdepth 1 -type f \( -name "*.pdf" -or -name "*.PDF" \) | xargs ls -1tr | head -1 )
NOTE: The above command doesn't work with files with new lines, or with long list of found files.

Parsing ls is always a bad idea. You need another strategy.
How about you make a function that gives you the oldest file among the ones given as argument? the following works in Bash (adapt to your needs):
get_oldest_file() {
# get oldest file among files given as parameters
# return is in variable get_oldest_file_ret
local oldest f
for f do
[[ -e $f ]] && [[ ! $oldest || $f -ot $oldest ]] && oldest=$f
done
get_oldest_file_ret=$oldest
}
Then just call as:
get_oldest_file *.{PDF,pdf}
echo "oldest file is: $get_oldest_file_ret"
Now, you probably don't want to use brace expansions like this at all. In fact, you very likely want to use the shell options nocaseglob and nullglob:
shopt -s nocaseglob nullglob
get_oldest_file *.pdf
echo "oldest file is: $get_oldest_file_ret"
If you're using a POSIX shell, it's going to be a bit trickier to have the equivalent of nullglob and nocaseglob.

Is perl an option? It's ubiquitous on Unix.
I would suggest:
perl -e 'print ((sort { -M $b <=> -M $a } glob ( "*.{pdf,PDF}" ))[0]);';
Which:
uses glob to fetch all files matching the pattern.
sort, using -M which is relative modification time. (in days).
fetches the first element ([0]) off the sort.
Prints that.

As #gniourf_gniourf says, parsing ls is a bad idea. Such as leaving unquoted globs, and generally not counting for funny characters in file names.
find is your friend:
#!/bin/sh
get_oldest_pdf() {
#
# echo path of oldest *.pdf (case-insensitive) file in current directory
#
find . -maxdepth 1 -mindepth 1 -iname "*.pdf" -printf '%T# %p\n' \
| sort -n \
| tail -1 \
| cut -d\ -f1-
}
whatever=$(get_oldest_pdf)
Notes:
find has numerous ways of formatting the output, including
things like access time and/or write time. I used '%T# %p\n',
where %T# is last write time in UNIX time format incl.fractal part.
This will never containt space so it's safe to use as separator.
Numeric sort and tail get the last item, sorting by the time,
cut removes the time from the output.
I used IMO much easier to read/maintain pipe notation, with help of \.
the code should run on any POSIX shell,
You could easily adjust the function to parametrize the pattern,
time used (access/write), control the search depth or starting dir.

Bash shell script function gives "find: missing argument to `-exec'" error

I wrote a function in a Bash shell script to search a Linux tree for filenames matching a pattern containing a regular expression, with colour highlighting:
function ggrep {
LS_="ls --color {}|sed s~./~~"
[ -n "$1" -a "$1" != "*" ] && NAME_="-iname $1" || NAME_=
[ -n "$2" ] && EXEC_="egrep -q \"$2\" \"{}\" && $LS_ && egrep -n \"$2\" --color=always \"{}\"|sed s~^B~\ B~" || EXEC_=$LS_
FIND_="find . -type f $NAME_ -exec sh -c \"$EXEC_\" \\;"
echo -e \\e[7m $FIND_ \\e[0m
$FIND_
}
e.g. ggrep a* lists all files starting with a under the current directory tree,
and ggrep a* x lists of files starting with a and containing x
When I run it, I get:
find: missing argument to `-exec'
even though I get the correct output when I copy and paste the line output by "echo" into the terminal. Can anyone please tell me what I've done wrong?
Secondly, it would be great if ggrep * x listed all files containing x, but * expands to a list of filenames and I need to use \* or '*' instead. Is there a way around this? Thanks!

Terminate the find command with \; instead of \\; .
find . -type f $NAME_ -exec sh -c \"$EXEC_\" \;

eval $FIND_
in the last line of the function body works fine for me.
Expansions in BASH are generally not recursive, so if you load a command from a variable, you should always use "eval" to enforce reprocessing the expanded variable as it was a fresh input. Normally quotes are not handled properly within a string that has already been expanded.
To your second problem, I think there is no satisfactory solution. The shell will always expand * before passing it to anything controlled by you. You can disable this expansion, but that is a global setting. Anyway, I think that this expansion could actually act in favor of your function. Consider rewriting it in a way that takes advantage of it. (I did not analyze whether the current version was close to that or not.)

Fuzzy file search in Linux console

Does anybody know a way to perform a quick fuzzy search on the Linux console?
Quite often I come across situations where I need to find a file in a project but I don't remember the exact filename.
In the Sublime text editor I would press Ctrl+ P and type a part of the name, which will produce a list of files to select from. That's an amazing feature I'm quite happy with. The problem is that in most cases I have to browse code in a console on remote machines via ssh. I'm wondering if there is a tool similar to the "Go Anywhere" feature for the Linux console?

You may find fzf useful. It's a general purpose fuzzy finder written in Go that can be used with any list of things: files, processes, command history, Git branches, etc.
Its install script will setup a Ctrl+T keybinding for your shell. Pressing Ctrl+T lets you fuzzy-search for a file or directory and put its path on your console.
The following GIF shows example usage of fzf including its Vim integration:

Most of these answers won't do fuzzy searching like sublime text does it -- they may match part of the answer, but they don't do the nice 'just find all the letters in this order' behavior.
I think this is a bit closer to what you want. I put together a special version of cd ('fcd') that uses fuzzy searching to find the target directory. Super simple -- just add this to your bashrc:
function joinstr { local IFS="$1"; shift; echo "$*"; }
function fcd { cd $(joinstr \* $(echo "$*" | fold -w1))* }
This will add an * between each letter in the input, so if I want to go to, for instance,
/home/dave/results/sample/today
I can just type any of the following:
fcd /h/d/r/spl/t
fcd /h/d/r/s/t
fcd /h/d/r/sam/t
fcd /h/d/r/s/ty
Using the first as an example, this will execute cd /*h*/*d*/*r*/*s*p*l*/*t* and let the shell sort out what actually matches.
As long as the first character is correct, and one letter from each directory in the path is written, it will find what you're looking for. Perhaps you can adapt this for your needs? The important bit is:
$(joinstr \* $(echo "$*" | fold -w1))*
which creates the fuzzy search string.

I usually use:
ls -R | grep -i [whatever I can remember of the file name]
From a directory above where I expect the file to be - the higher up you go in the directory tree, the slower this is going to go.
When I find the the exact file name, I use it in find:
find . [discovered file name]
This could be collapsed into one line:
for f in $(ls --color=never -R | grep --color=never -i partialName); do find -name $f; done
(I found a problem with ls and grep being aliased to "--color=auto")

The fasd shell script is probably worth taking a look at too.
fasd offers quick access to files and directories for POSIX shells. It is inspired by tools like autojump, z and v. Fasd keeps track of files and directories you have accessed, so that you can quickly reference them in the command line.
It differs a little from a complete find of all files, as it only searches recently opened files. However it is still very useful.

find . -iname '*foo*'
Case insensitive find of filenames containing foo.

I don't know how familiar you are with the terminal, but this could help you:
find | grep 'report'
find | grep 'report.*2008'
Sorry if you already know grep and were looking for something more advanced.

fd is a simple, fast and user-friendly alternative to find.
Demo from the GitHub project page:

You can do the following
grep -iR "text to search for" .
where "." being the starting point, so you could do something like
grep -iR "text to search" /home/
This will make grep search for the given text inside every file under /home/ and list files which contain that text.

You can try c- (Cminus), a fuzzy dir changing tool of bash script, which using bash completion. It is somehow limited by only matching visited paths, but really convenient and quite fast.
GitHub project: whitebob/cminus
Introduction on YouTube: https://youtu.be/b8Bem53Cz9A

You might want to try
AGREP or something else that uses the TRE Regular Expression library.
(From their site:)
TRE is a lightweight, robust, and efficient POSIX compliant regexp matching library with some exciting features such as approximate (fuzzy) matching.
At the core of TRE is a new algorithm for regular expression matching with submatch addressing. The algorithm uses linear worst-case time in the length of the text being searched, and quadratic worst-case time in the length of the used regular expression. In other words, the time complexity of the algorithm is O(M2N), where M is the length of the regular expression and N is the length of the text. The used space is also quadratic on the length of the regex, but does not depend on the searched string. This quadratic behaviour occurs only on pathological cases which are probably very rare in practice.
TRE is not just yet another regexp matcher. TRE has some features which are not there in most free POSIX compatible implementations. Most of these features are not present in non-free implementations either, for that matter.
Approximate pattern matching allows matches to be approximate, that is, allows the matches to be close to the searched pattern under some measure of closeness. TRE uses the edit-distance measure (also known as the Levenshtein distance) where characters can be inserted, deleted, or substituted in the searched text in order to get an exact match. Each insertion, deletion, or substitution adds the distance, or cost, of the match. TRE can report the matches which have a cost lower than some given threshold value. TRE can also be used to search for matches with the lowest cost.

You could use find like this for complex regex:
find . -type f -regextype posix-extended -iregex ".*YOUR_PARTIAL_NAME.*" -print
Or this for simplier glob-like matches:
find . -type f -name "*YOUR_PARTIAL_NAME*" -print
Or you could also use find2perl (which is quite faster and more optimized than find), like this:
find2perl . -type f -name "*YOUR_PARTIAL_NAME*" -print | perl
If you just want to see how Perl does it, remove the | perl part and you'll see the code it generates. It's a very good way to learn by the way.
Alternatively, write a quick bash wrapper like this, and call it whenever you want:
#! /bin/bash
FIND_BASE="$1"
GLOB_PATTERN="$2"
if [ $# -ne 2 ]; then
echo "Syntax: $(basename $0) <FIND_BASE> <GLOB_PATTERN>"
else
find2perl "$FIND_BASE" -type f -name "*$GLOB_PATTERN*" -print | perl
fi
Name this something like qsearch and then call it like this: qsearch . something

Search zsh for file or folder in terminal and open or navigate to it with combination of find, fzf, vim and cd.
Install fzf in zsh and add script to ~/.zshrc, then reload shell source ~/.zshrc
fzf-file-search() {
item="$(find '/' -type d \( -path '/proc/*' -o -path '/dev/*' \) -prune -false -o -iname '*' 2>/dev/null | FZF_DEFAULT_OPTS="--height ${FZF_TMUX_HEIGHT:-40%} --rev erse --bind=ctrl-z:ignore $FZF_DEFAULT_OPTS $FZF_CTRL_T_OPTS" $(__fzfcmd) -m "$#")"
if [[ -d ${item} ]]; then
cd "${item}" || return 1
elif [[ -f ${item} ]]; then
(vi "${item}" < /dev/tty) || return 1
else
return 1
fi
zle accept-line
}
zle -N fzf-file-search
bindkey '^f' fzf-file-search
Press keyboard shortcut 'Ctrl+F' to run it, this can be changed in bindkey '^f'. It searchs (find) through all files/folders (fzf) and depending on file type, navigate to directory (cd) or open file with text editor (vim).
Also quickly open recent files/folders with fasd:
fasd-fzf-cd-vi() {
item="$(fasd -Rl "$1" | fzf -1 -0 --no-sort +m)"
if [[ -d ${item} ]]; then
cd "${item}" || return 1
elif [[ -f ${item} ]]; then
(vi "${item}" < /dev/tty) || return 1
else
return 1
fi
zle accept-line
}
zle -N fasd-fzf-cd-vi
bindkey '^e' fasd-fzf-cd-vi
Keyboard shortcut 'Ctrl+E'
Check other usefull tips and tricks for fast navigation inside terminal https://github.com/webdev4422/.dotfiles

Recursively look for files with a specific extension

I'm trying to find all files with a specific extension in a directory and its subdirectories with my bash (Latest Ubuntu LTS Release).
This is what's written in a script file:
#!/bin/bash
directory="/home/flip/Desktop"
suffix="in"
browsefolders ()
for i in "$1"/*;
do
echo "dir :$directory"
echo "filename: $i"
# echo ${i#*.}
extension=`echo "$i" | cut -d'.' -f2`
echo "Erweiterung $extension"
if [ -f "$i" ]; then
if [ $extension == $suffix ]; then
echo "$i ends with $in"
else
echo "$i does NOT end with $in"
fi
elif [ -d "$i" ]; then
browsefolders "$i"
fi
done
}
browsefolders "$directory"
Unfortunately, when I start this script in terminal, it says:
[: 29: in: unexpected operator
(with $extension instead of 'in')
What's going on here, where's the error?
But this curly brace

find "$directory" -type f -name "*.in"
is a bit shorter than that whole thing (and safer - deals with whitespace in filenames and directory names).
Your script is probably failing for entries that don't have a . in their name, making $extension empty.

find {directory} -type f -name '*.extension'
Example: To find all csv files in the current directory and its sub-directories, use:
find . -type f -name '*.csv'

The syntax I use is a bit different than what #Matt suggested:
find $directory -type f -name \*.in
(it's one less keystroke).

Without using find:
du -a $directory | awk '{print $2}' | grep '\.in$'

Though using find command can be useful here, the shell itself provides options to achieve this requirement without any third party tools. The bash shell provides an extended glob support option using which you can get the file names under recursive paths that match with the extensions you want.
The extended option is extglob which needs to be set using the shopt option as below. The options are enabled with the -s support and disabled with he -u flag. Additionally you could use couple of options more i.e. nullglob in which an unmatched glob is swept away entirely, replaced with a set of zero words. And globstar that allows to recurse through all the directories
shopt -s extglob nullglob globstar
Now all you need to do is form the glob expression to include the files of a certain extension which you can do as below. We use an array to populate the glob results because when quoted properly and expanded, the filenames with special characters would remain intact and not get broken due to word-splitting by the shell.
For example to list all the *.csv files in the recursive paths
fileList=(**/*.csv)
The option ** is to recurse through the sub-folders and *.csv is glob expansion to include any file of the extensions mentioned. Now for printing the actual files, just do
printf '%s\n' "${fileList[#]}"
Using an array and doing a proper quoted expansion is the right way when used in shell scripts, but for interactive use, you could simply use ls with the glob expression as
ls -1 -- **/*.csv
This could very well be expanded to match multiple files i.e. file ending with multiple extension (i.e. similar to adding multiple flags in find command). For example consider a case of needing to get all recursive image files i.e. of extensions *.gif, *.png and *.jpg, all you need to is
ls -1 -- **/+(*.jpg|*.gif|*.png)
This could very well be expanded to have negate results also. With the same syntax, one could use the results of the glob to exclude files of certain type. Assume you want to exclude file names with the extensions above, you could do
excludeResults=()
excludeResults=(**/!(*.jpg|*.gif|*.png))
printf '%s\n' "${excludeResults[#]}"
The construct !() is a negate operation to not include any of the file extensions listed inside and | is an alternation operator just as used in the Extended Regular Expressions library to do an OR match of the globs.
Note that these extended glob support is not available in the POSIX bourne shell and its purely specific to recent versions of bash. So if your are considering portability of the scripts running across POSIX and bash shells, this option wouldn't be right.

find "$PWD" -type f -name "*.in"

There's a { missing after browsefolders ()
All $in should be $suffix
The line with cut gets you only the middle part of front.middle.extension. You should read up your shell manual on ${varname%%pattern} and friends.
I assume you do this as an exercise in shell scripting, otherwise the find solution already proposed is the way to go.
To check for proper shell syntax, without running a script, use sh -n scriptname.

To find all the pom.xml files in your current directory and print them, you can use:
find . -name 'pom.xml' -print

find $directory -type f -name "*.in"|grep $substring

for file in "${LOCATION_VAR}"/*.zip
do
echo "$file"
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string