Fuzzy file search in Linux console - linux

Does anybody know a way to perform a quick fuzzy search on the Linux console?
Quite often I come across situations where I need to find a file in a project but I don't remember the exact filename.
In the Sublime text editor I would press Ctrl+ P and type a part of the name, which will produce a list of files to select from. That's an amazing feature I'm quite happy with. The problem is that in most cases I have to browse code in a console on remote machines via ssh. I'm wondering if there is a tool similar to the "Go Anywhere" feature for the Linux console?

You may find fzf useful. It's a general purpose fuzzy finder written in Go that can be used with any list of things: files, processes, command history, Git branches, etc.
Its install script will setup a Ctrl+T keybinding for your shell. Pressing Ctrl+T lets you fuzzy-search for a file or directory and put its path on your console.
The following GIF shows example usage of fzf including its Vim integration:

Most of these answers won't do fuzzy searching like sublime text does it -- they may match part of the answer, but they don't do the nice 'just find all the letters in this order' behavior.
I think this is a bit closer to what you want. I put together a special version of cd ('fcd') that uses fuzzy searching to find the target directory. Super simple -- just add this to your bashrc:
function joinstr { local IFS="$1"; shift; echo "$*"; }
function fcd { cd $(joinstr \* $(echo "$*" | fold -w1))* }
This will add an * between each letter in the input, so if I want to go to, for instance,
/home/dave/results/sample/today
I can just type any of the following:
fcd /h/d/r/spl/t
fcd /h/d/r/s/t
fcd /h/d/r/sam/t
fcd /h/d/r/s/ty
Using the first as an example, this will execute cd /*h*/*d*/*r*/*s*p*l*/*t* and let the shell sort out what actually matches.
As long as the first character is correct, and one letter from each directory in the path is written, it will find what you're looking for. Perhaps you can adapt this for your needs? The important bit is:
$(joinstr \* $(echo "$*" | fold -w1))*
which creates the fuzzy search string.

I usually use:
ls -R | grep -i [whatever I can remember of the file name]
From a directory above where I expect the file to be - the higher up you go in the directory tree, the slower this is going to go.
When I find the the exact file name, I use it in find:
find . [discovered file name]
This could be collapsed into one line:
for f in $(ls --color=never -R | grep --color=never -i partialName); do find -name $f; done
(I found a problem with ls and grep being aliased to "--color=auto")

The fasd shell script is probably worth taking a look at too.
fasd offers quick access to files and directories for POSIX shells. It is inspired by tools like autojump, z and v. Fasd keeps track of files and directories you have accessed, so that you can quickly reference them in the command line.
It differs a little from a complete find of all files, as it only searches recently opened files. However it is still very useful.

find . -iname '*foo*'
Case insensitive find of filenames containing foo.

I don't know how familiar you are with the terminal, but this could help you:
find | grep 'report'
find | grep 'report.*2008'
Sorry if you already know grep and were looking for something more advanced.

fd is a simple, fast and user-friendly alternative to find.
Demo from the GitHub project page:

You can do the following
grep -iR "text to search for" .
where "." being the starting point, so you could do something like
grep -iR "text to search" /home/
This will make grep search for the given text inside every file under /home/ and list files which contain that text.

You can try c- (Cminus), a fuzzy dir changing tool of bash script, which using bash completion. It is somehow limited by only matching visited paths, but really convenient and quite fast.
GitHub project: whitebob/cminus
Introduction on YouTube: https://youtu.be/b8Bem53Cz9A

You might want to try
AGREP or something else that uses the TRE Regular Expression library.
(From their site:)
TRE is a lightweight, robust, and efficient POSIX compliant regexp matching library with some exciting features such as approximate (fuzzy) matching.
At the core of TRE is a new algorithm for regular expression matching with submatch addressing. The algorithm uses linear worst-case time in the length of the text being searched, and quadratic worst-case time in the length of the used regular expression. In other words, the time complexity of the algorithm is O(M2N), where M is the length of the regular expression and N is the length of the text. The used space is also quadratic on the length of the regex, but does not depend on the searched string. This quadratic behaviour occurs only on pathological cases which are probably very rare in practice.
TRE is not just yet another regexp matcher. TRE has some features which are not there in most free POSIX compatible implementations. Most of these features are not present in non-free implementations either, for that matter.
Approximate pattern matching allows matches to be approximate, that is, allows the matches to be close to the searched pattern under some measure of closeness. TRE uses the edit-distance measure (also known as the Levenshtein distance) where characters can be inserted, deleted, or substituted in the searched text in order to get an exact match. Each insertion, deletion, or substitution adds the distance, or cost, of the match. TRE can report the matches which have a cost lower than some given threshold value. TRE can also be used to search for matches with the lowest cost.

You could use find like this for complex regex:
find . -type f -regextype posix-extended -iregex ".*YOUR_PARTIAL_NAME.*" -print
Or this for simplier glob-like matches:
find . -type f -name "*YOUR_PARTIAL_NAME*" -print
Or you could also use find2perl (which is quite faster and more optimized than find), like this:
find2perl . -type f -name "*YOUR_PARTIAL_NAME*" -print | perl
If you just want to see how Perl does it, remove the | perl part and you'll see the code it generates. It's a very good way to learn by the way.
Alternatively, write a quick bash wrapper like this, and call it whenever you want:
#! /bin/bash
FIND_BASE="$1"
GLOB_PATTERN="$2"
if [ $# -ne 2 ]; then
echo "Syntax: $(basename $0) <FIND_BASE> <GLOB_PATTERN>"
else
find2perl "$FIND_BASE" -type f -name "*$GLOB_PATTERN*" -print | perl
fi
Name this something like qsearch and then call it like this: qsearch . something

Search zsh for file or folder in terminal and open or navigate to it with combination of find, fzf, vim and cd.
Install fzf in zsh and add script to ~/.zshrc, then reload shell source ~/.zshrc
fzf-file-search() {
item="$(find '/' -type d \( -path '/proc/*' -o -path '/dev/*' \) -prune -false -o -iname '*' 2>/dev/null | FZF_DEFAULT_OPTS="--height ${FZF_TMUX_HEIGHT:-40%} --rev erse --bind=ctrl-z:ignore $FZF_DEFAULT_OPTS $FZF_CTRL_T_OPTS" $(__fzfcmd) -m "$#")"
if [[ -d ${item} ]]; then
cd "${item}" || return 1
elif [[ -f ${item} ]]; then
(vi "${item}" < /dev/tty) || return 1
else
return 1
fi
zle accept-line
}
zle -N fzf-file-search
bindkey '^f' fzf-file-search
Press keyboard shortcut 'Ctrl+F' to run it, this can be changed in bindkey '^f'. It searchs (find) through all files/folders (fzf) and depending on file type, navigate to directory (cd) or open file with text editor (vim).
Also quickly open recent files/folders with fasd:
fasd-fzf-cd-vi() {
item="$(fasd -Rl "$1" | fzf -1 -0 --no-sort +m)"
if [[ -d ${item} ]]; then
cd "${item}" || return 1
elif [[ -f ${item} ]]; then
(vi "${item}" < /dev/tty) || return 1
else
return 1
fi
zle accept-line
}
zle -N fasd-fzf-cd-vi
bindkey '^e' fasd-fzf-cd-vi
Keyboard shortcut 'Ctrl+E'
Check other usefull tips and tricks for fast navigation inside terminal https://github.com/webdev4422/.dotfiles

Related

linux grep match a subset of files from a previous match

I am needing pipe this result:
grep -R "extends Some_Critical_Class" *
to another grep:
grep "function init("
ie. "files that extend Some_Critical_Class that also have function init()"
If there is a way to do it in one operation in grep, that would be great, but I'd also like to see the how the piping is done to improve my programming in *nix (which is rudimentary right now). Thanks.
To be clear, you want the list of files that contain both strings. Not only you need two greps for this, but you also need the -l (a.k.a. --files-with-matches) option.
Here is one way of doing this:
grep -F -R -l -Z "extends Some_Critical_Class" . \
| xargs -0 grep -F -l "function init("
We first obtain a (NUL-delimited) list of files that contain your first string, and then we use xargs to pass this list of files to the second grep.
Don't use grep (g/re/p) to find files, adding that functionality to GNU grep was just a bad idea since there's already a perfectly good tool to find files with an extremely obvious name.
You didn't say what your expected output was but maybe this does what you want:
find . -type f -exec \
awk '
/extends Some_Critical_Class/ { x=1 }
/function init\(/ { y=1 }
END { if (x && y) print FILENAME }
' {} \;
The above will work on any Unix box, not just one with GNU tools, and can be trivially modified to add more regexps or strings to search for, various "and" and "or" combinations, etc.

listing file in unix and saving the output in a variable(Oldest File fetching for a particular extension)

This might be a very simple thing for a shell scripting programmer but am pretty new to it. I was trying to execute the below command in a shell script and save the output into a variable
inputfile=$(ls -ltr *.{PDF,pdf} | head -1 | awk '{print $9}')
The command works fine when I fire it from terminal but fails when executed through a shell script (sh). Why is that the command fails, does it mean that shell script doesn't support the command or am I doing it wrong? Also how do I know if a command will work in shell or not?
Just to give you a glimpse of my requirement, I was trying to get the oldest file from a particular directory (I also want to make sure upper case and lower case extensions are handled). Is there any other way to do this ?
The above command will work correctly only if BOTH *.pdf and *.PDF files are in the directory you are currently.
If you would like to execute it in a directory with only one of those you should consider using e.g.:
inputfiles=$(find . -maxdepth 1 -type f \( -name "*.pdf" -or -name "*.PDF" \) | xargs ls -1tr | head -1 )
NOTE: The above command doesn't work with files with new lines, or with long list of found files.
Parsing ls is always a bad idea. You need another strategy.
How about you make a function that gives you the oldest file among the ones given as argument? the following works in Bash (adapt to your needs):
get_oldest_file() {
# get oldest file among files given as parameters
# return is in variable get_oldest_file_ret
local oldest f
for f do
[[ -e $f ]] && [[ ! $oldest || $f -ot $oldest ]] && oldest=$f
done
get_oldest_file_ret=$oldest
}
Then just call as:
get_oldest_file *.{PDF,pdf}
echo "oldest file is: $get_oldest_file_ret"
Now, you probably don't want to use brace expansions like this at all. In fact, you very likely want to use the shell options nocaseglob and nullglob:
shopt -s nocaseglob nullglob
get_oldest_file *.pdf
echo "oldest file is: $get_oldest_file_ret"
If you're using a POSIX shell, it's going to be a bit trickier to have the equivalent of nullglob and nocaseglob.
Is perl an option? It's ubiquitous on Unix.
I would suggest:
perl -e 'print ((sort { -M $b <=> -M $a } glob ( "*.{pdf,PDF}" ))[0]);';
Which:
uses glob to fetch all files matching the pattern.
sort, using -M which is relative modification time. (in days).
fetches the first element ([0]) off the sort.
Prints that.
As #gniourf_gniourf says, parsing ls is a bad idea. Such as leaving unquoted globs, and generally not counting for funny characters in file names.
find is your friend:
#!/bin/sh
get_oldest_pdf() {
#
# echo path of oldest *.pdf (case-insensitive) file in current directory
#
find . -maxdepth 1 -mindepth 1 -iname "*.pdf" -printf '%T# %p\n' \
| sort -n \
| tail -1 \
| cut -d\ -f1-
}
whatever=$(get_oldest_pdf)
Notes:
find has numerous ways of formatting the output, including
things like access time and/or write time. I used '%T# %p\n',
where %T# is last write time in UNIX time format incl.fractal part.
This will never containt space so it's safe to use as separator.
Numeric sort and tail get the last item, sorting by the time,
cut removes the time from the output.
I used IMO much easier to read/maintain pipe notation, with help of \.
the code should run on any POSIX shell,
You could easily adjust the function to parametrize the pattern,
time used (access/write), control the search depth or starting dir.

Easy replace with/without regex in multiple files

Hundred times a day I need to search for patterns in files and sometime I have to replace these patterns with something else. Most of the time it is simple patterns like a word or a short sentence but sometime I have to look for more complex regexp. I don't really like sed (at least the sed version I have because it is not much compliant with the PCRE engine). So I rather prefer using perl -pi -e.
However, Perl pie is not very attractive on Cygwin because of the mandatory -i.bak temp files. I need to find a way to automatically remove the .bak files after processing. Moreover, if I want to replace recursively in a project I have to list all the files first:
find . | xargs -n1 perl -pi -e 's/foo/bar/'
This command is quite long to write especially if you use it thousand times a month. So I decided to write a more useful tool working in the same way as the great silver searcher ag.
ag 'foo\d{3}[^\w]' # Search for a pattern
# Oh yes this one should be renamed!
replace 's/(foo)\d{3}[^\w]/\U$1\E_bar/g'
I wrote this very primitive bash function
function replace
{
EXTENSION=.perlpie_tmp
perl -p -i$EXTENSION -e $1 ${*:2}
for file in ${*:2}; do
rm "$file$EXTENSION";
done;
}
But I am not satisfied at all because it doesn't automatically search for all files recursively if there is no more than one argument. I may either modify this function an add find . if the number of arguments is 1, or I can write a much complex program in Perl that can support command line options, pretty output, smart case search or even plain text search.
What is the most suitable option to this problem and is there any advanced search/replace tool on the linux world? If not I may try to write my own rip tool standing for replace-in-place which can support all the options that I need.
Before that I need some advices...
EDIT
Actually I think to fork https://github.com/petdance/ack2 to add a replacement feature... This may or may not be a good idea...
Here's an alternative to your function (edited to use the suggestion provided by gniourf_gniourf, thanks):
find -type f . -exec sh -c 'perl -pi.bak -e "s/foo/bar/" "$0" && rm -f "$0".bak' {} \;
Using this approach, you can remove the file as you go.
I think you can use
grep -Hrn -e "string" .
to find a pattern, and
find -type f -exec sed -i "s#string1#string2#g" {} \;
to replace a pattern
I would slightly modify your existing function:
function replace {
local perl_code=$1 EXTENSION=.perlpie_tmp file
shift
for file; do
perl -p -i$EXTENSION -e "$perl_code" "$file" && rm "$file$EXTENSION"
done;
}
This will slightly worsen the performance as you're now calling perl multiple times, but I suspect you won't notice.

Recursion in a linux file search - Bash script

I need to make a linux file search which involves recursion for a project. I got a bit of help making this so I don't understand this code fully only parts of it. Could someone explain what it means and also give a bit of help as to how I would go about getting a user to input a keyword and for this function to search for that keyword in the directories? Thankyou
#!/bin/bash
lookIn() {
echo $2
for d in $(find $1 -type d)
do
if [ "$d" != "$1" ]
echo "looking in $d"
lookIn $d
fi
done
}
lookIn
You only need find. find will traverse the entire directory. Assuming $1 points to the folder you want to search:
read -p "Enter file name to find: " KEYWORD
find $1 -type f -name "$KEYWORD"
If you want to find names that contain the keyword, then use:
find $1 -type f -name "*${KEYWORD}*"
Try this then you can work this into your bigger script (whatever it does).
TL;DR
Don't use recursion. It may work, but it's more work than necessary; Bash doesn't have tail-call optimization, and it's not a functional programming language. Just use find with the right set of arguments.
Parameterized Bash Function to Call Find
find_name() {
starting_path="$1"
filename="$2"
find "$1" -name "$2" 2>&-
}
Example Output
Make sure you quote properly, especially if using globbing characters like * or ?. For example:
$ find_name /etc 'pass?d'
/etc/passwd
/etc/pam.d/passwd
You don't really need find for recursive file search. grep -r (recursive) will work fine.
See below script:
#!/bin/bash
# change dir to base dir where files are stored for search
cd /base/search/dir
# accept input from user
read -p "Enter Search Keyword: " kw
# perform case insensitive recursive search and list matched file
grep -irl "$kw" *

Recursively look for files with a specific extension

I'm trying to find all files with a specific extension in a directory and its subdirectories with my bash (Latest Ubuntu LTS Release).
This is what's written in a script file:
#!/bin/bash
directory="/home/flip/Desktop"
suffix="in"
browsefolders ()
for i in "$1"/*;
do
echo "dir :$directory"
echo "filename: $i"
# echo ${i#*.}
extension=`echo "$i" | cut -d'.' -f2`
echo "Erweiterung $extension"
if [ -f "$i" ]; then
if [ $extension == $suffix ]; then
echo "$i ends with $in"
else
echo "$i does NOT end with $in"
fi
elif [ -d "$i" ]; then
browsefolders "$i"
fi
done
}
browsefolders "$directory"
Unfortunately, when I start this script in terminal, it says:
[: 29: in: unexpected operator
(with $extension instead of 'in')
What's going on here, where's the error?
But this curly brace
find "$directory" -type f -name "*.in"
is a bit shorter than that whole thing (and safer - deals with whitespace in filenames and directory names).
Your script is probably failing for entries that don't have a . in their name, making $extension empty.
find {directory} -type f -name '*.extension'
Example: To find all csv files in the current directory and its sub-directories, use:
find . -type f -name '*.csv'
The syntax I use is a bit different than what #Matt suggested:
find $directory -type f -name \*.in
(it's one less keystroke).
Without using find:
du -a $directory | awk '{print $2}' | grep '\.in$'
Though using find command can be useful here, the shell itself provides options to achieve this requirement without any third party tools. The bash shell provides an extended glob support option using which you can get the file names under recursive paths that match with the extensions you want.
The extended option is extglob which needs to be set using the shopt option as below. The options are enabled with the -s support and disabled with he -u flag. Additionally you could use couple of options more i.e. nullglob in which an unmatched glob is swept away entirely, replaced with a set of zero words. And globstar that allows to recurse through all the directories
shopt -s extglob nullglob globstar
Now all you need to do is form the glob expression to include the files of a certain extension which you can do as below. We use an array to populate the glob results because when quoted properly and expanded, the filenames with special characters would remain intact and not get broken due to word-splitting by the shell.
For example to list all the *.csv files in the recursive paths
fileList=(**/*.csv)
The option ** is to recurse through the sub-folders and *.csv is glob expansion to include any file of the extensions mentioned. Now for printing the actual files, just do
printf '%s\n' "${fileList[#]}"
Using an array and doing a proper quoted expansion is the right way when used in shell scripts, but for interactive use, you could simply use ls with the glob expression as
ls -1 -- **/*.csv
This could very well be expanded to match multiple files i.e. file ending with multiple extension (i.e. similar to adding multiple flags in find command). For example consider a case of needing to get all recursive image files i.e. of extensions *.gif, *.png and *.jpg, all you need to is
ls -1 -- **/+(*.jpg|*.gif|*.png)
This could very well be expanded to have negate results also. With the same syntax, one could use the results of the glob to exclude files of certain type. Assume you want to exclude file names with the extensions above, you could do
excludeResults=()
excludeResults=(**/!(*.jpg|*.gif|*.png))
printf '%s\n' "${excludeResults[#]}"
The construct !() is a negate operation to not include any of the file extensions listed inside and | is an alternation operator just as used in the Extended Regular Expressions library to do an OR match of the globs.
Note that these extended glob support is not available in the POSIX bourne shell and its purely specific to recent versions of bash. So if your are considering portability of the scripts running across POSIX and bash shells, this option wouldn't be right.
find "$PWD" -type f -name "*.in"
There's a { missing after browsefolders ()
All $in should be $suffix
The line with cut gets you only the middle part of front.middle.extension. You should read up your shell manual on ${varname%%pattern} and friends.
I assume you do this as an exercise in shell scripting, otherwise the find solution already proposed is the way to go.
To check for proper shell syntax, without running a script, use sh -n scriptname.
To find all the pom.xml files in your current directory and print them, you can use:
find . -name 'pom.xml' -print
find $directory -type f -name "*.in"|grep $substring
for file in "${LOCATION_VAR}"/*.zip
do
echo "$file"
done

Resources