linux : listing files that contain several words

linux : listing files that contain several words - linux

I try to find a way to list all the files in the directory tree (recursively) that contain several words.
While searching I found example such as egrep -R -l 'toto|tata' . but | induce OR. I would like AND...
Thank you for your help

Using GNU grep with GNU xargs,
grep -ERl 'toto' | xargs -r grep 'tata'
The first grep lists those files containing the pattern toto which is then fed to xargs and with the second grep those files containing tata is retrieved. The -r flag is to ensure second grep doesn't run on an empty output.
The -r flag in xargs from the man page,
-r, --no-run-if-empty
If the standard input does not contain any nonblanks, do not run the command.
Normally, the command is run once even if there is no input. This option is a GNU
extension.

agrep tool is designed for providing AND to grep with usage:
agrep 'pattern1;pattern2' file
In your case you could run
find . -type f -exec agrep 'toto;tata' {} \; #apply -l to display the file names
PS1: For current directory you can just agrep 'pattern1;pattern2' *.*
PS2: Unfortunatelly agrep does not support -R option.

Related

How can I use grep to get all the lines that contains string1 and string2 separated by space?

Line1: .................
Line2: #hello1 #hello2 #hello3
Line3: .................
Line4: .................
Line5: #hello1 #hello4 #hello3
Line6: #hello1 #hello2 #hello3
Line7: .................
I have files that look similar in terms of lines on one of my project directories. I want to get the counts of all the lines that contain #hello1 and #hello2. In this case I would get 2 as a result only for this file. However, I want to do this recursively.

The canonical way to "do something recursively" is to use the find command. If you want to find lines that have two words on them, a simple regex will do:
grep -lr '#hello1.*#hello2' .
The option -l instructs grep to show us only filenames rather than file content, and the option -r tells grep to traverse the filesystem recursively. The start of the search is the path at the end of the line. Once you have the list of files, you can parse that list using commands run by xargs.
For example, this will count all the lines in files matching the pattern you specified.
grep -lr '#hello1.*#hello2' . | xargs -n 1 wc -l
This uses xargs to run the wc command on each of the files listed by grep. You could probably also run this without the -n 1, unless you're dealing with many many thousands of files that would exceed your maximum command line length.
Or, if I'm interpreting your question correctly, the following will count just the patterns in those files.
grep -lr '#hello1.*#hello2' . | xargs -n 1 grep -Hc '#hello1.*#hello2'
This runs a similar grep to the one used to generate your recursive list of files, and presents the output with filename (-H) and count (-c).
But if you want complex rules like finding two patterns possibly on different lines in the file, then grep probably is not the optimal tool, unless you use multiple greps launched by find:
find /path/to/base -type f \
-exec grep -q '#hello1' {} \; \
-exec grep -q '#hello2' {} \; \
-print
(Lines split for easier reading.)
This is somewhat costly, as find needs to launch up to two children for each file. So another approach would be to use awk instead:
find /path/to/base -type f \
-exec awk '/#hello1/{c++} /#hello2/{c++} c==2{r=1} END{exit 1-r}' {} \; \
-print
Alternately, if your shell is bash version 4 or above, you can avoid using find and use the bash option globstar:
$ shopt -s globstar
$ awk 'FNR=1{c=0} /#hello1/{c++} /#hello2/{c++} c==2{print FILENAME;nextfile}' **/*
Note: none of this is tested.

If you are not nterested in the number of files also,
then just something along:
find $BASEDIRECTORY -type f -print0 | xargs -0 grep -h PATTERN | wc -l

If you want to count lines containing #hello1 and #hello2 separated by space in a specific file you can:
$ grep -c '#hello1 #hello2' file
If you want to count in more than one file:
$ grep -c '#hello1 #hello2' file1 file2 ...
And if you want to get the gran total:
$ grep -c '#hello1 #hello2' file1 file2 ... | paste -s -d+ - | bc
of course you can let your shell expanding file names. So, for example:
$ grep -c '#hello1 #hello2' *.txt | paste -s -d+ - | bc
or so...

find . -type f | xargs -1 awk '/#hello1/ && /#hello2/{c++} END{print FILENAME, c+0}'

Grep into given filenames

I have a directory that contains many subdirectories that contain many files.
I list the contents of the current directory using ls *. I see that there are certain files that are relevant, in terms of their names. Therefore, the relevant files can be obtained as such ls * | grep "abc\|def\|ghi".
Now I want to search within the given filenames. So I try something like:
ls * | grep "abc\|def\|ghi" | zgrep -i "ERROR" *, however, this is not looking into the file contents, rather the names. Is there an easy way to do this with pipes?

To use grep to search the contents of files within a directory, try using the find command, using xargs to couple it with the grep command, like so:
find . -type f | xargs grep '...'

You can do it like this:
find -E . -type f -regex ".*/.*(abc|def).*" -exec grep -H ERROR {} \+
The -E allows use of extended regexes so you can use the pipe (|) for expressing alternations. The + at the end allows searching in as many files as possible for each invocation of -exec grep rather than needing a whole new process for every single file.

You should use xargs to grep each file contents:
ls * | grep "abc\|def\|ghi" | xargs zgrep -i "ERROR" *

I know you asked for a solution with pipes, but they are not necessary for this task. grep has many parameters, and can solve this problem alone:
grep . -rh --include "*abc*" --include "*def*" -e "ERROR"
Parameters:
--include : Search only files whose base name matches the give wildcard pattern (not regex!)
-h : Suppress the prefixing of file names on output.
-r : recursive
-e : regex filter pattern

grep -i "ERROR" `ls * | grep "abc\|def\|ghi"`

How to search and replace using grep

I need to recursively search for a specified string within all files and subdirectories within a directory and replace this string with another string.
I know that the command to find it might look like this:
grep 'string_to_find' -r ./*
But how can I replace every instance of string_to_find with another string?

Another option is to use find and then pass it through sed.
find /path/to/files -type f -exec sed -i 's/oldstring/new string/g' {} \;

I got the answer.
grep -rl matchstring somedir/ | xargs sed -i 's/string1/string2/g'

You could even do it like this:
Example
grep -rl 'windows' ./ | xargs sed -i 's/windows/linux/g'
This will search for the string 'windows' in all files relative to the current directory and replace 'windows' with 'linux' for each occurrence of the string in each file.

This works best for me on OS X:
grep -r -l 'searchtext' . | sort | uniq | xargs perl -e "s/matchtext/replacetext/" -pi
Source: http://www.praj.com.au/post/23691181208/grep-replace-text-string-in-files

Usually not with grep, but rather with sed -i 's/string_to_find/another_string/g' or perl -i.bak -pe 's/string_to_find/another_string/g'.

Other solutions mix regex syntaxes. To use perl/PCRE patterns for both search and replace, and process only matching files, this works quite well:
grep -rlIZPi 'match1' | xargs -0r perl -pi -e 's/match2/replace/gi;'
match1 and match2 are usually identical but match2 can contain more advanced features that are only relevant to the substitution, e.g. capturing groups.
Translation: grep recursively and list matching filenames, each separated by null to protect any special characters; pipe any filenames to xargs which is expecting a null-separated list; if any filenames are received, pass them to perl to perform the actual substitutions.
For case-sensitive matching, drop the i flag from grep and the i pattern modifier from the s/// expression, but not the i flag from perl itself. To include binary files, remove the I flag from grep.

Be very careful when using find and sed in a git repo! If you don't exclude the binary files you can end up with this error:
error: bad index file sha1 signature
fatal: index file corrupt
To solve this error you need to revert the sed by replacing your new_string with your old_string. This will revert your replaced strings, so you will be back to the beginning of the problem.
The correct way to search for a string and replace it is to skip find and use grep instead in order to ignore the binary files:
sed -ri -e "s/old_string/new_string/g" $(grep -Elr --binary-files=without-match "old_string" "/files_dir")
Credits for #hobs

Here is what I would do:
find /path/to/dir -type f -iname "*filename*" -print0 | xargs -0 sed -i '/searchstring/s/old/new/g'
this will look for all files containing filename in the file's name under the /path/to/dir, than for every file found, search for the line with searchstring and replace old with new.
Though if you want to omit looking for a specific file with a filename string in the file's name, than simply do:
find /path/to/dir -type f -print0 | xargs -0 sed -i '/searchstring/s/old/new/g'
This will do the same thing above, but to all files found under /path/to/dir.

Modern rust tools can be used to do this job.
For example to replace in all (non ignored) files "oldstring" and "oldString" with "newstring" and "newString" respectively you can :
Use fd and sd
fd -tf -x sd 'old([Ss]tring)' 'new$1' {}
Use ned
ned -R -p 'old([Ss]tring)' -r 'new$1' .
Use ruplacer
ruplacer --go 'old([Ss]tring)' 'new$1' .
Ignored files
To include ignored (by .gitignore) and hidden files you have to specify it :
use -IH for fd,
use --ignored --hiddenfor ruplacer.

Another option would be to just use perl with globstar.
Enabling shopt -s globstar in your .bashrc (or wherever) allows the ** glob pattern to match all sub-directories and files recursively.
Thus using perl -pXe 's/SEARCH/REPLACE/g' -i ** will recursively
replace SEARCH with REPLACE.
The -X flag tells perl to "disable all warnings" - which means that
it won't complain about directories.
The globstar also allows you to do things like sed -i 's/SEARCH/REPLACE/g' **/*.ext if you wanted to replace SEARCH with REPLACE in all child files with the extension .ext.

Use grep to search for a string in files, include subfolders

i have to search for a particular text in files and for that im using grep command but it searches only in current folder.What i want is that using a single grep command i can search a particular thing in the current folder as well as in all of its sub folders.How can i do that???

POSIX grep does not support recursive searching - the GNU version of grep does.
find . -type f -exec grep 'pattern' {} \;
would be runnable on any POSIX compliant UNIX.

man grep says
-R, -r, --recursive
Read all files under each directory, recursively; this is
equivalent to the -d recurse option.

And even more common is to use find with xargs, say
find <dir> -type f -name <shellglob> -print0 | xargs grep -0
where -print0 and -0, respectively, would use null char to separate entries in order to avoid issues with filenames having space characters.

How do I recursively grep all directories and subdirectories?

How do I recursively grep all directories and subdirectories?
find . | xargs grep "texthere" *

grep -r "texthere" .
The first parameter represents the regular expression to search for, while the second one represents the directory that should be searched. In this case, . means the current directory.
Note: This works for GNU grep, and on some platforms like Solaris you must specifically use GNU grep as opposed to legacy implementation. For Solaris this is the ggrep command.

If you know the extension or pattern of the file you would like, another method is to use --include option:
grep -r --include "*.txt" texthere .
You can also mention files to exclude with --exclude.
Ag
If you frequently search through code, Ag (The Silver Searcher) is a much faster alternative to grep, that's customized for searching code. For instance, it's recursive by default and automatically ignores files and directories listed in .gitignore, so you don't have to keep passing the same cumbersome exclude options to grep or find.

I now always use (even on Windows with GoW -- Gnu on Windows):
grep --include="*.xxx" -nRHI "my Text to grep" *
(As noted by kronen in the comments, you can add 2>/dev/null to void permission denied outputs)
That includes the following options:
--include=PATTERN
Recurse in directories only searching file matching PATTERN.
-n, --line-number
Prefix each line of output with the line number within its input file.
(Note: phuclv adds in the comments that -n decreases performance a lot so, so you might want to skip that option)
-R, -r, --recursive
Read all files under each directory, recursively; this is equivalent to the -d recurse option.
-H, --with-filename
Print the filename for each match.
-I
Process a binary file as if it did not contain matching data;
this is equivalent to the --binary-files=without-match option.
And I can add 'i' (-nRHIi), if I want case-insensitive results.
I can get:
/home/vonc/gitpoc/passenger/gitlist/github #grep --include="*.php" -nRHI "hidden" *
src/GitList/Application.php:43: 'git.hidden' => $config->get('git', 'hidden') ? $config->get('git', 'hidden') : array(),
src/GitList/Provider/GitServiceProvider.php:21: $options['hidden'] = $app['git.hidden'];
tests/InterfaceTest.php:32: $options['hidden'] = array(self::$tmpdir . '/hiddenrepo');
vendor/klaussilveira/gitter/lib/Gitter/Client.php:20: protected $hidden;
vendor/klaussilveira/gitter/lib/Gitter/Client.php:170: * Get hidden repository list
vendor/klaussilveira/gitter/lib/Gitter/Client.php:176: return $this->hidden;
...

Also:
find ./ -type f -print0 | xargs -0 grep "foo"
but grep -r is a better answer.

globbing **
Using grep -r works, but it may overkill, especially in large folders.
For more practical usage, here is the syntax which uses globbing syntax (**):
grep "texthere" **/*.txt
which greps only specific files with pattern selected pattern. It works for supported shells such as Bash +4 or zsh.
To activate this feature, run: shopt -s globstar.
See also: How do I find all files containing specific text on Linux?
git grep
For projects under Git version control, use:
git grep "pattern"
which is much quicker.
ripgrep
For larger projects, the quickest grepping tool is ripgrep which greps files recursively by default:
rg "pattern" .
It's built on top of Rust's regex engine which uses finite automata, SIMD and aggressive literal optimizations to make searching very fast. Check the detailed analysis here.

In POSIX systems, you don't find -r parameter for grep and your grep -rn "stuff" . won't run, but if you use find command it will:
find . -type f -exec grep -n "stuff" {} \; -print
Agreed by Solaris and HP-UX.

If you only want to follow actual directories, and not symbolic links,
grep -r "thingToBeFound" directory
If you want to follow symbolic links as well as actual directories (be careful of infinite recursion),
grep -R "thing to be found" directory
Since you're trying to grep recursively, the following options may also be useful to you:
-H: outputs the filename with the line
-n: outputs the line number in the file
So if you want to find all files containing Darth Vader in the current directory or any subdirectories and capture the filename and line number, but do not want the recursion to follow symbolic links, the command would be
grep -rnH "Darth Vader" .
If you want to find all mentions of the word cat in the directory
/home/adam/Desktop/TomAndJerry
and you're currently in the directory
/home/adam/Desktop/WorldDominationPlot
and you want to capture the filename but not the line number of any instance of the string "cats", and you want the recursion to follow symbolic links if it finds them, you could run either of the following
grep -RH "cats" ../TomAndJerry #relative directory
grep -RH "cats" /home/adam/Desktop/TomAndJerry #absolute directory
Source:
running "grep --help"
A short introduction to symbolic links, for anyone reading this answer and confused by my reference to them:
https://www.nixtutor.com/freebsd/understanding-symbolic-links/

To find name of files with path recursively containing the particular string use below command
for UNIX:
find . | xargs grep "searched-string"
for Linux:
grep -r "searched-string" .
find a file on UNIX server
find . -type f -name file_name
find a file on LINUX server
find . -name file_name

just the filenames can be useful too
grep -r -l "foo" .

another syntax to grep a string in all files on a Linux system recursively
grep -irn "string"
the -r indicates a recursive search that searches for the specified string in the given directory and sub directory looking for the specified string in files, program, etc
-i ingnore case sensitive can be used to add inverted case string
-n prints the line number of the specified string
NB: this prints massive result to the console so you might need to filter the output by piping and remove less interesting bits of info it also searches binary programs so you might want to filter some of the results

ag is my favorite way to do this now github.com/ggreer/the_silver_searcher . It's basically the same thing as ack but with a few more optimizations.
Here's a short benchmark. I clear the cache before each test (cf https://askubuntu.com/questions/155768/how-do-i-clean-or-disable-the-memory-cache )
ryan#3G08$ sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
3
ryan#3G08$ time grep -r "hey ya" .
real 0m9.458s
user 0m0.368s
sys 0m3.788s
ryan#3G08:$ sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
3
ryan#3G08$ time ack-grep "hey ya" .
real 0m6.296s
user 0m0.716s
sys 0m1.056s
ryan#3G08$ sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
3
ryan#3G08$ time ag "hey ya" .
real 0m5.641s
user 0m0.356s
sys 0m3.444s
ryan#3G08$ time ag "hey ya" . #test without first clearing cache
real 0m0.154s
user 0m0.224s
sys 0m0.172s

This should work:
grep -R "texthere" *

If you are looking for a specific content in all files from a directory structure, you may use find since it is more clear what you are doing:
find -type f -exec grep -l "texthere" {} +
Note that -l (downcase of L) shows the name of the file that contains the text. Remove it if you instead want to print the match itself. Or use -H to get the file together with the match. All together, other alternatives are:
find -type f -exec grep -Hn "texthere" {} +
Where -n prints the line number.

This is the one that worked for my case on my current machine (git bash on windows 7):
find ./ -type f -iname "*.cs" -print0 | xargs -0 grep "content pattern"
I always forget the -print0 and -0 for paths with spaces.
EDIT: My preferred tool is now instead ripgrep: https://github.com/BurntSushi/ripgrep/releases . It's really fast and has better defaults (like recursive by default). Same example as my original answer but using ripgrep: rg -g "*.cs" "content pattern"

grep -r "texthere" . (notice period at the end)
(^credit: https://stackoverflow.com/a/1987928/1438029)
Clarification:
grep -r "texthere" / (recursively grep all directories and subdirectories)
grep -r "texthere" . (recursively grep these directories and subdirectories)
grep recursive
grep [options] PATTERN [FILE...]
[options]
-R, -r, --recursive
Read all files under each directory, recursively.
This is equivalent to the -d recurse or --directories=recurse option.
http://linuxcommand.org/man_pages/grep1.html
grep help
$ grep --help
$ grep --help |grep recursive
-r, --recursive like --directories=recurse
-R, --dereference-recursive
Alternatives
ack (http://beyondgrep.com/)
ag (http://github.com/ggreer/the_silver_searcher)

Throwing my two cents here. As others already mentioned grep -r doesn't work on every platform. This may sound silly but I always use git.
git grep "texthere"
Even if the directory is not staged, I just stage it and use git grep.

Below are the command for search a String recursively on Unix and Linux environment.
for UNIX command is:
find . -name "string to be searched" -exec grep "text" "{}" \;
for Linux command is:
grep -r "string to be searched" .

In 2018, you want to use ripgrep or the-silver-searcher because they are way faster than the alternatives.
Here is a directory with 336 first-level subdirectories:
% find . -maxdepth 1 -type d | wc -l
336
% time rg -w aggs -g '*.py'
...
rg -w aggs -g '*.py' 1.24s user 2.23s system 283% cpu 1.222 total
% time ag -w aggs -G '.*py$'
...
ag -w aggs -G '.*py$' 2.71s user 1.55s system 116% cpu 3.651 total
% time find ./ -type f -name '*.py' | xargs grep -w aggs
...
find ./ -type f -name '*.py' 1.34s user 5.68s system 32% cpu 21.329 total
xargs grep -w aggs 6.65s user 0.49s system 32% cpu 22.164 total
On OSX, this installs ripgrep: brew install ripgrep. This installs silver-searcher: brew install the_silver_searcher.

In my IBM AIX Server (OS version: AIX 5.2), use:
find ./ -type f -print -exec grep -n -i "stringYouWannaFind" {} \;
this will print out path/file name and relative line number in the file like:
./inc/xxxx_x.h
2865: /** Description : stringYouWannaFind */
anyway,it works for me : )

For a list of available flags:
grep --help
Returns all matches for the regexp texthere in the current directory, with the corresponding line number:
grep -rn "texthere" .
Returns all matches for texthere, starting at the root directory, with the corresponding line number and ignoring case:
grep -rni "texthere" /
flags used here:
-r recursive
-n print line number with output
-i ignore case

Note that find . -type f | xargs grep whatever sorts of solutions will run into "Argument list to long" errors when there are too many files matched by find.
The best bet is grep -r but if that isn't available, use find . -type f -exec grep -H whatever {} \; instead.

I guess this is what you're trying to write
grep myText $(find .)
and this may be something else helpful if you want to find the files grep hit
grep myText $(find .) | cut -d : -f 1 | sort | uniq

For .gz files, recursively scan all files and directories
Change file type or put *
find . -name \*.gz -print0 | xargs -0 zgrep "STRING"

Just for fun, a quick and dirty search of *.txt files if the #christangrant answer is too much to type :-)
grep -r texthere .|grep .txt

Here's a recursive (tested lightly with bash and sh) function that traverses all subfolders of a given folder ($1) and using grep searches for given string ($3) in given files ($2):
$ cat script.sh
#!/bin/sh
cd "$1"
loop () {
for i in *
do
if [ -d "$i" ]
then
# echo entering "$i"
cd "$i"
loop "$1" "$2"
fi
done
if [ -f "$1" ]
then
grep -l "$2" "$PWD/$1"
fi
cd ..
}
loop "$2" "$3"
Running it and an example output:
$ sh script start_folder filename search_string
/home/james/start_folder/dir2/filename

Get the first matched files from grep command and get all the files don't contain some word, but input files for second grep comes from result files of first grep command.
grep -l -r --include "*.js" "FIRSTWORD" * | xargs grep "SECONDwORD"
grep -l -r --include "*.js" "FIRSTWORD" * | xargs grep -L "SECONDwORD"
dc0fd654-37df-4420-8ba5-6046a9dbe406
grep -l -r --include "*.js" "SEARCHWORD" * | awk -F'/' '{print $NF}' | xargs -I{} sh -c 'echo {}; grep -l -r --include "*.html" -w --include=*.js -e {} *; echo '''
5319778a-cec2-444d-bcc4-53d33821fedb

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

linux : listing files that contain several words - linux

I try to find a way to list all the files in the directory tree (recursively) that contain several words. While searching I found example such as egrep -R -l 'toto|tata' . but | induce OR. I would like AND... Thank you for your help

Related

How can I use grep to get all the lines that contains string1 and string2 separated by space?

Grep into given filenames

How to search and replace using grep

Use grep to search for a string in files, include subfolders

How do I recursively grep all directories and subdirectories?

Categories

Resources