How can I search for keywords using logical AND conditions in files on Ubuntu? - linux

I've been trying to search for multiple keyword in my Ubuntu files. I know how to do it for one file :
find /[myRep] -type f | xargs grep -rl "myFunction"
I wanted to do it for two keywords, such as myFunction and myClass, to get all the files that can instantiate myFunction in myClass.
I tryed to use :
find /[myRep] -type f | xargs grep -rl "myFunction" | xargs grep -rl "myClass"
I get results, but I'm not sure if this is accurate. Plus, I wonder if there is a simple way to add more logical conditions in the search, such as "OR", or "NOT" commands ...

Use Regex Alternation for Logical OR Conditions
If you're trying to find files that contain either "myFunction" or "myClass", you could use an extended regular expression with alternation For example:
# Using GNU Find and GNU Grep
find . exec grep --extended-regexp --files-with-matches 'myFunction|myClass' {} +
When passed a list of files to grep, this will show you matching files that contain either word.
Logical AND is Trickier
A logical AND is trickier because you have to account for ordering. You can either:
Filter files on one set of requirements, then the other.
Use a more full-feature program where you can store state.
As a trivial example of the first case:
# Use nulls to separate filenames for safety.
find /etc/passwd -print0 |
xargs -0 egrep -Zl root |
xargs -0 egrep -Zl www
As a contrived example of the second case, you could use GNU awk:
# Print name of current file if it matches both alternates
# on different lines.
find /etc/passwd -print0 |
xargs -0 awk 'BEGIN {matches=0};
/root|www/ {matches+=1};
matches >= 2 {print FILENAME; matches=0; nextfile}'

Your command looks fine to me. You first grep all files to find those which contain "myFunction" and then pass them through another grep for "myClass". As a result, you will end up with files containing both "myFunction" and "myClass".

Related

complementar command to `grep token_1 | grep token_2` [duplicate]

How do I match all lines not matching a particular pattern using grep? I tried this:
grep '[^foo]'
grep -v is your friend:
grep --help | grep invert
-v, --invert-match select non-matching lines
Also check out the related -L (the complement of -l).
-L, --files-without-match only print FILE names containing no match
You can also use awk for these purposes, since it allows you to perform more complex checks in a clearer way:
Lines not containing foo:
awk '!/foo/'
Lines containing neither foo nor bar:
awk '!/foo/ && !/bar/'
Lines containing neither foo nor bar which contain either foo2 or bar2:
awk '!/foo/ && !/bar/ && (/foo2/ || /bar2/)'
And so on.
In your case, you presumably don't want to use grep, but add instead a negative clause to the find command, e.g.
find /home/baumerf/public_html/ -mmin -60 -not -name error_log
If you want to include wildcards in the name, you'll have to escape them, e.g. to exclude files with suffix .log:
find /home/baumerf/public_html/ -mmin -60 -not -name \*.log

Run recursive grep using two patterns

How can I use this grep pattern to recursively search a directory? I need for both of these to be on the same line in the file the string. I keep getting the message back this is a directory. How can I make it search recursively all files with the extension .cfc?
"<cffunction" and "inject="
grep -insR "<cffunction" | grep "inject=" /c/mydirectory/
Use find and exec:
find your_dir -name "*.cfc" -type f -exec grep -insE 'inject=.*<cffunction|<cffunction.*inject=' /dev/null {} +
find finds your *.cfc files recursively and feeds into grep, picking only regular files (-type f)
inject=.*<cffunction|<cffunction.*inject= catches lines that have your patterns in either order
{} + ensures each invocation of grep gets up to ARG_MAX files
/dev/null argument to grep ensures that the output is prefixed with the name of file even when there is a single *.cfc file
You've got it backwards, you should pipe your file search to the second command, like:
grep -nisr "inject=" /c/mydirectory | grep "<cffunction"
edit: to exclude some directories and search only in *.cfc files, use:
grep -nisr --exclude-dir={/abs/dir/path,rel/dir/path} --include \*.cfc "inject=" /c/mydirectory | grep "<cffunction"

Grep into given filenames

I have a directory that contains many subdirectories that contain many files.
I list the contents of the current directory using ls *. I see that there are certain files that are relevant, in terms of their names. Therefore, the relevant files can be obtained as such ls * | grep "abc\|def\|ghi".
Now I want to search within the given filenames. So I try something like:
ls * | grep "abc\|def\|ghi" | zgrep -i "ERROR" *, however, this is not looking into the file contents, rather the names. Is there an easy way to do this with pipes?
To use grep to search the contents of files within a directory, try using the find command, using xargs to couple it with the grep command, like so:
find . -type f | xargs grep '...'
You can do it like this:
find -E . -type f -regex ".*/.*(abc|def).*" -exec grep -H ERROR {} \+
The -E allows use of extended regexes so you can use the pipe (|) for expressing alternations. The + at the end allows searching in as many files as possible for each invocation of -exec grep rather than needing a whole new process for every single file.
You should use xargs to grep each file contents:
ls * | grep "abc\|def\|ghi" | xargs zgrep -i "ERROR" *
I know you asked for a solution with pipes, but they are not necessary for this task. grep has many parameters, and can solve this problem alone:
grep . -rh --include "*abc*" --include "*def*" -e "ERROR"
Parameters:
--include : Search only files whose base name matches the give wildcard pattern (not regex!)
-h : Suppress the prefixing of file names on output.
-r : recursive
-e : regex filter pattern
grep -i "ERROR" `ls * | grep "abc\|def\|ghi"`

How to search and replace using grep

I need to recursively search for a specified string within all files and subdirectories within a directory and replace this string with another string.
I know that the command to find it might look like this:
grep 'string_to_find' -r ./*
But how can I replace every instance of string_to_find with another string?
Another option is to use find and then pass it through sed.
find /path/to/files -type f -exec sed -i 's/oldstring/new string/g' {} \;
I got the answer.
grep -rl matchstring somedir/ | xargs sed -i 's/string1/string2/g'
You could even do it like this:
Example
grep -rl 'windows' ./ | xargs sed -i 's/windows/linux/g'
This will search for the string 'windows' in all files relative to the current directory and replace 'windows' with 'linux' for each occurrence of the string in each file.
This works best for me on OS X:
grep -r -l 'searchtext' . | sort | uniq | xargs perl -e "s/matchtext/replacetext/" -pi
Source: http://www.praj.com.au/post/23691181208/grep-replace-text-string-in-files
Usually not with grep, but rather with sed -i 's/string_to_find/another_string/g' or perl -i.bak -pe 's/string_to_find/another_string/g'.
Other solutions mix regex syntaxes. To use perl/PCRE patterns for both search and replace, and process only matching files, this works quite well:
grep -rlIZPi 'match1' | xargs -0r perl -pi -e 's/match2/replace/gi;'
match1 and match2 are usually identical but match2 can contain more advanced features that are only relevant to the substitution, e.g. capturing groups.
Translation: grep recursively and list matching filenames, each separated by null to protect any special characters; pipe any filenames to xargs which is expecting a null-separated list; if any filenames are received, pass them to perl to perform the actual substitutions.
For case-sensitive matching, drop the i flag from grep and the i pattern modifier from the s/// expression, but not the i flag from perl itself. To include binary files, remove the I flag from grep.
Be very careful when using find and sed in a git repo! If you don't exclude the binary files you can end up with this error:
error: bad index file sha1 signature
fatal: index file corrupt
To solve this error you need to revert the sed by replacing your new_string with your old_string. This will revert your replaced strings, so you will be back to the beginning of the problem.
The correct way to search for a string and replace it is to skip find and use grep instead in order to ignore the binary files:
sed -ri -e "s/old_string/new_string/g" $(grep -Elr --binary-files=without-match "old_string" "/files_dir")
Credits for #hobs
Here is what I would do:
find /path/to/dir -type f -iname "*filename*" -print0 | xargs -0 sed -i '/searchstring/s/old/new/g'
this will look for all files containing filename in the file's name under the /path/to/dir, than for every file found, search for the line with searchstring and replace old with new.
Though if you want to omit looking for a specific file with a filename string in the file's name, than simply do:
find /path/to/dir -type f -print0 | xargs -0 sed -i '/searchstring/s/old/new/g'
This will do the same thing above, but to all files found under /path/to/dir.
Modern rust tools can be used to do this job.
For example to replace in all (non ignored) files "oldstring" and "oldString" with "newstring" and "newString" respectively you can :
Use fd and sd
fd -tf -x sd 'old([Ss]tring)' 'new$1' {}
Use ned
ned -R -p 'old([Ss]tring)' -r 'new$1' .
Use ruplacer
ruplacer --go 'old([Ss]tring)' 'new$1' .
Ignored files
To include ignored (by .gitignore) and hidden files you have to specify it :
use -IH for fd,
use --ignored --hiddenfor ruplacer.
Another option would be to just use perl with globstar.
Enabling shopt -s globstar in your .bashrc (or wherever) allows the ** glob pattern to match all sub-directories and files recursively.
Thus using perl -pXe 's/SEARCH/REPLACE/g' -i ** will recursively
replace SEARCH with REPLACE.
The -X flag tells perl to "disable all warnings" - which means that
it won't complain about directories.
The globstar also allows you to do things like sed -i 's/SEARCH/REPLACE/g' **/*.ext if you wanted to replace SEARCH with REPLACE in all child files with the extension .ext.

Shell: find files in a list under a directory

I have a list containing about 1000 file names to search under a directory and its subdirectories. There are hundreds of subdirs with more than 1,000,000 files. The following command will run find for 1000 times:
cat filelist.txt | while read f; do find /dir -name $f; done
Is there a much faster way to do it?
If filelist.txt has a single filename per line:
find /dir | grep -f <(sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
(The -f option means that grep searches for all the patterns in the given file.)
Explanation of <(sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt):
The <( ... ) is called a process subsitution, and is a little similar to $( ... ). The situation is equivalent to (but using the process substitution is neater and possibly a little faster):
sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt
The call to sed runs the commands s#^#/#, s/$/$/ and s/\([\.[\*]\|\]\)/\\\1/g on each line of filelist.txt and prints them out. These commands convert the filenames into a format that will work better with grep.
s#^#/# means put a / at the before each filename. (The ^ means "start of line" in a regex)
s/$/$/ means put a $ at the end of each filename. (The first $ means "end of line", the second is just a literal $ which is then interpreted by grep to mean "end of line").
The combination of these two rules means that grep will only look for matches like .../<filename>, so that a.txt doesn't match ./a.txt.backup or ./abba.txt.
s/\([\.[\*]\|\]\)/\\\1/g puts a \ before each occurrence of . [ ] or *. Grep uses regexes and those characters are considered special, but we want them to be plain so we need to escape them (if we didn't escape them, then a file name like a.txt would match files like abtxt).
As an example:
$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile
$ sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$
Grep then uses each line of that output as a pattern when it is searching the output of find.
If filelist.txt is a plain list:
$ find /dir | grep -F -f filelist.txt
If filelist.txt is a pattern list:
$ find /dir | grep -f filelist.txt
Use xargs(1) for the while loop can be a bit faster than in bash.
Like this
xargs -a filelist.txt -I filename find /dir -name filename
Be careful if the file names in filelist.txt contains whitespaces, read the second paragraph in the DESCRIPTION section of xargs(1) manpage about this problem.
An improvement based on some assumptions. For example, a.txt is in filelist.txt, and you can make sure there is only one a.txt in /dir. Then you can tell find(1) to exit early when it finds the instance.
xargs -a filelist.txt -I filename find /dir -name filename -print -quit
Another solution. You can pre-process the filelist.txt, make it into a find(1) arguments list like this. This will reduce find(1) invocations:
find /dir -name 'a.txt' -or -name 'b.txt' -or -name 'c.txt'
I'm not entirely sure of the question here, but I came to this page after trying to find a way to discover which 4 of 13000 files had failed to copy.
Neither of the answers did it for me so I did this:
cp file-list file-list2
find dir/ >> file-list2
sort file-list2 | uniq -u
Which resulted with a list of the 4 files I needed.
The idea is to combine the two file lists to determine the unique entries.
sort is used to make duplicate entries adjacent to each other which is the only way uniq will filter them out.

Resources