Find files containing multiple strings

Find files containing multiple strings - string

I use a command to recursively find files containing a certain string1:
find . -type f -exec grep -H string1 {} \;
I need to find files containing multiple strings, so the command should return those containing all strings. Something like this:
find . -type f -exec grep -H string1 AND string2 {} \;
I couldn't find a way. The strings can be anywhere in the files. Even a solution for only two strings would be nice.

you can also try this;
find . -type f -exec grep -l 'string1' {} \; | xargs grep -l 'string2'
this shows file names that contain string1 and string2

You can chain your actions and use the exit status of the first one to only execute the second one if the first one was successful. (Omitting the operator between primaries defaults to -and/-a.)
find . -type f -exec grep -q 'string1' {} \; -exec grep -H 'string2' {} \;
The first grep command uses -q, "quiet", which returns a successful exit status if the string was found.
To collect all files containing string1 and then run the search for string2 with just a single invocation of grep, you could use -exec ... {} +:
find . -type f -exec grep -q 'string1' {} \; -exec grep 'string2' {} +

with GNU grep
grep -rlZ 'string1' | xargs -0 grep -l 'string2'
from man grep
-r, --recursive
Read all files under each directory, recursively, following symbolic
links only if they are on the command line. Note that if no file
operand is given, grep searches the working directory. This is
equivalent to the -d recurse option.
-Z, --null
Output a zero byte (the ASCII NUL character) instead of the character that normally follows a file
name. For example, grep -lZ outputs a zero byte after each file name instead of the usual newline.
This option makes the output unambiguous, even in the presence of file names containing unusual
characters like newlines. This option can be used with commands like find -print0, perl -0, sort -z,
and xargs -0 to process arbitrary file names, even those that contain newline characters.

Amazed that this old question lacks the obvious simple Awk solution:
find . -type f -exec awk '/string1/ && /string2/ { print; r=1 } END { exit 1-r }' {} \;
The trickery with the r variable is just to emulate the exit code from grep (zero means found, one means not; if you don't care, you can take that out).
For efficiency, maybe switch from -exec ... {} \; to -exec ... {} + though then you might want to refactor the Awk script a bit (either throw out the exit code, or change it so the exit code indicates something like "no files matched" vs "only some files matched" vs "all files matched"?)
The above code looks for files which contain both strings on the same line. The case of finding them on any lines is an easy change.
awk '/string1/ { s1=1 }
/string2/ { s2=1 }
s1 && s2 { print FILENAME; exit }
END { exit(1 - (s1 && s2)) }' file
This just prints the name of the file, and assumes that you have a single input file. For processing multiple files, refactor slightly, to reset the values of s1 and s2 when visiting a new file:
awk 'FNR == 1 { s1 = s2 = 0 }
/string1/ { s1 = 1 }
/string2/ { s2 = 1 }
s1 && s2 { r=1; print FILENAME; nextfile }
END { exit 1-r }' file1 file2 file3 ...
Some ancient Awk versions might not support nextfile, though it is now in POSIX.

Answer
As you can see from the other answers on this page, there are several command-line tools that can be used to perform conjunctive searching across files. A fast and flexible solution that has not yet been posted is to use ag:
ag -l string1 | xargs ag -l string2
Useful variations
For case-insensitive searching, use the -i option of ag:
ag -il string1 | xargs ag -il string2
For additional search terms, extend the pipeline:
ag -l string1 | xargs ag -l string2 | xargs ag -l string3 | xargs ag -l string4

grep -rlZ string1 | xargs -0 grep -l string2
If your patterns are fixed strings, we can speed up the command by adding -F to grep:
grep -rlZF string1 | xargs -0 grep -lF string2

Related

Find the longest file name in Linux

I am searching for the longest filename from my root directory to the very bottom.
I have coded a C program that will calculate the longest file name's length and its name.
However, I cannot get the shell to redirect the long list of file names to standard input for my program to receive it.
Here is what I did:
ls -Rp | grep -v / | grep -v "Permission denied" | ./home/user/findlongest
findlongest has been compiled and I check it on one of my IDE's to make sure it's working correctly. No run time errors were detected so far.
How do I get the list of file names into my 'findlongest' code by redirecting stdin?

Try this:
find / -type f -printf '%f\n' 2>/dev/null | /home/user/findlongest
The 2>/dev/null will discard all data written to stderr (which is where you're seeing the 'Permission denied' messages from).
Or the following to remove the dependancy on your application (from here):
find / -type f -printf '%f\n' 2>/dev/null | \
awk 'length > max_length {
max_length = length; longest_line = $0
}
END {
print length(longest_line) " " longest_line
}'

What about
find / -type f | /home/user/findlongest
It will list all files from root with absolute path and print only those files you have permissions to list.

Based on the command:
find -exec basename '{}' ';'
which prints recursively only the filenames of all the files starting from the directory you are: all the filenames.
This bash line will provide the file with longest name and the its number of characters:
Note that the loop involved will make the process slow.
for i in $(find -exec basename '{}' ';'); do printf $i" " && echo -e -n $i | wc -c; done | sort -nk 2 | tail -1
By parts:
Prints the name of the file followed by a single space:
printf $i" "
Prints the number of characters of such file:
echo -e -n $i | wc -c
Sorts the output by number of characters and takes the longest one (the very latest):
sort -nk 2 | tail -1
All this inside a for loop to handle line by line.
The for sentence can be also changed by:
for i in $(find -type f -printf '%f\n');
As stated in #Attie's answer

File with the most lines in a directory NOT bytes

I'm trying to to wc -l an entire directory and then display the filename in an echo with the number of lines.
To add to my frustration, the directory has to come from a passed argument. So without looking stupid, can someone first tell me why a simple wc -l $1 doesn't give me the line count for the directory I type in the argument? I know i'm not understanding it completely.
On top of that I need validation too, if the argument given is not a directory or there is more than one argument.

wc works on files rather than directories so, if you want the word count on all files in the directory, you would start with:
wc -l $1/*
With various gyrations to get rid of the total, sort it and extract only the largest, you could end up with something like (split across multiple lines for readability but should be entered on a single line):
pax> wc -l $1/* 2>/dev/null
| grep -v ' total$'
| sort -n -k1
| tail -1l
2892 target_dir/big_honkin_file.txt
As to the validation, you can check the number of parameters passed to your script with something like:
if [[ $# -ne 1 ]] ; then
echo 'Whoa! Wrong parameteer count'
exit 1
fi
and you can check if it's a directory with:
if [[ ! -d $1 ]] ; then
echo 'Whoa!' "[$1]" 'is not a directory'
exit 1
fi

Is this what you want?
> find ./test1/ -type f|xargs wc -l
1 ./test1/firstSession_cnaiErrorFile.txt
77 ./test1/firstSession_cnaiReportFile.txt
14950 ./test1/exp.txt
1 ./test1/test1_cnaExitValue.txt
15029 total
so your directory which is the argument should go here:
find $your_complete_directory_path/ -type f|xargs wc -l

I'm trying to to wc -l an entire directory and then display the
filename in an echo with the number of lines.
You can do a find on the directory and use -exec option to trigger wc -l. Something like this:
$ find ~/Temp/perl/temp/ -exec wc -l '{}' \;
wc: /Volumes/Data/jaypalsingh/Temp/perl/temp/: read: Is a directory
11 /Volumes/Data/jaypalsingh/Temp/perl/temp//accessor1.plx
25 /Volumes/Data/jaypalsingh/Temp/perl/temp//autoincrement.pm
12 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless1.plx
14 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless2.plx
22 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr1.plx
27 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr2.plx
7 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee1.pm
18 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee2.pm
26 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee3.pm
12 /Volumes/Data/jaypalsingh/Temp/perl/temp//ftp.plx
14 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit1.plx
16 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit2.plx
24 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit3.plx
33 /Volumes/Data/jaypalsingh/Temp/perl/temp//persisthash.pm

Nice question!
I saw the answers. Some are pretty good. The find ...|xrags is my most preferred. It could be simplified anyway using find ... -exec wc -l {} + syntax. But there is a problem. When the command line buffer is full a wc -l ... is called and every time a <number> total line is printer. As wc has no arg to disable this feature wc has to be reimplemented. To filter out these lines with grep is not nice:
So my complete answer is
#!/usr/bin/bash
[ $# -ne 1 ] && echo "Bad number of args">&2 && exit 1
[ ! -d "$1" ] && echo "Not dir">&2 && exit 1
find "$1" -type f -exec awk '{++n[FILENAME]}END{for(i in n) printf "%8d %s\n",n[i],i}' {} +
Or using less temporary space, but a little bit larger code in awk:
find "$1" -type f -exec awk 'function pr(){printf "%8d %s\n",n,f}FNR==1{f&&pr();n=0;f=FILENAME}{++n}END{pr()}' {} +
Misc
If it should not be called for subdirectories then add -maxdepth 1 before -type to find.
It is pretty fast. I was afraid that it would be much slower then the find ... wc + version, but for a directory containing 14770 files (in several subdirs) the wc version run 3.8 sec and awk version run 5.2 sec.
awk and wc consider the not \n ended lines differently. The last line ended with no \n is not counted by wc. I prefer to count it as awk does.
It does not print the empty files

To find the file with most lines in the current directory and its subdirectories, with zsh:
lines() REPLY=$(wc -l < "$REPLY")
wc -l -- **/*(D.nO+lined[1])
That defines a lines function which is going to be used as a glob sorting function that returns in $REPLY the number of lines of the file whose path is given in $REPLY.
Then we use zsh's recursive globbing **/* to find regular files (.), numerically (n) reverse sorted (O) with the lines function (+lines), and select the first one [1]. (D to include dotfiles and traverse dotdirs).
Doing it with standard utilities is a bit tricky if you don't want to make assumptions on what characters file names may contain (like newline, space...). With GNU tools as found on most Linux distributions, it's a bit easier as they can deal with NUL terminated lines:
find . -type f -exec sh -c '
for file do
size=$(wc -c < "$file") &&
printf "%s\0" "$size:$file"
done' sh {} + |
tr '\n\0' '\0\n' |
sort -rn |
head -n1 |
tr '\0' '\n'
Or with zsh or GNU bash syntax:
biggest= max=-1
find . -type f -print0 |
{
while IFS= read -rd '' file; do
size=$(wc -l < "$file") &&
((size > max)) &&
max=$size biggest=$file
done
[[ -n $biggest ]] && printf '%s\n' "$max: $biggest"
}

Here's one that works for me with the git bash (mingw32) under windows:
find . -type f -print0| xargs -0 wc -l
This will list the files and line counts in the current directory and sub dirs. You can also direct the output to a text file and import it into Excel if needed:
find . -type f -print0| xargs -0 wc -l > fileListingWithLineCount.txt

Use wc on all subdirectories to count the sum of lines

How can I count all lines of all files in all subdirectories with wc?
cd mydir
wc -l *
..
11723 total
man wc suggests wc -l --files0-from=-, but I do not know how to generate the list of all files as NUL-terminated names
find . -print | wc -l --files0-from=-
did not work.

You probably want this:
find . -type f -print0 | wc -l --files0-from=-
If you only want the total number of lines, you could use
find . -type f -exec cat {} + | wc -l

Perhaps you are looking for exec option of find.
find . -type f -exec wc -l {} \; | awk '{total += $1} END {print total}'

To count all lines for specific file extension u can use ,
find . -name '*.fileextension' | xargs wc -l
if you want it on two or more different types of files u can put -o option
find . -name '*.fileextension1' -o -name '*.fileextension2' | xargs wc -l

Another option would be to use a recursive grep:
grep -hRc '' . | awk '{k+=$1}END{print k}'
The awk simply adds the numbers. The grep options used are:
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines. (-c is specified by POSIX.)
-h, --no-filename
Suppress the prefixing of file names on output. This is the
default when there is only one file (or only standard input) to
search.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
The grep, therefore, counts the number of lines matching anything (''), so essentially just counts the lines.

I would suggest something like
find ./ -type f | xargs wc -l | cut -c 1-8 | awk '{total += $1} END {print total}'

Based on ДМИТРИЙ МАЛИКОВ's answer:
Example for counting lines of java code with formatting:
one liner
find . -name *.java -exec wc -l {} \; | awk '{printf ("%3d: %6d %s\n",NR,$1,$2); total += $1} END {printf (" %6d\n",total)}'
awk part:
{
printf ("%3d: %6d %s\n",NR,$1,$2);
total += $1
}
END {
printf (" %6d\n",total)
}
example result
1: 120 ./opencv/NativeLibrary.java
2: 65 ./opencv/OsCheck.java
3: 5 ./opencv/package-info.java
190

Bit late to the game here, but wouldn't this also work? find . -type f | wc -l
This counts all lines output by the 'find' command. You can fine-tune the 'find' to show whatever you want. I am using it to count the number of subdirectories, in one specific subdir, in deep tree: find ./*/*/*/*/*/*/TOC -type d | wc -l . Output: 76435. (Just doing a find without all the intervening asterisks yielded an error.)

print search term with line count

Hello bash beginner question. I want to look through multiple files, find the lines that contain a search term, count the number of unique lines in this list and then print into a tex file:
the input file name
the search term used
the count of unique lines
so an example output line for file 'Firstpredictoroutput.txt' using search term 'Stop_gained' where there are 10 unique lines in the file would be:
Firstpredictoroutput.txt Stop_gained 10
I can get the unique count for a single file using:
grep 'Search_term' inputfile.txt | uniq -c | wc -l | >>output.txt
But I don't know enough yet about implementing loops in pipelines using bash.
All my inputfiles end with *predictoroutput.txt
Any help is greatly appreciated.
Thanks in advance,
Rubal

You can write a function called fun, and call the fun with two arguments: filename and pattern
$ fun() { echo "$1 $2 `grep -c $2 $1`"; }
$ fun input.txt Stop_gained
input.txt Stop_gained 2

You can use find :
find . -type f -exec sh -c "grep 'Search_term' {} | uniq -c | wc -l >> output.txt" \;
Although you can have issue with weird filenames. You can add more options to find, for example to treat only '.txt' files :
find . -type f -name "*.txt" -exec sh -c "grep 'Search_term' {} | uniq -c | wc -l >> output.txt" \;

q="search for this"
for f in *.txt; do echo "$f $q $(grep $q $f | uniq | wc -l)"; done > out.txt

How to count occurrences of a word in all the files of a directory?

I’m trying to count a particular word occurrence in a whole directory. Is this possible?
Say for example there is a directory with 100 files all of whose files may have the word “aaa” in them. How would I count the number of “aaa” in all the files under that directory?
I tried something like:
zegrep "xception" `find . -name '*auth*application*' | wc -l
But it’s not working.

grep -roh aaa . | wc -w
Grep recursively all files and directories in the current dir searching for aaa, and output only the matches, not the entire line. Then, just use wc to count how many words are there.

Another solution based on find and grep.
find . -type f -exec grep -o aaa {} \; | wc -l
Should correctly handle filenames with spaces in them.

Use grep in its simplest way. Try grep --help for more info.
To get count of a word in a particular file:
grep -c <word> <file_name>
Example:
grep -c 'aaa' abc_report.csv
Output:
445
To get count of a word in the whole directory:
grep -c -R <word>
Example:
grep -c -R 'aaa'
Output:
abc_report.csv:445
lmn_report.csv:129
pqr_report.csv:445
my_folder/xyz_report.csv:408

Let's use AWK!
$ function wordfrequency() { awk 'BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) { word = tolower($i); words[word]++ } } END { for (w in words) printf("%3d %s\n", words[w], w) } ' | sort -rn; }
$ cat your_file.txt | wordfrequency
This lists the frequency of each word occurring in the provided file. If you want to see the occurrences of your word, you can just do this:
$ cat your_file.txt | wordfrequency | grep yourword
To find occurrences of your word across all files in a directory (non-recursively), you can do this:
$ cat * | wordfrequency | grep yourword
To find occurrences of your word across all files in a directory (and it's sub-directories), you can do this:
$ find . -type f | xargs cat | wordfrequency | grep yourword
Source: AWK-ward Ruby

find .|xargs perl -p -e 's/ /\n'|xargs grep aaa|wc -l

cat the files together and grep the output: cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '\<exception\>'
if you want 'exceptional' to match, don't use the '\<' and '\>' around the word.

How about starting with:
cat * | sed 's/ /\n/g' | grep '^aaa$' | wc -l
as in the following transcript:
pax$ cat file1
this is a file number 1
pax$ cat file2
And this file is file number 2,
a slightly larger file
pax$ cat file[12] | sed 's/ /\n/g' | grep 'file$' | wc -l
4
The sed converts spaces to newlines (you may want to include other space characters as well such as tabs, with sed 's/[ \t]/\n/g'). The grep just gets those lines that have the desired word, then the wc counts those lines for you.
Now there may be edge cases where this script doesn't work but it should be okay for the vast majority of situations.
If you wanted a whole tree (not just a single directory level), you can use somthing like:
( find . -name '*.txt' -exec cat {} ';' ) | sed 's/ /\n/g' | grep '^aaa$' | wc -l

There's also a grep regex syntax for matching words only:
# based on Carlos Campderrós solution posted in this thread
man grep | less -p '\<'
grep -roh '\<aaa\>' . | wc -l
For a different word matching regex syntax see:
man re_format | less -p '\[\[:<:\]\]'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Find files containing multiple strings - string

you can also try this; find . -type f -exec grep -l 'string1' {} \; | xargs grep -l 'string2' this shows file names that contain string1 and string2

grep -rlZ string1 | xargs -0 grep -l string2 If your patterns are fixed strings, we can speed up the command by adding -F to grep: grep -rlZF string1 | xargs -0 grep -lF string2

Related

Find the longest file name in Linux

File with the most lines in a directory NOT bytes

Use wc on all subdirectories to count the sum of lines

print search term with line count

How to count occurrences of a word in all the files of a directory?

Categories

Resources