complementar command to `grep token_1 | grep token_2` [duplicate] - linux

How do I match all lines not matching a particular pattern using grep? I tried this:
grep '[^foo]'

grep -v is your friend:
grep --help | grep invert
-v, --invert-match select non-matching lines
Also check out the related -L (the complement of -l).
-L, --files-without-match only print FILE names containing no match

You can also use awk for these purposes, since it allows you to perform more complex checks in a clearer way:
Lines not containing foo:
awk '!/foo/'
Lines containing neither foo nor bar:
awk '!/foo/ && !/bar/'
Lines containing neither foo nor bar which contain either foo2 or bar2:
awk '!/foo/ && !/bar/ && (/foo2/ || /bar2/)'
And so on.

In your case, you presumably don't want to use grep, but add instead a negative clause to the find command, e.g.
find /home/baumerf/public_html/ -mmin -60 -not -name error_log
If you want to include wildcards in the name, you'll have to escape them, e.g. to exclude files with suffix .log:
find /home/baumerf/public_html/ -mmin -60 -not -name \*.log

Related

using grep in single-line files to find the number of occurrences of a word/pattern

I have json files in the current directory, and subdirectories. All the files have a single line of content.
I want to a list of all files that contain the word XYZ, and the number of times it occurs in that file.
I want to print the list according to the following format:
file_name pattern_occurence_times
It should look something like:
.\x1\x2\file1.json 3
.\x1\file3.json 2
The problem is that grep counts the NUMBER of lines containing XYZ, not the number of occurrences.
Since the whole content of the files is always contained in a single line, the count is always 1 (if the pattern occurs in the file).
I used this command for that:
find . -type f -name "*.json" -exec grep --files-with-match -i 'xyz' {} \; -exec grep -wci 'xyz' {} \;
I wrote a python code, and it works, but I would like to know if there is any way of doing that using find and grep or any other command line tools.
Thanks
The classical approach to this problem is the pipeline grep -o regex file | wc -l. However, to execute a pipeline in find's -exec you have to run a shell (e.g. sh -c ... ). But all these things together will only print the number of matches, not the file names. Also, files with no matches have to be filtered out.
Because of all of this I think a single awk command would be preferable:
find ... -type f -exec awk '{$0=tolower($0); c+=gsub(/xyz/,"")}
END {if(c>0) print FILENAME " " c}' {} \;
Here the tolower($0) emulates grep's -i option. Make sure to write your search pattern xyz only in lowercase.
If you want to combine this with subsequent filters in find you can add else exit 1 at the end of the last awk block to continue (inside find) only with the printed files.
Use the -o option of grep, e.g. in conjunction with wc, e.g.
find . -name "*.json" | while read -r f ; do
echo $f : $(grep -ow XYZ "$f" | wc -l)
done

How can I use grep to get all the lines that contains string1 and string2 separated by space?

Line1: .................
Line2: #hello1 #hello2 #hello3
Line3: .................
Line4: .................
Line5: #hello1 #hello4 #hello3
Line6: #hello1 #hello2 #hello3
Line7: .................
I have files that look similar in terms of lines on one of my project directories. I want to get the counts of all the lines that contain #hello1 and #hello2. In this case I would get 2 as a result only for this file. However, I want to do this recursively.
The canonical way to "do something recursively" is to use the find command. If you want to find lines that have two words on them, a simple regex will do:
grep -lr '#hello1.*#hello2' .
The option -l instructs grep to show us only filenames rather than file content, and the option -r tells grep to traverse the filesystem recursively. The start of the search is the path at the end of the line. Once you have the list of files, you can parse that list using commands run by xargs.
For example, this will count all the lines in files matching the pattern you specified.
grep -lr '#hello1.*#hello2' . | xargs -n 1 wc -l
This uses xargs to run the wc command on each of the files listed by grep. You could probably also run this without the -n 1, unless you're dealing with many many thousands of files that would exceed your maximum command line length.
Or, if I'm interpreting your question correctly, the following will count just the patterns in those files.
grep -lr '#hello1.*#hello2' . | xargs -n 1 grep -Hc '#hello1.*#hello2'
This runs a similar grep to the one used to generate your recursive list of files, and presents the output with filename (-H) and count (-c).
But if you want complex rules like finding two patterns possibly on different lines in the file, then grep probably is not the optimal tool, unless you use multiple greps launched by find:
find /path/to/base -type f \
-exec grep -q '#hello1' {} \; \
-exec grep -q '#hello2' {} \; \
-print
(Lines split for easier reading.)
This is somewhat costly, as find needs to launch up to two children for each file. So another approach would be to use awk instead:
find /path/to/base -type f \
-exec awk '/#hello1/{c++} /#hello2/{c++} c==2{r=1} END{exit 1-r}' {} \; \
-print
Alternately, if your shell is bash version 4 or above, you can avoid using find and use the bash option globstar:
$ shopt -s globstar
$ awk 'FNR=1{c=0} /#hello1/{c++} /#hello2/{c++} c==2{print FILENAME;nextfile}' **/*
Note: none of this is tested.
If you are not nterested in the number of files also,
then just something along:
find $BASEDIRECTORY -type f -print0 | xargs -0 grep -h PATTERN | wc -l
If you want to count lines containing #hello1 and #hello2 separated by space in a specific file you can:
$ grep -c '#hello1 #hello2' file
If you want to count in more than one file:
$ grep -c '#hello1 #hello2' file1 file2 ...
And if you want to get the gran total:
$ grep -c '#hello1 #hello2' file1 file2 ... | paste -s -d+ - | bc
of course you can let your shell expanding file names. So, for example:
$ grep -c '#hello1 #hello2' *.txt | paste -s -d+ - | bc
or so...
find . -type f | xargs -1 awk '/#hello1/ && /#hello2/{c++} END{print FILENAME, c+0}'

How do I count the number of instances a string/sub-string appears in the filenames of a certain directory in Unix?

Say my current working directory is called my_dir and I have a few different files in them:
file1
file_dog
filedog_1
dog_file
How do I count the number of instances "dog" appears (should be 3) in my directory?
Thank you!
You can use the -c option from grep
grep -c:
Suppress normal output; instead print a count of matching lines for each input file.
ls -1 | grep -c dog
or:
ls -1 *[dD][oO][gG]* | wc -l
The following solution works even if filenames contain newline characters:
$ touch file1 file_dog filedog_1 dog_file
$ find . -maxdepth 1 -name '*dog*' -print0 | grep -zc .
3
How it works:
find . -maxdepth 1 -name '*dog*' -print0
This finds all files in the current directory with dog in their name and prints them out in a nul-separated list. Since the nul character is one of the few not allowed in a file name, this is safe. If you want a case-insensitive match (so that Dog is matched as well), replace -name with -iname.
grep -zc .
This reads in a nul-separated list and counts the lines.
-z tells grep that the input is nul-separated
-c tells grep to suppress normal output and count the number of matches
. tells grep to match anything.

How can I search for keywords using logical AND conditions in files on Ubuntu?

I've been trying to search for multiple keyword in my Ubuntu files. I know how to do it for one file :
find /[myRep] -type f | xargs grep -rl "myFunction"
I wanted to do it for two keywords, such as myFunction and myClass, to get all the files that can instantiate myFunction in myClass.
I tryed to use :
find /[myRep] -type f | xargs grep -rl "myFunction" | xargs grep -rl "myClass"
I get results, but I'm not sure if this is accurate. Plus, I wonder if there is a simple way to add more logical conditions in the search, such as "OR", or "NOT" commands ...
Use Regex Alternation for Logical OR Conditions
If you're trying to find files that contain either "myFunction" or "myClass", you could use an extended regular expression with alternation For example:
# Using GNU Find and GNU Grep
find . exec grep --extended-regexp --files-with-matches 'myFunction|myClass' {} +
When passed a list of files to grep, this will show you matching files that contain either word.
Logical AND is Trickier
A logical AND is trickier because you have to account for ordering. You can either:
Filter files on one set of requirements, then the other.
Use a more full-feature program where you can store state.
As a trivial example of the first case:
# Use nulls to separate filenames for safety.
find /etc/passwd -print0 |
xargs -0 egrep -Zl root |
xargs -0 egrep -Zl www
As a contrived example of the second case, you could use GNU awk:
# Print name of current file if it matches both alternates
# on different lines.
find /etc/passwd -print0 |
xargs -0 awk 'BEGIN {matches=0};
/root|www/ {matches+=1};
matches >= 2 {print FILENAME; matches=0; nextfile}'
Your command looks fine to me. You first grep all files to find those which contain "myFunction" and then pass them through another grep for "myClass". As a result, you will end up with files containing both "myFunction" and "myClass".

Shell: find files in a list under a directory

I have a list containing about 1000 file names to search under a directory and its subdirectories. There are hundreds of subdirs with more than 1,000,000 files. The following command will run find for 1000 times:
cat filelist.txt | while read f; do find /dir -name $f; done
Is there a much faster way to do it?
If filelist.txt has a single filename per line:
find /dir | grep -f <(sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
(The -f option means that grep searches for all the patterns in the given file.)
Explanation of <(sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt):
The <( ... ) is called a process subsitution, and is a little similar to $( ... ). The situation is equivalent to (but using the process substitution is neater and possibly a little faster):
sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt
The call to sed runs the commands s#^#/#, s/$/$/ and s/\([\.[\*]\|\]\)/\\\1/g on each line of filelist.txt and prints them out. These commands convert the filenames into a format that will work better with grep.
s#^#/# means put a / at the before each filename. (The ^ means "start of line" in a regex)
s/$/$/ means put a $ at the end of each filename. (The first $ means "end of line", the second is just a literal $ which is then interpreted by grep to mean "end of line").
The combination of these two rules means that grep will only look for matches like .../<filename>, so that a.txt doesn't match ./a.txt.backup or ./abba.txt.
s/\([\.[\*]\|\]\)/\\\1/g puts a \ before each occurrence of . [ ] or *. Grep uses regexes and those characters are considered special, but we want them to be plain so we need to escape them (if we didn't escape them, then a file name like a.txt would match files like abtxt).
As an example:
$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile
$ sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$
Grep then uses each line of that output as a pattern when it is searching the output of find.
If filelist.txt is a plain list:
$ find /dir | grep -F -f filelist.txt
If filelist.txt is a pattern list:
$ find /dir | grep -f filelist.txt
Use xargs(1) for the while loop can be a bit faster than in bash.
Like this
xargs -a filelist.txt -I filename find /dir -name filename
Be careful if the file names in filelist.txt contains whitespaces, read the second paragraph in the DESCRIPTION section of xargs(1) manpage about this problem.
An improvement based on some assumptions. For example, a.txt is in filelist.txt, and you can make sure there is only one a.txt in /dir. Then you can tell find(1) to exit early when it finds the instance.
xargs -a filelist.txt -I filename find /dir -name filename -print -quit
Another solution. You can pre-process the filelist.txt, make it into a find(1) arguments list like this. This will reduce find(1) invocations:
find /dir -name 'a.txt' -or -name 'b.txt' -or -name 'c.txt'
I'm not entirely sure of the question here, but I came to this page after trying to find a way to discover which 4 of 13000 files had failed to copy.
Neither of the answers did it for me so I did this:
cp file-list file-list2
find dir/ >> file-list2
sort file-list2 | uniq -u
Which resulted with a list of the 4 files I needed.
The idea is to combine the two file lists to determine the unique entries.
sort is used to make duplicate entries adjacent to each other which is the only way uniq will filter them out.

Resources