How to search for multiple patterns in multiple files using find or sed - linux

grep -Zrl '$pattern1' /path/of/file | xargs -0 grep -rlZ '$pattern2' | xargs -0 grep -l '$pattern3' | xargs grep --color -C1 -E "$pattern1|$pattern2|$pattern3"
how to write the above command using sed or find.
The above command is basically searching 3 patterns at same time in multiple files.

sed is Stream EDitor, might not be the best utility to use for searching patterns.
I'm guessing you're trying to grep 3 patterns in one set of files, which you already did in your last pipe:
grep --color -C1 -E "$pattern1|$pattern2|$pattern3"
I'd use find and grep together when I know there are some patterns in the filenames, then grep based on the results like:
find -iname '*pattern_in_filename*' -exec grep -E "$pattern1|$pattern2|$pattern3" {} ;

Related

linux merge only subset of lines from multiple files

I have the following folder structure:
/drive1/180204_somerandomtext/file.csv
/drive1/180504_somerandomtext/file.csv
/drive1/190101_somerandomtext/file.csv
/drive1/190305_somerandomtext/file.csv
...
Each file.txt has the same structure, but contains different data. From a file.txt I want to extract only a subset of lines using the following command:
grep -A5000 -m1 -e 'Sample_ID,' /drive1/180204_somerandomtext/file.csv | tail -n+2
This command works and prints the next 5000 lines following the line that starts with 'Sample_ID,'
I've extended this command
grep -A5000 -m1 -e 'Sample_ID,' /drive1/180204_somerandomtext/file.csv | tail -n+2 | sed 's/^/180204_somerandomtext,/'
Using the 'sed' I now add to the beginning of each line the pattern '180204_somerandomtext', which is actually the name of the folder that contains the file.csv
I'm now stuck at the following steps:
how to do this for all file.csv files in the subfolders of drive1
how to store this result in one large file called 'samples.csv'
I've tried something with xargs. It works with the grep command, but piping the sed, isn't working then.
find /drive1/ -maxdepth 1 -name '1*' | cut -d '/' -f2 | xargs -I {} grep -A5000 -m1 -e 'Sample_ID,' /drive1/{}/file.csv | sed 's/^/{},/'
I'm also not a big fan of xargs, I'm finding find -exec much clearer to use, let me explain:
Imagine I would like to do something with a file file1.txt:
Command -sw1 param1 -sw2 param2.1 param2.2 file1.txt
Launch a command, and use switches sw1, sw2 with parameters param1, param2.1 and param2.2.
When I want to perform this for all file1.txt within a directory structure, I do the following:
find . -name "file1.txt" -exec Command -sw1 param1 -sw2 param2.1 param2.2 {} \;
So I just put the find command (with some information on where and what to find), and afterwards comes the -exec. After that -exec I put the exact command, where I replace my original filename by {} and I end the whole thing by \;.
In your case, it would be something like:
find /drive1 -name file.csv -exec grep -A5000 -m1 -e 'Sample_ID,' {} \;

Recursive grep for gz files search string from an output string

I'm trying to search a string from an output of a string in recursive search with gz files folder.
I'm using the command which is only worked:
find . -name "*.gz" -exec zgrep -H 'PATTERN' \{\} \;
from find string inside a gzipped file in a folder
How can I make this happen just like using normal grep with pipe as follow?
cat <folder> | grep 'pattern1' | grep 'pattern2'
you can pipe the find results through a second grep:
find . -name "*.gz" -exec zgrep -H "PATTERN1" {} \; | grep "PATTERN2"
Regarding your specific question
How can I make this happen just like using normal grep with pipe: cat | grep 'pattern1' | grep 'pattern2'
You can use find to cat all the files and then grep, but this won't be good because you will only get result lines without the filename.
It's better to just use two grep commands:
zgrep 'pattern1' *.gz | grep 'pattern2'
If you want to include subdirectories you can use the globstar (assuming you are running bash):
shopt -s globstar
zgrep 'pattern1' **.gz | grep 'pattern2'
By your "pseudo-code" (cat | grep 'pattern2' | grep 'pattern3') do you mean?
If we have a file file.txt that contains:
pattern this text on the line <br>
pattern2 this text on the line <br>
this pattern3 <br>
pattern2pattern3 this line
then your "pseudo-code"
cat file.txt | grep 'pattern2' | grep 'pattern3'
would result in: pattern2pattern3 this line. If this is what you want, we could use
zcat file.gz | grep 'pattern1' | grep 'pattern2'
But if we look at the find . -name "*.gz" -exec zgrep -H 'PATTERN' \{\} \; it is not even close to the samt thing because that would be more like:
cat /**.*txt | grep 'pattern'
which it self is a bit special the samt result would be given by
grep -R 'pattern'
Then i would say it in the case of .gz (not sure if the /**/*.gz will work with zcat)
zcat /**/*.gz | grep 'pattern'

Grep - How to concatenate filename to each returned line of file content?

I have a statement which
Finds a set of files
Cats their contents out
Then greps their contents
It is this pipeline:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name="
produces an output such as:
<start-node name="Start" secure="false"/>
<start-node name="Run" secure="false"/>
What I was hoping to get is something like:
filename1-<start-node name="Start" secure="false"/>
filename2-<start-node name="Run" secure="false"/>
An easier may be to execute grep on the result of find, without xargs and cat:
grep -i "Test_" `find .` | grep -i "start-node name="
Because you cat all the files into a single stream, grep doesn't have any filename information. You want to give all the filenames to grep as arguments:
find ... | xargs grep "<start-node name=" /dev/null
Note two additional changes - I've dropped the -i flag, as it appears you're inspecting XML, and that's not case-insensitive; I've added /dev/null to the list of files, so that grep always has at least two files of input, even if find only gives one result. That's the portable way to get grep to print filenames.
Now, let's look at the find command. Instead of finding all files, then filtering through grep, we can use the -iregex predicate of GNU grep:
find . -iregex '.*Test_.*' \( -type 'f' -o -type 'l' \) | xargs grep ...
The mixed-case pattern suggests your filenames aren't really case-insensitive, and you might not want to grep symlinks (I'm sure you don't want directories and special files passed through), in which case you can simplify (and can use portable find again):
find . -name '*Test_*' -type 'f' | xargs grep ...
Now protect against the kind of filenames that trip up pipelines, and you have
find . -name '*Test_*' -type 'f' -print0 \
| xargs -0 grep -e "<start-node name=" -- /dev/null
Alternatively, if you have GNU grep, you don't need find at all:
grep --recursive --include '*[Tt]est_*' -e "<start-node name=" .
If you just need to count them:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name=" | awk 'BEGIN{n=0}{n=n+1;print "filename" n "-" $0}'
From man grep:
-H Always print filename headers with output lines.

How can I grep while avoiding 'Too many arguments' [duplicate]

This question already has answers here:
Argument list too long error for rm, cp, mv commands
(31 answers)
Closed 7 years ago.
I was trying to clean out some spam email and ran into an issue. The amount of files in queue, were so large that my usual command was unable to process. It would give me an error about too many arguments.
I usually do this
grep -i user#domain.com 1US* | awk -F: '{print $1}' | xargs rm
1US* can be anything between 1US[a-zA-Z]. The only thing I could make work was running this horrible contraption. Its one file, with 1USa, 1USA, 1USb etc, through the entire alphabet. I know their has to be a way to run this more efficiently.
grep -s $SPAMMER /var/mailcleaner/spool/exim_stage1/input/1USa* | awk -F: '{print $1}' | xargs rm
grep -s $SPAMMER /var/mailcleaner/spool/exim_stage1/input/1USA* | awk -F: '{print $1}' | xargs rm
Run several instances of grep. Instead of
grep -i user#domain.com 1US* | awk '{...}' | xargs rm
do
(for i in 1US*; do grep -li user#domain "$i"; done) | xargs rm
Note the -l flag, since we only want the file name of the match. This will both speed up grep (terminate on first match) and makes your awk script unrequired. This could be improved by checking the return status of grep and calling rm, not using xargs (xargs is very fragile, IMO). I'll give you the better version if you ask.
Hope it helps.
you can use find to find all files which name's starting with the pattern '1US'. Then you can pipe the output to xargs which will take care, that the argument list will not growing to much and handle the grep call. Note that I've used a nullbyte to separate filenames for xargs. This avoids problems with problematic file names. ;)
find -maxdepth 1 -name '1US*' -printf '%f\0' | xargs -0 grep -u user#domain | awk ...
The -exec argument to find is useful here, I've used this myself in similar situations.
E.g.
# List the files that match
find /path/to/input/ -type f -exec grep -qiF spammer#spammy.com \{\} \; -print
# Once you're sure you've got it right
find /path/to/input/ -type f -exec grep -qiF spammer#spammy.com \{\} \; -delete
Using xargs is more efficient than using "find ... -exec grep" because you have less process creations etc.
One way to go about this would be:
ls 1US* | xargs grep -i user#domain.com | awk -F: '{print $1}' | xargs rm
But easier would be:
find . -iname "1US*" -exec rm {} \;
Use find and a loop instead of xargs.
find . -name '1US*' | \
while read x; do grep -iq user#domain "$x" && rm "$x"; done
This uses pipes and loops instead of arguments (both for grep and rm) and prevents issues related with limits on arguments.

Unix Command to List files containing string but *NOT* containing another string

How do I recursively view a list of files that has one string and specifically doesn't have another string? Also, I mean to evaluate the text of the files, not the filenames.
Conclusion:
As per comments, I ended up using:
find . -name "*.html" -exec grep -lR 'base\-maps' {} \; | xargs grep -L 'base\-maps\-bot'
This returned files with "base-maps" and not "base-maps-bot". Thank you!!
Try this:
grep -rl <string-to-match> | xargs grep -L <string-not-to-match>
Explanation: grep -lr makes grep recursively (r) output a list (l) of all files that contain <string-to-match>. xargs loops over these files, calling grep -L on each one of them. grep -L will only output the filename when the file does not contain <string-not-to-match>.
The use of xargs in the answers above is not necessary; you can achieve the same thing like this:
find . -type f -exec grep -q <string-to-match> {} \; -not -exec grep -q <string-not-to-match> {} \; -print
grep -q means run quietly but return an exit code indicating whether a match was found; find can then use that exit code to determine whether to keep executing the rest of its options. If -exec grep -q <string-to-match> {} \; returns 0, then it will go on to execute -not -exec grep -q <string-not-to-match>{} \;. If that also returns 0, it will go on to execute -print, which prints the name of the file.
As another answer has noted, using find in this way has major advantages over grep -Rl where you only want to search files of a certain type. If, on the other hand, you really want to search all files, grep -Rl is probably quicker, as it uses one grep process to perform the first filter for all files, instead of a separate grep process for each file.
These answers seem off as the match BOTH strings. The following command should work better:
grep -l <string-to-match> * | xargs grep -c <string-not-to-match> | grep '\:0'
Here is a more generic construction:
find . -name <nameFilter> -print0 | xargs -0 grep -Z -l <patternYes> | xargs -0 grep -L <patternNo>
This command outputs files whose name matches <nameFilter> (adjust find predicates as you need) which contain <patternYes>, but do not contain <patternNo>.
The enhancements are:
It works with filenames containing whitespace.
It lets you filter files by name.
If you don't need to filter by name (one often wants to consider all the files in current directory), you can strip find and add -R to the first grep:
grep -R -Z -l <patternYes> | xargs -0 grep -L <patternNo>
find . -maxdepth 1 -name "*.py" -exec grep -L "string-not-to-match" {} \;
This Command will get all ".py" files that don't contain "string-not-to-match" at same directory.
To match string A and exclude strings B & C being present in the same line I use, and quotes to allow search string to contain a space
grep -r <string A> | grep -v -e <string B> -e "<string C>" | awk -F ':' '{print $1}'
Explanation: grep -r recursively filters all lines matching in output format
filename: line
To exclude (grep -v) from those lines the ones that also contain either -e string B or -e string C. awk is used to print only the first field (the filename) using the colon as fieldseparator -F

Resources