Recursive grep for gz files search string from an output string - linux

I'm trying to search a string from an output of a string in recursive search with gz files folder.
I'm using the command which is only worked:
find . -name "*.gz" -exec zgrep -H 'PATTERN' \{\} \;
from find string inside a gzipped file in a folder
How can I make this happen just like using normal grep with pipe as follow?
cat <folder> | grep 'pattern1' | grep 'pattern2'

you can pipe the find results through a second grep:
find . -name "*.gz" -exec zgrep -H "PATTERN1" {} \; | grep "PATTERN2"

Regarding your specific question
How can I make this happen just like using normal grep with pipe: cat | grep 'pattern1' | grep 'pattern2'
You can use find to cat all the files and then grep, but this won't be good because you will only get result lines without the filename.
It's better to just use two grep commands:
zgrep 'pattern1' *.gz | grep 'pattern2'
If you want to include subdirectories you can use the globstar (assuming you are running bash):
shopt -s globstar
zgrep 'pattern1' **.gz | grep 'pattern2'

By your "pseudo-code" (cat | grep 'pattern2' | grep 'pattern3') do you mean?
If we have a file file.txt that contains:
pattern this text on the line <br>
pattern2 this text on the line <br>
this pattern3 <br>
pattern2pattern3 this line
then your "pseudo-code"
cat file.txt | grep 'pattern2' | grep 'pattern3'
would result in: pattern2pattern3 this line. If this is what you want, we could use
zcat file.gz | grep 'pattern1' | grep 'pattern2'
But if we look at the find . -name "*.gz" -exec zgrep -H 'PATTERN' \{\} \; it is not even close to the samt thing because that would be more like:
cat /**.*txt | grep 'pattern'
which it self is a bit special the samt result would be given by
grep -R 'pattern'
Then i would say it in the case of .gz (not sure if the /**/*.gz will work with zcat)
zcat /**/*.gz | grep 'pattern'

Related

How to search for multiple patterns in multiple files using find or sed

grep -Zrl '$pattern1' /path/of/file | xargs -0 grep -rlZ '$pattern2' | xargs -0 grep -l '$pattern3' | xargs grep --color -C1 -E "$pattern1|$pattern2|$pattern3"
how to write the above command using sed or find.
The above command is basically searching 3 patterns at same time in multiple files.
sed is Stream EDitor, might not be the best utility to use for searching patterns.
I'm guessing you're trying to grep 3 patterns in one set of files, which you already did in your last pipe:
grep --color -C1 -E "$pattern1|$pattern2|$pattern3"
I'd use find and grep together when I know there are some patterns in the filenames, then grep based on the results like:
find -iname '*pattern_in_filename*' -exec grep -E "$pattern1|$pattern2|$pattern3" {} ;

Grep - How to concatenate filename to each returned line of file content?

I have a statement which
Finds a set of files
Cats their contents out
Then greps their contents
It is this pipeline:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name="
produces an output such as:
<start-node name="Start" secure="false"/>
<start-node name="Run" secure="false"/>
What I was hoping to get is something like:
filename1-<start-node name="Start" secure="false"/>
filename2-<start-node name="Run" secure="false"/>
An easier may be to execute grep on the result of find, without xargs and cat:
grep -i "Test_" `find .` | grep -i "start-node name="
Because you cat all the files into a single stream, grep doesn't have any filename information. You want to give all the filenames to grep as arguments:
find ... | xargs grep "<start-node name=" /dev/null
Note two additional changes - I've dropped the -i flag, as it appears you're inspecting XML, and that's not case-insensitive; I've added /dev/null to the list of files, so that grep always has at least two files of input, even if find only gives one result. That's the portable way to get grep to print filenames.
Now, let's look at the find command. Instead of finding all files, then filtering through grep, we can use the -iregex predicate of GNU grep:
find . -iregex '.*Test_.*' \( -type 'f' -o -type 'l' \) | xargs grep ...
The mixed-case pattern suggests your filenames aren't really case-insensitive, and you might not want to grep symlinks (I'm sure you don't want directories and special files passed through), in which case you can simplify (and can use portable find again):
find . -name '*Test_*' -type 'f' | xargs grep ...
Now protect against the kind of filenames that trip up pipelines, and you have
find . -name '*Test_*' -type 'f' -print0 \
| xargs -0 grep -e "<start-node name=" -- /dev/null
Alternatively, if you have GNU grep, you don't need find at all:
grep --recursive --include '*[Tt]est_*' -e "<start-node name=" .
If you just need to count them:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name=" | awk 'BEGIN{n=0}{n=n+1;print "filename" n "-" $0}'
From man grep:
-H Always print filename headers with output lines.

Read an input file one line at a time to pass as input to grep

My requirement is to read a file and then run a grep command on the read line one at a time for all the lines in the file.
Filtering the required file which matches a pattern
find . -name *.ini -exec grep -w HTC {} \; -print | grep ini > input.files
cat input.files
./PLATFORM/android/build/integration/android/suites/android_Prefs_Devices_Comms_Suite/ini/android_0019.ini
./PLATFORM/android/build/integration/android/suites/android_Prefs_Devices_Comms_Suite/ini/android_0150.ini
./PLATFORM/android/build/integration/android/suites/android_Prefs_Devices_Comms_Suite/ini/android_0616_1.ini
./PLATFORM/android/build/integration/android/suites/android_SI_Query_Suite/ini/android_0547_4.ini
./PLATFORM/android/build/integration/android/suites/android_SI_Query_Suite/ini/android_0578.ini
./PLATFORM/android/build/integration/android/suites/android_PDL_Suite/ini/android_5203_1.ini
./PLATFORM/android/build/integration/android/suites/android_System_Maintenance_Suite/ini/android_0579_2.ini
Any idea how to read one line at a time from input.files and execute a grep command on that ?
cat input.files -exec grep -w HTC_One {} \;
There won't be any difference in the result if you say:
grep pattern file1
grep pattern file2
or
grep pattern file1 file2
Simply, use xargs:
cat input.files | xargs grep -w HTC_One
Use xargs instead:
cat input.files | xargs grep -w HTC_One {} \;
From the xargs man page:
xargs reads items
from the standard input, delimited by blanks (which can be protected
with double or single quotes or a backslash) or newlines, and executes
the command (default is /bin/echo) one or more times with any initial-
arguments followed by items read from standard input.

How to find text files not containing text on Linux?

How do I find files not containing some text on Linux? Basically I'm looking for the inverse of the following
find . -print | xargs grep -iL "somestring"
The command you quote, ironically enough does exactly what you describe.
Test it!
echo "hello" > a
echo "bye" > b
grep -iL BYE a b
Says a only.
I think you may be confusing -L and -l
find . -print | xargs grep -iL "somestring"
is the inverse of
find . -print | xargs grep -il "somestring"
By the way, consider
find . -print0 | xargs -0 grep -iL "somestring"
Or even
grep -IRiL "somestring" .
You can do it with grep alone (without find).
grep -riL "somestring" .
This is the explanation of the parameters used on grep
-L, --files-without-match
each file processed.
-R, -r, --recursive
Recursively search subdirectories listed.
-i, --ignore-case
Perform case insensitive matching.
If you use l lowercase you will get the opposite (files with matches)
-l, --files-with-matches
Only the names of files containing selected lines are written
Find the markdown file through find and grep to find the mismatch
$ find. -name '* .md' -print0 | xargs -0 grep -iL "title"
Directly use grep's -L to search for files that only contain markdown files and no titles
$ grep -iL "title" -r ./* --include '* .md'
If you use "find" the script do "grep" also in folder:
[root#vps test]# find | xargs grep -Li 1234
grep: .: Is a directory
.
./test.txt
./test2.txt
[root#vps test]#
Use the "grep" directly:
# grep -Li 1234 /root/test/*
/root/test/test2.txt
/root/test/test.txt
[root#vps test]#
or specify in "find" the options "-type f"...even if you use the find you will put more time (first the list of files and then make the grep).

Delete files with string found in file - Linux cli

I am trying to delete erroneous emails based on finding the email address in the file via Linux CLI.
I can get the files with
find . | xargs grep -l email#example.com
But I cannot figure out how to delete them from there as the following code doesn't work.
rm -f | xargs find . | xargs grep -l email#example.com
Solution for your command:
grep -l email#example.com * | xargs rm
Or
for file in $(grep -l email#example.com *); do
rm -i $file;
# ^ prompt for delete
done
For safety I normally pipe the output from find to something like awk and create a batch file with each line being "rm filename"
That way you can check it before actually running it and manually fix any odd edge cases that are difficult to do with a regex
find . | xargs grep -l email#example.com | awk '{print "rm "$1}' > doit.sh
vi doit.sh // check for murphy and his law
source doit.sh
You can use find's -exec and -delete, it will only delete the file if the grep command succeeds. Using grep -q so it wouldn't print anything, you can replace the -q with -l to see which files had the string in them.
find . -exec grep -q 'email#example.com' '{}' \; -delete
I liked Martin Beckett's solution but found that file names with spaces could trip it up (like who uses spaces in file names, pfft :D). Also I wanted to review what was matched so I move the matched files to a local folder instead of just deleting them with the 'rm' command:
# Make a folder in the current directory to put the matched files
$ mkdir -p './matched-files'
# Create a script to move files that match the grep
# NOTE: Remove "-name '*.txt'" to allow all file extensions to be searched.
# NOTE: Edit the grep argument 'something' to what you want to search for.
$ find . -name '*.txt' -print0 | xargs -0 grep -al 'something' | awk -F '\n' '{ print "mv \""$0"\" ./matched-files" }' > doit.sh
Or because its possible (in Linux, idk about other OS's) to have newlines in a file name you can use this longer, untested if works better (who puts newlines in filenames? pfft :D), version:
$ find . -name '*.txt' -print0 | xargs -0 grep -alZ 'something' | awk -F '\0' '{ for (x=1; x<NF; x++) print "mv \""$x"\" ./matched-files" }' > doit.sh
# Evaluate the file following the 'source' command as a list of commands executed in the current context:
$ source doit.sh
NOTE: I had issues where grep could not match inside files that had utf-16 encoding.
See here for a workaround. In case that website disappears what you do is use grep's -a flag which makes grep treat files as text and use a regex pattern that matches any first-byte in each extended character. For example to match Entité do this:
grep -a 'Entit.e'
and if that doesn't work then try this:
grep -a 'E.n.t.i.t.e'
Despite Martin's safe answer, if you've got certainty of what you want to delete, such as in writing a script, I've used this with greater success than any other one-liner suggested before around here:
$ find . | grep -l email#example.com | xargs -I {} rm -rf {}
But I rather find by name:
$ find . -iname *something* | xargs -I {} echo {}
rm -f `find . | xargs grep -li email#example.com`
does the job better. Use `...` to run the command to offer the file names containing email.#example.com (grep -l lists them, -i ignores case) to remove them with rm (-f forcibly / -i interactively).
find . | xargs grep -l email#example.com
how to remove:
rm -f 'find . | xargs grep -l email#example.com'
Quick and efficent. Replace find_files_having_this_text with the text you want to search.
grep -Ril 'find_files_having_this_text' . | xargs rm

Resources