Read an input file one line at a time to pass as input to grep - linux

My requirement is to read a file and then run a grep command on the read line one at a time for all the lines in the file.
Filtering the required file which matches a pattern
find . -name *.ini -exec grep -w HTC {} \; -print | grep ini > input.files
cat input.files
./PLATFORM/android/build/integration/android/suites/android_Prefs_Devices_Comms_Suite/ini/android_0019.ini
./PLATFORM/android/build/integration/android/suites/android_Prefs_Devices_Comms_Suite/ini/android_0150.ini
./PLATFORM/android/build/integration/android/suites/android_Prefs_Devices_Comms_Suite/ini/android_0616_1.ini
./PLATFORM/android/build/integration/android/suites/android_SI_Query_Suite/ini/android_0547_4.ini
./PLATFORM/android/build/integration/android/suites/android_SI_Query_Suite/ini/android_0578.ini
./PLATFORM/android/build/integration/android/suites/android_PDL_Suite/ini/android_5203_1.ini
./PLATFORM/android/build/integration/android/suites/android_System_Maintenance_Suite/ini/android_0579_2.ini
Any idea how to read one line at a time from input.files and execute a grep command on that ?
cat input.files -exec grep -w HTC_One {} \;

There won't be any difference in the result if you say:
grep pattern file1
grep pattern file2
or
grep pattern file1 file2
Simply, use xargs:
cat input.files | xargs grep -w HTC_One

Use xargs instead:
cat input.files | xargs grep -w HTC_One {} \;
From the xargs man page:
xargs reads items
from the standard input, delimited by blanks (which can be protected
with double or single quotes or a backslash) or newlines, and executes
the command (default is /bin/echo) one or more times with any initial-
arguments followed by items read from standard input.

Related

How to search for multiple patterns in multiple files using find or sed

grep -Zrl '$pattern1' /path/of/file | xargs -0 grep -rlZ '$pattern2' | xargs -0 grep -l '$pattern3' | xargs grep --color -C1 -E "$pattern1|$pattern2|$pattern3"
how to write the above command using sed or find.
The above command is basically searching 3 patterns at same time in multiple files.
sed is Stream EDitor, might not be the best utility to use for searching patterns.
I'm guessing you're trying to grep 3 patterns in one set of files, which you already did in your last pipe:
grep --color -C1 -E "$pattern1|$pattern2|$pattern3"
I'd use find and grep together when I know there are some patterns in the filenames, then grep based on the results like:
find -iname '*pattern_in_filename*' -exec grep -E "$pattern1|$pattern2|$pattern3" {} ;

linux merge only subset of lines from multiple files

I have the following folder structure:
/drive1/180204_somerandomtext/file.csv
/drive1/180504_somerandomtext/file.csv
/drive1/190101_somerandomtext/file.csv
/drive1/190305_somerandomtext/file.csv
...
Each file.txt has the same structure, but contains different data. From a file.txt I want to extract only a subset of lines using the following command:
grep -A5000 -m1 -e 'Sample_ID,' /drive1/180204_somerandomtext/file.csv | tail -n+2
This command works and prints the next 5000 lines following the line that starts with 'Sample_ID,'
I've extended this command
grep -A5000 -m1 -e 'Sample_ID,' /drive1/180204_somerandomtext/file.csv | tail -n+2 | sed 's/^/180204_somerandomtext,/'
Using the 'sed' I now add to the beginning of each line the pattern '180204_somerandomtext', which is actually the name of the folder that contains the file.csv
I'm now stuck at the following steps:
how to do this for all file.csv files in the subfolders of drive1
how to store this result in one large file called 'samples.csv'
I've tried something with xargs. It works with the grep command, but piping the sed, isn't working then.
find /drive1/ -maxdepth 1 -name '1*' | cut -d '/' -f2 | xargs -I {} grep -A5000 -m1 -e 'Sample_ID,' /drive1/{}/file.csv | sed 's/^/{},/'
I'm also not a big fan of xargs, I'm finding find -exec much clearer to use, let me explain:
Imagine I would like to do something with a file file1.txt:
Command -sw1 param1 -sw2 param2.1 param2.2 file1.txt
Launch a command, and use switches sw1, sw2 with parameters param1, param2.1 and param2.2.
When I want to perform this for all file1.txt within a directory structure, I do the following:
find . -name "file1.txt" -exec Command -sw1 param1 -sw2 param2.1 param2.2 {} \;
So I just put the find command (with some information on where and what to find), and afterwards comes the -exec. After that -exec I put the exact command, where I replace my original filename by {} and I end the whole thing by \;.
In your case, it would be something like:
find /drive1 -name file.csv -exec grep -A5000 -m1 -e 'Sample_ID,' {} \;

Grep - How to concatenate filename to each returned line of file content?

I have a statement which
Finds a set of files
Cats their contents out
Then greps their contents
It is this pipeline:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name="
produces an output such as:
<start-node name="Start" secure="false"/>
<start-node name="Run" secure="false"/>
What I was hoping to get is something like:
filename1-<start-node name="Start" secure="false"/>
filename2-<start-node name="Run" secure="false"/>
An easier may be to execute grep on the result of find, without xargs and cat:
grep -i "Test_" `find .` | grep -i "start-node name="
Because you cat all the files into a single stream, grep doesn't have any filename information. You want to give all the filenames to grep as arguments:
find ... | xargs grep "<start-node name=" /dev/null
Note two additional changes - I've dropped the -i flag, as it appears you're inspecting XML, and that's not case-insensitive; I've added /dev/null to the list of files, so that grep always has at least two files of input, even if find only gives one result. That's the portable way to get grep to print filenames.
Now, let's look at the find command. Instead of finding all files, then filtering through grep, we can use the -iregex predicate of GNU grep:
find . -iregex '.*Test_.*' \( -type 'f' -o -type 'l' \) | xargs grep ...
The mixed-case pattern suggests your filenames aren't really case-insensitive, and you might not want to grep symlinks (I'm sure you don't want directories and special files passed through), in which case you can simplify (and can use portable find again):
find . -name '*Test_*' -type 'f' | xargs grep ...
Now protect against the kind of filenames that trip up pipelines, and you have
find . -name '*Test_*' -type 'f' -print0 \
| xargs -0 grep -e "<start-node name=" -- /dev/null
Alternatively, if you have GNU grep, you don't need find at all:
grep --recursive --include '*[Tt]est_*' -e "<start-node name=" .
If you just need to count them:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name=" | awk 'BEGIN{n=0}{n=n+1;print "filename" n "-" $0}'
From man grep:
-H Always print filename headers with output lines.

xargs inconsistent behavior and -n1 parameter

I have a shell script
find . -name "*.java" -print0 | xargs -0 grep -Lz 'regular_expression'
which outputs file names not matching the regexp in this way:
file1.java
file2.java
...
The way I understand, it works as follows: find find needed files and concatenate their names with \0. Then xargs split the output of find with \0 and feeds them to grep one-by-one.
Then I wanted to add one more stage and get only basename of the files. I modified the script:
find . -name "*.java" -print0 | xargs -0 grep -LzZ 'regular_expression' | xargs -0 basename
but got an error. I started investigating and made an temporary output:
find . -name "*.java" -print0 | xargs -0 grep -LzZ 'regular_expression' | xargs -0 echo basename
and got this:
basename ./file1.java ./file2.java ./subdir/file1.java ./subdir/file2.java
So, the filenames were not split by \0. I can't get why they are split in case of xargs used with grep and not split in xargs with basename.
I got a workaround by using -n1 in the latter xargs. But still I don't understand why I needed it (given I didn't use in in the xargs with grep) and what this parameter does.
Hope you can explain to me what -n1 does and why I needed it in the latter usage and didn't need it in the former with grep.
-n1 tells xargs to run the given command once per argument.
So if you have something like
echo file1 file2 file2 | xargs basename
That's equivalent to
basename file1 file2 file2
But if you do
echo file1 file2 file2 | xargs -n1 basename
That will cause xargs to run:
basename file1
basename file2
basename file2
As for xargs's -0 flag, that's an alias to the --null option which tells xargs to split on \0 instead of the default whitespace. You needed it after the find because the find put in \0 with -print0, but the result of grep is plain whitespace separated tokens.
The filenames were split by \0. The difference is in the commands you're using. xargs normally takes its standard input, breaks it into a list (here, by splitting on NUL), and then passes that list as extra arguments to your command. So when you do this:
find . -name "*.java" -print0 | xargs -0 grep -Lz 'regular_expression'
What actually runs is this:
grep -Lz 'regular_expression' file1.java file2.java file3.java...
Here, the -z doesn't matter because it only affects how grep reads stdin, and you're not sending anything to its stdin.
So, when you add another xargs that runs basename, you get this:
basename file1.java file2.java file3.java...
But while grep will take any number of filename arguments, basename only takes one and ignores the others.
That's where -n 1 comes in: it tells xargs to break its list of arguments into chunks (of 1), and run the command multiple times. So what runs now is:
basename file1.java
basename file2.java
basename file3.java
...
And all the output is concatenated together onto stdout.

Unix Command to List files containing string but *NOT* containing another string

How do I recursively view a list of files that has one string and specifically doesn't have another string? Also, I mean to evaluate the text of the files, not the filenames.
Conclusion:
As per comments, I ended up using:
find . -name "*.html" -exec grep -lR 'base\-maps' {} \; | xargs grep -L 'base\-maps\-bot'
This returned files with "base-maps" and not "base-maps-bot". Thank you!!
Try this:
grep -rl <string-to-match> | xargs grep -L <string-not-to-match>
Explanation: grep -lr makes grep recursively (r) output a list (l) of all files that contain <string-to-match>. xargs loops over these files, calling grep -L on each one of them. grep -L will only output the filename when the file does not contain <string-not-to-match>.
The use of xargs in the answers above is not necessary; you can achieve the same thing like this:
find . -type f -exec grep -q <string-to-match> {} \; -not -exec grep -q <string-not-to-match> {} \; -print
grep -q means run quietly but return an exit code indicating whether a match was found; find can then use that exit code to determine whether to keep executing the rest of its options. If -exec grep -q <string-to-match> {} \; returns 0, then it will go on to execute -not -exec grep -q <string-not-to-match>{} \;. If that also returns 0, it will go on to execute -print, which prints the name of the file.
As another answer has noted, using find in this way has major advantages over grep -Rl where you only want to search files of a certain type. If, on the other hand, you really want to search all files, grep -Rl is probably quicker, as it uses one grep process to perform the first filter for all files, instead of a separate grep process for each file.
These answers seem off as the match BOTH strings. The following command should work better:
grep -l <string-to-match> * | xargs grep -c <string-not-to-match> | grep '\:0'
Here is a more generic construction:
find . -name <nameFilter> -print0 | xargs -0 grep -Z -l <patternYes> | xargs -0 grep -L <patternNo>
This command outputs files whose name matches <nameFilter> (adjust find predicates as you need) which contain <patternYes>, but do not contain <patternNo>.
The enhancements are:
It works with filenames containing whitespace.
It lets you filter files by name.
If you don't need to filter by name (one often wants to consider all the files in current directory), you can strip find and add -R to the first grep:
grep -R -Z -l <patternYes> | xargs -0 grep -L <patternNo>
find . -maxdepth 1 -name "*.py" -exec grep -L "string-not-to-match" {} \;
This Command will get all ".py" files that don't contain "string-not-to-match" at same directory.
To match string A and exclude strings B & C being present in the same line I use, and quotes to allow search string to contain a space
grep -r <string A> | grep -v -e <string B> -e "<string C>" | awk -F ':' '{print $1}'
Explanation: grep -r recursively filters all lines matching in output format
filename: line
To exclude (grep -v) from those lines the ones that also contain either -e string B or -e string C. awk is used to print only the first field (the filename) using the colon as fieldseparator -F

Resources