xargs inconsistent behavior and -n1 parameter - linux

I have a shell script
find . -name "*.java" -print0 | xargs -0 grep -Lz 'regular_expression'
which outputs file names not matching the regexp in this way:
file1.java
file2.java
...
The way I understand, it works as follows: find find needed files and concatenate their names with \0. Then xargs split the output of find with \0 and feeds them to grep one-by-one.
Then I wanted to add one more stage and get only basename of the files. I modified the script:
find . -name "*.java" -print0 | xargs -0 grep -LzZ 'regular_expression' | xargs -0 basename
but got an error. I started investigating and made an temporary output:
find . -name "*.java" -print0 | xargs -0 grep -LzZ 'regular_expression' | xargs -0 echo basename
and got this:
basename ./file1.java ./file2.java ./subdir/file1.java ./subdir/file2.java
So, the filenames were not split by \0. I can't get why they are split in case of xargs used with grep and not split in xargs with basename.
I got a workaround by using -n1 in the latter xargs. But still I don't understand why I needed it (given I didn't use in in the xargs with grep) and what this parameter does.
Hope you can explain to me what -n1 does and why I needed it in the latter usage and didn't need it in the former with grep.

-n1 tells xargs to run the given command once per argument.
So if you have something like
echo file1 file2 file2 | xargs basename
That's equivalent to
basename file1 file2 file2
But if you do
echo file1 file2 file2 | xargs -n1 basename
That will cause xargs to run:
basename file1
basename file2
basename file2
As for xargs's -0 flag, that's an alias to the --null option which tells xargs to split on \0 instead of the default whitespace. You needed it after the find because the find put in \0 with -print0, but the result of grep is plain whitespace separated tokens.

The filenames were split by \0. The difference is in the commands you're using. xargs normally takes its standard input, breaks it into a list (here, by splitting on NUL), and then passes that list as extra arguments to your command. So when you do this:
find . -name "*.java" -print0 | xargs -0 grep -Lz 'regular_expression'
What actually runs is this:
grep -Lz 'regular_expression' file1.java file2.java file3.java...
Here, the -z doesn't matter because it only affects how grep reads stdin, and you're not sending anything to its stdin.
So, when you add another xargs that runs basename, you get this:
basename file1.java file2.java file3.java...
But while grep will take any number of filename arguments, basename only takes one and ignores the others.
That's where -n 1 comes in: it tells xargs to break its list of arguments into chunks (of 1), and run the command multiple times. So what runs now is:
basename file1.java
basename file2.java
basename file3.java
...
And all the output is concatenated together onto stdout.

Related

how do I change string in all sub directories with same file name (For eg: data.txt) in linux using termianl?

find . -name "data.txt" -print0 | grep -rl "pa028" ./ |xargs -0 sed -i '' -e 's/pa028/pa014/g'
I tried to replace pa028 with pa014 in the file name "data.txt" in all subdirectories. Can you find please correct me?
You can't put grep between find -print0 and xargs -0 because grep operates on lines, and this pipeline contains null-separated text instead of lines. Additionally, grep -r . will ignore the standard input you so expensively set up find to produce.
find . -name "data.txt" -exec grep -q "pa028" {} \; -print0 |
xargs -r -0 sed -i '' -e 's/pa028/pa014/g'
The logic here is to use -exec grep -q as a predicate to find so we produce a null-terminated list of matching files (for which the -exec returns true) to pass to xargs -r -0. (The -r option is important, too; you get weird errors if xargs runs anyway even though find produced no output.)
There is an extension to GNU grep to operate on null-terminated strings with -z and print null-terminated file names with -Z -l but that's a fairly recent development, so I'm not yet prepared to recommend that.

Grep - How to concatenate filename to each returned line of file content?

I have a statement which
Finds a set of files
Cats their contents out
Then greps their contents
It is this pipeline:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name="
produces an output such as:
<start-node name="Start" secure="false"/>
<start-node name="Run" secure="false"/>
What I was hoping to get is something like:
filename1-<start-node name="Start" secure="false"/>
filename2-<start-node name="Run" secure="false"/>
An easier may be to execute grep on the result of find, without xargs and cat:
grep -i "Test_" `find .` | grep -i "start-node name="
Because you cat all the files into a single stream, grep doesn't have any filename information. You want to give all the filenames to grep as arguments:
find ... | xargs grep "<start-node name=" /dev/null
Note two additional changes - I've dropped the -i flag, as it appears you're inspecting XML, and that's not case-insensitive; I've added /dev/null to the list of files, so that grep always has at least two files of input, even if find only gives one result. That's the portable way to get grep to print filenames.
Now, let's look at the find command. Instead of finding all files, then filtering through grep, we can use the -iregex predicate of GNU grep:
find . -iregex '.*Test_.*' \( -type 'f' -o -type 'l' \) | xargs grep ...
The mixed-case pattern suggests your filenames aren't really case-insensitive, and you might not want to grep symlinks (I'm sure you don't want directories and special files passed through), in which case you can simplify (and can use portable find again):
find . -name '*Test_*' -type 'f' | xargs grep ...
Now protect against the kind of filenames that trip up pipelines, and you have
find . -name '*Test_*' -type 'f' -print0 \
| xargs -0 grep -e "<start-node name=" -- /dev/null
Alternatively, if you have GNU grep, you don't need find at all:
grep --recursive --include '*[Tt]est_*' -e "<start-node name=" .
If you just need to count them:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name=" | awk 'BEGIN{n=0}{n=n+1;print "filename" n "-" $0}'
From man grep:
-H Always print filename headers with output lines.

Recursively locate all files that have string "a" AND string "b" using grep

I've been using the following command to recursively search directories for a string.
grep -Rn "myString" *
I was wondering if someone would be so kind as to teach me how to search for multiple
strings in the same file recursively. That is, I want to locate all file names that have both "String1" and "String2."
If I could know the line number of each string within the file that contains both strings as well that would be great.
I've been trying several things without success. I want to start the search in a base directory and recursively search downward through all the subdirectories. If someone could help me with this, I would greatly appreciate it.
Pipe the results of your first search to grep again:
grep -RlZ "String1" . | xargs -0 grep -l "String2"
This would list the files containing both String1 and String2.
Getting the line numbers for the files containing both the strings wouldn't be probably very efficient since you need to know that a priori. One way would be to again pipe the results to grep:
grep -RlZ "String1" . | xargs -0 grep -lZ "String2" | xargs -0 grep -En 'String1|String2'
You can have find cascade the checks for you:
find . -type f -exec fgrep -q 'myString1' {} \; \
-exec fgrep -q 'myString2' {} \; \
-exec fgrep -q 'myString3' {} \; \
-print
grep --null -rl String1 . | xargs -0 grep --null -l String2 | xargs -0 grep -n -e String1 -e String2
There are a few ways to do this, but since you need files with both matching strings, you can find filenames with one match, then rescan them for the second. The first grep finds filenames with the first pattern; the second re-scans those files for the second string. Finally, a third grep prints out line numbers with matches.

Read an input file one line at a time to pass as input to grep

My requirement is to read a file and then run a grep command on the read line one at a time for all the lines in the file.
Filtering the required file which matches a pattern
find . -name *.ini -exec grep -w HTC {} \; -print | grep ini > input.files
cat input.files
./PLATFORM/android/build/integration/android/suites/android_Prefs_Devices_Comms_Suite/ini/android_0019.ini
./PLATFORM/android/build/integration/android/suites/android_Prefs_Devices_Comms_Suite/ini/android_0150.ini
./PLATFORM/android/build/integration/android/suites/android_Prefs_Devices_Comms_Suite/ini/android_0616_1.ini
./PLATFORM/android/build/integration/android/suites/android_SI_Query_Suite/ini/android_0547_4.ini
./PLATFORM/android/build/integration/android/suites/android_SI_Query_Suite/ini/android_0578.ini
./PLATFORM/android/build/integration/android/suites/android_PDL_Suite/ini/android_5203_1.ini
./PLATFORM/android/build/integration/android/suites/android_System_Maintenance_Suite/ini/android_0579_2.ini
Any idea how to read one line at a time from input.files and execute a grep command on that ?
cat input.files -exec grep -w HTC_One {} \;
There won't be any difference in the result if you say:
grep pattern file1
grep pattern file2
or
grep pattern file1 file2
Simply, use xargs:
cat input.files | xargs grep -w HTC_One
Use xargs instead:
cat input.files | xargs grep -w HTC_One {} \;
From the xargs man page:
xargs reads items
from the standard input, delimited by blanks (which can be protected
with double or single quotes or a backslash) or newlines, and executes
the command (default is /bin/echo) one or more times with any initial-
arguments followed by items read from standard input.

Unix Command to List files containing string but *NOT* containing another string

How do I recursively view a list of files that has one string and specifically doesn't have another string? Also, I mean to evaluate the text of the files, not the filenames.
Conclusion:
As per comments, I ended up using:
find . -name "*.html" -exec grep -lR 'base\-maps' {} \; | xargs grep -L 'base\-maps\-bot'
This returned files with "base-maps" and not "base-maps-bot". Thank you!!
Try this:
grep -rl <string-to-match> | xargs grep -L <string-not-to-match>
Explanation: grep -lr makes grep recursively (r) output a list (l) of all files that contain <string-to-match>. xargs loops over these files, calling grep -L on each one of them. grep -L will only output the filename when the file does not contain <string-not-to-match>.
The use of xargs in the answers above is not necessary; you can achieve the same thing like this:
find . -type f -exec grep -q <string-to-match> {} \; -not -exec grep -q <string-not-to-match> {} \; -print
grep -q means run quietly but return an exit code indicating whether a match was found; find can then use that exit code to determine whether to keep executing the rest of its options. If -exec grep -q <string-to-match> {} \; returns 0, then it will go on to execute -not -exec grep -q <string-not-to-match>{} \;. If that also returns 0, it will go on to execute -print, which prints the name of the file.
As another answer has noted, using find in this way has major advantages over grep -Rl where you only want to search files of a certain type. If, on the other hand, you really want to search all files, grep -Rl is probably quicker, as it uses one grep process to perform the first filter for all files, instead of a separate grep process for each file.
These answers seem off as the match BOTH strings. The following command should work better:
grep -l <string-to-match> * | xargs grep -c <string-not-to-match> | grep '\:0'
Here is a more generic construction:
find . -name <nameFilter> -print0 | xargs -0 grep -Z -l <patternYes> | xargs -0 grep -L <patternNo>
This command outputs files whose name matches <nameFilter> (adjust find predicates as you need) which contain <patternYes>, but do not contain <patternNo>.
The enhancements are:
It works with filenames containing whitespace.
It lets you filter files by name.
If you don't need to filter by name (one often wants to consider all the files in current directory), you can strip find and add -R to the first grep:
grep -R -Z -l <patternYes> | xargs -0 grep -L <patternNo>
find . -maxdepth 1 -name "*.py" -exec grep -L "string-not-to-match" {} \;
This Command will get all ".py" files that don't contain "string-not-to-match" at same directory.
To match string A and exclude strings B & C being present in the same line I use, and quotes to allow search string to contain a space
grep -r <string A> | grep -v -e <string B> -e "<string C>" | awk -F ':' '{print $1}'
Explanation: grep -r recursively filters all lines matching in output format
filename: line
To exclude (grep -v) from those lines the ones that also contain either -e string B or -e string C. awk is used to print only the first field (the filename) using the colon as fieldseparator -F

Resources