How can I grep while avoiding 'Too many arguments' [duplicate] - linux

This question already has answers here:
Argument list too long error for rm, cp, mv commands
(31 answers)
Closed 7 years ago.
I was trying to clean out some spam email and ran into an issue. The amount of files in queue, were so large that my usual command was unable to process. It would give me an error about too many arguments.
I usually do this
grep -i user#domain.com 1US* | awk -F: '{print $1}' | xargs rm
1US* can be anything between 1US[a-zA-Z]. The only thing I could make work was running this horrible contraption. Its one file, with 1USa, 1USA, 1USb etc, through the entire alphabet. I know their has to be a way to run this more efficiently.
grep -s $SPAMMER /var/mailcleaner/spool/exim_stage1/input/1USa* | awk -F: '{print $1}' | xargs rm
grep -s $SPAMMER /var/mailcleaner/spool/exim_stage1/input/1USA* | awk -F: '{print $1}' | xargs rm

Run several instances of grep. Instead of
grep -i user#domain.com 1US* | awk '{...}' | xargs rm
do
(for i in 1US*; do grep -li user#domain "$i"; done) | xargs rm
Note the -l flag, since we only want the file name of the match. This will both speed up grep (terminate on first match) and makes your awk script unrequired. This could be improved by checking the return status of grep and calling rm, not using xargs (xargs is very fragile, IMO). I'll give you the better version if you ask.
Hope it helps.

you can use find to find all files which name's starting with the pattern '1US'. Then you can pipe the output to xargs which will take care, that the argument list will not growing to much and handle the grep call. Note that I've used a nullbyte to separate filenames for xargs. This avoids problems with problematic file names. ;)
find -maxdepth 1 -name '1US*' -printf '%f\0' | xargs -0 grep -u user#domain | awk ...

The -exec argument to find is useful here, I've used this myself in similar situations.
E.g.
# List the files that match
find /path/to/input/ -type f -exec grep -qiF spammer#spammy.com \{\} \; -print
# Once you're sure you've got it right
find /path/to/input/ -type f -exec grep -qiF spammer#spammy.com \{\} \; -delete

Using xargs is more efficient than using "find ... -exec grep" because you have less process creations etc.
One way to go about this would be:
ls 1US* | xargs grep -i user#domain.com | awk -F: '{print $1}' | xargs rm
But easier would be:
find . -iname "1US*" -exec rm {} \;

Use find and a loop instead of xargs.
find . -name '1US*' | \
while read x; do grep -iq user#domain "$x" && rm "$x"; done
This uses pipes and loops instead of arguments (both for grep and rm) and prevents issues related with limits on arguments.

Related

Grep - How to concatenate filename to each returned line of file content?

I have a statement which
Finds a set of files
Cats their contents out
Then greps their contents
It is this pipeline:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name="
produces an output such as:
<start-node name="Start" secure="false"/>
<start-node name="Run" secure="false"/>
What I was hoping to get is something like:
filename1-<start-node name="Start" secure="false"/>
filename2-<start-node name="Run" secure="false"/>
An easier may be to execute grep on the result of find, without xargs and cat:
grep -i "Test_" `find .` | grep -i "start-node name="
Because you cat all the files into a single stream, grep doesn't have any filename information. You want to give all the filenames to grep as arguments:
find ... | xargs grep "<start-node name=" /dev/null
Note two additional changes - I've dropped the -i flag, as it appears you're inspecting XML, and that's not case-insensitive; I've added /dev/null to the list of files, so that grep always has at least two files of input, even if find only gives one result. That's the portable way to get grep to print filenames.
Now, let's look at the find command. Instead of finding all files, then filtering through grep, we can use the -iregex predicate of GNU grep:
find . -iregex '.*Test_.*' \( -type 'f' -o -type 'l' \) | xargs grep ...
The mixed-case pattern suggests your filenames aren't really case-insensitive, and you might not want to grep symlinks (I'm sure you don't want directories and special files passed through), in which case you can simplify (and can use portable find again):
find . -name '*Test_*' -type 'f' | xargs grep ...
Now protect against the kind of filenames that trip up pipelines, and you have
find . -name '*Test_*' -type 'f' -print0 \
| xargs -0 grep -e "<start-node name=" -- /dev/null
Alternatively, if you have GNU grep, you don't need find at all:
grep --recursive --include '*[Tt]est_*' -e "<start-node name=" .
If you just need to count them:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name=" | awk 'BEGIN{n=0}{n=n+1;print "filename" n "-" $0}'
From man grep:
-H Always print filename headers with output lines.

Search directory and sub-directories for pattern in a file

In linux, I want to search the given directoy and its sub-folders/files for certain include and exclude pattern.
find /apps -exec grep "performance" -v "warn" {} /dev/null \;
This echoes loads of lines from which search goes trough. I don't want that, I'd like to find files containing performance which do not contain warn. How do I do that?
Very close to what you have already:
find /apps -exec grep "performance" {} /dev/null \; | grep -v "warn"
Just pipe the output through a second call to grep.
To find files containing performance but not warn, list the files containing performance, then filter out the ones that contain warn. You need separate calls to grep for each filter. Use the -l option to grep so that it only prints out file names and not matching lines. Use xargs to pass the file names from the first pass to the command line of the second-pass grep.
find /apps -type f -exec grep -l "performance" /dev/null {} + |
sed 's/[[:blank:]\"'\'']/\\&/g' |
xargs grep -lv "warn"
(The sed call in the middle is there because xargs expects a weirdly quoted input format that doesn't correspond to what any other command produces.)
Using -exec option of the find command is less effective than pipelining it to xargs:
find /apps -print0 | xargs -0 grep -n -v "warn" | grep "performance"
This, probably, also solves your problem with printing unwanted output. You will also probably want tu use the -name option to filter out specific files.
find /apps -name '*.ext' -print0 | xargs -0 grep -n -v "warn" | grep "performance"
If you want to find files that do not contain "warn" at all, grep -v is not what you want -- that prints all lines not containing "warn" but it will not tell you that the file (as a whole) does not contain "warn"
find /apps -type f -print0 | while read -r -d '' f; do
grep -q performance "$f" && ! grep -q warn "$f" && echo "$f"
done

Unix Command to List files containing string but *NOT* containing another string

How do I recursively view a list of files that has one string and specifically doesn't have another string? Also, I mean to evaluate the text of the files, not the filenames.
Conclusion:
As per comments, I ended up using:
find . -name "*.html" -exec grep -lR 'base\-maps' {} \; | xargs grep -L 'base\-maps\-bot'
This returned files with "base-maps" and not "base-maps-bot". Thank you!!
Try this:
grep -rl <string-to-match> | xargs grep -L <string-not-to-match>
Explanation: grep -lr makes grep recursively (r) output a list (l) of all files that contain <string-to-match>. xargs loops over these files, calling grep -L on each one of them. grep -L will only output the filename when the file does not contain <string-not-to-match>.
The use of xargs in the answers above is not necessary; you can achieve the same thing like this:
find . -type f -exec grep -q <string-to-match> {} \; -not -exec grep -q <string-not-to-match> {} \; -print
grep -q means run quietly but return an exit code indicating whether a match was found; find can then use that exit code to determine whether to keep executing the rest of its options. If -exec grep -q <string-to-match> {} \; returns 0, then it will go on to execute -not -exec grep -q <string-not-to-match>{} \;. If that also returns 0, it will go on to execute -print, which prints the name of the file.
As another answer has noted, using find in this way has major advantages over grep -Rl where you only want to search files of a certain type. If, on the other hand, you really want to search all files, grep -Rl is probably quicker, as it uses one grep process to perform the first filter for all files, instead of a separate grep process for each file.
These answers seem off as the match BOTH strings. The following command should work better:
grep -l <string-to-match> * | xargs grep -c <string-not-to-match> | grep '\:0'
Here is a more generic construction:
find . -name <nameFilter> -print0 | xargs -0 grep -Z -l <patternYes> | xargs -0 grep -L <patternNo>
This command outputs files whose name matches <nameFilter> (adjust find predicates as you need) which contain <patternYes>, but do not contain <patternNo>.
The enhancements are:
It works with filenames containing whitespace.
It lets you filter files by name.
If you don't need to filter by name (one often wants to consider all the files in current directory), you can strip find and add -R to the first grep:
grep -R -Z -l <patternYes> | xargs -0 grep -L <patternNo>
find . -maxdepth 1 -name "*.py" -exec grep -L "string-not-to-match" {} \;
This Command will get all ".py" files that don't contain "string-not-to-match" at same directory.
To match string A and exclude strings B & C being present in the same line I use, and quotes to allow search string to contain a space
grep -r <string A> | grep -v -e <string B> -e "<string C>" | awk -F ':' '{print $1}'
Explanation: grep -r recursively filters all lines matching in output format
filename: line
To exclude (grep -v) from those lines the ones that also contain either -e string B or -e string C. awk is used to print only the first field (the filename) using the colon as fieldseparator -F

Delete files with string found in file - Linux cli

I am trying to delete erroneous emails based on finding the email address in the file via Linux CLI.
I can get the files with
find . | xargs grep -l email#example.com
But I cannot figure out how to delete them from there as the following code doesn't work.
rm -f | xargs find . | xargs grep -l email#example.com
Solution for your command:
grep -l email#example.com * | xargs rm
Or
for file in $(grep -l email#example.com *); do
rm -i $file;
# ^ prompt for delete
done
For safety I normally pipe the output from find to something like awk and create a batch file with each line being "rm filename"
That way you can check it before actually running it and manually fix any odd edge cases that are difficult to do with a regex
find . | xargs grep -l email#example.com | awk '{print "rm "$1}' > doit.sh
vi doit.sh // check for murphy and his law
source doit.sh
You can use find's -exec and -delete, it will only delete the file if the grep command succeeds. Using grep -q so it wouldn't print anything, you can replace the -q with -l to see which files had the string in them.
find . -exec grep -q 'email#example.com' '{}' \; -delete
I liked Martin Beckett's solution but found that file names with spaces could trip it up (like who uses spaces in file names, pfft :D). Also I wanted to review what was matched so I move the matched files to a local folder instead of just deleting them with the 'rm' command:
# Make a folder in the current directory to put the matched files
$ mkdir -p './matched-files'
# Create a script to move files that match the grep
# NOTE: Remove "-name '*.txt'" to allow all file extensions to be searched.
# NOTE: Edit the grep argument 'something' to what you want to search for.
$ find . -name '*.txt' -print0 | xargs -0 grep -al 'something' | awk -F '\n' '{ print "mv \""$0"\" ./matched-files" }' > doit.sh
Or because its possible (in Linux, idk about other OS's) to have newlines in a file name you can use this longer, untested if works better (who puts newlines in filenames? pfft :D), version:
$ find . -name '*.txt' -print0 | xargs -0 grep -alZ 'something' | awk -F '\0' '{ for (x=1; x<NF; x++) print "mv \""$x"\" ./matched-files" }' > doit.sh
# Evaluate the file following the 'source' command as a list of commands executed in the current context:
$ source doit.sh
NOTE: I had issues where grep could not match inside files that had utf-16 encoding.
See here for a workaround. In case that website disappears what you do is use grep's -a flag which makes grep treat files as text and use a regex pattern that matches any first-byte in each extended character. For example to match Entité do this:
grep -a 'Entit.e'
and if that doesn't work then try this:
grep -a 'E.n.t.i.t.e'
Despite Martin's safe answer, if you've got certainty of what you want to delete, such as in writing a script, I've used this with greater success than any other one-liner suggested before around here:
$ find . | grep -l email#example.com | xargs -I {} rm -rf {}
But I rather find by name:
$ find . -iname *something* | xargs -I {} echo {}
rm -f `find . | xargs grep -li email#example.com`
does the job better. Use `...` to run the command to offer the file names containing email.#example.com (grep -l lists them, -i ignores case) to remove them with rm (-f forcibly / -i interactively).
find . | xargs grep -l email#example.com
how to remove:
rm -f 'find . | xargs grep -l email#example.com'
Quick and efficent. Replace find_files_having_this_text with the text you want to search.
grep -Ril 'find_files_having_this_text' . | xargs rm

Write output out of grep into a file on Linux?

find . -name "*.php" | xargs grep -i -n "searchstring" >output.txt
Here I am trying to write data into a file which is not happening...
How about appending results using >>?
find . -name "*.php" | xargs grep -i -n "searchstring" >> output.txt
I haven't got a Linux box with me right now, so I'll try to improvize.
the xargs grep -i -n "searchstring" bothers me a bit.
Perhaps you meant xargs -I {} grep -i "searchstring" {}, or just xargs grep -i "searchstring"?
Since -n as grep's argument will give you only number lines, I doubt this is what you needed.
This way, your final code would be
find . -name "*.php" | xargs grep -i "searchstring" >> output.txt
find . -name "*.php" -exec grep -i -n "function" {} \; >output.txt
But you won't know what file it came from. You might want:
find . -name "*.php" -exec grep -i -Hn "function" {} \; >output.txt
instead.
I guess that you have spaces in the php filenames. If you hand them to grep through xargs in the way that you do, the names get split into parts and grep interprets those parts as filenames which it then cannot find.
There is a solution for that. find has a -print0 option that instructs find to separate results by a NUL byte and xargs has a -0 option that instructs xargs to expect a NUL byte as separator. Using those you get:
find . -name "*.php" -print0 | xargs -0 grep -i -n "searchstring" > output.txt
Try using line-buffered
grep --line-buffered
[edit]
I ran your original command on my box and it seems to work fine, so I'm not sure anymore.
Looks fine to me. What happens if you remove >output.txt?
If you're searching trees of source code, please consider using ack. To do what you're doing in ack, regardless of there being spaces in filenames, you'd do:
ack --php -i searchstring > output.txt
I always use the following command. It displays the output on a console and also creates the file
grep -r "string to be searched" . 2>&1 | tee /your/path/to/file/filename.txt
Check free disk space by
$ df -Th
It could be not enough free space on your disk.

Resources