Why 'grep -l' is not matching all files containing a certain string? - linux

From time to time I face a very weird behavior with find + grep commands. I am asking this because I haven't found anything related to this.
At my work I often have to perform a considerable search against a high amount of logs, looking for a certain string.
Due to its excellent performance I trust heavily on the command grep -l to execute this.
I use commands like this:
find . -type f -name "*log*" -exec grep -l STRING {} \; 2>/dev/null
I also have a multi-thread program that use find + grep -l in a parallel way.
The problem is that sometimes some files are not found during the search, even though they contain the string I am interested in. Then, when I execute the same command for a second time, the searching works and show me all the files I am interested in.
This seems to be a very intermittent issue and I have no idea of what I should check.
Any idea of what could cause that? Could it be a problem with find that set parameters for grep command? Is it a grep problem? May it be related to the high amount of files we search on a certain time.
Thanks.

Related

Find command - How to cut short after finding first file

I am using the find command to check if a certain pattern of
file exists within a directory tree. (Note, anywhere down the tree)
Once the first file is found the checking can stop because the answer is "yes".
How can I stop "find" from continuing the unnecessary search for other files?
Limiting -maxdepth does not work for the obvious reason that I am checking
any where down the tree.
I tried -exec exit ; and -exec quit ;
Hoping there was a linux command to call via -exec that would stop processing.
Should I write a script (to call via -exec above) that kills the find process
but continues running my script?
Additional detail: I am calling find from a perl script.
I don't necessarily have to use 'find' if there are other tools.
I may have to resolve to walking the dir-path via a longer perl script that I can control
when to stop.
I also looked into -prune option but it seems to be valid only up front (globally)
and can't change it in the middle of processing.
This was one instance of my find command that worked and returned all occurrences of the file pattern.
find /media/pk/mm2020A1/00.staging /media/pk/mm2020A1/x.folders -name hevc -o -name 'HEVC' -o -name '265'
It sounds like you want something along the lines of
find . -name '*.csv' | wc -l
and then ask whether that is -gt 0,
with the detail that we'd like to exit
early if possible, to conserve compute resources.
Well, here's a start:
find . -name '*.csv' | head -1
It doesn't exactly bail after finding first match,
since there's a race condition,
but it keeps you from spending two minutes
recursing down a deep directory tree.
In particular, after receiving 1st result head
will close() stdin, so find won't be able
to write to stdout, and it soon will exit.
I don't know your business use case.
But you may find it convenient and performant
to record find . -ls | sort > files.txt
every now and again,
and have your script consult that file.
It typically takes less time to access those stored results
than to re-run find, that is, to once again
recurse through the directory trees.
Why? It's a random I/O versus sequential access story.
You can exit earlier if you adopt
this
technique:
use Path::Class;
dir('.')->recurse( ...

My server not responding to grep

I am trying to use this command on my server
grep -lr --include=*.php "eval(base64_decode" /path/to/webroot
Absolutely nothing happens, no response from the server.
Can anyone help me out?
I am not an experienced Linux user.
The GNU folks messed up when they gave grep arguments to recursively search for files. Forget you ever heard of -r or --include and rewrite your script to use find to find the files and grep to Globally search for a Regular Expression and Print (g/re/p) the result from each file (see the huge clues in the tool names?). For example:
find /path/to/webroot -name '*.php' -print0 |
xargs -0 grep -l 'eval(base64_decode'
If that still gives you an issue then step 1 in debugging it is to run the find on it's own and see if it produces a list of files. If so, then step 2 is to run the grep alone on one of the files output by find. If you can't figure it out from that, let us know.

Multithreaded Bash in while loop

I have the following Bash one liner which should iterate through all the files in the folder named *.xml , check if they have the below string, and if not, rename them to $.empty
find -name '*.xml' | xargs -I{} grep -LZ "state=\"open\"" {} | while IFS= read -rd '' x; do mv "$x" "$x".empty ; done
this process is very slow, and when running this script in folders with over 100k files, it takes well over 15 minutes to complete.
I couldn't find a way to make this process to run multithreadly.
Note that in for loop im hitting the "too many arguments" errors, due to the large number of files.
Can anyone think of a solution ?
Thanks !
Roy
The biggest bottleneck in your code is that you are running a separate mv process (which is just a wrapper around a system call) to rename each file. Let's say you have 100,000 files, and 20,000 of them need to be renamed. Your original code will need 120,000 processes, one grep per file and one mv per rename. (Ignoring the 2 calls to find and xargs.)
A better approach would be to use a language than can access the system call directly. Here is a simple Perl example:
find -name '*.xml' | xargs -I{} grep -LZ "state=\"open\"" {} |
perl -n0e 'rename("$_", "$_.empty")'
This replaces 20,000 calls to mv with a single call to perl.
The other bottleneck is running a single grep process for each file. Instead, you'd like to pass as many files as possible to grep each time. There is no need for xargs here; use the -exec primary to find instead.
find -name '*.xml' -exec grep -LZ "state=\"open\"" {} + |
perl -n0e 'rename("$_", "$_.empty")'
The too many arguments error you were receiving is based on total argument length. Suppose the limit is 4096, and your XML files have an average name length of 20 characters. This means you should be able to pass 200+ files to each call to grep. The -exec ... + primary takes care of passing as many files as possible to each call to grep, so this code at most will require 100,000 / 200 = 500 calls to grep, a vast improvement.
Depending on the size of the files, it might be faster to read each file in the Perl process to check for the string to match. However, grep is very well optimized, and the code to do so, while not terribly complicated, is still more than you can comfortably write in a one-liner. This should be a good balance between speed and simplicity.

Unix/Bash: Redirect results of find command so files are used as input for other command

I've got a directory structure that contains many different files named foo.sql. I want to be able to cd into this directory & issue a command like the following:
find . -name "foo.sql" -exec mysql -uUserName -pUserPasswd < {} \;
where {} is the relative path to each foo.sql file. Basically, I want:
mysql -uUserName -pUserPasswd < path/to/foo.sql
to be run once for each foo.sql file under my subdirectory. I've tried Google & it's been not much help. Ideally this would be part of a UNIX shell script.
Thanks in advance, & sorry if it's been asked before.
The -exec option doesn't run a shell, so it can't process shell operators like redirection. Try this:
find . -name "foo.sql" -exec cat {} + | mysql -uUserName -pUserPasswd
cat {} will write the contents of all the files to the pipe, which will then be read by mysql.
Or, just to point out another approach:
find . | xargs cat | mysql etcetera
xargs is a generic pipe operation roughly equivalent to find's '-exec'. It has some advantages, some disadvantages, depending on what you're doing. Intend to use it because i'm often filtering the list of found files in an earlier pipeline stage before operating on them.
There are also other ways of assembling such command lines. One nice thing about Unix's generic toolkits is that there are usually multiple solutions, each with its own tradeoffs.

Find in Linux combined with a search to return a particular line

I'm trying to return a particular line from files found from this search:
find . -name "database.php"
Each of these files contains a database name, next to a php variable like $dname=
I've been trying to use -exec to execute a grep search on this file with no success
-exec "grep {\}\ dbname"
Can anyone provide me with some understanding of how to accomplish this task?
I'm running CentOS 5, and there are about 100 database.php files stored in subdirectories on my server.
Thanks
Jason
You have the arguments to grep inverted, and you need them as separate arguments:
find . -name "database.php" -exec grep '$dbname' /dev/null {} +
The presence of /dev/null ensures that the file name(s) that match are listed as well as the lines that match.
I think this will do it. Not sure if you need to make any adjustments for CentOS.
find . -name "database.php" -exec grep dbname {} \;
I worked it out using xargs
find . -name "database.php" -print | xargs grep \'database\'\=\> > list_of_databases
Feel free to post a better way if you find one (or what some rep for a good answer)
I tend to habitually avoid find because I've never learned how to use it properly, so the way I'd accomplish your task would be:
grep dbname **/database.php
Edit: This command won't be viable in all cases because it can potentially generate a very long argument list, whereas find executes its command on found files one by one like xargs. And, as I noted in my comment, it's possibly not very portable. But it's damn short ;)

Resources