Find command - How to cut short after finding first file - linux

I am using the find command to check if a certain pattern of
file exists within a directory tree. (Note, anywhere down the tree)
Once the first file is found the checking can stop because the answer is "yes".
How can I stop "find" from continuing the unnecessary search for other files?
Limiting -maxdepth does not work for the obvious reason that I am checking
any where down the tree.
I tried -exec exit ; and -exec quit ;
Hoping there was a linux command to call via -exec that would stop processing.
Should I write a script (to call via -exec above) that kills the find process
but continues running my script?
Additional detail: I am calling find from a perl script.
I don't necessarily have to use 'find' if there are other tools.
I may have to resolve to walking the dir-path via a longer perl script that I can control
when to stop.
I also looked into -prune option but it seems to be valid only up front (globally)
and can't change it in the middle of processing.
This was one instance of my find command that worked and returned all occurrences of the file pattern.
find /media/pk/mm2020A1/00.staging /media/pk/mm2020A1/x.folders -name hevc -o -name 'HEVC' -o -name '265'

It sounds like you want something along the lines of
find . -name '*.csv' | wc -l
and then ask whether that is -gt 0,
with the detail that we'd like to exit
early if possible, to conserve compute resources.
Well, here's a start:
find . -name '*.csv' | head -1
It doesn't exactly bail after finding first match,
since there's a race condition,
but it keeps you from spending two minutes
recursing down a deep directory tree.
In particular, after receiving 1st result head
will close() stdin, so find won't be able
to write to stdout, and it soon will exit.
I don't know your business use case.
But you may find it convenient and performant
to record find . -ls | sort > files.txt
every now and again,
and have your script consult that file.
It typically takes less time to access those stored results
than to re-run find, that is, to once again
recurse through the directory trees.
Why? It's a random I/O versus sequential access story.
You can exit earlier if you adopt
this
technique:
use Path::Class;
dir('.')->recurse( ...

Related

how to efficiently find if a linux directory including sudirectories has at least 1 file

In my project various jobs are created as files in directories inside subdirectories.
But usually the case is I find that the jobs are mostly in some dirs and not in the most others
currently I use
find $DIR -type f | head -n 1
to know if the directory has atleast 1 file , but this is a waste
how to efficiently find if a linux directory including sudirectories has at least 1 file
Your code is already efficient, but perhaps the reason is not obvious. When you pipe the output of find to head -n 1 you probably assume that find lists all the files and then head discards everything after the first one. But that's not quite what head does.
When find lists the first file, head will print it, but when find lists the second file, head will terminate itself, which sends SIGPIPE to find because the pipe between them is closed. Then find will stop running, because the default signal handler for SIGPIPE terminates the program which receives it.
So the cost of your pipelined commands is only the cost of finding two files, not the cost of finding all files. For most obvious use cases this should be good enough.
Try this
find -type f -printf '%h\n' | uniq
The find part finds all files, but prints only the directory. The uniq part eliminates duplicates.
Pitfall: It doesn't work (like your example) for files containing a NEWLINE in the directory path.
This command finds the first subdiretory containing at least one file and then stop:
find . -mindepth 1 -type d -exec bash -c 'c=$(find {} -maxdepth 1 -type f -print -quit);test "x$c" != x' \; -print -quit
The first find iterates through all subdirectories and second find finds the first file and then stop.

Why 'grep -l' is not matching all files containing a certain string?

From time to time I face a very weird behavior with find + grep commands. I am asking this because I haven't found anything related to this.
At my work I often have to perform a considerable search against a high amount of logs, looking for a certain string.
Due to its excellent performance I trust heavily on the command grep -l to execute this.
I use commands like this:
find . -type f -name "*log*" -exec grep -l STRING {} \; 2>/dev/null
I also have a multi-thread program that use find + grep -l in a parallel way.
The problem is that sometimes some files are not found during the search, even though they contain the string I am interested in. Then, when I execute the same command for a second time, the searching works and show me all the files I am interested in.
This seems to be a very intermittent issue and I have no idea of what I should check.
Any idea of what could cause that? Could it be a problem with find that set parameters for grep command? Is it a grep problem? May it be related to the high amount of files we search on a certain time.
Thanks.

Multithreaded Bash in while loop

I have the following Bash one liner which should iterate through all the files in the folder named *.xml , check if they have the below string, and if not, rename them to $.empty
find -name '*.xml' | xargs -I{} grep -LZ "state=\"open\"" {} | while IFS= read -rd '' x; do mv "$x" "$x".empty ; done
this process is very slow, and when running this script in folders with over 100k files, it takes well over 15 minutes to complete.
I couldn't find a way to make this process to run multithreadly.
Note that in for loop im hitting the "too many arguments" errors, due to the large number of files.
Can anyone think of a solution ?
Thanks !
Roy
The biggest bottleneck in your code is that you are running a separate mv process (which is just a wrapper around a system call) to rename each file. Let's say you have 100,000 files, and 20,000 of them need to be renamed. Your original code will need 120,000 processes, one grep per file and one mv per rename. (Ignoring the 2 calls to find and xargs.)
A better approach would be to use a language than can access the system call directly. Here is a simple Perl example:
find -name '*.xml' | xargs -I{} grep -LZ "state=\"open\"" {} |
perl -n0e 'rename("$_", "$_.empty")'
This replaces 20,000 calls to mv with a single call to perl.
The other bottleneck is running a single grep process for each file. Instead, you'd like to pass as many files as possible to grep each time. There is no need for xargs here; use the -exec primary to find instead.
find -name '*.xml' -exec grep -LZ "state=\"open\"" {} + |
perl -n0e 'rename("$_", "$_.empty")'
The too many arguments error you were receiving is based on total argument length. Suppose the limit is 4096, and your XML files have an average name length of 20 characters. This means you should be able to pass 200+ files to each call to grep. The -exec ... + primary takes care of passing as many files as possible to each call to grep, so this code at most will require 100,000 / 200 = 500 calls to grep, a vast improvement.
Depending on the size of the files, it might be faster to read each file in the Perl process to check for the string to match. However, grep is very well optimized, and the code to do so, while not terribly complicated, is still more than you can comfortably write in a one-liner. This should be a good balance between speed and simplicity.

renaming with find

I managed to find several files with the find command.
the files are of the type file_sakfksanf.txt, file_afsjnanfs.pdf, file_afsnjnjans.cpp,
now I want to rename them with the rename and -exec command to
mywish_sakfksanf.txt, mywish_afsjnanfs.pdf, mywish_afsnjnjans.cpp
that only the first prefix is changed. I am trying for some time, so don't blame me for being stupid.
If you read through the -exec section of the man pages for find you will come across the {} string that allows you to use the matches as arguments within -exec. This will allow you to use rename on your find matches in the following way:
find . -name 'file_*' -exec rename 's/file_/mywish_/' {} \;
From the manual:
-exec command ;
Execute command; true if 0 status is returned. All following
arguments to find are taken to be arguments to the command until an
argument consisting of ;' is encountered. The string{}' is replaced
by the current file name being processed everywhere it occurs in the
arguments to the command, not just in arguments where it is alone, as
in some versions of find. Both of these constructions might need to
be escaped (with a `\') or quoted to protect them from expansion by
the shell. See the EXAMPLES section for examples of the use of the
-exec option. The specified command is run once for each matched file. The command is executed in the starting directory.There are
unavoidable security problems surrounding use of the -exec action;
you should use the -execdir option instead.
Although you asked for a find/exec solution, as Mark Reed suggested, you might want to consider piping your results to xargs. If you do, make sure to use the -print0 option with find and either the -0 or -null option with xargs to avoid unexpected behaviour resulting from whitespace or shell metacharacters appearing in your file names. Also, consider using the + version of -exec (also in the manual) as this is the POSIX spec for find and should therefore be more portable if you are wanting to run your command elsewhere (not always true); it also builds its command line in a way similar to xargs which should result in less invocations of rename.
Don't think there's a way you can do this with just find, you'll need to create a script:
#!/bin/bash
NEW=`echo $1 | sed -e 's/file_/mywish_/'`
mv $1 ${NEW}
THen you can:
find ./ -name 'file_*' -exec my_script {} \;

Find in Linux combined with a search to return a particular line

I'm trying to return a particular line from files found from this search:
find . -name "database.php"
Each of these files contains a database name, next to a php variable like $dname=
I've been trying to use -exec to execute a grep search on this file with no success
-exec "grep {\}\ dbname"
Can anyone provide me with some understanding of how to accomplish this task?
I'm running CentOS 5, and there are about 100 database.php files stored in subdirectories on my server.
Thanks
Jason
You have the arguments to grep inverted, and you need them as separate arguments:
find . -name "database.php" -exec grep '$dbname' /dev/null {} +
The presence of /dev/null ensures that the file name(s) that match are listed as well as the lines that match.
I think this will do it. Not sure if you need to make any adjustments for CentOS.
find . -name "database.php" -exec grep dbname {} \;
I worked it out using xargs
find . -name "database.php" -print | xargs grep \'database\'\=\> > list_of_databases
Feel free to post a better way if you find one (or what some rep for a good answer)
I tend to habitually avoid find because I've never learned how to use it properly, so the way I'd accomplish your task would be:
grep dbname **/database.php
Edit: This command won't be viable in all cases because it can potentially generate a very long argument list, whereas find executes its command on found files one by one like xargs. And, as I noted in my comment, it's possibly not very portable. But it's damn short ;)

Resources