Linux Shell Command: Find. How to Sort and Exec without using Pipes? - linux

Linux command find with argument exec does a GREAT job executing commands on files/folders regardless whether they contain spaces and special characters. For example:
find . -type f -exec md5sum {} \;
Works great to run md5sum on each file in a directory tree, but executes in a random order. Find does not sort the results, and requires piping to sort to get results in a more human-readable ordering. However, piping to sort eliminates the benefits of exec.
This does not work:
find . -type f | sort | md5sum
Because some filenames contain spaces and special characters.
Also does not work:
find . -type f | sort | sed 's/ /\\ /g' | md5sum
Still does not recognize spaces are part of the filename.
I suppose I can always sort the final result later, but wonder if someone knows an easy way to avoid that extra step by sorting within find?

With BSD find
A -s argument is available to request lexographic sort order.
find . -s -type f -exec md5sum -- '{}' +
With GNU find
Use NUL delimiters to allow filenames to be processed unambiguously. Assuming you have GNU tools:
find . -type f -print0 | sort -z | xargs -0 md5sum

Found a working solution
find . -type f -exec md5sum {} + | sort -k 1.33
Sorts the results by comparing the characters starting after the 32 character md5sum result, producing a readable/sorted list.

Related

How to pipe a list of files returned from find to cat and sort them

I'm trying to find all the files from a folder and then print them but sorted.
I have this so far
find . -type f -exec cat {} \;
and it print's all files but I need to sort them too but when I do
find . -type f -exec sort cat {};
I get the next error
sort:cannot read:cat:No such file or directory
and if I switch sort and cat like this
find . -type f -exec cat sort {} \;
I get the same error the it print's the file(I have only one file to print)
It's not clear to me if you want to display the contents of the files unchanged sorting the files by name, or if you want to sort the contents of each file. If the latter:
find . -type f -exec sort {} \;
If the former, use bsd find's -s option:
find -s . -type f -exec cat {} \;
If you don't have bsd find, use:
find . -type f -print0 | sort -z | xargs -0 cat
Composing commands using pipes is often the simplest solution.
find . -print0 -type f | sort | xargs -0 cat
Explanation: you can sort filenames after the fact using ... | sort, then pass the output (the list of files) to cat using xargs, i.e. ... | xargs cat.
As #arkaduisz points out, when using pipes, should carefully handle filenames containing whitespaces (thus using -print0 and -0).

How to pipe grep into a program that uses a directory with only read permissions

I have a remote directory that I do not have any permissions to other than to read files.
Typically I run a custom server-wide script, dostuff, as such:
dostuff /path/to/images/*img
However, this directory accidentally has two distinct sets of files (same number of files each) with very similar names:
13-08_1_0XXX.img
13-08_1_0XXX_16YYYY.img
where Xs increments from 1-900 together, and Ys have somewhat arbitrary numbers.
For example, I can regex select filenames for the first set with:
find . | grep -E -w -o ".{12}.img"
So I tried
find /path/to/images/ | grep -E -w -o ".{12}.img" | dostuff
but that does not work. As I can't move the files or copy them elsewhere, I think the only solution is to figure out how to pipe them into the script as two individual sets of images. Any suggestions would be appreciated!
Assuming you have GNU findutils, following should work:
$ find /path/to/imagedir -regextype posix-extended \
-regex '.*/13-08_1_0([1-9][0-9][0-8]|900)(.img|_16[0-9]+{4}.img)' -exec dosomestuff {} \;
-regextype posix-extended -regex option in find gives grep -E type functionality.
{} is the matched pattern found.
-exec dosomestuff {} would _do_some_stuff_ to the matched {}.
You can put the find...|grep... command in backquotes, and make that the argument to dostuff. Better is piping to find ... | grep ... | xargs dostuff. xargs will invoke dostuff with the (stdin stream of) filenames as an argument list. If the list of files would exceed max bash command line length (32KB), xargs breaks it into multiple invocations of dostuff.

Linux: Redirecting output of a command to "find"

I have a list of file names as output of certain command.
I need to find each of these files in a given directory.
I tried following command:
ls -R /home/ABC/testDir/ | grep "\.java" | xargs find /home/ABC/someAnotherDir -iname
But it is giving me following error:
find: paths must precede expression: XYZ.java
What would be the right way to do it?
ls -R /home/ABC/testDir/ | grep -F .java |
while read f; do find . -iname "$(basename $f)"; done
You can also use ${f##*/} instead of basename. Or;
find /home/ABC/testDir -iname '*.java*' |
while read f; do find . -iname "${f##*/}"; done
Note that, undoubtedly, many people will object to parsing the output of ls or find without using a null byte as filename separater, claiming that whitespace in filenames will cause problems. Those people usually ignore newlines in filenames, and their objections can be safely ignored. (As long as you don't allow whitespace in your filenames, that is!)
A better option is:
find /home/ABC/testDir -iname '*.java' -exec find . -iname {}
The reason xargs doesn't work is that is that you cannot pass 2 arguments to -iname within find.
find /home/ABC/testDir -name "\.java"

Linux command: How to 'find' only text files?

After a few searches from Google, what I come up with is:
find my_folder -type f -exec grep -l "needle text" {} \; -exec file {} \; | grep text
which is very unhandy and outputs unneeded texts such as mime type information. Any better solutions? I have lots of images and other binary files in the same folder with a lot of text files that I need to search through.
I know this is an old thread, but I stumbled across it and thought I'd share my method which I have found to be a very fast way to use find to find only non-binary files:
find . -type f -exec grep -Iq . {} \; -print
The -I option to grep tells it to immediately ignore binary files and the . option along with the -q will make it immediately match text files so it goes very fast. You can change the -print to a -print0 for piping into an xargs -0 or something if you are concerned about spaces (thanks for the tip, #lucas.werkmeister!)
Also the first dot is only necessary for certain BSD versions of find such as on OS X, but it doesn't hurt anything just having it there all the time if you want to put this in an alias or something.
EDIT: As #ruslan correctly pointed out, the -and can be omitted since it is implied.
Based on this SO question :
grep -rIl "needle text" my_folder
Why is it unhandy? If you need to use it often, and don't want to type it every time just define a bash function for it:
function findTextInAsciiFiles {
# usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text
}
put it in your .bashrc and then just run:
findTextInAsciiFiles your_folder "needle text"
whenever you want.
EDIT to reflect OP's edit:
if you want to cut out mime informations you could just add a further stage to the pipeline that filters out mime informations. This should do the trick, by taking only what comes before :: cut -d':' -f1:
function findTextInAsciiFiles {
# usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text | cut -d ':' -f1
}
find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search"
This is unfortunately not space save. Putting this into bash script makes it a bit easier.
This is space safe:
#!/bin/bash
#if [ ! "$1" ] ; then
echo "Usage: $0 <search>";
exit
fi
find . -type f -print0 \
| xargs -0 file \
| grep -P text \
| cut -d: -f1 \
| xargs -i% grep -Pil "$1" "%"
Another way of doing this:
# find . |xargs file {} \; |grep "ASCII text"
If you want empty files too:
# find . |xargs file {} \; |egrep "ASCII text|empty"
How about this:
$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable'
If you want the filenames without the file types, just add a final sed filter.
$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
You can filter-out unneeded file types by adding more -e 'type' options to the last grep command.
EDIT:
If your xargs version supports the -d option, the commands above become simpler:
$ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
Here's how I've done it ...
1 . make a small script to test if a file is plain text
istext:
#!/bin/bash
[[ "$(file -bi $1)" == *"file"* ]]
2 . use find as before
find . -type f -exec istext {} \; -exec grep -nHi mystring {} \;
Here's a simplified version with extended explanation for beginners like me who are trying to learn how to put more than one command in one line.
If you were to write out the problem in steps, it would look like this:
// For every file in this directory
// Check the filetype
// If it's an ASCII file, then print out the filename
To achieve this, we can use three UNIX commands: find, file, and grep.
find will check every file in the directory.
file will give us the filetype. In our case, we're looking for a return of 'ASCII text'
grep will look for the keyword 'ASCII' in the output from file
So how can we string these together in a single line? There are multiple ways to do it, but I find that doing it in order of our pseudo-code makes the most sense (especially to a beginner like me).
find ./ -exec file {} ";" | grep 'ASCII'
Looks complicated, but not bad when we break it down:
find ./ = look through every file in this directory. The find command prints out the filename of any file that matches the 'expression', or whatever comes after the path, which in our case is the current directory or ./
The most important thing to understand is that everything after that first bit is going to be evaluated as either True or False. If True, the file name will get printed out. If not, then the command moves on.
-exec = this flag is an option within the find command that allows us to use the result of some other command as the search expression. It's like calling a function within a function.
file {} = the command being called inside of find. The file command returns a string that tells you the filetype of a file. Regularly, it would look like this: file mytextfile.txt. In our case, we want it to use whatever file is being looked at by the find command, so we put in the curly braces {} to act as an empty variable, or parameter. In other words, we're just asking for the system to output a string for every file in the directory.
";" = this is required by find and is the punctuation mark at the end of our -exec command. See the manual for 'find' for more explanation if you need it by running man find.
| grep 'ASCII' = | is a pipe. Pipe take the output of whatever is on the left and uses it as input to whatever is on the right. It takes the output of the find command (a string that is the filetype of a single file) and tests it to see if it contains the string 'ASCII'. If it does, it returns true.
NOW, the expression to the right of find ./ will return true when the grep command returns true. Voila.
I have two issues with histumness' answer:
It only list text files. It does not actually search them as
requested. To actually search, use
find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text"
It spawns a grep process for every file, which is very slow. A better solution is then
find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text"
or simply
find . -type f -print0 | xargs -0 grep -I "needle text"
This only takes 0.2s compared to 4s for solution above (2.5GB data / 7700 files), i.e. 20x faster.
Also, nobody cited ag, the Silver Searcher or ack-grep¸as alternatives. If one of these are available, they are much better alternatives:
ag -t "needle text" # Much faster than ack
ack -t "needle text" # or ack-grep
As a last note, beware of false positives (binary files taken as text files). I already had false positive using either grep/ag/ack, so better list the matched files first before editing the files.
Although it is an old question, I think this info bellow will add to the quality of the answers here.
When ignoring files with the executable bit set, I just use this command:
find . ! -perm -111
To keep it from recursively enter into other directories:
find . -maxdepth 1 ! -perm -111
No need for pipes to mix lots of commands, just the powerful plain find command.
Disclaimer: it is not exactly what OP asked, because it doesn't check if the file is binary or not. It will, for example, filter out bash script files, that are text themselves but have the executable bit set.
That said, I hope this is useful to anyone.
I do it this way:
1) since there're too many files (~30k) to search thru, I generate the text file list daily for use via crontab using below command:
find /to/src/folder -type f -exec file {} \; | grep text | cut -d: -f1 > ~/.src_list &
2) create a function in .bashrc:
findex() {
cat ~/.src_list | xargs grep "$*" 2>/dev/null
}
Then I can use below command to do the search:
findex "needle text"
HTH:)
I prefer xargs
find . -type f | xargs grep -I "needle text"
if your filenames are weird look up using the -0 options:
find . -type f -print0 | xargs -0 grep -I "needle text"
bash example to serach text "eth0" in /etc in all text/ascii files
grep eth0 $(find /etc/ -type f -exec file {} \; | egrep -i "text|ascii" | cut -d ':' -f1)
If you are interested in finding any file type by their magic bytes using the awesome file utility combined with power of find, this can come in handy:
$ # Let's make some test files
$ mkdir ASCII-finder
$ cd ASCII-finder
$ dd if=/dev/urandom of=binary.file bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.009023 s, 116 MB/s
$ file binary.file
binary.file: data
$ echo 123 > text.txt
$ # Let the magic begin
$ find -type f -print0 | \
xargs -0 -I ## bash -c 'file "$#" | grep ASCII &>/dev/null && echo "file is ASCII: $#"' -- ##
Output:
file is ASCII: ./text.txt
Legend: $ is the interactive shell prompt where we enter our commands
You can modify the part after && to call some other script or do some other stuff inline as well, i.e. if that file contains given string, cat the entire file or look for a secondary string in it.
Explanation:
find items that are files
Make xargs feed each item as a line into one liner bash
command/script
file checks type of file by magic byte, grep checks if ASCII
exists, if so, then after && your next command executes.
find prints results null separated, this is good to escape
filenames with spaces and meta-characters in it.
xargs , using -0 option, reads them null separated, -I ##
takes each record and uses as positional parameter/args to bash
script.
-- for bash ensures whatever comes after it is an argument even
if it starts with - like -c which could otherwise be interpreted
as bash option
If you need to find types other than ASCII, simply replace grep ASCII with other type, like grep "PDF document, version 1.4"
find . -type f | xargs file | grep "ASCII text" | awk -F: '{print $1}'
Use find command to list all files, use file command to verify they are text (not tar,key), finally use awk command to filter and print the result.
How about this
find . -type f|xargs grep "needle text"

How do I include a pipe | in my linux find -exec command?

This isn't working. Can this be done in find? Or do I need to xargs?
find -name 'file_*' -follow -type f -exec zcat {} \| agrep -dEOE 'grep' \;
the solution is easy: execute via sh
... -exec sh -c "zcat {} | agrep -dEOE 'grep' " \;
The job of interpreting the pipe symbol as an instruction to run multiple processes and pipe the output of one process into the input of another process is the responsibility of the shell (/bin/sh or equivalent).
In your example you can either choose to use your top level shell to perform the piping like so:
find -name 'file_*' -follow -type f -exec zcat {} \; | agrep -dEOE 'grep'
In terms of efficiency this results costs one invocation of find, numerous invocations of zcat, and one invocation of agrep.
This would result in only a single agrep process being spawned which would process all the output produced by numerous invocations of zcat.
If you for some reason would like to invoke agrep multiple times, you can do:
find . -name 'file_*' -follow -type f \
-printf "zcat %p | agrep -dEOE 'grep'\n" | sh
This constructs a list of commands using pipes to execute, then sends these to a new shell to actually be executed. (Omitting the final "| sh" is a nice way to debug or perform dry runs of command lines like this.)
In terms of efficiency this results costs one invocation of find, one invocation of sh, numerous invocations of zcat and numerous invocations of agrep.
The most efficient solution in terms of number of command invocations is the suggestion from Paul Tomblin:
find . -name "file_*" -follow -type f -print0 | xargs -0 zcat | agrep -dEOE 'grep'
... which costs one invocation of find, one invocation of xargs, a few invocations of zcat and one invocation of agrep.
find . -name "file_*" -follow -type f -print0 | xargs -0 zcat | agrep -dEOE 'grep'
You can also pipe to a while loop that can do multiple actions on the file which find locates. So here is one for looking in jar archives for a given java class file in folder with a large distro of jar files
find /usr/lib/eclipse/plugins -type f -name \*.jar | while read jar; do echo $jar; jar tf $jar | fgrep IObservableList ; done
the key point being that the while loop contains multiple commands referencing the passed in file name separated by semicolon and these commands can include pipes. So in that example I echo the name of the matching file then list what is in the archive filtering for a given class name. The output looks like:
/usr/lib/eclipse/plugins/org.eclipse.core.contenttype.source_3.4.1.R35x_v20090826-0451.jar
/usr/lib/eclipse/plugins/org.eclipse.core.databinding.observable_1.2.0.M20090902-0800.jar
org/eclipse/core/databinding/observable/list/IObservableList.class
/usr/lib/eclipse/plugins/org.eclipse.search.source_3.5.1.r351_v20090708-0800.jar
/usr/lib/eclipse/plugins/org.eclipse.jdt.apt.core.source_3.3.202.R35x_v20091130-2300.jar
/usr/lib/eclipse/plugins/org.eclipse.cvs.source_1.0.400.v201002111343.jar
/usr/lib/eclipse/plugins/org.eclipse.help.appserver_3.1.400.v20090429_1800.jar
in my bash shell (xubuntu10.04/xfce) it really does make the matched classname bold as the fgrep highlights the matched string; this makes it really easy to scan down the list of hundreds of jar files that were searched and easily see any matches.
on windows you can do the same thing with:
for /R %j in (*.jar) do #echo %j & #jar tf %j | findstr IObservableList
note that in that on windows the command separator is '&' not ';' and that the '#' suppresses the echo of the command to give a tidy output just like the linux find output above; although findstr is not make the matched string bold so you have to look a bit closer at the output to see the matched class name. It turns out that the windows 'for' command knows quite a few tricks such as looping through text files...
enjoy
I found that running a string shell command (sh -c) works best, for example:
find -name 'file_*' -follow -type f -exec bash -c "zcat \"{}\" | agrep -dEOE 'grep'" \;
If you are looking for a simple alternative, this can be done using a loop:
for i in $(find -name 'file_*' -follow -type f); do
zcat $i | agrep -dEOE 'grep'
done
or, more general and easy to understand form:
for i in $(YOUR_FIND_COMMAND); do
YOUR_EXEC_COMMAND_AND_PIPES
done
and replace any {} by $i in YOUR_EXEC_COMMAND_AND_PIPES
Here's what you should do:
find -name 'file_*' -follow -type f -exec sh -c 'zcat "$1" | agrep -dEOE "grep"' sh {} \;
I tried a couple of these answers and they didn't work for me. #flolo's answer doesn't work correctly if your filenames have special characters. According to this answer:
The find command executes the command directly. The command, including the filename argument, will not be processed by the shell or anything else that might modify the filename. It's very safe.
You lose that safety if you put the {} inside the sh command string.
There is a potential problem with #Rolf W. Rasmussen's answer. Yes, it handles special characters (as far as I know), but if the find output is too long, you won't be able to execute xargs -0 ...: there is a command line character limit set by the kernel and sometimes your shell. Coincidentally, every time I want to pipe commands from a find, I run into this limit.
But, they do bring up a valid point regarding the performance limitations. I'm not sure how to overcome that, though personally, I've never run into a situation where my suggestion is too slow.

Resources