How to pipe a list of files returned from find to cat and sort them - linux

I'm trying to find all the files from a folder and then print them but sorted.
I have this so far
find . -type f -exec cat {} \;
and it print's all files but I need to sort them too but when I do
find . -type f -exec sort cat {};
I get the next error
sort:cannot read:cat:No such file or directory
and if I switch sort and cat like this
find . -type f -exec cat sort {} \;
I get the same error the it print's the file(I have only one file to print)

It's not clear to me if you want to display the contents of the files unchanged sorting the files by name, or if you want to sort the contents of each file. If the latter:
find . -type f -exec sort {} \;
If the former, use bsd find's -s option:
find -s . -type f -exec cat {} \;
If you don't have bsd find, use:
find . -type f -print0 | sort -z | xargs -0 cat

Composing commands using pipes is often the simplest solution.
find . -print0 -type f | sort | xargs -0 cat
Explanation: you can sort filenames after the fact using ... | sort, then pass the output (the list of files) to cat using xargs, i.e. ... | xargs cat.
As #arkaduisz points out, when using pipes, should carefully handle filenames containing whitespaces (thus using -print0 and -0).

Related

delete all files except a pattern list file

I need to delete all the files in the current directory except a list of patterns that are described in a whitelist file (delete_whitelist.txt) like this:
(.*)dir1(/)?
(.*)dir2(/)?
(.*)dir2/ser1(/)?(.*)
(.*)dir2/ser2(/)?(.*)
(.*)dir2/ser3(/)?(.*)
(.*)dir2/ser4(/)?(.*)
(.*)dir2/ser5(/)?(.*)
How can I perform this in one bash line?
Any bash script can fit on one line:
find . -type f -print0 | grep -EzZvf delete_whitelist.txt | xargs -0 printf '%s\n'
Check the output and then, if it's OK:
find . -type f -print0 | grep -EzZvf delete_whitelist.txt | xargs -0 rm

Linux Shell Command: Find. How to Sort and Exec without using Pipes?

Linux command find with argument exec does a GREAT job executing commands on files/folders regardless whether they contain spaces and special characters. For example:
find . -type f -exec md5sum {} \;
Works great to run md5sum on each file in a directory tree, but executes in a random order. Find does not sort the results, and requires piping to sort to get results in a more human-readable ordering. However, piping to sort eliminates the benefits of exec.
This does not work:
find . -type f | sort | md5sum
Because some filenames contain spaces and special characters.
Also does not work:
find . -type f | sort | sed 's/ /\\ /g' | md5sum
Still does not recognize spaces are part of the filename.
I suppose I can always sort the final result later, but wonder if someone knows an easy way to avoid that extra step by sorting within find?
With BSD find
A -s argument is available to request lexographic sort order.
find . -s -type f -exec md5sum -- '{}' +
With GNU find
Use NUL delimiters to allow filenames to be processed unambiguously. Assuming you have GNU tools:
find . -type f -print0 | sort -z | xargs -0 md5sum
Found a working solution
find . -type f -exec md5sum {} + | sort -k 1.33
Sorts the results by comparing the characters starting after the 32 character md5sum result, producing a readable/sorted list.

How to search for string in files whose name contains another string and are created in last 7 days in linux?

I'd like to search for string in files whose name contains another string and are created in last 7 days,
I tried:
find . -type f -name '*name_string*' -mtime -7 | grep -ir '*mytext*'
But it didn't work,
Please help
You were really close but just missed the xargs, otherwise the output from find is just a bunch of text for grep.
find . -type f -name '*name_string*' -mtime -7 | xargs grep -i 'mytext'
By using xargs you pass the list of files as the set of files that grep should be searching for the string mytext.
BTW, you can just use mytext instead of *mytext*
If you want to search for multiple patterns say pattern1 and pattern2 in the list of file names containing name_string:
find . -type f -name '*name_string*' -mtime -7 -print0 | while read -d $'\0' f; do
grep -qi pattern1 "$f" && grep -li pattern2 "$f"
done
This should work even with file names containing spaces.
No need for xargs or other non-standard extension to have good file name handling:
find . -type f -name '*name_string*' -mtime -7 -exec grep -i 'mytext' {} \;

How to only get file name with Linux 'find'?

I'm using find to all files in directory, so I get a list of paths. However, I need only file names. i.e. I get ./dir1/dir2/file.txt and I want to get file.txt
In GNU find you can use -printf parameter for that, e.g.:
find /dir1 -type f -printf "%f\n"
If your find doesn't have a -printf option you can also use basename:
find ./dir1 -type f -exec basename {} \;
Use -execdir which automatically holds the current file in {}, for example:
find . -type f -execdir echo '{}' ';'
You can also use $PWD instead of . (on some systems it won't produce an extra dot in the front).
If you still got an extra dot, alternatively you can run:
find . -type f -execdir basename '{}' ';'
-execdir utility [argument ...] ;
The -execdir primary is identical to the -exec primary with the exception that utility will be executed from the directory that holds the current file.
When used + instead of ;, then {} is replaced with as many pathnames as possible for each invocation of utility. In other words, it'll print all filenames in one line.
If you are using GNU find
find . -type f -printf "%f\n"
Or you can use a programming language such as Ruby(1.9+)
$ ruby -e 'Dir["**/*"].each{|x| puts File.basename(x)}'
If you fancy a bash (at least 4) solution
shopt -s globstar
for file in **; do echo ${file##*/}; done
If you want to run some action against the filename only, using basename can be tough.
For example this:
find ~/clang+llvm-3.3/bin/ -type f -exec echo basename {} \;
will just echo basename /my/found/path. Not what we want if we want to execute on the filename.
But you can then xargs the output. for example to kill the files in a dir based on names in another dir:
cd dirIwantToRMin;
find ~/clang+llvm-3.3/bin/ -type f -exec basename {} \; | xargs rm
On mac (BSD find) use:
find /dir1 -type f -exec basename {} \;
As others have pointed out, you can combine find and basename, but by default the basename program will only operate on one path at a time, so the executable will have to be launched once for each path (using either find ... -exec or find ... | xargs -n 1), which may potentially be slow.
If you use the -a option on basename, then it can accept multiple filenames in a single invocation, which means that you can then use xargs without the -n 1, to group the paths together into a far smaller number of invocations of basename, which should be more efficient.
Example:
find /dir1 -type f -print0 | xargs -0 basename -a
Here I've included the -print0 and -0 (which should be used together), in order to cope with any whitespace inside the names of files and directories.
Here is a timing comparison, between the xargs basename -a and xargs -n1 basename versions. (For sake of a like-with-like comparison, the timings reported here are after an initial dummy run, so that they are both done after the file metadata has already been copied to I/O cache.) I have piped the output to cksum in both cases, just to demonstrate that the output is independent of the method used.
$ time sh -c 'find /usr/lib -type f -print0 | xargs -0 basename -a | cksum'
2532163462 546663
real 0m0.063s
user 0m0.058s
sys 0m0.040s
$ time sh -c 'find /usr/lib -type f -print0 | xargs -0 -n 1 basename | cksum'
2532163462 546663
real 0m14.504s
user 0m12.474s
sys 0m3.109s
As you can see, it really is substantially faster to avoid launching basename every time.
Honestly basename and dirname solutions are easier, but you can also check this out :
find . -type f | grep -oP "[^/]*$"
or
find . -type f | rev | cut -d '/' -f1 | rev
or
find . -type f | sed "s/.*\///"
-exec and -execdir are slow, xargs is king.
$ alias f='time find /Applications -name "*.app" -type d -maxdepth 5'; \
f -exec basename {} \; | wc -l; \
f -execdir echo {} \; | wc -l; \
f -print0 | xargs -0 -n1 basename | wc -l; \
f -print0 | xargs -0 -n1 -P 8 basename | wc -l; \
f -print0 | xargs -0 basename | wc -l
139
0m01.17s real 0m00.20s user 0m00.93s system
139
0m01.16s real 0m00.20s user 0m00.92s system
139
0m01.05s real 0m00.17s user 0m00.85s system
139
0m00.93s real 0m00.17s user 0m00.85s system
139
0m00.88s real 0m00.12s user 0m00.75s system
xargs's parallelism also helps.
Funnily enough i cannot explain the last case of xargs without -n1.
It gives the correct result and it's the fastest ¯\_(ツ)_/¯
(basename takes only 1 path argument but xargs will send them all (actually 5000) without -n1. does not work on linux and openbsd, only macOS...)
Some bigger numbers from a linux system to see how -execdir helps, but still much slower than a parallel xargs:
$ alias f='time find /usr/ -maxdepth 5 -type d'
$ f -exec basename {} \; | wc -l; \
f -execdir echo {} \; | wc -l; \
f -print0 | xargs -0 -n1 basename | wc -l; \
f -print0 | xargs -0 -n1 -P 8 basename | wc -l
2358
3.63s real 0.10s user 0.41s system
2358
1.53s real 0.05s user 0.31s system
2358
1.30s real 0.03s user 0.21s system
2358
0.41s real 0.03s user 0.25s system
I've found a solution (on makandracards page), that gives just the newest file name:
ls -1tr * | tail -1
(thanks goes to Arne Hartherz)
I used it for cp:
cp $(ls -1tr * | tail -1) /tmp/

how to find files containing a string using egrep

I would like to find the files containing specific string under linux.
I tried something like but could not succeed:
find . -name *.txt | egrep mystring
Here you are sending the file names (output of the find command) as input to egrep; you actually want to run egrep on the contents of the files.
Here are a couple of alternatives:
find . -name "*.txt" -exec egrep mystring {} \;
or even better
find . -name "*.txt" -print0 | xargs -0 egrep mystring
Check the find command help to check what the single arguments do.
The first approach will spawn a new process for every file, while the second will pass more than one file as argument to egrep; the -print0 and -0 flags are needed to deal with potentially nasty file names (allowing to separate file names correctly even if a file name contains a space, for example).
try:
find . -name '*.txt' | xargs egrep mystring
There are two problems with your version:
Firstly, *.txt will first be expanded by the shell, giving you a listing of files in the current directory which end in .txt, so for instance, if you have the following:
[dsm#localhost:~]$ ls *.txt
test.txt
[dsm#localhost:~]$
your find command will turn into find . -name test.txt. Just try the following to illustrate:
[dsm#localhost:~]$ echo find . -name *.txt
find . -name test.txt
[dsm#localhost:~]$
Secondly, egrep does not take filenames from STDIN. To convert them to arguments you need to use xargs
find . -name *.txt | egrep mystring
That will not work as egrep will be searching for mystring within the output generated by find . -name *.txt which are just the path to *.txt files.
Instead, you can use xargs:
find . -name *.txt | xargs egrep mystring
You could use
find . -iname *.txt -exec egrep mystring \{\} \;
Here's an example that will return the file paths of a all *.log files that have a line that begins with ERROR:
find . -name "*.log" -exec egrep -l '^ERROR' {} \;
there's a recursive option from egrep you can use
egrep -R "pattern" *.log
If you only want the filenames:
find . -type f -name '*.txt' -exec egrep -l pattern {} \;
If you want filenames and matches:
find . -type f -name '*.txt' -exec egrep pattern {} /dev/null \;

Resources