How to use GNU find command to find files by pattern and list files in order of most recent modification to least? - linux

I want to use the GNU find command to find files based on a pattern, and then have them displayed in order of the most recently modified file to the least recently modified.
I understand this:
find / -type f -name '*.md'
but then what would be added to sort the files from the most recently modified to the least?

find can't sort files, so you can instead output the modification time plus filename, sort on modification time, then remove the modification time again:
find . -type f -name '*.md' -printf '%T# %p\0' | # Print time+name
sort -rnz | # Sort numerically, descending
cut -z -d ' ' -f 2- | # Remove time
tr '\0' '\n' # Optional: make human readable
This uses \0-separated entries to avoid problems with any kind of filenames. You can pass this directly and safely to a number of tools, but here it instead pipes to tr to show the file list as individual lines.

find <dir> -name "*.mz" -printf "%Ts - %h/%f\n" | sort -rn
Print the modified time in epoch format (%Ts) as well as the directories (%h) and file name (%f). Pipe this through to sort -rn to sort in reversed number order.

Pipe the output of find to xargs and ls:
find / -type f -name '*.md' | xargs ls -1t

Related

find order directories first, files last

I'm trying to list all files using find, so that the directories are listed first (in order) and files at same depth are listed after:
test/test1/1.txt
test/test2/1.txt
test/xtest/1.txt
test/test.txt
I tried using this:
find -type f -printf "%d\t%p\n" | sort -nr
But it gives me this result:
test/xtest/1.txt
test/test2/1.txt
test/test1/1.txt
test/test.txt
Is there a way using find or should I look for something else?
Sort by filename first, then - by depth:
find . -type f -printf "%d %p\n" | sort -k2 | sort -k1,1nr

Linux Shell Command: Find. How to Sort and Exec without using Pipes?

Linux command find with argument exec does a GREAT job executing commands on files/folders regardless whether they contain spaces and special characters. For example:
find . -type f -exec md5sum {} \;
Works great to run md5sum on each file in a directory tree, but executes in a random order. Find does not sort the results, and requires piping to sort to get results in a more human-readable ordering. However, piping to sort eliminates the benefits of exec.
This does not work:
find . -type f | sort | md5sum
Because some filenames contain spaces and special characters.
Also does not work:
find . -type f | sort | sed 's/ /\\ /g' | md5sum
Still does not recognize spaces are part of the filename.
I suppose I can always sort the final result later, but wonder if someone knows an easy way to avoid that extra step by sorting within find?
With BSD find
A -s argument is available to request lexographic sort order.
find . -s -type f -exec md5sum -- '{}' +
With GNU find
Use NUL delimiters to allow filenames to be processed unambiguously. Assuming you have GNU tools:
find . -type f -print0 | sort -z | xargs -0 md5sum
Found a working solution
find . -type f -exec md5sum {} + | sort -k 1.33
Sorts the results by comparing the characters starting after the 32 character md5sum result, producing a readable/sorted list.

Find and sort files by date modified

I know that there are many answers to this question online. However, I would like to know if this alternate solution would work:
ls -lt `find . -name "*.jpg" -print | head -10`
I'm aware of course that this will only give me the first 10 results. The reason I'm asking is because I'm not sure whether the ls is executing separately for each result of find or not. Thanks
In your solution:
the ls will be executed after the find is evaluated
it is likely that find will yield too many results for ls to process, in which case you might want to look at the xargs command
This should work better:
find . -type f -print0 | xargs -0 stat -f"%m %Sm %N" | sort -rn
The three parts of the command to this:
find all files and print their path
use xargs to process the (long) list of files and print out the modification unixtime, human readable time, and filename for each file
sort the resulting list in reverse numerical order
The main trick is to add the numerical unixtime when the files were last modified to the beginning of the lines, and then sort them.

Sorting directory contents (including hidden files) by name in the shell

Is there a nice way to sort directory contents (including hidden files) in the shell? Basically i'd like to be able to ls directories just its done in my GUI file manager. In a typical directory, the output is as such:
.a_hidden_dir
.b_hidden_dir
.c_hidden_dir
a_dir
b_dir
c_dir
.a_hidden_file
.b_hidden_file
.c_hidden_file
a_file
b_file
c_file
Of course ls has the --group-directories-first option, but this only gets us part of the way there as sort ignores the leading ., it does not sort hidden files to the top.
I'd like to be able to sort output from ls, find, or other list of paths in such a way. Does anyone know a good way to do this - maybe a sort -k KEYDEF?
Right now I'm doing something like this (it assumes directory names have a slash append to them):
pathsort(){
input=$(cat)
(
awk '/^\..+\/$/' <<<"$input" | sort
awk '/^[^.].+\/$/' <<<"$input" | sort
awk '/^\..+[^/]$/' <<<"$input" | sort
awk '/^[^.].+[^/]$/' <<<"$input" | sort
) | sed 's/\/$//'
}
\ls -Ap | pathsort
The above code gets the job done, but it is far from ideal. Please tell me there is a better way...
Jonathan Leffler proposed a simple and functional solution in a comment: set the local environment variable LANG=C. On my system, the default LANG=en_US.UTF-8 results in undesirable pathname sorting characteristics. The C is apparently in reference to bytewise character sorting with an ASCII charset. The result of setting LANG=C is such that 'dotfiles' (to include directories) are sorted to the top. It may be useful to note that LC_ALL=C may be used as well, as LC_ALL is a superset of LANG and other LC_* variables. All in all, setting the locale to C for sorting commands is strongly advised if you'd like a consistent sorting experience.
Here is the final solution to the desired pathname sorting hierarchy (dotfile dirs > normal dirs > dotfile files > normal files):
LC_ALL=C ls -A --group-directories-first
Note: this includes symlinks to files and directories as well
Similarly for sorting any other source of pathname output:
findtool | LC_ALL=C sort
Funnily, I think sorting the directories is simplest:
ls -1d .*/; ls -1d */
Files are harder to separate from the directories, you need to use find:
(find . -maxdepth 1 -type f -name '.*' -printf '%P\n' | sort); (find . -maxdepth 1 -type f -not -name '.*' -printf '%P\n' | sort)
Put the whole thing together:
alias lss="
ls -1d .*/; # Hidden directories
ls -1d */; # Normal directories
find . -maxdepth 1 -type f -name '.*' -printf '%P\n' | sort;
# Hidden files
find . -maxdepth 1 -type f -not -name '.*' -printf '%P\n' | sort
# Normal files
"
One caveat: There are other items that this will miss, like links and devices.

Create a bash script to delete folders which do not contain a certain filetype

I have recently run into a problem.
I used a utility to move all my music files into directories based on tags. This left a LOT of almost empty folders. The folders, in general, contain a thumbs.db file or some sort of image for album art. The mp3s have the correct album art in their new directories, so the old ones are okay to delete.
Basically, I need to find any directories within D:/Music/ that:
-Do not have any subdirectories
-Do not contain any mp3 files
And then delete them.
I figured this would be easier to do in a shell script or bash script or whatever else linux/unix world than in Windows 8.1 (HAHA).
Any suggestions? I'm not very experienced writing scripts like this.
This should get you started
find /music -mindepth 1 -type d |
while read dt
do
find "$dt" -mindepth 1 -type d | read && continue
find "$dt" -iname '*.mp3' -type f | read && continue
echo DELETE $dt
done
Here's the short story...
find . -name '*.mp3' -o -type d -printf '%h\n' | sort | uniq > non-empty-dirs.tmp
find . -type d -print | sort | uniq > all-dirs.tmp
comm -23 all-dirs.tmp non-empty-dirs.tmp > dirs-to-be-deleted.tmp
less dirs-to-be-deleted.tmp
cat dirs-to-be-deleted.tmp | xargs rm -rf
Note that you might have to run all the commands a few times (depending on your repository's directory depth) before you're done deleting all recursive empty directories...
And the long story goes...
You can approach this problem from two basic perspective: either you find all directories, then iterate over each of them, check if it contain any mp3 file or any subdirectory, if not, mark that directory for deletion. It will works, but on large very large repositories, you might expect a significant run time.
Another approach, which is in my sense much more interesting, is to build a list of directories NOT to be deleted, and subtract that list from the list of all directories. Let's work the second strategy, one step at a time...
First of all, to find the path of all directories that contains mp3 files, you can simply do:
find . -name '*.mp3' -printf '%h\n' | sort | uniq
This means "find any file ending with .mp3, then print the path to it's parent directory".
Now, I could certainly name at least ten different approaches to find directories that contains at least one subdirectory, but keeping the same strategy as above, we can easily get...
find . -type d -printf '%h\n' | sort | uniq
What this means is: "Find any directory, then print the path to it's parent."
Both of these queries can be combined in a single invocation, producing a single list containing the paths of all directories NOT to be deleted.. Let's redirect that list to a temporary file.
find . -name '*.mp3' -o -type d -printf '%h\n' | sort | uniq > non-empty-dirs.tmp
Let's similarly produce a file containing the paths of all directories, no matter if they are empty or not.
find . -type d -print | sort | uniq > all-dirs.tmp
So there, we have, on one side, the complete list of all directories, and on the other, the list of directories not to be deleted. What now? There are tons of strategies, but here's a very simple one:
comm -23 all-dirs.tmp non-empty-dirs.tmp > dirs-to-be-deleted.tmp
Once you have that, well, review it, and if you are satisfied, then pipe it through xargs to rm to actually delete the directories.
cat dirs-to-be-deleted.tmp | xargs rm -rf

Resources