Sorting directory contents (including hidden files) by name in the shell - linux

Is there a nice way to sort directory contents (including hidden files) in the shell? Basically i'd like to be able to ls directories just its done in my GUI file manager. In a typical directory, the output is as such:
.a_hidden_dir
.b_hidden_dir
.c_hidden_dir
a_dir
b_dir
c_dir
.a_hidden_file
.b_hidden_file
.c_hidden_file
a_file
b_file
c_file
Of course ls has the --group-directories-first option, but this only gets us part of the way there as sort ignores the leading ., it does not sort hidden files to the top.
I'd like to be able to sort output from ls, find, or other list of paths in such a way. Does anyone know a good way to do this - maybe a sort -k KEYDEF?
Right now I'm doing something like this (it assumes directory names have a slash append to them):
pathsort(){
input=$(cat)
(
awk '/^\..+\/$/' <<<"$input" | sort
awk '/^[^.].+\/$/' <<<"$input" | sort
awk '/^\..+[^/]$/' <<<"$input" | sort
awk '/^[^.].+[^/]$/' <<<"$input" | sort
) | sed 's/\/$//'
}
\ls -Ap | pathsort
The above code gets the job done, but it is far from ideal. Please tell me there is a better way...

Jonathan Leffler proposed a simple and functional solution in a comment: set the local environment variable LANG=C. On my system, the default LANG=en_US.UTF-8 results in undesirable pathname sorting characteristics. The C is apparently in reference to bytewise character sorting with an ASCII charset. The result of setting LANG=C is such that 'dotfiles' (to include directories) are sorted to the top. It may be useful to note that LC_ALL=C may be used as well, as LC_ALL is a superset of LANG and other LC_* variables. All in all, setting the locale to C for sorting commands is strongly advised if you'd like a consistent sorting experience.
Here is the final solution to the desired pathname sorting hierarchy (dotfile dirs > normal dirs > dotfile files > normal files):
LC_ALL=C ls -A --group-directories-first
Note: this includes symlinks to files and directories as well
Similarly for sorting any other source of pathname output:
findtool | LC_ALL=C sort

Funnily, I think sorting the directories is simplest:
ls -1d .*/; ls -1d */
Files are harder to separate from the directories, you need to use find:
(find . -maxdepth 1 -type f -name '.*' -printf '%P\n' | sort); (find . -maxdepth 1 -type f -not -name '.*' -printf '%P\n' | sort)
Put the whole thing together:
alias lss="
ls -1d .*/; # Hidden directories
ls -1d */; # Normal directories
find . -maxdepth 1 -type f -name '.*' -printf '%P\n' | sort;
# Hidden files
find . -maxdepth 1 -type f -not -name '.*' -printf '%P\n' | sort
# Normal files
"
One caveat: There are other items that this will miss, like links and devices.

Related

How to use GNU find command to find files by pattern and list files in order of most recent modification to least?

I want to use the GNU find command to find files based on a pattern, and then have them displayed in order of the most recently modified file to the least recently modified.
I understand this:
find / -type f -name '*.md'
but then what would be added to sort the files from the most recently modified to the least?
find can't sort files, so you can instead output the modification time plus filename, sort on modification time, then remove the modification time again:
find . -type f -name '*.md' -printf '%T# %p\0' | # Print time+name
sort -rnz | # Sort numerically, descending
cut -z -d ' ' -f 2- | # Remove time
tr '\0' '\n' # Optional: make human readable
This uses \0-separated entries to avoid problems with any kind of filenames. You can pass this directly and safely to a number of tools, but here it instead pipes to tr to show the file list as individual lines.
find <dir> -name "*.mz" -printf "%Ts - %h/%f\n" | sort -rn
Print the modified time in epoch format (%Ts) as well as the directories (%h) and file name (%f). Pipe this through to sort -rn to sort in reversed number order.
Pipe the output of find to xargs and ls:
find / -type f -name '*.md' | xargs ls -1t

How to iterate on files in a directory robustly and portably, oldest first?

"Robustly" -- I'm trying to handle files with spaces and newlines and other special characters in the file names -- only because I've seen some special characters despite being told my target environment wouldn't have them.
"Portably" -- needs to run on a wide variety of Linux machines, including BusyBox.
"Oldest" -- by modification time, with names as a tiebreaker would suffice for my purpose.
Normally I'd use find -type f -printf '%T+ %p\n' 2>/dev/null | sort | cut -d' ' -f2 | while read filename; do mything "$filename"; done
But on this particular BusyBox v1.18.4 find does not support -printf so I'm not sure I can use find to expose the modification times for sorting.
Usage: find [PATH]... [EXPRESSION]
Search for files. The default PATH is the current directory,
default EXPRESSION is '-print'
EXPRESSION may consist of:
-follow Follow symlinks
-xdev Don't descend directories on other filesystems
-maxdepth N Descend at most N levels. -maxdepth 0 applies
tests/actions to command line arguments only
-mindepth N Don't act on first N levels
-name PATTERN File name (w/o directory name) matches PATTERN
-iname PATTERN Case insensitive -name
-path PATTERN Path matches PATTERN
-regex PATTERN Path matches regex PATTERN
-type X File type is X (X is one of: f,d,l,b,c,...)
-perm NNN Permissions match any of (+NNN), all of (-NNN),
or exactly NNN
-mtime DAYS Modified time is greater than (+N), less than (-N),
or exactly N days
-mmin MINS Modified time is greater than (+N), less than (-N),
or exactly N minutes
-newer FILE Modified time is more recent than FILE's
-inum N File has inode number N
-user NAME File is owned by user NAME (numeric user ID allowed)
-group NAME File belongs to group NAME (numeric group ID allowed)
-depth Process directory name after traversing it
-size N[bck] File size is N (c:bytes,k:kbytes,b:512 bytes(def.))
+/-N: file size is bigger/smaller than N
-links N Number of links is greater than (+N), less than (-N),
or exactly N
-print Print (default and assumed)
-print0 Delimit output with null characters rather than
newlines
-exec CMD ARG ; Run CMD with all instances of {} replaced by the
matching files
-prune Stop traversing current subtree
-delete Delete files, turns on -depth option
(EXPR) Group an expression
And I read that looping over the output of ls is a bad idea.
My environment also does not support stat -c format but I can use stat -t ...
So my current idea is to
cd "$1"
if [ $? -ne 0 ]; then
printf 'failed cd'
exit 1
fi
while read name
do
mything "$name"
done <<< "$(stat -t ./* \
| awk -F' ' '{print $13 " " $1}' \
| sort -r \
| awk -F' ' '{print $2}')"
Is there a better way? I'm relying on stat's 'terse' format being consistent.

Select files by extension using grep

I need to count all the .txt files in the current folder.
I tried ls | grep .txt but if my folder content is: a.txt btxt c.c it will select a.txt and btxt and I only want files that end with .txt. I tried various combinations of regexp but with no result.
Find may be better than in this case since it is designed for handling file names:
find . -maxdepth 0 -name '*.txt' | wc -l
Buf if you are very cautious about possibly strange file names:
find . -maxdepth 0 -name '*.txt' -exec echo 1 \; | wc -l
For Grep, using the character '.' means: "any character"... so you'll need to escape the dot:
ls | grep -e "\.txt"
edit in fact the -e option is not even necessary. this will do the trick:
ls | grep "\.txt"
If all you need is number of files with extension '.txt' in current directory only, then this will also help.
ls -l *.txt | wc -l

Create a bash script to delete folders which do not contain a certain filetype

I have recently run into a problem.
I used a utility to move all my music files into directories based on tags. This left a LOT of almost empty folders. The folders, in general, contain a thumbs.db file or some sort of image for album art. The mp3s have the correct album art in their new directories, so the old ones are okay to delete.
Basically, I need to find any directories within D:/Music/ that:
-Do not have any subdirectories
-Do not contain any mp3 files
And then delete them.
I figured this would be easier to do in a shell script or bash script or whatever else linux/unix world than in Windows 8.1 (HAHA).
Any suggestions? I'm not very experienced writing scripts like this.
This should get you started
find /music -mindepth 1 -type d |
while read dt
do
find "$dt" -mindepth 1 -type d | read && continue
find "$dt" -iname '*.mp3' -type f | read && continue
echo DELETE $dt
done
Here's the short story...
find . -name '*.mp3' -o -type d -printf '%h\n' | sort | uniq > non-empty-dirs.tmp
find . -type d -print | sort | uniq > all-dirs.tmp
comm -23 all-dirs.tmp non-empty-dirs.tmp > dirs-to-be-deleted.tmp
less dirs-to-be-deleted.tmp
cat dirs-to-be-deleted.tmp | xargs rm -rf
Note that you might have to run all the commands a few times (depending on your repository's directory depth) before you're done deleting all recursive empty directories...
And the long story goes...
You can approach this problem from two basic perspective: either you find all directories, then iterate over each of them, check if it contain any mp3 file or any subdirectory, if not, mark that directory for deletion. It will works, but on large very large repositories, you might expect a significant run time.
Another approach, which is in my sense much more interesting, is to build a list of directories NOT to be deleted, and subtract that list from the list of all directories. Let's work the second strategy, one step at a time...
First of all, to find the path of all directories that contains mp3 files, you can simply do:
find . -name '*.mp3' -printf '%h\n' | sort | uniq
This means "find any file ending with .mp3, then print the path to it's parent directory".
Now, I could certainly name at least ten different approaches to find directories that contains at least one subdirectory, but keeping the same strategy as above, we can easily get...
find . -type d -printf '%h\n' | sort | uniq
What this means is: "Find any directory, then print the path to it's parent."
Both of these queries can be combined in a single invocation, producing a single list containing the paths of all directories NOT to be deleted.. Let's redirect that list to a temporary file.
find . -name '*.mp3' -o -type d -printf '%h\n' | sort | uniq > non-empty-dirs.tmp
Let's similarly produce a file containing the paths of all directories, no matter if they are empty or not.
find . -type d -print | sort | uniq > all-dirs.tmp
So there, we have, on one side, the complete list of all directories, and on the other, the list of directories not to be deleted. What now? There are tons of strategies, but here's a very simple one:
comm -23 all-dirs.tmp non-empty-dirs.tmp > dirs-to-be-deleted.tmp
Once you have that, well, review it, and if you are satisfied, then pipe it through xargs to rm to actually delete the directories.
cat dirs-to-be-deleted.tmp | xargs rm -rf

How to count number of files in each directory?

I am able to list all the directories by
find ./ -type d
I attempted to list the contents of each directory and count the number of files in each directory by using the following command
find ./ -type d | xargs ls -l | wc -l
But this summed the total number of lines returned by
find ./ -type d | xargs ls -l
Is there a way I can count the number of files in each directory?
This prints the file count per directory for the current directory level:
du -a | cut -d/ -f2 | sort | uniq -c | sort -nr
Assuming you have GNU find, let it find the directories and let bash do the rest:
find . -type d -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
find . -type f | cut -d/ -f2 | sort | uniq -c
find . -type f to find all items of the type file, in current folder and subfolders
cut -d/ -f2 to cut out their specific folder
sort to sort the list of foldernames
uniq -c to return the number of times each foldername has been counted
You could arrange to find all the files, remove the file names, leaving you a line containing just the directory name for each file, and then count the number of times each directory appears:
find . -type f |
sed 's%/[^/]*$%%' |
sort |
uniq -c
The only gotcha in this is if you have any file names or directory names containing a newline character, which is fairly unlikely. If you really have to worry about newlines in file names or directory names, I suggest you find them, and fix them so they don't contain newlines (and quietly persuade the guilty party of the error of their ways).
If you're interested in the count of the files in each sub-directory of the current directory, counting any files in any sub-directories along with the files in the immediate sub-directory, then I'd adapt the sed command to print only the top-level directory:
find . -type f |
sed -e 's%^\(\./[^/]*/\).*$%\1%' -e 's%^\.\/[^/]*$%./%' |
sort |
uniq -c
The first pattern captures the start of the name, the dot, the slash, the name up to the next slash and the slash, and replaces the line with just the first part, so:
./dir1/dir2/file1
is replaced by
./dir1/
The second replace captures the files directly in the current directory; they don't have a slash at the end, and those are replace by ./. The sort and count then works on just the number of names.
Here's one way to do it, but probably not the most efficient.
find -type d -print0 | xargs -0 -n1 bash -c 'echo -n "$1:"; ls -1 "$1" | wc -l' --
Gives output like this, with directory name followed by count of entries in that directory. Note that the output count will also include directory entries which may not be what you want.
./c/fa/l:0
./a:4
./a/c:0
./a/a:1
./a/a/b:0
Slightly modified version of Sebastian's answer using find instead of du (to exclude file-size-related overhead that du has to perform and that is never used):
find ./ -mindepth 2 -type f | cut -d/ -f2 | sort | uniq -c | sort -nr
-mindepth 2 parameter is used to exclude files in current directory. If you remove it, you'll see a bunch of lines like the following:
234 dir1
123 dir2
1 file1
1 file2
1 file3
...
1 fileN
(much like the du-based variant does)
If you do need to count the files in current directory as well, use this enhanced version:
{ find ./ -mindepth 2 -type f | cut -d/ -f2 | sort && find ./ -maxdepth 1 -type f | cut -d/ -f1; } | uniq -c | sort -nr
The output will be like the following:
234 dir1
123 dir2
42 .
Everyone else's solution has one drawback or another.
find -type d -readable -exec sh -c 'printf "%s " "$1"; ls -1UA "$1" | wc -l' sh {} ';'
Explanation:
-type d: we're interested in directories.
-readable: We only want them if it's possible to list the files in them. Note that find will still emit an error when it tries to search for more directories in them, but this prevents calling -exec for them.
-exec sh -c BLAH sh {} ';': for each directory, run this script fragment, with $0 set to sh and $1 set to the filename.
printf "%s " "$1": portably and minimally print the directory name, followed by only a space, not a newline.
ls -1UA: list the files, one per line, in directory order (to avoid stalling the pipe), excluding only the special directories . and ..
wc -l: count the lines
This can also be done with looping over ls instead of find
for f in */; do echo "$f -> $(ls $f | wc -l)"; done
Explanation:
for f in */; - loop over all directories
do echo "$f -> - print out each directory name
$(ls $f | wc -l) - call ls for this directory and count lines
This should return the directory name followed by the number of files in the directory.
findfiles() {
echo "$1" $(find "$1" -maxdepth 1 -type f | wc -l)
}
export -f findfiles
find ./ -type d -exec bash -c 'findfiles "$0"' {} \;
Example output:
./ 6
./foo 1
./foo/bar 2
./foo/bar/bazzz 0
./foo/bar/baz 4
./src 4
The export -f is required because the -exec argument of find does not allow executing a bash function unless you invoke bash explicitly, and you need to export the function defined in the current scope to the new shell explicitly.
My answer is a little different, due to the options of find, you can actually be much more flexible. Just try:
find . -type f -printf "%h\n" | sort | uniq -c
With the "%h" option to "-printf", find prints only the directory of the files it found. Then sort and count with "uniq -c". This prints the number of search result entries with the same directory, per directory.
Using further options on find, you can be much more flexible. For example, to get an overview how many files in which directory have been modified at a certain date, use:
find . -newermt "2022-01-01 00:00:00" -type f -printf "%TY-%Tm-%Td %h\n" | sort | uniq -c
This finds all files that have been modified since 1. January 2022, prints (with "-printf") the modification date and the directory, then sorts and counts them. In this example, each line in the result has the number of files, the date of modification (without time), and the directory.
Note that "-printf" may not be available in all versions of find I think.
I combined #glenn jackman's answer and #pcarvalho's answer(in comment list, there is something wrong with pcarvalho's answer because the extra style control function of character '`'(backtick)).
My script can accept path as an augument and sort the directory list as ls -l, also it can handles the problem of "space in file name".
#!/bin/bash
OLD_IFS="$IFS"
IFS=$'\n'
for dir in $(find $1 -maxdepth 1 -type d | sort);
do
files=("$dir"/*)
printf "%5d,%s\n" "${#files[#]}" "$dir"
done
FS="$OLD_IFS"
My first answer in stackoverflow, and I hope it can help someone ^_^
THis could be another way to browse through the directory structures and provide depth results.
find . -type d | awk '{print "echo -n \""$0" \";ls -l "$0" | grep -v total | wc -l" }' | sh
find . -type f -printf '%h\n' | sort | uniq -c
gives for example:
5 .
4 ./aln
5 ./aln/iq
4 ./bs
4 ./ft
6 ./hot
I tried with some of the others here but ended up with subfolders included in the file count when I only wanted the files. This prints ./folder/path<tab>nnn with the number of files, not including subfolders, for each subfolder in the current folder.
for d in `find . -type d -print`
do
echo -e "$d\t$(find $d -maxdepth 1 -type f -print | wc -l)"
done
This will give the overall count.
for file in */; do echo "$file -> $(ls $file | wc -l)"; done | cut -d ' ' -f 3| py --ji -l 'numpy.sum(l)'
A super fast miracle command, which recursively traverses files to count the number of images in a directory and organize the output by image extension:
find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -n | grep -Ei '(tiff|bmp|jpeg|jpg|png|gif)$'
Credits: https://unix.stackexchange.com/a/386135/354980
I edited the script in order to exclude all node_modules directories inside the analyzed one.
This can be used to check if the project number of files is exceeding the maximum number that the file watcher can handle.
find . -type d ! -path "*node_modules*" -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
To check the maximum files that your system can watch:
cat /proc/sys/fs/inotify/max_user_watches
node_modules folder should be added to your IDE/editor excluded paths in slow systems, and the other files count shouldn't ideally exceed the maximum (which can be changed though).
Easy Method:
find ./|grep "Search_file.txt" |cut -d"/" -f2|sort |uniq -c
In my case I needed the count at subfolder level, so I did:
du -a | cut -d/ -f3 | sort | uniq -c | sort -nr
Easy way to recursively find files of a given type. In this case, .jpg files for all folders in current directory:
find . -name *.jpg -print | wc -l
omg why the complex commands. just use something like
find whatever_folder | wc -l

Resources