find command listing results in directory order - linux

I am trying to use the find command to find all files 'M*' from my working directory and display results in directory order.
Instead it keeps displaying results in sorted order which causes some deeper directories to be listed first because they are alphabetically in order.
$ find -name 'M*'
./MyFourth
./s/MyFirst
./s/v/b/MyThird
./s/v/MySecond
I would like it to be in this order:
./MyFourth
./s/MyFirst
./s/v/MySecond
./s/v/b/MyThird
Thanks for your help

If I understand correctly what you mean by "directory order", this should help:
find -name 'M*' -printf '%p\t%d\n' | sort -n -k2 | cut -f 1
It prints the files sorted by their depth in the directory tree.

$ find . -name 'M*' | awk -F/ '{print NF,$0}' | sort -k1,1n -k2 | cut -d' ' -f 2-
./MyFourth
./s/MyFirst
./s/v/MySecond
./s/v/b/MyThird

Related

Sorting files based on non empty lines Linux

I want to sort recursively found non-empty .py files (from current directory) in reverse order, based on the non-empty lines. If multiple files have equal number of non-blank lines, than the order should be alphabetical. All i have is:
find -P . -name '*.py' ! size 0 -print | xargs cat | sed '/^\s*$/d' | wc -l
But this is not working and I don't know how to sort. I would prefer a one-liner instead of a bash script.
Thank you in advance
Find py files, count non-empty lines with grep, revert columns with awk, and sort in inverse numeric order:
find -name '*.py' -exec grep -v '^$' -c {} -H \; | \
awk -F: '{print $2, $1}' | \
sort -nr
wc -l *.py | sort -n will get you most of the way there.
wc -l *.py | grep -vF ' 0 ' | sort -n will eliminate the empty files.
Neither will handle recurse lookups, but that's not specified in your question, and I don't know how you'd go about alphabetizing in that case.
I think this does what you are asking:
wc -l *.py | grep -vF ' 0 ' | sort -nr -k1,2 -t' '

Finding a filetype in current directory and subdirectory, including hidden files. (Homework)

With bash, I am needing to find all gifs in my current directory and subdirectories, and display them in a specific way. I have to include hidden files as well, and I am not allowed to use grep (or its subsidiaries - fgrep and such) or basename. I am running Ubuntu 14.04 through a virtual machine (VirtualBox via Vagrant if that matters) without a GUI. My current script looks like this:
#!/bin/bash
ls -a | find $directory -type f -name "*.gif" | rev | cut -d/ -f1 | rev | cut -d. -f1 | sort -f
This has mostly done what I need, especially with regard to formatting, but when I changed one of the gifs to a hidden file, it was no longer visible - there was an extra empty line, almost as if the file name was written in invisible ink. Does anyone know why it's doing this?
It's not very clear what you are trying to achieve.
As lurker said in the comments cut -d. -f1 will make any line starting with a . be a blank line.
From your code, the closest I could think of is
find $directory -type f -iname '*.gif' | rev | cut -d/ -f1 | cut -d. -f2,3 | rev | sort -f
Giving you all gifs, hidden or not, without path or extension.
Example
user#host /tmp % ls -aR
.:
. .. subdirectory .test.gif test.gif
./subdirectory:
. .. .sub.gif sub.gif
user#host /tmp % find . -type f -iname '*.gif' | rev | cut -d/ -f1 | cut -d. -f2,3 | rev | sort -f
sub
.sub
test
.test

How to count number of files in each directory?

I am able to list all the directories by
find ./ -type d
I attempted to list the contents of each directory and count the number of files in each directory by using the following command
find ./ -type d | xargs ls -l | wc -l
But this summed the total number of lines returned by
find ./ -type d | xargs ls -l
Is there a way I can count the number of files in each directory?
This prints the file count per directory for the current directory level:
du -a | cut -d/ -f2 | sort | uniq -c | sort -nr
Assuming you have GNU find, let it find the directories and let bash do the rest:
find . -type d -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
find . -type f | cut -d/ -f2 | sort | uniq -c
find . -type f to find all items of the type file, in current folder and subfolders
cut -d/ -f2 to cut out their specific folder
sort to sort the list of foldernames
uniq -c to return the number of times each foldername has been counted
You could arrange to find all the files, remove the file names, leaving you a line containing just the directory name for each file, and then count the number of times each directory appears:
find . -type f |
sed 's%/[^/]*$%%' |
sort |
uniq -c
The only gotcha in this is if you have any file names or directory names containing a newline character, which is fairly unlikely. If you really have to worry about newlines in file names or directory names, I suggest you find them, and fix them so they don't contain newlines (and quietly persuade the guilty party of the error of their ways).
If you're interested in the count of the files in each sub-directory of the current directory, counting any files in any sub-directories along with the files in the immediate sub-directory, then I'd adapt the sed command to print only the top-level directory:
find . -type f |
sed -e 's%^\(\./[^/]*/\).*$%\1%' -e 's%^\.\/[^/]*$%./%' |
sort |
uniq -c
The first pattern captures the start of the name, the dot, the slash, the name up to the next slash and the slash, and replaces the line with just the first part, so:
./dir1/dir2/file1
is replaced by
./dir1/
The second replace captures the files directly in the current directory; they don't have a slash at the end, and those are replace by ./. The sort and count then works on just the number of names.
Here's one way to do it, but probably not the most efficient.
find -type d -print0 | xargs -0 -n1 bash -c 'echo -n "$1:"; ls -1 "$1" | wc -l' --
Gives output like this, with directory name followed by count of entries in that directory. Note that the output count will also include directory entries which may not be what you want.
./c/fa/l:0
./a:4
./a/c:0
./a/a:1
./a/a/b:0
Slightly modified version of Sebastian's answer using find instead of du (to exclude file-size-related overhead that du has to perform and that is never used):
find ./ -mindepth 2 -type f | cut -d/ -f2 | sort | uniq -c | sort -nr
-mindepth 2 parameter is used to exclude files in current directory. If you remove it, you'll see a bunch of lines like the following:
234 dir1
123 dir2
1 file1
1 file2
1 file3
...
1 fileN
(much like the du-based variant does)
If you do need to count the files in current directory as well, use this enhanced version:
{ find ./ -mindepth 2 -type f | cut -d/ -f2 | sort && find ./ -maxdepth 1 -type f | cut -d/ -f1; } | uniq -c | sort -nr
The output will be like the following:
234 dir1
123 dir2
42 .
Everyone else's solution has one drawback or another.
find -type d -readable -exec sh -c 'printf "%s " "$1"; ls -1UA "$1" | wc -l' sh {} ';'
Explanation:
-type d: we're interested in directories.
-readable: We only want them if it's possible to list the files in them. Note that find will still emit an error when it tries to search for more directories in them, but this prevents calling -exec for them.
-exec sh -c BLAH sh {} ';': for each directory, run this script fragment, with $0 set to sh and $1 set to the filename.
printf "%s " "$1": portably and minimally print the directory name, followed by only a space, not a newline.
ls -1UA: list the files, one per line, in directory order (to avoid stalling the pipe), excluding only the special directories . and ..
wc -l: count the lines
This can also be done with looping over ls instead of find
for f in */; do echo "$f -> $(ls $f | wc -l)"; done
Explanation:
for f in */; - loop over all directories
do echo "$f -> - print out each directory name
$(ls $f | wc -l) - call ls for this directory and count lines
This should return the directory name followed by the number of files in the directory.
findfiles() {
echo "$1" $(find "$1" -maxdepth 1 -type f | wc -l)
}
export -f findfiles
find ./ -type d -exec bash -c 'findfiles "$0"' {} \;
Example output:
./ 6
./foo 1
./foo/bar 2
./foo/bar/bazzz 0
./foo/bar/baz 4
./src 4
The export -f is required because the -exec argument of find does not allow executing a bash function unless you invoke bash explicitly, and you need to export the function defined in the current scope to the new shell explicitly.
My answer is a little different, due to the options of find, you can actually be much more flexible. Just try:
find . -type f -printf "%h\n" | sort | uniq -c
With the "%h" option to "-printf", find prints only the directory of the files it found. Then sort and count with "uniq -c". This prints the number of search result entries with the same directory, per directory.
Using further options on find, you can be much more flexible. For example, to get an overview how many files in which directory have been modified at a certain date, use:
find . -newermt "2022-01-01 00:00:00" -type f -printf "%TY-%Tm-%Td %h\n" | sort | uniq -c
This finds all files that have been modified since 1. January 2022, prints (with "-printf") the modification date and the directory, then sorts and counts them. In this example, each line in the result has the number of files, the date of modification (without time), and the directory.
Note that "-printf" may not be available in all versions of find I think.
I combined #glenn jackman's answer and #pcarvalho's answer(in comment list, there is something wrong with pcarvalho's answer because the extra style control function of character '`'(backtick)).
My script can accept path as an augument and sort the directory list as ls -l, also it can handles the problem of "space in file name".
#!/bin/bash
OLD_IFS="$IFS"
IFS=$'\n'
for dir in $(find $1 -maxdepth 1 -type d | sort);
do
files=("$dir"/*)
printf "%5d,%s\n" "${#files[#]}" "$dir"
done
FS="$OLD_IFS"
My first answer in stackoverflow, and I hope it can help someone ^_^
THis could be another way to browse through the directory structures and provide depth results.
find . -type d | awk '{print "echo -n \""$0" \";ls -l "$0" | grep -v total | wc -l" }' | sh
find . -type f -printf '%h\n' | sort | uniq -c
gives for example:
5 .
4 ./aln
5 ./aln/iq
4 ./bs
4 ./ft
6 ./hot
I tried with some of the others here but ended up with subfolders included in the file count when I only wanted the files. This prints ./folder/path<tab>nnn with the number of files, not including subfolders, for each subfolder in the current folder.
for d in `find . -type d -print`
do
echo -e "$d\t$(find $d -maxdepth 1 -type f -print | wc -l)"
done
This will give the overall count.
for file in */; do echo "$file -> $(ls $file | wc -l)"; done | cut -d ' ' -f 3| py --ji -l 'numpy.sum(l)'
A super fast miracle command, which recursively traverses files to count the number of images in a directory and organize the output by image extension:
find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -n | grep -Ei '(tiff|bmp|jpeg|jpg|png|gif)$'
Credits: https://unix.stackexchange.com/a/386135/354980
I edited the script in order to exclude all node_modules directories inside the analyzed one.
This can be used to check if the project number of files is exceeding the maximum number that the file watcher can handle.
find . -type d ! -path "*node_modules*" -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
To check the maximum files that your system can watch:
cat /proc/sys/fs/inotify/max_user_watches
node_modules folder should be added to your IDE/editor excluded paths in slow systems, and the other files count shouldn't ideally exceed the maximum (which can be changed though).
Easy Method:
find ./|grep "Search_file.txt" |cut -d"/" -f2|sort |uniq -c
In my case I needed the count at subfolder level, so I did:
du -a | cut -d/ -f3 | sort | uniq -c | sort -nr
Easy way to recursively find files of a given type. In this case, .jpg files for all folders in current directory:
find . -name *.jpg -print | wc -l
omg why the complex commands. just use something like
find whatever_folder | wc -l

Copy the three newest files under one directory (recursively) to another specified directory

I'm using bash.
Suppose I have a log file directory /var/myprogram/logs/.
Under this directory I have many sub-directories and sub-sub-directories that include different types of log files from my program.
I'd like to find the three newest files (modified most recently), whose name starts with 2010, under /var/myprogram/logs/, regardless of sub-directory and copy them to my home directory.
Here's what I would do manually
1. Go through each directory and do ls -lt 2010*
to see which files starting with 2010 are modified most recently.
2. Once I go through all directories, I'd know which three files are the newest. So I copy them manually to my home directory.
This is pretty tedious, so I wondered if maybe I could somehow pipe some commands together to do this in one step, preferably without using shell scripts?
I've been looking into find, ls, head, and awk that I might be able to use but haven't figured the right way to glue them together.
Let me know if I need to clarify. Thanks.
Here's how you can do it:
find -type f -name '2010*' -printf "%C#\t%P\n" |sort -r -k1,1 |head -3 |cut -f 2-
This outputs a list of files prefixed by their last change time, sorts them based on that value, takes the top 3 and removes the timestamp.
Your answers feel very complicated, how about
for FILE in find . -type d; do ls -t -1 -F $FILE | grep -v "/" | head -n3 | xargs -I{} mv {} ..; done;
or laid out nicely
for FILE in `find . -type d`;
do
ls -t -1 -F $FILE | grep -v "/" | grep "^2010" | head -n3 | xargs -I{} mv {} ~;
done;
My "shortest" answer after quickly hacking it up.
for file in $(find . -iname *.php -mtime 1 | xargs ls -l | awk '{ print $6" "$7" "$8" "$9 }' | sort | sed -n '1,3p' | awk '{ print $4 }'); do cp $file ../; done
The main command stored in $() does the following:
Find all files recursively in current directory matching (case insensitive) the name *.php and having been modified in the last 24 hours.
Pipe to ls -l, required to be able to sort by modification date, so we can have the first three
Extract the modification date and file name/path with awk
Sort these files based on datetime
With sed print only the first 3 files
With awk print only their name/path
Used in a for loop and as action copy them to the desired location.
Or use #Hasturkun's variant, which popped as a response while I was editing this post :)

How can I find all of the distinct file extensions in a folder hierarchy?

On a Linux machine I would like to traverse a folder hierarchy and get a list of all of the distinct file extensions within it.
What would be the best way to achieve this from a shell?
Try this (not sure if it's the best way, but it works):
find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u
It work as following:
Find all files from current folder
Prints extension of files if any
Make a unique sorted list
No need for the pipe to sort, awk can do it all:
find . -type f | awk -F. '!a[$NF]++{print $NF}'
Recursive version:
find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u
If you want totals (how may times the extension was seen):
find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort | uniq -c | sort -rn
Non-recursive (single folder):
for f in *.*; do printf "%s\n" "${f##*.}"; done | sort -u
I've based this upon this forum post, credit should go there.
My awk-less, sed-less, Perl-less, Python-less POSIX-compliant alternative:
find . -type f | rev | cut -d. -f1 | rev | tr '[:upper:]' '[:lower:]' | sort | uniq --count | sort -rn
The trick is that it reverses the line and cuts the extension at the beginning.
It also converts the extensions to lower case.
Example output:
3689 jpg
1036 png
610 mp4
90 webm
90 mkv
57 mov
12 avi
10 txt
3 zip
2 ogv
1 xcf
1 trashinfo
1 sh
1 m4v
1 jpeg
1 ini
1 gqv
1 gcs
1 dv
Powershell:
dir -recurse | select-object extension -unique
Thanks to http://kevin-berridge.blogspot.com/2007/11/windows-powershell.html
Adding my own variation to the mix. I think it's the simplest of the lot and can be useful when efficiency is not a big concern.
find . -type f | grep -oE '\.(\w+)$' | sort -u
Find everythin with a dot and show only the suffix.
find . -type f -name "*.*" | awk -F. '{print $NF}' | sort -u
if you know all suffix have 3 characters then
find . -type f -name "*.???" | awk -F. '{print $NF}' | sort -u
or with sed shows all suffixes with one to four characters. Change {1,4} to the range of characters you are expecting in the suffix.
find . -type f | sed -n 's/.*\.\(.\{1,4\}\)$/\1/p'| sort -u
I tried a bunch of the answers here, even the "best" answer. They all came up short of what I specifically was after. So besides the past 12 hours of sitting in regex code for multiple programs and reading and testing these answers this is what I came up with which works EXACTLY like I want.
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort -u
Finds all files which may have an extension.
Greps only the extension
Greps for file extensions between 2 and 16 characters (just adjust the numbers if they don't fit your need). This helps avoid cache files and system files (system file bit is to search jail).
Awk to print the extensions in lower case.
Sort and bring in only unique values. Originally I had attempted to try the awk answer but it would double print items that varied in case sensitivity.
If you need a count of the file extensions then use the below code
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort | uniq -c | sort -rn
While these methods will take some time to complete and probably aren't the best ways to go about the problem, they work.
Update:
Per #alpha_989 long file extensions will cause an issue. That's due to the original regex "[[:alpha:]]{3,6}". I have updated the answer to include the regex "[[:alpha:]]{2,16}". However anyone using this code should be aware that those numbers are the min and max of how long the extension is allowed for the final output. Anything outside that range will be split into multiple lines in the output.
Note: Original post did read "- Greps for file extensions between 3 and 6 characters (just adjust the numbers if they don't fit your need). This helps avoid cache files and system files (system file bit is to search jail)."
Idea: Could be used to find file extensions over a specific length via:
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{4,}" | awk '{print tolower($0)}' | sort -u
Where 4 is the file extensions length to include and then find also any extensions beyond that length.
In Python using generators for very large directories, including blank extensions, and getting the number of times each extension shows up:
import json
import collections
import itertools
import os
root = '/home/andres'
files = itertools.chain.from_iterable((
files for _,_,files in os.walk(root)
))
counter = collections.Counter(
(os.path.splitext(file_)[1] for file_ in files)
)
print json.dumps(counter, indent=2)
Since there's already another solution which uses Perl:
If you have Python installed you could also do (from the shell):
python -c "import os;e=set();[[e.add(os.path.splitext(f)[-1]) for f in fn]for _,_,fn in os.walk('/home')];print '\n'.join(e)"
Another way:
find . -type f -name "*.*" -printf "%f\n" | while IFS= read -r; do echo "${REPLY##*.}"; done | sort -u
You can drop the -name "*.*" but this ensures we are dealing only with files that do have an extension of some sort.
The -printf is find's print, not bash. -printf "%f\n" prints only the filename, stripping the path (and adds a newline).
Then we use string substitution to remove up to the last dot using ${REPLY##*.}.
Note that $REPLY is simply read's inbuilt variable. We could just as use our own in the form: while IFS= read -r file, and here $file would be the variable.
None of the replies so far deal with filenames with newlines properly (except for ChristopheD's, which just came in as I was typing this). The following is not a shell one-liner, but works, and is reasonably fast.
import os, sys
def names(roots):
for root in roots:
for a, b, basenames in os.walk(root):
for basename in basenames:
yield basename
sufs = set(os.path.splitext(x)[1] for x in names(sys.argv[1:]))
for suf in sufs:
if suf:
print suf
I think the most simple & straightforward way is
for f in *.*; do echo "${f##*.}"; done | sort -u
It's modified on ChristopheD's 3rd way.
I don't think this one was mentioned yet:
find . -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c
The accepted answer uses REGEX and you cannot create an alias command with REGEX, you have to put it into a shell script, I'm using Amazon Linux 2 and did the following:
I put the accepted answer code into a file using :
sudo vim find.sh
add this code:
find ./ -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u
save the file by typing: :wq!
sudo vim ~/.bash_profile
alias getext=". /path/to/your/find.sh"
:wq!
. ~/.bash_profile
you could also do this
find . -type f -name "*.php" -exec PATHTOAPP {} +
I've found it simple and fast...
# find . -type f -exec basename {} \; | awk -F"." '{print $NF}' > /tmp/outfile.txt
# cat /tmp/outfile.txt | sort | uniq -c| sort -n > tmp/outfile_sorted.txt

Resources