Unix - Count Number of file types recursively - linux

I am new to Stack Overflow and am somewhat of a newbie with Linux. I have been trying to filter specific files within a parent directory and it's children using the following command as an example:
ls -R | grep '*.jpg' | wc -l
Which I have found great when looking for individual files but I will need to do this on a monthly basis and looking for quicker ways to list several types in one command. I purposely want to exclude hidden files.
I have tried this but to no avail —
Count number of specific file type of a directory and its sub dir in mac
I've seen different methods across the web from list, find, tree, echo etc. so any help with this would be much appreciated and if there is a better way of doing this than what I am currently doing then that's not a problem as I am open to suggestions. I'm just not sure what's the best way to skin this cat at the moment!
Thank you very much

you can do this with help of find as it was mentioned under the link from your initial post.
Just something like this:
find . -name \*.jpg -or -name \*.png -not -path \*/\.\* | wc -l

If you arrive here looking for more of a summary, here's a way to count all file extensions recursively in a folder:
find . -type f -name '*.*' -not -name '.*' | sed -Ee 's,.*/.+\.([^/]+)$,\1,' | sort | uniq -ci | sort -n
This gives a summary like:
422 mov
1043 mp4
3266 png
6738 CR3
9417 RAF
29679 cr2
60949 jpg

You can have grep filter for more than one pattern. You should learn about manpages in linux, just type man grep in a terminal and you will see of what this program is capable of and how.
For your issue, you could e.g. use this to filter for png and jpeg files (ingoring case-sensitivity, thus getting PNG and png files):
ls -R | grep -i '*.jpg\|*.png' | wc -l
the -i will ignore the case of the names, the \| is an or-concatenation.

Thank you all for contributing, in case this proves to be useful to someone out there, I've had some help from a developer friend who's kindly looked into it for me and what I've found that works best in my particular case is the following:
find . -type f \( -iname "*.jpg" ! -iname ".*.png" ! -path "*/.HSResource/*" \) |wc -l
This skips over the resource folders and hidden files and appears to return me the correct results.

Related

How to grep through many files of same file type

I wish to grep through many (20,000) text files, each with about 1,000,000 lines each, so the faster the better.
I have tried the below code and it just doesn't seem to want to do anything, it doesn't find any matches even after an hour (it should have done by now).
for i in $(find . -name "*.txt"); do grep -Ff firstpart.txt $1; done
Ofir's answer is good. Another option:
find . -name "*.txt" -exec grep -fnFH firstpart.txt {} \;
I like to add the -n for line numbers and -H to get the filename. -H is particularly useful in this case as you could have a lot of matches.
Instead of iterating through the files in a loop, you can just give the file names to grep using xargs and let grep go over all the files.
find . -name "*.txt" | xargs grep $1
I'm not quite sure whether it will actually increase the performance, but it's probably worth a try.
ripgrep is the most amazing tool. You should get that and use it.
To search *.txt files in all directories recursively, do this:
rg -t txt -f patterns.txt
Ripgrep uses one of the fastest regular expression engines out there. It uses multiple threads. It searches directories and files, and filters them to the interesting ones in the fastest way.
It is simply great.
For anyone stuck using grep for whatever reason:
find -name '*.txt' -type f -print0 | xargs -0 -P 8 -n 8 grep -Ff patterns.txt
That tells xargs to -n 8 use 8 arguments per command and to -P 8 run 8 copies in parallel. It has the downside that the output might become interleaved and corrupted.
Instead of xargs you could use parallel which does a fancier job and keeps output in order:
$ find -name '*.txt' -type f -print0 | parallel -0 grep --with-filename grep -Ff patterns.txt

Counting number of files in a directory with an OSX terminal command

I'm looking for a specific directory file count that returns a number. I would type it into the terminal and it can give me the specified directory's file count.
I've already tried echo find "'directory' | wc -l" but that didn't work, any ideas?
You seem to have the right idea. I'd use -type f to find only files:
$ find some_directory -type f | wc -l
If you only want files directly under this directory and not to search recursively through subdirectories, you could add the -maxdepth flag:
$ find some_directory -maxdepth 1 -type f | wc -l
Open the terminal and switch to the location of the directory.
Type in:
find . -type f | wc -l
This searches inside the current directory (that's what the . stands for) for all files, and counts them.
The fastest way to obtain the number of files within a directory is by obtaining the value of that directory's kMDItemFSNodeCount metadata attribute.
mdls -name kMDItemFSNodeCount directory_name -raw|xargs
The above command has a major advantage over find . -type f | wc -l in that it returns the count almost instantly, even for directories which contain millions of files.
Please note that the command obtains the number of files, not just regular files.
I don't understand why folks are using 'find' because for me it's a lot easier to just pipe in 'ls' like so:
ls *.png | wc -l
to find the number of png images in the current directory.
I'm using tree, this is the way :
tree ph

List files but exclude certain directories in Unix

I am attempting to list files in a folder with sub folders recursively, but trying to avoid going into one folder as they are duplicates.
This is the command I run but it doesn't do anything.
ls -lR /opt/elk/data/syslogs | grep -v .log. | grep --exclude-dir="cam" * > /tmp/logs.log
If there any changes I can make to this?
Thanks.
Options to different versions of find vary greatly, but you may try:
find /opt/elk/data/syslogs -name cam -prune -o -print
On RHEL, you probably have gnu find, and if you want file size and modification time, you might try:
find /opt/elk/data/syslogs -name cam -prune -o -printf "%p %s %t\n"

How do I find the number of all .txt files in a directory and all sub directories using specifically the find command and the wc command?

So far I have this:
find -name ".txt"
I'm not quite sure how to use wc to find out the exact number of files. When using the command above, all the .txt files show up, but I need the exact number of files with the .txt extension. Please don't suggest using other commands as I'd like to specifically use find and wc. Thanks
Try:
find . -name '*.txt' | wc -l
The -l option to wc tells it to return just the number of lines.
Improvement (requires GNU find)
The above will give the wrong number if any .txt file name contains a newline character. This will work correctly with any file names:
find . -iname '*.txt' -printf '1\n' | wc -l
-printf '1\n tells find to print just the line 1 for each file name found. This avoids problems with file names having difficult characters.
Example
Let's create two .txt files, one with a newline in its name:
$ touch dir1/dir2/a.txt $'dir1/dir2/b\nc.txt'
Now, let's find the find command:
$ find . -name '*.txt'
./dir1/dir2/b?c.txt
./dir1/dir2/a.txt
To count the files:
$ find . -name '*.txt' | wc -l
3
As you can see, the answer is off by one. The improved version, however, works correctly:
$ find . -iname '*.txt' -printf '1\n' | wc -l
2
find -type f -name "*.h" -mtime +10 -print | wc -l
This worked out.

Find a bunch of randomly sorted images on disk and copy to target dir

For testing purposes I need a bunch of random images from disc, copied to a specific directory. So, in pseudo code:
find [] -iname "*.jpg"
and then sort -R
and then head -n [number wanted]
and then copy to destination
Is it possible to combine above commands in a single bash command? Like eg:
for i in `find ./images/ -iname "*.jpg" | sort -R | head -n243`; do cp "$i" ./target/; done;
But that doesn't quite work. I feel I'll need an 'xargs' somewhere in there, but I'm afraid I don't understand xargs very well... would I need to pass a 'print0' (or equivalent) to all seperate commands?
[edit]
I left out the final step: I'd like to copy the images to a certain directory under a new (sequential) name. So the first image becomes 1.jpg, the second 2.jpg etc. For this, the command I posted does not work as intended.
The command that you specified also will work without any issues. It works for me well. Can you point out the exact error you are facing.
Meanwhile,
This will just do the trick for you:
find ./images/ -iname "*.jpg" | sort -R | head -n <no. of files> | xargs -I {} cp {} target/
Simply use shuf -n.
Example:
find ./images/ -iname "*.jpg" | shuf -n 10 | xargs cp -t ./target/
It would copy 10 random images to ./target/. If you need 243 just use shuf -n 243.
According to your edit, this should do :
for i in `find ./images/ -iname "*.jpg" | sort -R | head -n2`; do cp $i ./target/$((1 + compt++)).jpg; done;`
Here, you add a counter to keep track of the number of files you already copied.

Resources