searching an exact number of files in a Linux directory - linux

I want recursively count files in a Linux directory using this
find DIR_NAME -type f | wc -l
my question is , how to stop the find execution above if more than 1000 files are found in that folder ?
Is it possible ? Or do I need to wait find execution ?

You can use head to limit the number of lines returned by find:
find DIR_NAME -type f | head -n 1000 | wc -l
The head program will exit after the first thousand lines, find will receive SIGPIPE and exit as well.

Related

how to efficiently find if a linux directory including sudirectories has at least 1 file

In my project various jobs are created as files in directories inside subdirectories.
But usually the case is I find that the jobs are mostly in some dirs and not in the most others
currently I use
find $DIR -type f | head -n 1
to know if the directory has atleast 1 file , but this is a waste
how to efficiently find if a linux directory including sudirectories has at least 1 file
Your code is already efficient, but perhaps the reason is not obvious. When you pipe the output of find to head -n 1 you probably assume that find lists all the files and then head discards everything after the first one. But that's not quite what head does.
When find lists the first file, head will print it, but when find lists the second file, head will terminate itself, which sends SIGPIPE to find because the pipe between them is closed. Then find will stop running, because the default signal handler for SIGPIPE terminates the program which receives it.
So the cost of your pipelined commands is only the cost of finding two files, not the cost of finding all files. For most obvious use cases this should be good enough.
Try this
find -type f -printf '%h\n' | uniq
The find part finds all files, but prints only the directory. The uniq part eliminates duplicates.
Pitfall: It doesn't work (like your example) for files containing a NEWLINE in the directory path.
This command finds the first subdiretory containing at least one file and then stop:
find . -mindepth 1 -type d -exec bash -c 'c=$(find {} -maxdepth 1 -type f -print -quit);test "x$c" != x' \; -print -quit
The first find iterates through all subdirectories and second find finds the first file and then stop.

Counting number of files in a directory with an OSX terminal command

I'm looking for a specific directory file count that returns a number. I would type it into the terminal and it can give me the specified directory's file count.
I've already tried echo find "'directory' | wc -l" but that didn't work, any ideas?
You seem to have the right idea. I'd use -type f to find only files:
$ find some_directory -type f | wc -l
If you only want files directly under this directory and not to search recursively through subdirectories, you could add the -maxdepth flag:
$ find some_directory -maxdepth 1 -type f | wc -l
Open the terminal and switch to the location of the directory.
Type in:
find . -type f | wc -l
This searches inside the current directory (that's what the . stands for) for all files, and counts them.
The fastest way to obtain the number of files within a directory is by obtaining the value of that directory's kMDItemFSNodeCount metadata attribute.
mdls -name kMDItemFSNodeCount directory_name -raw|xargs
The above command has a major advantage over find . -type f | wc -l in that it returns the count almost instantly, even for directories which contain millions of files.
Please note that the command obtains the number of files, not just regular files.
I don't understand why folks are using 'find' because for me it's a lot easier to just pipe in 'ls' like so:
ls *.png | wc -l
to find the number of png images in the current directory.
I'm using tree, this is the way :
tree ph

Shell script to find and count total number of characters in all the files

I'm struggling to make a script to find every file in your home directory that is less than 3 days old and then get a count of the total number of characters in all of these files.
Any suggestions?
Thanks.
Below command should work from the current directory.
find ./ -ctime -3 | xargs wc -c
Below should work for home directory.
find ~ -ctime -3 | xargs wc -c

Bash script that writes subdirectories who has more than 5 files

while I was trying to practice my linux skills, but I could not solve this question.
So its basically saying "Write a bash script that takes a name of
directory as a command argument and printf the name of subdirectories
that has more than 5 files in it."
I thought we will use the find command but ı still could not figure it out. My code is:
find directory -type d -mindepth5
but it's not working.
You can use find twice:
First you can use find and wc to count the number of files in a given directory:
nb=$(find directory -maxdepth 1 -type f -printf "x\n" | wc -l)
This just asks find to output an x on a line for each file in the directory directory, proceeding non-recursively, then wc -l counts the number of lines, so, really, nb is the number of files in directory.
If you want to know whether a directory contains more than 5 files, it's a good idea to stop find as soon as 6 files are found:
nb=$(find directory -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l)
Here nb has an upper threshold of 6.
Now if for each subdirectory of a directory directory you want to output the number of files (threshold at 6), you can do this:
find directory -type d -exec bash -c 'nb=$(find "$0" -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l); echo "$nb"' {} \;
where the $0 that appears is the 0-th argument, namely {} that find will replaced by the subdirectory of directory.
Finally, you only want to display the subdirectory name if the number of files is more than 5:
find . -type d -exec bash -c 'nb=$(find "$0" -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l); ((nb>5))' {} \; -print
The final test ((nb>5)) returns success or failure whether nb is greater than 5 or not, and in case of success, find will -print the subdirectory name.
This should do the trick:
find directory/ -type f | sed 's/\(.*\)\/.*/\1/g' | sort | uniq -c | sort -n | awk '{if($1>5) print($2)}'
Using mindpeth is useless here since it only lists directories at at least depth 5. You say you need subdirectories with more then 5 files in it.
find directory -type f prints all files in subdirectories
sed 's/\(.*\)\/.*/\1/g' removes names of files leaving only list of subdirecotries without filenames
sort sorts that list so we can use uniq
uniq -c merges duplicate lines and writes how many times it occured
sort -n sorts it by number of occurences (so you end up with a list:(how many times, subdirectory))
awk '{if($1>5) print($2)}' prints only those with first comlun 1 > 5 (and it only prints the second column)
So you end up with a list of subdirectories with at least 5 files inside.
EDIT:
A fix for paths with spaces was proposed:
Instead of awk '{if($1>5) print($2)}' there should be awk '{if($1>5){ $1=""; print(substr($0,2)) }}' which sets first part of line to "" and then prints whole line without a leading space (which was delimiter). So put together we get this:
find directory/ -type f | sed 's/\(.*\)\/.*/\1/g' | sort | uniq -c | sort -n | awk '{if($1>5){ $1=""; print(substr($0,2)) }}'

Count lines found with find command

I have configured glusterfs into two servers.
I want to implement a script wich monitors the replication. My idea is to exec the following:
find "/replica_path/" -mmin +1 -exec ls -l {} \; |wc -l
This will find the files modified more than 1 min ago and must return the same count in both servers.
I'll use spawn to exec this line remotely-
But when executing that line from the command line, the server takes a long to return the path, in fact I've to break the execution.
How could I implement this?
ls -l might need quite some time to resolve owner names etc.
perhaps you just need to count the number of matches:
find "/replica_path/" -mmin +1 | wc -l
It might help to avoid executing /bin/ls for each matched item if you just want to count them.
Try
find "/replica_path/" -mmin -1 -print | wc -l

Resources