How to find the count of and total sizes of multiple files in directory? - linux

I have a directory, inside it multiple directories which contains many type of files.
I want to find *.jpg files then to get the count and total size of all individual one.
I know I have to use find wc -l and du -ch but I don't know how to combine them in a single script or in a single command.
find . -type f name "*.jpg" -exec - not sure how to connect all the three

Supposing your starting folder is ., this will give you all files and the total size:
find . -type f -name '*.jpg' -exec du -ch {} +
The + at the end executes du -ch on all files at once - rather than per file, allowing you the get the frand total.
If you want to know only the total, add | tail -n 1 at the end.
Fair warning: this in fact executes
du -ch file1 file2 file3 ...
Which may break for very many files.
To check how many:
$ getconf ARG_MAX
2097152
That's what is configured on my system.
This doesn't give you the number of files though. You'll need to catch the output of find and use it twice.
The last line is the total, so we'll use all but the last line to get the number of files, and the last one for the total:
OUT=$(find . -type f -name '*.jpg' -exec du -ch {} +)
N=$(echo "$OUT" | head -n -1 | wc -l)
SIZE=$(echo "$OUT" | tail -n 1)
echo "Number of files: $N"
echo $SIZE
Which for me gives:
Number of files: 143
584K total

Related

How to get combined disc space of all files in a directory with help of du in linux [duplicate]

I've got a bunch of files scattered across folders in a layout, e.g.:
dir1/somefile.gif
dir1/another.mp4
dir2/video/filename.mp4
dir2/some.file
dir2/blahblah.mp4
And I need to find the total disk space used for the MP4 files only. This means it's gotta be recursive somehow.
I've looked at du and fiddling with piping things to grep but can't seem to figure out how to calculate just the MP4 files no matter where they are.
A human readable total disk space output is a must too, preferably in GB, if possible?
Any ideas? Thanks
For individual file size:
find . -name "*.mp4" -print0 | du -sh --files0-from=-
For total disk space in GB:
find . -name "*.mp4" -print0 | du -sb --files0-from=- | awk '{ total += $1} END { print total/1024/1024/1024 }'
You can simply do :
find -name "*.mp4" -exec du -b {} \; | awk 'BEGIN{total=0}{total=total+$1}END{print total}'
The -exec option of find command executes a simple command with {} as the file found by find.
du -b displays the size of the file in bytes.
The awk command initializes a variable at 0 and get the size of each file to display the total at the end of the command.
This will sum all mp4 files size in bytes:
find ./ -name "*.mp4" -printf "%s\n" | paste -sd+ | bc

Output file size for all files of certain type in directory recursively?

I am trying to get a sum of the total size of PDF files recursively within a directory. I have tried running the command below within the directory, but the recursive part does not work properly as it seems to only report on the files in the current directory and not include the directories within. I am expecting the result to be near 100 GB in size however the command is only reporting about 200 MB of files.
find . -name "*.pdf" | xargs du -sch
Please help!
Use stat -c %n,%s to get the file name and size of the individual files. Then use awk to sum the size and print.
$ find . -name '*.pdf' -exec stat -c %n,%s {} \; | awk -F, '{sum+=$2}END{print sum}'
In fact you don't need %n, since you want only the sum:
$ find . -name '*.pdf' -exec stat -c %s {} \; | awk '{sum+=$1}END{print sum}'
You can get the sum of sizes of all files using:
find . -name '*.pdf' -exec stat -c %s {} \; | tr '\n' '+' | sed 's/+$/\n/'| bc
First find finds all the files as specified and for each file runs stat, which prints the file size. Then I supstitude all newline with a '+' using tr, then I remove the trailing '+' for the newline back and pass it to bc, which prints the sum.

Delete files 100 at a time and count total files

I have written a bash script to delete 100 files at a time from a directory because i was getting args list too long error but now i want to count the total files that were deleted in total from the directory
Here is the script
echo /example-dir/* | xargs -n 100 rm -rf
What i want is to write the total deleted files from each directory into a file along with path for example Deleted <count> files from <path>
How can i achieve this with my current setup?
You can simply do this by enabling verbose output from rm and then simply count the output lines using wc -l
If you have whitespaces or special characters in the file names, using echo to pass the list of files to xargs will not work.
Better use find with -print0 to use a NULL character as a delimiter for the individual files:
find /example-dir -type f -print0 | xargs --null -n 100 rm -vrf | wc -l
You can avoid xargs and do this in a simple while loop and use a counter:
destdir='/example-dir/'
count=0
while IFS= read -d '' file; do
rm -rf "$file"
((count++))
done < <(find "$destdir" -type f -print0)
echo "Deleted $count files from $destdir"
Note use of -print0 to take care of file names with whitespaces/newlines/glob etc.
By the way, if you really have lots of files and you do this often, it might be useful to look at some other options:
Use find's built-in -delete
time find . -name \*.txt -print -delete | wc -l
30000
real 0m1.244s
user 0m0.055s
sys 0m1.037s
Use find's ability to build up maximal length argument list
time find . -name \*.txt -exec rm -v {} + | wc -l
30000
real 0m0.979s
user 0m0.043s
sys 0m0.920s
Use GNU Parallel's ability to build long argument lists
time find . -name \*.txt -print0 | parallel -0 -X rm -v | wc -l
30000
real 0m1.076s
user 0m1.090s
sys 0m1.223s
Use a single Perl process to read filenames and delete whilst counting
time find . -name \*.txt -print0 | perl -0ne 'unlink;$i++;END{print $i}'
30000
real 0m1.049s
user 0m0.057s
sys 0m1.006s
For testing, you can create 30,000 files really fast with GNU Parallel, which allows -X to also build up long argument lists. For example, I can create 30,000 files in 8 seconds on my Mac with:
seq -w 0 29999 | parallel -X touch file{}.txt

Bash script that writes subdirectories who has more than 5 files

while I was trying to practice my linux skills, but I could not solve this question.
So its basically saying "Write a bash script that takes a name of
directory as a command argument and printf the name of subdirectories
that has more than 5 files in it."
I thought we will use the find command but ı still could not figure it out. My code is:
find directory -type d -mindepth5
but it's not working.
You can use find twice:
First you can use find and wc to count the number of files in a given directory:
nb=$(find directory -maxdepth 1 -type f -printf "x\n" | wc -l)
This just asks find to output an x on a line for each file in the directory directory, proceeding non-recursively, then wc -l counts the number of lines, so, really, nb is the number of files in directory.
If you want to know whether a directory contains more than 5 files, it's a good idea to stop find as soon as 6 files are found:
nb=$(find directory -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l)
Here nb has an upper threshold of 6.
Now if for each subdirectory of a directory directory you want to output the number of files (threshold at 6), you can do this:
find directory -type d -exec bash -c 'nb=$(find "$0" -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l); echo "$nb"' {} \;
where the $0 that appears is the 0-th argument, namely {} that find will replaced by the subdirectory of directory.
Finally, you only want to display the subdirectory name if the number of files is more than 5:
find . -type d -exec bash -c 'nb=$(find "$0" -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l); ((nb>5))' {} \; -print
The final test ((nb>5)) returns success or failure whether nb is greater than 5 or not, and in case of success, find will -print the subdirectory name.
This should do the trick:
find directory/ -type f | sed 's/\(.*\)\/.*/\1/g' | sort | uniq -c | sort -n | awk '{if($1>5) print($2)}'
Using mindpeth is useless here since it only lists directories at at least depth 5. You say you need subdirectories with more then 5 files in it.
find directory -type f prints all files in subdirectories
sed 's/\(.*\)\/.*/\1/g' removes names of files leaving only list of subdirecotries without filenames
sort sorts that list so we can use uniq
uniq -c merges duplicate lines and writes how many times it occured
sort -n sorts it by number of occurences (so you end up with a list:(how many times, subdirectory))
awk '{if($1>5) print($2)}' prints only those with first comlun 1 > 5 (and it only prints the second column)
So you end up with a list of subdirectories with at least 5 files inside.
EDIT:
A fix for paths with spaces was proposed:
Instead of awk '{if($1>5) print($2)}' there should be awk '{if($1>5){ $1=""; print(substr($0,2)) }}' which sets first part of line to "" and then prints whole line without a leading space (which was delimiter). So put together we get this:
find directory/ -type f | sed 's/\(.*\)\/.*/\1/g' | sort | uniq -c | sort -n | awk '{if($1>5){ $1=""; print(substr($0,2)) }}'

calculate total used disk space by files older than 180 days using find

I am trying to find the total disk space used by files older than 180 days in a particular directory. This is what I'm using:
find . -mtime +180 -exec du -sh {} \;
but the above is quiet evidently giving me disk space used by every file that is found. I want only the total added disk space used by the files. Can this be done using find and exec command ?
Please note I simply don't want to use a script for this, it will be great if there could be a one liner for this. Any help is highly appreciated.
Why not this?
find /path/to/search/in -type f -mtime +180 -print0 | du -hc --files0-from - | tail -n 1
#PeterT is right. Almost all these answers invoke a command (du) for each file, which is very resource intensive and slow and unnecessary. The simplest and fastest way is this:
find . -type f -mtime +356 -printf '%s\n' | awk '{total=total+$1}END{print total/1024}'
du wouldn't summarize if you pass a list of files to it.
Instead, pipe the output to cut and let awk sum it up. So you can say:
find . -mtime +180 -exec du -ks {} \; | cut -f1 | awk '{total=total+$1}END{print total/1024}'
Note that the option -h to display the result in human-readable format has been replaced by -k which is equivalent to block size of 1K. The result is presented in MB (see total/1024 above).
Be careful not to take into account the disk usage by the directories. For example, I have a lot of files in my ~/tmp directory:
$ du -sh ~/tmp
3,7G /home/rpet/tmp
Running the first part of example posted by devnull to find the files modified in the last 24 hours, we can see that awk will sum the whole disk usage of the ~/tmp directory:
$ find ~/tmp -mtime 0 -exec du -ks {} \; | cut -f1
3849848
84
80
But there is only one file modified in that period of time, with very little disk usage:
$ find ~/tmp -mtime 0
/home/rpet/tmp
/home/rpet/tmp/kk
/home/rpet/tmp/kk/test.png
$ du -sh ~/tmp/kk
84K /home/rpet/tmp/kk
So we need to take into account only the files and exclude the directories:
$ find ~/tmp -type f -mtime 0 -exec du -ks {} \; | cut -f1 | awk '{total=total+$1}END{print total/1024}'
0.078125
You can also specify date ranges using the -newermt parameter. For example:
$ find . -type f -newermt "2014-01-01" ! -newermt "2014-06-01"
See http://www.commandlinefu.com/commands/view/8721/find-files-in-a-date-range
You can print file size with find using the -printf option, but you still need awk to sum.
For example, total size of all files older than 365 days:
find . -type f -mtime +356 -printf '%s\n' \
| awk '{a+=$1;} END {printf "%.1f GB\n", a/2**30;}'

Resources