How to get combined disc space of all files in a directory with help of du in linux [duplicate] - linux

I've got a bunch of files scattered across folders in a layout, e.g.:
dir1/somefile.gif
dir1/another.mp4
dir2/video/filename.mp4
dir2/some.file
dir2/blahblah.mp4
And I need to find the total disk space used for the MP4 files only. This means it's gotta be recursive somehow.
I've looked at du and fiddling with piping things to grep but can't seem to figure out how to calculate just the MP4 files no matter where they are.
A human readable total disk space output is a must too, preferably in GB, if possible?
Any ideas? Thanks

For individual file size:
find . -name "*.mp4" -print0 | du -sh --files0-from=-
For total disk space in GB:
find . -name "*.mp4" -print0 | du -sb --files0-from=- | awk '{ total += $1} END { print total/1024/1024/1024 }'

You can simply do :
find -name "*.mp4" -exec du -b {} \; | awk 'BEGIN{total=0}{total=total+$1}END{print total}'
The -exec option of find command executes a simple command with {} as the file found by find.
du -b displays the size of the file in bytes.
The awk command initializes a variable at 0 and get the size of each file to display the total at the end of the command.

This will sum all mp4 files size in bytes:
find ./ -name "*.mp4" -printf "%s\n" | paste -sd+ | bc

Related

How to find the count of and total sizes of multiple files in directory?

I have a directory, inside it multiple directories which contains many type of files.
I want to find *.jpg files then to get the count and total size of all individual one.
I know I have to use find wc -l and du -ch but I don't know how to combine them in a single script or in a single command.
find . -type f name "*.jpg" -exec - not sure how to connect all the three
Supposing your starting folder is ., this will give you all files and the total size:
find . -type f -name '*.jpg' -exec du -ch {} +
The + at the end executes du -ch on all files at once - rather than per file, allowing you the get the frand total.
If you want to know only the total, add | tail -n 1 at the end.
Fair warning: this in fact executes
du -ch file1 file2 file3 ...
Which may break for very many files.
To check how many:
$ getconf ARG_MAX
2097152
That's what is configured on my system.
This doesn't give you the number of files though. You'll need to catch the output of find and use it twice.
The last line is the total, so we'll use all but the last line to get the number of files, and the last one for the total:
OUT=$(find . -type f -name '*.jpg' -exec du -ch {} +)
N=$(echo "$OUT" | head -n -1 | wc -l)
SIZE=$(echo "$OUT" | tail -n 1)
echo "Number of files: $N"
echo $SIZE
Which for me gives:
Number of files: 143
584K total

Unix find command display file size in gigabyte

I want to find files in a certain directory using find command and display the file size as well after finding. Here is what I have come up with so far.
find /my_search_directory -type f -name "abc*" -printf "%f %k KB\n"
%k displays the file size in kbs,I want this to be in gbs. Can anyone help me out with this?
du -BG filename
writes to stdout the size of filename in GB.
You can use -exec to run du for each file found, passing -B GiB to show the size in GiB rather than bytes:
find /my_search_directory -type f -name "abc*" -exec du -B GiB {} \;
You can also use -h instead of -B GiB to make du choose an appropriate unit; this is probably more useful to you because -B rounds upwards. You can also add --apparent-size to show the perceived file size instead of the size taken up on disk.
As an extra tip, sort -h can be used to process the output of du -h:
find /my_search_directory -type f -name "abc*" -exec du h {} \; | sort -h

Output file size for all files of certain type in directory recursively?

I am trying to get a sum of the total size of PDF files recursively within a directory. I have tried running the command below within the directory, but the recursive part does not work properly as it seems to only report on the files in the current directory and not include the directories within. I am expecting the result to be near 100 GB in size however the command is only reporting about 200 MB of files.
find . -name "*.pdf" | xargs du -sch
Please help!
Use stat -c %n,%s to get the file name and size of the individual files. Then use awk to sum the size and print.
$ find . -name '*.pdf' -exec stat -c %n,%s {} \; | awk -F, '{sum+=$2}END{print sum}'
In fact you don't need %n, since you want only the sum:
$ find . -name '*.pdf' -exec stat -c %s {} \; | awk '{sum+=$1}END{print sum}'
You can get the sum of sizes of all files using:
find . -name '*.pdf' -exec stat -c %s {} \; | tr '\n' '+' | sed 's/+$/\n/'| bc
First find finds all the files as specified and for each file runs stat, which prints the file size. Then I supstitude all newline with a '+' using tr, then I remove the trailing '+' for the newline back and pass it to bc, which prints the sum.

calculate total used disk space by files older than 180 days using find

I am trying to find the total disk space used by files older than 180 days in a particular directory. This is what I'm using:
find . -mtime +180 -exec du -sh {} \;
but the above is quiet evidently giving me disk space used by every file that is found. I want only the total added disk space used by the files. Can this be done using find and exec command ?
Please note I simply don't want to use a script for this, it will be great if there could be a one liner for this. Any help is highly appreciated.
Why not this?
find /path/to/search/in -type f -mtime +180 -print0 | du -hc --files0-from - | tail -n 1
#PeterT is right. Almost all these answers invoke a command (du) for each file, which is very resource intensive and slow and unnecessary. The simplest and fastest way is this:
find . -type f -mtime +356 -printf '%s\n' | awk '{total=total+$1}END{print total/1024}'
du wouldn't summarize if you pass a list of files to it.
Instead, pipe the output to cut and let awk sum it up. So you can say:
find . -mtime +180 -exec du -ks {} \; | cut -f1 | awk '{total=total+$1}END{print total/1024}'
Note that the option -h to display the result in human-readable format has been replaced by -k which is equivalent to block size of 1K. The result is presented in MB (see total/1024 above).
Be careful not to take into account the disk usage by the directories. For example, I have a lot of files in my ~/tmp directory:
$ du -sh ~/tmp
3,7G /home/rpet/tmp
Running the first part of example posted by devnull to find the files modified in the last 24 hours, we can see that awk will sum the whole disk usage of the ~/tmp directory:
$ find ~/tmp -mtime 0 -exec du -ks {} \; | cut -f1
3849848
84
80
But there is only one file modified in that period of time, with very little disk usage:
$ find ~/tmp -mtime 0
/home/rpet/tmp
/home/rpet/tmp/kk
/home/rpet/tmp/kk/test.png
$ du -sh ~/tmp/kk
84K /home/rpet/tmp/kk
So we need to take into account only the files and exclude the directories:
$ find ~/tmp -type f -mtime 0 -exec du -ks {} \; | cut -f1 | awk '{total=total+$1}END{print total/1024}'
0.078125
You can also specify date ranges using the -newermt parameter. For example:
$ find . -type f -newermt "2014-01-01" ! -newermt "2014-06-01"
See http://www.commandlinefu.com/commands/view/8721/find-files-in-a-date-range
You can print file size with find using the -printf option, but you still need awk to sum.
For example, total size of all files older than 365 days:
find . -type f -mtime +356 -printf '%s\n' \
| awk '{a+=$1;} END {printf "%.1f GB\n", a/2**30;}'

Measure disk space of certain file types in aggregate

I have some files across several folders:
/home/d/folder1/a.txt
/home/d/folder1/b.txt
/home/d/folder1/c.mov
/home/d/folder2/a.txt
/home/d/folder2/d.mov
/home/d/folder2/folder3/f.txt
How can I measure the grand total amount of disk space taken up by all the .txt files in /home/d/?
I know du will give me the total space of a given folder, and ls -l will give me the total space of individual files, but what if I want to add up all the txt files and just look at the space taken by all .txt files in one giant total for all .txt in /home/d/ including both folder1 and folder2 and their subfolders like folder3?
find folder1 folder2 -iname '*.txt' -print0 | du --files0-from - -c -s | tail -1
This will report disk space usage in bytes by extension:
find . -type f -printf "%f %s\n" |
awk '{
PARTSCOUNT=split( $1, FILEPARTS, "." );
EXTENSION=PARTSCOUNT == 1 ? "NULL" : FILEPARTS[PARTSCOUNT];
FILETYPE_MAP[EXTENSION]+=$2
}
END {
for( FILETYPE in FILETYPE_MAP ) {
print FILETYPE_MAP[FILETYPE], FILETYPE;
}
}' | sort -n
Output:
3250 png
30334451 mov
57725092729 m4a
69460813270 3gp
79456825676 mp3
131208301755 mp4
Simple:
du -ch *.txt
If you just want the total space taken to show up, then:
du -ch *.txt | tail -1
Here's a way to do it (in Linux, using GNU coreutils du and Bash syntax), avoiding bad practice:
total=0
while read -r line
do
size=($line)
(( total+=size ))
done < <( find . -iname "*.txt" -exec du -b {} + )
echo "$total"
If you want to exclude the current directory, use -mindepth 2 with find.
Another version that doesn't require Bash syntax:
find . -iname "*.txt" -exec du -b {} + | awk '{total += $1} END {print total}'
Note that these won't work properly with file names which include newlines (but those with spaces will work).
macOS
use the tool du and the parameter -I to exclude all other files
Linux
-X, --exclude-from=FILE
exclude files that match any pattern in FILE
--exclude=PATTERN
exclude files that match PATTERN
This will do it:
total=0
for file in *.txt
do
space=$(ls -l "$file" | awk '{print $5}')
let total+=space
done
echo $total
GNU find,
find /home/d -type f -name "*.txt" -printf "%s\n" | awk '{s+=$0}END{print "total: "s" bytes"}'
Building on ennuikiller's, this will handle spaces in names. I needed to do this and get a little report:
find -type f -name "*.wav" | grep export | ./calc_space
#!/bin/bash
# calc_space
echo SPACE USED IN MEGABYTES
echo
total=0
while read FILE
do
du -m "$FILE"
space=$(du -m "$FILE"| awk '{print $1}')
let total+=space
done
echo $total
A one liner for those with GNU tools on bash:
for i in $(find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u); do echo "$i"": ""$(du -hac **/*."$i" | tail -n1 | awk '{print $1;}')"; done | sort -h -k 2 -r
You must have extglob enabled:
shopt -s extglob
If you want dot files to work, you must run
shopt -s dotglob
Sample output:
d: 3.0G
swp: 1.3G
mp4: 626M
txt: 263M
pdf: 238M
ogv: 115M
i: 76M
pkl: 65M
pptx: 56M
mat: 50M
png: 29M
eps: 25M
etc
my solution to get a total size of all text files in a given path and subdirectories (using perl oneliner)
find /path -iname '*.txt' | perl -lane '$sum += -s $_; END {print $sum}'
I like to use find in combination with xargs:
find . -name "*.txt" -print0 |xargs -0 du -ch
Add tail if you only want to see the grand total
find . -name "*.txt" -print0 |xargs -0 du -ch | tail -n1
For anyone wanting to do this with macOS at the command line, you need a variation based on the -print0 argument instead of printf. Some of the above answers address that but this will do it comprehensively by extension:
find . -type f -print0 | xargs -0 stat -f "%N %i" |
awk '{
PARTSCOUNT=split( $1, FILEPARTS, "." );
EXTENSION=PARTSCOUNT == 1 ? "NULL" : FILEPARTS[PARTSCOUNT];
FILETYPE_MAP[EXTENSION]+=$2
}
END {
for( FILETYPE in FILETYPE_MAP ) {
print FILETYPE_MAP[FILETYPE], FILETYPE;
}
}' | sort -n
There are several potential problems with the accepted answer:
it does not descend into subdirectories (without relying on non-standard shell features like globstar)
in general, as pointed out by Dennis Williamson below, you should avoid parsing the output of ls
namely, if the user or group (columns 3 and 4) have spaces in them, column 5 will not be the file size
if you have a million such files, this will spawn two million subshells, and it'll be sloooow
As proposed by ghostdog74, you can use the GNU-specific -printf option to find to achieve a more robust solution, avoiding all the excessive pipes, subshells, Perl, and weird du options:
# the '%s' format string means "the file's size"
find . -name "*.txt" -printf "%s\n" \
| awk '{sum += $1} END{print sum " bytes"}'
Yes, yes, solutions using paste or bc are also possible, but not any more straightforward.
On macOS, you would need to use Homebrew or MacPorts to install findutils, and call gfind instead. (I see the "linux" tag on this question, but it's also tagged "unix".)
Without GNU find, you can still fall back to using du:
find . -name "*.txt" -exec du -k {} + \
| awk '{kbytes+=$1} END{print kbytes " Kbytes"}'
…but you have to be mindful of the fact that du's default output is in 512-byte blocks for historical reasons (see the "RATIONALE" section of the man page), and some versions of du (notably, macOS's) will not even have an option to print sizes in bytes.
Many other fine solutions here (see Barn's answer in particular), but most suffer the drawback of being unnecessarily complex or depending too heavily on GNU-only features—and maybe in your environment, that's OK!

Resources