Output file size for all files of certain type in directory recursively? - linux

I am trying to get a sum of the total size of PDF files recursively within a directory. I have tried running the command below within the directory, but the recursive part does not work properly as it seems to only report on the files in the current directory and not include the directories within. I am expecting the result to be near 100 GB in size however the command is only reporting about 200 MB of files.
find . -name "*.pdf" | xargs du -sch
Please help!

Use stat -c %n,%s to get the file name and size of the individual files. Then use awk to sum the size and print.
$ find . -name '*.pdf' -exec stat -c %n,%s {} \; | awk -F, '{sum+=$2}END{print sum}'
In fact you don't need %n, since you want only the sum:
$ find . -name '*.pdf' -exec stat -c %s {} \; | awk '{sum+=$1}END{print sum}'

You can get the sum of sizes of all files using:
find . -name '*.pdf' -exec stat -c %s {} \; | tr '\n' '+' | sed 's/+$/\n/'| bc
First find finds all the files as specified and for each file runs stat, which prints the file size. Then I supstitude all newline with a '+' using tr, then I remove the trailing '+' for the newline back and pass it to bc, which prints the sum.

Related

How to find the count of and total sizes of multiple files in directory?

I have a directory, inside it multiple directories which contains many type of files.
I want to find *.jpg files then to get the count and total size of all individual one.
I know I have to use find wc -l and du -ch but I don't know how to combine them in a single script or in a single command.
find . -type f name "*.jpg" -exec - not sure how to connect all the three
Supposing your starting folder is ., this will give you all files and the total size:
find . -type f -name '*.jpg' -exec du -ch {} +
The + at the end executes du -ch on all files at once - rather than per file, allowing you the get the frand total.
If you want to know only the total, add | tail -n 1 at the end.
Fair warning: this in fact executes
du -ch file1 file2 file3 ...
Which may break for very many files.
To check how many:
$ getconf ARG_MAX
2097152
That's what is configured on my system.
This doesn't give you the number of files though. You'll need to catch the output of find and use it twice.
The last line is the total, so we'll use all but the last line to get the number of files, and the last one for the total:
OUT=$(find . -type f -name '*.jpg' -exec du -ch {} +)
N=$(echo "$OUT" | head -n -1 | wc -l)
SIZE=$(echo "$OUT" | tail -n 1)
echo "Number of files: $N"
echo $SIZE
Which for me gives:
Number of files: 143
584K total

How do i strip the full path retaining only the filename?

I am using the following find command to list all the files recursively within a folder and sort it by size ( higest size being on top )
find . -not -path '*/\.*' -not -name '*.nfo' -type f -exec du -h {} + | sort -r -h
The command is working well but i need to strip off the full path from each result only retaining the filename
Eg.
Dir/AnotherDir/file.mp4 should be listed as file.mp4
Generally when i have to do this in find command , i simply use -printf '%f\n' but this is can't be used in my current command as the files are being printed by du command
Just post process the data:
find ... | sort ... | sed -E 's#[[:space:]].*/# #'
or
... | awk '{printf "%s\t%s\n", $1, $NF}' FS='\t\|/'

How to get combined disc space of all files in a directory with help of du in linux [duplicate]

I've got a bunch of files scattered across folders in a layout, e.g.:
dir1/somefile.gif
dir1/another.mp4
dir2/video/filename.mp4
dir2/some.file
dir2/blahblah.mp4
And I need to find the total disk space used for the MP4 files only. This means it's gotta be recursive somehow.
I've looked at du and fiddling with piping things to grep but can't seem to figure out how to calculate just the MP4 files no matter where they are.
A human readable total disk space output is a must too, preferably in GB, if possible?
Any ideas? Thanks
For individual file size:
find . -name "*.mp4" -print0 | du -sh --files0-from=-
For total disk space in GB:
find . -name "*.mp4" -print0 | du -sb --files0-from=- | awk '{ total += $1} END { print total/1024/1024/1024 }'
You can simply do :
find -name "*.mp4" -exec du -b {} \; | awk 'BEGIN{total=0}{total=total+$1}END{print total}'
The -exec option of find command executes a simple command with {} as the file found by find.
du -b displays the size of the file in bytes.
The awk command initializes a variable at 0 and get the size of each file to display the total at the end of the command.
This will sum all mp4 files size in bytes:
find ./ -name "*.mp4" -printf "%s\n" | paste -sd+ | bc

How to use "find" and "grep" to get the file size too?

I have this script:
find test -type f \( -iname \*.html -o -iname \*.htm -o -iname \*.xhtml \) -exec grep -il ".swf" {} \; -printf '%k KB - \t %p\n' > result-swf-files.csv
This will search the directory "test" (and its subdirectories) for all HTML files which contains the word ".swf" in them. ANd will write a CSV file with the results.
But I want to get the file size too in the same line (now, the script outputs on one line the grep result - which doesn't have the file size - and in another line the printf result - which includes the file size).
How do I add an option to grep to get the file size?
A less verbose way is to use recursive grep (if your system supports it):
grep -rl --include="*.htm*" ".swf" test|xargs ls -l|awk '{ print $9 "," $5 }'
Explanation :
Grep recursively using the "rl" flag
include file pattern "*.htm"
search for the string ".swf" in each htm* file
search only under the "test" directory
pipe the result to xargs where each filename becomes argument to "ls -l" command
Then use awk to get to only filename and file size. Use comma "," in between 9th and 5th columns in awk print to get the csv output.
Feel free to replace "ls -l" with human readable variants such as "ls -lk" or "ls -lh"
Alternatively, in your script, you can just print only the 2nd line of each file (the one that contains the size). You can just pipe and use grep like this : grep "[0-9] [KB]"
Below is the complete command:
find . -type f \( -iname \*.html -o -iname \*.htm -o -iname \*.xhtml \) -exec grep -il ".swf" {} \; -printf '%k KB - \t %p\n'| grep "[0-9] [KB]"
find . -name *PATTERN*.gz -print0 | xargs -0 ls -lh
So you get ls for all files that you want.

How to list the files using sort command but not ls -lrt command

I am writing a shell script to check some parameters like errors or exception inside the log files which are getting generated in last 2 hours inside the directory /var/log. So this is command I am using:
find /var/log -mmin -120|xargs egrep -i "error|exception"
It is displaying the list of file names and its corresponding parameters(errors and exceptions) but the list of files are not as per the time sequence. I mean the output is something like this(the sequence):
/var/log/123.log:RPM returned error
/var/log/361.log:There is error in line 1
/var/log/4w1.log:Error in configuration line
But the sequence how these 3 log files have been generated is different.
/var/log>ls -lrt
Dec24 1:19 361.log
Dec24 2:01 4w1.log
Dec24 2:15 123.log
So I want the output in the same sequence,I mean like this:
/var/log/361.log:There is error in line 1
/var/log/4w1.log:Error in configuration line
/var/log/123.log:RPM returned error
I tried this:
find /var/log -mmin -120|ls -ltr|xargs egrep -i "error|exception"
but it is not working.
Any help on this is really appreciated.
If your filenames don't have any special characters (like newline characters, etc), all you need is another call to xargs:
find . -type f -mmin -120 | xargs ls -tr | xargs egrep -i "error|exception"
Or if your filenames contain said special chars:
find . -type f -mmin -120 -print0 | xargs -0 ls -tr | xargs egrep -i "error|exception"
You can prepend the modified time using the -printf argument to find, then sort, and then remove the modified time with sed:
find /var/log -mmin -120 -printf '%T#:%p\n' | sort -V | sed -r 's/^[^:]+://' | xargs egrep -i "error|exception"
find ... -printf '%T#:%p\n' prints out each found file (%p) prepended by the seconds since the UNIX epoch (%T#; e.g., 1419433217.1835886710) and a colon separator (:), each on a new line (\n).
sort -V sorts the files naturally by modification time because it is at the beginning of each line (e.g., 1419433217.1835886710:path/to/the/file).
sed -r 's/^[^:]+://' takes each line in the format 123456789.1234:path/to/the/file and strips out the modification time leaving just the file path path/to/the/file

Resources