Findi All Large Files of specific size on Linux, to clear unwanted space - linux

My linux system is throwing I/O error - disk space full error for any activity. I would like to clear logs, unwanted files with large size.
How to search in file system for files greater than a specific amount?

use the below command to search for files greater than 200MB in desecnding order, use sudo to look for files in descending order
find / -type f -size +200M -exec ls -lh {} \; 2> /dev/null | awk '{ print $NF ": " $5 }' | sort -nrk 2,2

Related

How to get combined disc space of all files in a directory with help of du in linux [duplicate]

I've got a bunch of files scattered across folders in a layout, e.g.:
dir1/somefile.gif
dir1/another.mp4
dir2/video/filename.mp4
dir2/some.file
dir2/blahblah.mp4
And I need to find the total disk space used for the MP4 files only. This means it's gotta be recursive somehow.
I've looked at du and fiddling with piping things to grep but can't seem to figure out how to calculate just the MP4 files no matter where they are.
A human readable total disk space output is a must too, preferably in GB, if possible?
Any ideas? Thanks
For individual file size:
find . -name "*.mp4" -print0 | du -sh --files0-from=-
For total disk space in GB:
find . -name "*.mp4" -print0 | du -sb --files0-from=- | awk '{ total += $1} END { print total/1024/1024/1024 }'
You can simply do :
find -name "*.mp4" -exec du -b {} \; | awk 'BEGIN{total=0}{total=total+$1}END{print total}'
The -exec option of find command executes a simple command with {} as the file found by find.
du -b displays the size of the file in bytes.
The awk command initializes a variable at 0 and get the size of each file to display the total at the end of the command.
This will sum all mp4 files size in bytes:
find ./ -name "*.mp4" -printf "%s\n" | paste -sd+ | bc

Why "find -mmin -1 -exec du -cb {} + | grep total | head -1" and "find -mmin -1 -exec du -ch {} + | grep total | head -1" are different

When I run the command:
find / 2>/dev/null -user root -type f -mmin -1 -exec du -cb {} + | grep total | head -1
I get a rather large number in bytes which is expected.
However, when I run the same command but with human-readable instead of bytes, as in:
find / 2>/dev/null -user root -type f -mmin -1 -exec du -ch {} + | grep total | head -1
I get 0. I also tried removing the head -1 thinking I was grabbing the wrong data, but every print out is 0 total. Why is this? Is there an alternative method to get the total size of all files found using find for both bytes and human-readable print outs?
Use -xdev option to find command to exclude other filesystems.
I don't have an explanation why yet, but I think this is related to tmpfs and devtmpfs filesystems such as /proc.
When I ran your scenario's I had the same results because the -b option adds in the size of /proc/kcore
procfs is a bit of dark magic; no files in it are real. It looks like a filesystem, acts like a filesystem, and is a filesystem. But not one that is stored on disk (or elsewhere).
/proc/kcore specifically is a file which maps directly to every available byte in your virtual memory ... I'm not absolutely clear on the details; the 128TB comes from Linux allocating 47ish bits of the 64bits available for virtual memory.
When I use the -ch argument for du it shows /proc/kcore as 0:
0 /proc/kcore
But when I use the -cb it shows my /proc/kcore as:
140737486266368 /proc/kcore
this is because the -b option:
-b, --bytes
equivalent to '--apparent-size --block-size=1'
and --apparent-size :
--apparent-size
print apparent sizes, rather than disk usage; although the apparent size is
usually smaller, it may be larger due to holes in ('sparse') files, internal
fragmentation, indirect blocks, and the like
References:
/proc kcore file is huge
https://linux.die.net/man/1/du

Output file size for all files of certain type in directory recursively?

I am trying to get a sum of the total size of PDF files recursively within a directory. I have tried running the command below within the directory, but the recursive part does not work properly as it seems to only report on the files in the current directory and not include the directories within. I am expecting the result to be near 100 GB in size however the command is only reporting about 200 MB of files.
find . -name "*.pdf" | xargs du -sch
Please help!
Use stat -c %n,%s to get the file name and size of the individual files. Then use awk to sum the size and print.
$ find . -name '*.pdf' -exec stat -c %n,%s {} \; | awk -F, '{sum+=$2}END{print sum}'
In fact you don't need %n, since you want only the sum:
$ find . -name '*.pdf' -exec stat -c %s {} \; | awk '{sum+=$1}END{print sum}'
You can get the sum of sizes of all files using:
find . -name '*.pdf' -exec stat -c %s {} \; | tr '\n' '+' | sed 's/+$/\n/'| bc
First find finds all the files as specified and for each file runs stat, which prints the file size. Then I supstitude all newline with a '+' using tr, then I remove the trailing '+' for the newline back and pass it to bc, which prints the sum.

Compare 2 Folders and Find Files with Differing Byte Counts

Using Gnome in Linux Mint 12, I copied a Folder of about 9.7 GB (containing a complex tree of subfolders) from one NTFS Flash Drive to another NTFS Flash Drive. According to Gnome the file counts match, but according to du (and other programs) the byte counts don't match. (I've had the same problem copying folders in other Linux distros and Windows XP.)
I only want to know which files don't have matching byte counts. (I don't want to compare the contents of each file, because that would take way too long.) What's the best, easiest and fastest way to find the byte-count-mismatched files?
I would adapt the answer by #user1464130 as it has trouble handling spaces in file names.
cd dir1
find . -type f -printf "%p %s\n" | sort > ~/dir1.txt
cd dir2
find . -type f -printf "%p %s\n" | sort > ~/dir2.txt
diff ~/dir1.txt ~/dir2.txt
If you want to launch a command on each file and use the result in the report, you can use the while Bash construct. This example uses md5sum to compute a checksum for each file.
find . -maxdepth 1 -type f -printf "%p %s\n" | while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done
Each $() is executed separately and allows us to compute the checksum for each file. The use of tr squeezes every consecutive spaces into a single space and cut extracts the word in the n-th position, here in the first position. If we don't do that, we get the name of the file two times because md5sum give it back on stdout.
Here is an example without using the comparison (no diff). Note that I've used a dash - to emphasize the three datas we output about each file but it could be a problem if you want to feed it to another program.
$ find . -maxdepth 1 -name "*.c" -type f -printf "%p %s\n" | while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done
./thread.c - 5f2b7b12c7cd12fcb9e9796078e5d15b - 584
./utils.c - d61bc1dbc72768e622a04f03e3b8f7a2 - 3413
EDIT : And to handle spaces in filenames and still get the checksum and the size, you can use the following code.
$ find . -maxdepth 1 -name "*.c" -type f -print0 | xargs -0 -n 1 md5sum | while read checksum path; do echo $path $(stat --printf="%s" "$path") $checksum ; done
./ini tia li za tion.c 84 31626123e9056bac2e96b472bd62f309
Did you check if both partitions have the same attributes? (block size, size, reserved space for deletions or bad blocks, etc.)
For your specific case, I would recommend rsync with option -n (or --dry-run). It will tell you which files are different. That is:
$ rsync -I -n /source/ /target/
The option -I is to ignore times. You can use the same command to make both directories equivalent (timestamp, permissions, etc.).
Check the manual of rsync or try the option --help to get more options and examples on how to use it. It is very powerful.
Assuming you need to compare dir1 and dir 2, here are the console commands:
cd dir1
find . -type f|sort|xargs ls -l| awk '{print $5,$8}' > ~/dir1.txt
cd dir2
find . -type f|sort|xargs ls -l| awk '{print $5,$8}' > ~/dir2.txt
diff ~/dir1.txt ~/dir2.txt
You may need to edit awk parameters to make it print file length and path properly.

Measure disk space of certain file types in aggregate

I have some files across several folders:
/home/d/folder1/a.txt
/home/d/folder1/b.txt
/home/d/folder1/c.mov
/home/d/folder2/a.txt
/home/d/folder2/d.mov
/home/d/folder2/folder3/f.txt
How can I measure the grand total amount of disk space taken up by all the .txt files in /home/d/?
I know du will give me the total space of a given folder, and ls -l will give me the total space of individual files, but what if I want to add up all the txt files and just look at the space taken by all .txt files in one giant total for all .txt in /home/d/ including both folder1 and folder2 and their subfolders like folder3?
find folder1 folder2 -iname '*.txt' -print0 | du --files0-from - -c -s | tail -1
This will report disk space usage in bytes by extension:
find . -type f -printf "%f %s\n" |
awk '{
PARTSCOUNT=split( $1, FILEPARTS, "." );
EXTENSION=PARTSCOUNT == 1 ? "NULL" : FILEPARTS[PARTSCOUNT];
FILETYPE_MAP[EXTENSION]+=$2
}
END {
for( FILETYPE in FILETYPE_MAP ) {
print FILETYPE_MAP[FILETYPE], FILETYPE;
}
}' | sort -n
Output:
3250 png
30334451 mov
57725092729 m4a
69460813270 3gp
79456825676 mp3
131208301755 mp4
Simple:
du -ch *.txt
If you just want the total space taken to show up, then:
du -ch *.txt | tail -1
Here's a way to do it (in Linux, using GNU coreutils du and Bash syntax), avoiding bad practice:
total=0
while read -r line
do
size=($line)
(( total+=size ))
done < <( find . -iname "*.txt" -exec du -b {} + )
echo "$total"
If you want to exclude the current directory, use -mindepth 2 with find.
Another version that doesn't require Bash syntax:
find . -iname "*.txt" -exec du -b {} + | awk '{total += $1} END {print total}'
Note that these won't work properly with file names which include newlines (but those with spaces will work).
macOS
use the tool du and the parameter -I to exclude all other files
Linux
-X, --exclude-from=FILE
exclude files that match any pattern in FILE
--exclude=PATTERN
exclude files that match PATTERN
This will do it:
total=0
for file in *.txt
do
space=$(ls -l "$file" | awk '{print $5}')
let total+=space
done
echo $total
GNU find,
find /home/d -type f -name "*.txt" -printf "%s\n" | awk '{s+=$0}END{print "total: "s" bytes"}'
Building on ennuikiller's, this will handle spaces in names. I needed to do this and get a little report:
find -type f -name "*.wav" | grep export | ./calc_space
#!/bin/bash
# calc_space
echo SPACE USED IN MEGABYTES
echo
total=0
while read FILE
do
du -m "$FILE"
space=$(du -m "$FILE"| awk '{print $1}')
let total+=space
done
echo $total
A one liner for those with GNU tools on bash:
for i in $(find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u); do echo "$i"": ""$(du -hac **/*."$i" | tail -n1 | awk '{print $1;}')"; done | sort -h -k 2 -r
You must have extglob enabled:
shopt -s extglob
If you want dot files to work, you must run
shopt -s dotglob
Sample output:
d: 3.0G
swp: 1.3G
mp4: 626M
txt: 263M
pdf: 238M
ogv: 115M
i: 76M
pkl: 65M
pptx: 56M
mat: 50M
png: 29M
eps: 25M
etc
my solution to get a total size of all text files in a given path and subdirectories (using perl oneliner)
find /path -iname '*.txt' | perl -lane '$sum += -s $_; END {print $sum}'
I like to use find in combination with xargs:
find . -name "*.txt" -print0 |xargs -0 du -ch
Add tail if you only want to see the grand total
find . -name "*.txt" -print0 |xargs -0 du -ch | tail -n1
For anyone wanting to do this with macOS at the command line, you need a variation based on the -print0 argument instead of printf. Some of the above answers address that but this will do it comprehensively by extension:
find . -type f -print0 | xargs -0 stat -f "%N %i" |
awk '{
PARTSCOUNT=split( $1, FILEPARTS, "." );
EXTENSION=PARTSCOUNT == 1 ? "NULL" : FILEPARTS[PARTSCOUNT];
FILETYPE_MAP[EXTENSION]+=$2
}
END {
for( FILETYPE in FILETYPE_MAP ) {
print FILETYPE_MAP[FILETYPE], FILETYPE;
}
}' | sort -n
There are several potential problems with the accepted answer:
it does not descend into subdirectories (without relying on non-standard shell features like globstar)
in general, as pointed out by Dennis Williamson below, you should avoid parsing the output of ls
namely, if the user or group (columns 3 and 4) have spaces in them, column 5 will not be the file size
if you have a million such files, this will spawn two million subshells, and it'll be sloooow
As proposed by ghostdog74, you can use the GNU-specific -printf option to find to achieve a more robust solution, avoiding all the excessive pipes, subshells, Perl, and weird du options:
# the '%s' format string means "the file's size"
find . -name "*.txt" -printf "%s\n" \
| awk '{sum += $1} END{print sum " bytes"}'
Yes, yes, solutions using paste or bc are also possible, but not any more straightforward.
On macOS, you would need to use Homebrew or MacPorts to install findutils, and call gfind instead. (I see the "linux" tag on this question, but it's also tagged "unix".)
Without GNU find, you can still fall back to using du:
find . -name "*.txt" -exec du -k {} + \
| awk '{kbytes+=$1} END{print kbytes " Kbytes"}'
…but you have to be mindful of the fact that du's default output is in 512-byte blocks for historical reasons (see the "RATIONALE" section of the man page), and some versions of du (notably, macOS's) will not even have an option to print sizes in bytes.
Many other fine solutions here (see Barn's answer in particular), but most suffer the drawback of being unnecessarily complex or depending too heavily on GNU-only features—and maybe in your environment, that's OK!

Resources