Find most fat directories by it's own size - linux

I would like to list most fat directories by it's own size sorted by size descending.
'Directory own size' means size of the directory excluding size of all it's subdirectories.
For example, we have directory structure:
/tmp/D1
|-- 5m.file
|-- D2
| |-- 2m.file
| `-- D4
| `-- 4m.file
`-- D3
`-- 3m.file
By for executing command and passing /tmp/D1 as argument I'd like to get result like
5m /tmp/D1
4m /tmp/D1/D2/D4
3m /tmp/D1/D3
2m /tmp/D1/D2
du -Sh . | sort -rh | head -n 10
+x to limit to current filesystem only
du -Shx . | sort -rh | head -n 10

You can use du with an -S option for this
From the man page
-S, --separate-dirs
do not include size of subdirectories
$ du -Sh /foo/bar/temp2/ | sort -rh
84K /foo/bar/temp2/
40K /foo/bar/temp2/tempo
4.0K /foo/bar/temp2/opt/logs/merchantportal
4.0K /foo/bar/temp2/opt/logs
4.0K /foo/bar/temp2/opt
4.0K /foo/bar/temp2/folder
4.0K /foo/bar/temp2/bang
Now checking in the normal way with the -s option, which is inclusive of all sub-directories.
$ du -sh /foo/bar/temp2/opt
12K /foo/bar/temp2/opt
which is the summation of sizes of the sub-folders /foo/bar/temp2/opt/logs/merchantportal, /foo/bar/temp2/opt/logs and the base folder itself
The -h formats the size in the human readable format according to the man page. If you forcefully want to format the output in 1-Mbyte blocks you can use the option du -Sm

Related

displaying disk space usage of the current directory excluding size of subdirectories

I want to write a command to display the disk space usage the current directory excluding the size of subdirectories. The following image describes the files and directories of the current directory:
du ./ --exclude='./file*'
output will be :
4 ./dir1
4 .
I am getting first output but not second.
du -Sd 1
Output will be:
4 ./dir1
4 .
Suppose the current directory is /tmp/foo, which has no files, except for a single directory /tmp/foo/bar, into which is put a copy of bash (1113504 bytes). Running the tree util:
tree --du "$(pwd)"
...reports:
/tmp/foo
└── [ 1117600] bar
└── [ 1113504] bash
1121696 bytes used in 1 directory, 1 file
To get the size in bytes of /tmp/foo, (but not /tmp/foo/bar), this works:
du -bSd 1 "$(pwd)" | grep -w "$(pwd)$"
Output:
4096 /tmp/foo
The same line of code can be reused, just cd to any directory:
cd foo/bar/
du -bSd 1 "$(pwd)" | grep -w "$(pwd)$"
Output:
1117600 /tmp/foo/bar
The command will be :-
$du -S
And the output will be
(https://i.stack.imgur.com/fxqkC.jpg)
Try This.
du -S ./ --exclude='./file*'
OUTPUT
4 ./dir1
4 ./

Why "find -mmin -1 -exec du -cb {} + | grep total | head -1" and "find -mmin -1 -exec du -ch {} + | grep total | head -1" are different

When I run the command:
find / 2>/dev/null -user root -type f -mmin -1 -exec du -cb {} + | grep total | head -1
I get a rather large number in bytes which is expected.
However, when I run the same command but with human-readable instead of bytes, as in:
find / 2>/dev/null -user root -type f -mmin -1 -exec du -ch {} + | grep total | head -1
I get 0. I also tried removing the head -1 thinking I was grabbing the wrong data, but every print out is 0 total. Why is this? Is there an alternative method to get the total size of all files found using find for both bytes and human-readable print outs?
Use -xdev option to find command to exclude other filesystems.
I don't have an explanation why yet, but I think this is related to tmpfs and devtmpfs filesystems such as /proc.
When I ran your scenario's I had the same results because the -b option adds in the size of /proc/kcore
procfs is a bit of dark magic; no files in it are real. It looks like a filesystem, acts like a filesystem, and is a filesystem. But not one that is stored on disk (or elsewhere).
/proc/kcore specifically is a file which maps directly to every available byte in your virtual memory ... I'm not absolutely clear on the details; the 128TB comes from Linux allocating 47ish bits of the 64bits available for virtual memory.
When I use the -ch argument for du it shows /proc/kcore as 0:
0 /proc/kcore
But when I use the -cb it shows my /proc/kcore as:
140737486266368 /proc/kcore
this is because the -b option:
-b, --bytes
equivalent to '--apparent-size --block-size=1'
and --apparent-size :
--apparent-size
print apparent sizes, rather than disk usage; although the apparent size is
usually smaller, it may be larger due to holes in ('sparse') files, internal
fragmentation, indirect blocks, and the like
References:
/proc kcore file is huge
https://linux.die.net/man/1/du

Why du command show different total in folder versus in parent

I can't understand why I have different total size in a folder versus in parent.
That's my folder tree
bkp
|-- raid10
| |-- folder_a
| |-- folder_b
| |-- folder_c
| |-- folder_d
| |-- folder_e
| |-- folder_f
| |-- folder_g
| |-- folder_h
| |-- folder_i
| |-- folder_j
| |-- folder_k
| |-- folder_l
| |-- script.sh
|-- vm
I previously delete a big amount of file in that folder and I want to get my new disk usage.
sudo du -shc /bkp/*
756G raid10
4.0K vm
756G total
Now I execute that command to get more info about the folder raid10:
sudo du -shc /bkp/raid10/*
13G folder_a
178M folder_b
15G folder_c
2.3G folder_d
32M folder_e
31G folder_f
31G folder_g
49G folder_h
131M folder_i
4.7G folder_j
392M folder_k
4.0K folder_l
4.0K lost+found
4.0K script.sh~
144G total
Why the total is so different ?
I checked man du and tried some command, like --apparent-size, but same result. Also try without -s sudo du -hc /bkp/raid10/*, I have the same total but I see all directory...
I have some assumptions:
There is some cache in du command ?
There is a trash or hidden file that du can't read?
Some information about my files:
Disk filesystem is ext4
File are uploaded with rsync
Disk is not in raid
To make du search for invisible just do:
#First part will get all invisible and second will get all non-invisible
du -shc /bkp/raid10/.[!.]* /bkp/raid10/*
Or cleaner command:
cd /bkp/raid10
du -sch .[!.]* *
Or enable shell option that matches hidden file with globbing
shopt -s dotglob
du -sch *
To look for hidden (names starting with a dot) files/directories recursively:
find . -name ".*" -ls

Short command to find total size of files matching a wild card

I could envision a simple shell script that would accomplish what I want by just iterating through a list of files in a directory and summing the individual size but was wondering if there was already a more concise way to do that.
something like
ls -lh *.jpg
that gives me the total size of just all the jpg files in the directory
Try du to summarize disk usage:
du -csh *.jpg
Output (for example):
8.0K sane-logo.jpg
16K sane-umax-advanced.jpg
28K sane-umax-histogram.jpg
24K sane-umax.jpg
16K sane-umax-standard.jpg
4.0K sane-umax-text2.jpg
4.0K sane-umax-text4.jpg
4.0K sane-umax-text.jpg
104K total
du does not summarize the size of the files but summarizes the size of the used blocks in the file system. If a file has a size of 13K and the file system uses a block size of 4K, then 16K is shown for this file.
You can use this function :
dir () { ls -FaGl "${#}" | awk '{ last_size += $4; print }; END { print last_size }'; }
also you can use this command this is shorter and give you better result!
find YOUR_PATH -type f -name '*.jpg' -exec du -ch {} +
For don't show files list, and just show total size, use this:
du -ch *.php.* | tail -1
Output:
196M total

How to list the size of each file and directory and sort by descending size in Bash?

I found that there is no easy to get way the size of a directory in Bash?
I want that when I type ls -<some options>, it can list of all the sum of the file size of directory recursively and files at the same time and sort by size order.
Is that possible?
Simply navigate to directory and run following command:
du -a --max-depth=1 | sort -n
OR add -h for human readable sizes and -r to print bigger directories/files first.
du -a -h --max-depth=1 | sort -hr
Apparently --max-depth option is not in Mac OS X's version of the du command. You can use the following instead.
du -h -d 1 | sort -n
du -s -- * | sort -n
(this willnot show hidden (.dotfiles) files)
Use du -sm for Mb units etc. I always use
du -smc -- * | sort -n
because the total line (-c) will end up at the bottom for obvious reasons :)
PS:
See comments for handling dotfiles
I frequently use e.g. 'du -smc /home// | sort -n |tail' to get a feel of where exactly the large bits are sitting
Command
du -h --max-depth=0 * | sort -hr
Output
3,5M asdf.6000.gz
3,4M asdf.4000.gz
3,2M asdf.2000.gz
2,5M xyz.PT.gz
136K xyz.6000.gz
116K xyz.6000p.gz
88K test.4000.gz
76K test.4000p.gz
44K test.2000.gz
8,0K desc.common.tcl
8,0K wer.2000p.gz
8,0K wer.2000.gz
4,0K ttree.3
Explanation
du displays "disk usage"
h is for "human readable" (both, in sort and in du)
max-depth=0 means du will not show sizes of subfolders (remove that if you want to show all sizes of every file in every sub-, subsub-, ..., folder)
r is for "reverse" (biggest file first)
ncdu
When I came to this question, I wanted to clean up my file system. The command line tool ncdu is way better suited to this task.
Installation on Ubuntu:
$ sudo apt-get install ncdu
Usage:
Just type ncdu [path] in the command line. After a few seconds for analyzing the path, you will see something like this:
$ ncdu 1.11 ~ Use the arrow keys to navigate, press ? for help
--- / ---------------------------------------------------------
. 96,1 GiB [##########] /home
. 17,7 GiB [# ] /usr
. 4,5 GiB [ ] /var
1,1 GiB [ ] /lib
732,1 MiB [ ] /opt
. 275,6 MiB [ ] /boot
198,0 MiB [ ] /storage
. 153,5 MiB [ ] /run
. 16,6 MiB [ ] /etc
13,5 MiB [ ] /bin
11,3 MiB [ ] /sbin
. 8,8 MiB [ ] /tmp
. 2,2 MiB [ ] /dev
! 16,0 KiB [ ] /lost+found
8,0 KiB [ ] /media
8,0 KiB [ ] /snap
4,0 KiB [ ] /lib64
e 4,0 KiB [ ] /srv
! 4,0 KiB [ ] /root
e 4,0 KiB [ ] /mnt
e 4,0 KiB [ ] /cdrom
. 0,0 B [ ] /proc
. 0,0 B [ ] /sys
# 0,0 B [ ] initrd.img.old
# 0,0 B [ ] initrd.img
# 0,0 B [ ] vmlinuz.old
# 0,0 B [ ] vmlinuz
Delete the currently highlighted element with d, exit with CTRL + c
ls -S sorts by size. Then, to show the size too, ls -lS gives a long (-l), sorted by size (-S) display. I usually add -h too, to make things easier to read, so, ls -lhS.
Simple and fast:
find . -mindepth 1 -maxdepth 1 -type d | parallel du -s | sort -n
*requires GNU Parallel.
I think I might have figured out what you want to do. This will give a sorted list of all the files and all the directories, sorted by file size and size of the content in the directories.
(find . -depth 1 -type f -exec ls -s {} \;; find . -depth 1 -type d -exec du -s {} \;) | sort -n
[enhanced version]
This is going to be much faster and precise than the initial version below and will output the sum of all the file size of current directory:
echo `find . -type f -exec stat -c %s {} \; | tr '\n' '+' | sed 's/+$//g'` | bc
the stat -c %s command on a file will return its size in bytes. The tr command here is used to overcome xargs command limitations (apparently piping to xargs is splitting results on more lines, breaking the logic of my command). Hence tr is taking care of replacing line feed with + (plus) sign. sed has the only goal to remove the last + sign from the resulting string to avoid complains from the final bc (basic calculator) command that, as usual, does the math.
Performances: I tested it on several directories and over ~150.000 files top (the current number of files of my fedora 15 box) having what I believe it is an amazing result:
# time echo `find / -type f -exec stat -c %s {} \; | tr '\n' '+' | sed 's/+$//g'` | bc
12671767700
real 2m19.164s
user 0m2.039s
sys 0m14.850s
Just in case you want to make a comparison with the du -sb / command, it will output an estimated disk usage in bytes (-b option)
# du -sb /
12684646920 /
As I was expecting it is a little larger than my command calculation because the du utility returns allocated space of each file and not the actual consumed space.
[initial version]
You cannot use du command if you need to know the exact sum size of your folder because (as per man page citation) du estimates file space usage. Hence it will lead you to a wrong result, an approximation (maybe close to the sum size but most likely greater than the actual size you are looking for).
I think there might be different ways to answer your question but this is mine:
ls -l $(find . -type f | xargs) | cut -d" " -f5 | xargs | sed 's/\ /+/g'| bc
It finds all files under . directory (change . with whatever directory you like), also hidden files are included and (using xargs) outputs their names in a single line, then produces a detailed list using ls -l. This (sometimes) huge output is piped towards cut command and only the fifth field (-f5), which is the file size in bytes is taken and again piped against xargs which produces again a single line of sizes separated by blanks. Now take place a sed magic which replaces each blank space with a plus (+) sign and finally bc (basic calculator) does the math.
It might need additional tuning and you may have ls command complaining about arguments list too long.
Another simple solution.
$ for entry in $(ls); do du -s "$entry"; done | sort -n
the result will look like
2900 tmp
6781 boot
8428 bin
24932 lib64
34436 sbin
90084 var
106676 etc
125216 lib
3313136 usr
4828700 opt
changing "du -s" to "du -sh" will show human readable size, but we won't be able to sort in this method.
you can use the below to list files by size
du -h | sort -hr | more
or
du -h --max-depth=0 * | sort -hr | more
I tend to use du in a simple way.
du -sh */ | sort -n
This provides me with an idea of what directories are consuming the most space. I can then run more precise searches later.
sudo du -hsx 2>/dev/null * | sort -hr | less
4.9G var
2.2G usr
61M root
9.0M etc
6.5M home
824K init
36K run
16K lost+found
4.0K tmp
4.0K srv
4.0K opt
4.0K mnt
4.0K media
4.0K boot
0 sys
0 sbin
0 proc
0 libx32
0 lib64
0 lib32
0 lib
0 dev
0 bin
(END)

Resources