When running ls -l, why does the filesize on a directory not match the output of du? - linux

What does 4096 mean in output of ls -l?
[root#file nutch-0.9]# du -csh resume.new/
2.3G resume.new/
[root#file nutch-0.9]# ls -l
total 55132
drwxr-xr-x 7 root root 4096 Jun 18 03:19 resume.new

That the directory takes up 4096 bytes of disk space (not including its contents).

I have been wondering about it too. So, after searching I came across:
"It's the size necessary to store the
meta-data about files (including the
file names contained in that
directory). The number of files /
sub-directories at a given time might
not map directly to the size reported,
because once allocated, space is not
freed if the number of files changes.
This behaviour makes sense for most
use cases (where disk space is cheap,
and once a directory has a lot of
files in it, it will probably have
them again in future), and helps to
reduce fragmentation."
Reference: http://www.linuxquestions.org/questions/showthread.php?p=2978839#post2978839

Directories are just like files with <name, inode> tuples, and that are specially treated by the filesystem. The size reported by ls is the size of this "file". Check this answer in Server Fault for an overview of how directories are under the hood.
So, the 4096 bytes mean, most likely, that the filesystem block size is 4096 and that directory is currently using a single block to store this table of names and inodes.

4096, in your example, is the number of bytes used by the directory itself. In other words, this is the space required to store the list of items contained in the directory. It is not, as the question title suggests, the sum of the space of all of the items stored in the directory.
You don't say what system you're using, but in many UNIX/Linux file systems, the minimum unit of storage allocation is 4K, which is why the size is showing as 4096. The directory entries for two items, plus "." and "..", should take considerably less space.

Related

How to prove that directory is a file in Linux

"Everything is a file in Linux". How can i prove that directories are represented as files in linux. Also the physical hardware devices everything creates and is represented as files in Linux. But how can i prove this concept with supporting examples to someone.
Viewing the Directory and other physical hardwares as files in Liniux.( POC)
The "Everything is a file in Linux" statement is a bit of an oversimplification. There are many things in Linux that appear as files, but don't quite 'act' as you think they would in a conventional sense.
Block files (e.g. /dev/loop0) are a great example of this as they are used as a way of communicating with device drivers.
That said, directories are their own 'special' kind of file that contain inode ids pointing to a file's inode. I suppose a simple 'proof' of sorts would be to ls -l any directory and you will notice that most (if not all) of them will have a listed file size of 4096 bytes rather than listing the collective size of its contents.
4096 bytes is the smallest blocksize for most filesystems and is usually more than enough to fit all the information (inode ids) of a directory. So rather than direct information/access to its files, a directory rather holds meta-data about them.
Alternatively, using stat on any directory will display it's own inode number (as well as the number of links it has).
EDIT: Directory files contain the inode id (a pointer to a file's inode) not the inode itself. I have edited the answer.

Get directory size with xen image

I want to check the size of my directory.
directory is xen domU image.
directory name is xendisk
du -sh ./xendisk
returns 5.4G.
but xen domU Image size is 10G.
ls -alh and du -sh image
what happened?
You have created a sparse file for your image. If you used a command like truncate -s 10G domU.img to create the image then this would be the result.
The wiki which I have linked has more information but basically a sparse file is one where the empty parts of the file take no space. This is useful when dealing with VMs because in most cases your VM will only take a fraction of the space available to it so using a sparse file will mean that it takes far less space on your filesystem (as you have observed). The article states that this is acheived using the following mechanism:
When reading sparse files, the file system transparently converts
metadata representing empty blocks into "real" blocks filled with zero
bytes at runtime. The application is unaware of this conversion.
If you need to check the size with du you may be interested in the --apparent-size option, which will include all of the unallocated blocks in the calculation. Therefore you could use this command if you need the output to match what ls is telling you:
du -sh --apparent-size ./xendisk

Why is root directory always stored in inode two?

I'm learning about Linux filesystems, with these sources:
http://linuxgazette.net/issue21/ext2.html
http://homepage.smc.edu/morgan_david/cs40/analyze-ext2.htm
But I have one question about the root directory: why is its inode number always two? Why not one, or another number?
The first inode number is 1. 0 is used as a NULL value, to indicate that there is no inode. Inode 1 is used to keep track of any bad blocks on the disk; it is essentially a hidden file containing the bad blocks, so that they will not be used by another file. The bad blocks can be recorded using e2fsck -c. The filesystem root directory is inode 2.
The meaning of particular inode numbers differs by filesystem. For ext4 you can find more information on the Ext4 Wiki Ext4 Disk Layout page; in particular see the "Special inodes" table.

Maximum number of files/directories on Linux?

I'm developing a LAMP online store, which will allow admins to upload multiple images for each item.
My concern is - right off the bat there will be 20000 items meaning roughly 60000 images.
Questions:
What is the maximum number of files and/or directories on Linux?
What is the usual way of handling this situation (best practice)?
My idea was to make a directory for each item, based on its unique ID, but then I'd still have 20000 directories in a main uploads directory, and it will grow indefinitely as old items won't be removed.
Thanks for any help.
ext[234] filesystems have a fixed maximum number of inodes; every file or directory requires one inode. You can see the current count and limits with df -i. For example, on a 15GB ext3 filesystem, created with the default settings:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvda 1933312 134815 1798497 7% /
There's no limit on directories in particular beyond this; keep in mind that every file or directory requires at least one filesystem block (typically 4KB), though, even if it's a directory with only a single item in it.
As you can see, though, 80,000 inodes is unlikely to be a problem. And with the dir_index option (enablable with tune2fs), lookups in large directories aren't too much of a big deal. However, note that many administrative tools (such as ls or rm) can have a hard time dealing with directories with too many files in them. As such, it's recommended to split your files up so that you don't have more than a few hundred to a thousand items in any given directory. An easy way to do this is to hash whatever ID you're using, and use the first few hex digits as intermediate directories.
For example, say you have item ID 12345, and it hashes to 'DEADBEEF02842.......'. You might store your files under /storage/root/d/e/12345. You've now cut the number of files in each directory by 1/256th.
If your server's filesystem has the dir_index feature turned on (see tune2fs(8) for details on checking and turning on the feature) then you can reasonably store upwards of 100,000 files in a directory before the performance degrades. (dir_index has been the default for new filesystems for most of the distributions for several years now, so it would only be an old filesystem that doesn't have the feature on by default.)
That said, adding another directory level to reduce the number of files in a directory by a factor of 16 or 256 would drastically improve the chances of things like ls * working without over-running the kernel's maximum argv size.
Typically, this is done by something like:
/a/a1111
/a/a1112
...
/b/b1111
...
/c/c6565
...
i.e., prepending a letter or digit to the path, based on some feature you can compute off the name. (The first two characters of md5sum or sha1sum of the file name is one common approach, but if you have unique object ids, then 'a'+ id % 16 is easy enough mechanism to determine which directory to use.)
60000 is nothing, 20000 as well. But you should put group these 20000 by any means in order to speed up access to them. Maybe in groups of 100 or 1000, by taking the number of the directory and dividing it by 100, 500, 1000, whatever.
E.g., I have a project where the files have numbers. I group them in 1000s, so I have
id/1/1332
id/3/3256
id/12/12334
id/350/350934
You actually might have a hard limit - some systems have 32 bit inodes, so you are limited to a number of 2^32 per file system.
In addition of the general answers (basically "don't bother that much", and "tune your filesystem", and "organize your directory with subdirectories containing a few thousand files each"):
If the individual images are small (e.g. less than a few kilobytes), instead of putting them in a folder, you could also put them in a database (e.g. with MySQL as a BLOB) or perhaps inside a GDBM indexed file. Then each small item won't consume an inode (on many filesystems, each inode wants at least some kilobytes). You could also do that for some threshold (e.g. put images bigger than 4kbytes in individual files, and smaller ones in a data base or GDBM file). Of course, don't forget to backup your data (and define a backup stategy).
The year is 2014. I come back in time to add this answer.
Lots of big/small files? You can use Amazon S3 and other alternatives based on Ceph like DreamObjects, where there are no directory limits to worry about.
I hope this helps someone decide from all the alternatives.
md5($id) ==> 0123456789ABCDEF
$file_path = items/012/345/678/9AB/CDE/F.jpg
1 node = 4096 subnodes (fast)

What size of folders show ls -la [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
When running ls -l, why does the filesize on a directory not match the output of du?
Hi.
I'm interesting in information what show me output ls -la in linux. So, default size is 4K. But if there are a lot of files, maybe with zero size, such as PHP sessions =), the size != 4K.
What showing me ls -la?
And after, when i clean this folder i see tha last max size.
ls -al will give you the space taken up by the directory itself, not the files within it.
As such, it has a minimum size. When a directory is created, it's given this much space to store file information which is a set number of bytes per file (let's say 64 bytes thought the number could be different).
If the initial size was 4K, that would allow up to 64 files. Once you put more than 64 files into the directory, it would have to be expanded.
As for your comment:
The reason why it may not get smaller when you delete all the files in it is because there's usually no real advantage. It's just left at the same size so that it doesn't have to be expanded again next time you put a bucketload of files in there (it tends to assume that past behaviour is an indicator of future behaviour).
If you want to reduce the space taken, there's an old trick for doing that. To reduce the size of /tmp/qq, create a brand new /tmp/qq2, copy all the files across (after deleting those you don't need), then simply rename /tmp/qq to /tmp/qq3 and /tmp/qq2 to /tmp/qq. Voila! Oh yeah, eventually delete /tmp/qq2.

Resources