Number of free inodes on a partition containing a directory - linux

I have a Python script running under Linux that generates huge numbers of tiny files into a given directory. However, many Linux filesystems like ext4 have a fixed number of inodes set at creation time, so I want to make sure it's possible to save that many files into that directory before starting. From the command line, you can see this number using df -i /some/directory.
How do you find the number of free inodes on the filesystem that directory lives on, in Python?

This can be done using the statvfs system call. In Python (both 2 and 3), this can be accessed using os.statvfs. The call describes the filesystem containing the file/directory the path specifies.
So to get the number of free inodes, use
#import os
os.statvfs('/some/directory').f_favail
Also, it's possible that some percentage of the inodes are reserved for the root user. If the script is running as root and you want to allow it to use the reserved inodes, use f_ffree instead of f_favail.

Related

How to obtain the maximum number of subdirectories in a directory from a C program on Linux?

I know the maximum number of files or directories in a directory, varies depending on the filesystem.
From within a C program on Linux, how to obtain the maximum number of directory files in a directory below the current working directory (or determine there is no maximum other than the size of the universe/computer)?
There is probably some #define constant somewhere, or perhaps, some entry in some configuration file, but I can't find either. Do I have to find out what filesystem is for my current directory, and then use the knowledge of that filesystem?
There is no specific limit on the number of files or subdirectories within a given directory. There are limits on the total number of inodes in a file system depending on how the file system was built and (mostly) how much space there is in total in the file system. Each named object requires an inode (but, thanks to hard links, a single inode can have multiple names). Thus, the limit is primarily controlled by the space available in the file system.
There are usually limits on how deep a directory hierarchy can be — that's the POSIX constant {PATH_MAX} defined (or not) in <limits.h>, and the related lower-bounds on the minimum acceptable value for {PATH_MAX} — {_XOPEN_PATH_MAX} (1024) and {_POSIX_PATH_MAX} (256).
You can use the functions fpathconf() and pathconf() to find properties of file systems at run-time. The related function sysconf() handles other configuration properties.

How to prove that directory is a file in Linux

"Everything is a file in Linux". How can i prove that directories are represented as files in linux. Also the physical hardware devices everything creates and is represented as files in Linux. But how can i prove this concept with supporting examples to someone.
Viewing the Directory and other physical hardwares as files in Liniux.( POC)
The "Everything is a file in Linux" statement is a bit of an oversimplification. There are many things in Linux that appear as files, but don't quite 'act' as you think they would in a conventional sense.
Block files (e.g. /dev/loop0) are a great example of this as they are used as a way of communicating with device drivers.
That said, directories are their own 'special' kind of file that contain inode ids pointing to a file's inode. I suppose a simple 'proof' of sorts would be to ls -l any directory and you will notice that most (if not all) of them will have a listed file size of 4096 bytes rather than listing the collective size of its contents.
4096 bytes is the smallest blocksize for most filesystems and is usually more than enough to fit all the information (inode ids) of a directory. So rather than direct information/access to its files, a directory rather holds meta-data about them.
Alternatively, using stat on any directory will display it's own inode number (as well as the number of links it has).
EDIT: Directory files contain the inode id (a pointer to a file's inode) not the inode itself. I have edited the answer.

Automatically creating symlimks for files

I am in a rather unique predicament.
Let's say that I am on a Linux-based computer. It could be anything, really. The important part is that I have 2 partitions on my device. 1 that is around 1 GB and another that is around 15 GB.
The 1 GB partition (mounted on /) is reserved for system use, and the rest (mounted on /home) is for the user (me) to use.
Suppose I am running low on free space in my system partition. However, I want to install some command line utilities (which, of course, install to the system).
In the meantime, I create a folder in /home called stash. More on this later.
So, I download a tool, for example, bash. Bash is a .deb which I end up extracting to /home/stash. Let's assume bash is too big for me to install it to the system. That's okay, I can just create a symlimk at /bin/bash that redirects to /home/stash/bin/bash.
However, I'd like not only to symlink /bin/bash, but all of the other directories in the /home/stash folder. Is there a way that I could automate this symlink process?

Optimal number of files per directory vs number of directories for EXT4

I have a program that produces large number of small files (say, 10,000 files). After they are created, another script accesses them and processes one by one.
Questions:
does it matter, in terms of performance, how the files are organized (all in one directory or in multiple directories)
if so, then what is the optimal number of directories and files per dir?
I run Debian with ext4 file system
Related
Maximum number of files/folders on Linux?
https://serverfault.com/questions/104986/what-is-the-maximum-number-of-files-a-file-system-can-contain
10k files inside a single folder is not a problem on Ext4. It should have the dir_index option enabled by default, which indexes directories content using a btree-like structure to prevent performance issues.
To sum up, unless you create millions of files or use ext2/ext3, you shouldn't have to worry about system or FS performance issues.
That being said, shell tools and commands don't like to be called with a lot of files as parameter ( rm * for example) and may return you an error message saying something like 'too many arguments'. Look at this answer for what happens then.

Maximum number of files/directories on Linux?

I'm developing a LAMP online store, which will allow admins to upload multiple images for each item.
My concern is - right off the bat there will be 20000 items meaning roughly 60000 images.
Questions:
What is the maximum number of files and/or directories on Linux?
What is the usual way of handling this situation (best practice)?
My idea was to make a directory for each item, based on its unique ID, but then I'd still have 20000 directories in a main uploads directory, and it will grow indefinitely as old items won't be removed.
Thanks for any help.
ext[234] filesystems have a fixed maximum number of inodes; every file or directory requires one inode. You can see the current count and limits with df -i. For example, on a 15GB ext3 filesystem, created with the default settings:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvda 1933312 134815 1798497 7% /
There's no limit on directories in particular beyond this; keep in mind that every file or directory requires at least one filesystem block (typically 4KB), though, even if it's a directory with only a single item in it.
As you can see, though, 80,000 inodes is unlikely to be a problem. And with the dir_index option (enablable with tune2fs), lookups in large directories aren't too much of a big deal. However, note that many administrative tools (such as ls or rm) can have a hard time dealing with directories with too many files in them. As such, it's recommended to split your files up so that you don't have more than a few hundred to a thousand items in any given directory. An easy way to do this is to hash whatever ID you're using, and use the first few hex digits as intermediate directories.
For example, say you have item ID 12345, and it hashes to 'DEADBEEF02842.......'. You might store your files under /storage/root/d/e/12345. You've now cut the number of files in each directory by 1/256th.
If your server's filesystem has the dir_index feature turned on (see tune2fs(8) for details on checking and turning on the feature) then you can reasonably store upwards of 100,000 files in a directory before the performance degrades. (dir_index has been the default for new filesystems for most of the distributions for several years now, so it would only be an old filesystem that doesn't have the feature on by default.)
That said, adding another directory level to reduce the number of files in a directory by a factor of 16 or 256 would drastically improve the chances of things like ls * working without over-running the kernel's maximum argv size.
Typically, this is done by something like:
/a/a1111
/a/a1112
...
/b/b1111
...
/c/c6565
...
i.e., prepending a letter or digit to the path, based on some feature you can compute off the name. (The first two characters of md5sum or sha1sum of the file name is one common approach, but if you have unique object ids, then 'a'+ id % 16 is easy enough mechanism to determine which directory to use.)
60000 is nothing, 20000 as well. But you should put group these 20000 by any means in order to speed up access to them. Maybe in groups of 100 or 1000, by taking the number of the directory and dividing it by 100, 500, 1000, whatever.
E.g., I have a project where the files have numbers. I group them in 1000s, so I have
id/1/1332
id/3/3256
id/12/12334
id/350/350934
You actually might have a hard limit - some systems have 32 bit inodes, so you are limited to a number of 2^32 per file system.
In addition of the general answers (basically "don't bother that much", and "tune your filesystem", and "organize your directory with subdirectories containing a few thousand files each"):
If the individual images are small (e.g. less than a few kilobytes), instead of putting them in a folder, you could also put them in a database (e.g. with MySQL as a BLOB) or perhaps inside a GDBM indexed file. Then each small item won't consume an inode (on many filesystems, each inode wants at least some kilobytes). You could also do that for some threshold (e.g. put images bigger than 4kbytes in individual files, and smaller ones in a data base or GDBM file). Of course, don't forget to backup your data (and define a backup stategy).
The year is 2014. I come back in time to add this answer.
Lots of big/small files? You can use Amazon S3 and other alternatives based on Ceph like DreamObjects, where there are no directory limits to worry about.
I hope this helps someone decide from all the alternatives.
md5($id) ==> 0123456789ABCDEF
$file_path = items/012/345/678/9AB/CDE/F.jpg
1 node = 4096 subnodes (fast)

Resources