How can I limit the max numbers of folders that user can create in linux - linux

Since I have been told that if a user in my computer will create "infinite" number of folders / files (even empty) it can cause my computer to become much much slower (even stuck), I want to limit the maximum number of files/directories that user can create.
I'm afraid that one user will try to create a huge number of files and it will become a problem for all the other users, so it will be a security issue,
How do I do that, how do I limit the max number of files/directories each user can create?

You should first enable quota check on your filesystem
Modify the /etc/fstab, and add the keyword usrquota and grpquota to the corresponding filesystem that you would like to monitor.
The following example indicates that both user and group quota check is enabled on /home filesystem
# cat /etc/fstab
LABEL=/home /home ext2 defaults,usrquota,grpquota 1 2
reboot after this is done.
Once you’ve enabled disk quota check on the filesystem, collect all quota information initially as shown below.
# quotacheck -avug
quotacheck: Scanning /dev/sda3 [/home] done
quotacheck: Checked 5182 directories and 31566 files
quotacheck: Old file not found.
quotacheck: Old file not found.
Now, use the edquota command as shown below, to edit the quota information for a specific user.
For example, to change the disk quota for user ‘ramesh’, use edquota command, which will open the soft, hard limit values in an editor as shown below.
# edquota ramesh
Disk quotas for user ramesh (uid 500):
Filesystem blocks soft hard inodes soft hard
/dev/sda3 1419352 0 0 1686 0 0
Hard limit – if you specify 2GB as hard limit, user will not be able to create new files after 2GB
Soft limit – if you specify 1GB as soft limit, user will get a warning message “disk quota exceeded”, once they reach 1GB limit. But, they’ll still be able to create new files until they reach the hard limit
Lastly, if you would like a report each day on a users quota you can do the following.
Add the quotacheck to the daily cron job. Create a quotacheck file as shown below under the /etc/cron.daily directory, that will run the quotacheck command everyday. This will send the output of the quotacheck command to root email address.
# cat /etc/cron.daily/quotacheck
quotacheck -avug

This is what quotas are designed for. You can use file system quotas to enforce limits, per user and/or per group for:
the amount of disk size space that can be used
the number of blocks that can be used
the number of inodes that can be created.
The number of inodes will essentially limit the number of files and directories a user can create.
There is extensive, very good quality documentation about how to configure file system quotas in many sources, which I suggest you read further:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-disk-quotas.html
https://wiki.archlinux.org/index.php/disk_quota
http://www.ibm.com/developerworks/library/l-lpic1-v3-104-4/
http://www.firewall.cx/linux-knowledgebase-tutorials/linux-administration/838-linux-file-system-quotas.html

Related

Number of free inodes on a partition containing a directory

I have a Python script running under Linux that generates huge numbers of tiny files into a given directory. However, many Linux filesystems like ext4 have a fixed number of inodes set at creation time, so I want to make sure it's possible to save that many files into that directory before starting. From the command line, you can see this number using df -i /some/directory.
How do you find the number of free inodes on the filesystem that directory lives on, in Python?
This can be done using the statvfs system call. In Python (both 2 and 3), this can be accessed using os.statvfs. The call describes the filesystem containing the file/directory the path specifies.
So to get the number of free inodes, use
#import os
os.statvfs('/some/directory').f_favail
Also, it's possible that some percentage of the inodes are reserved for the root user. If the script is running as root and you want to allow it to use the reserved inodes, use f_ffree instead of f_favail.

what's the difference between output of "ulimit" command and the content of file "/etc/security/limits.conf"?

I am totally confused by obtaining the limits of open file descriptors in Linux.
which value is correct by them?
ulimit -n ======> 65535
but
vim /etc/security/limits.conf
soft nofile 50000
hard nofile 90000
The limits applied in /etc/security/limits.conf are applied by the limits authentication module at login if it's part of the PAM configuration. Then the shell gets invoked which can apply it's own limits to the shell.
If you're asking which one is in effect, then it's the result from the ulimit call. if it's not invoked with the -H option, then it displays the soft limit.
The idea behind the limits.conf settings is to have a global place to apply limits for, for example, remote logins
Limits for things like file descriptors can be set at the user level, or on a system wide level. /etc/security/limits.conf is where you can set user level limits, which might be different limits for each user, or just defaults that apply to all users. The example you show has a soft (~warning) level limit of 50000, but a hard (absolute maximum) limit of 90000.
However, a system limit of 65535 might be in place, which would take precedence over the user limit. I think system limits are set in /etc/sysctl.conf, if my memory serves correctly. You might check there to see if you're being limited by the system.
Also, the ulimit command can take switches to specifically show the soft (-Sn) and hard (-Hn) limits for file descriptors.
i think this conf is used by all the apps in the system. if you do want to change one particular app, you can try setrlimit() or getrlimt(). man doc dose explain everything.

Maximum number of files/directories on Linux?

I'm developing a LAMP online store, which will allow admins to upload multiple images for each item.
My concern is - right off the bat there will be 20000 items meaning roughly 60000 images.
Questions:
What is the maximum number of files and/or directories on Linux?
What is the usual way of handling this situation (best practice)?
My idea was to make a directory for each item, based on its unique ID, but then I'd still have 20000 directories in a main uploads directory, and it will grow indefinitely as old items won't be removed.
Thanks for any help.
ext[234] filesystems have a fixed maximum number of inodes; every file or directory requires one inode. You can see the current count and limits with df -i. For example, on a 15GB ext3 filesystem, created with the default settings:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvda 1933312 134815 1798497 7% /
There's no limit on directories in particular beyond this; keep in mind that every file or directory requires at least one filesystem block (typically 4KB), though, even if it's a directory with only a single item in it.
As you can see, though, 80,000 inodes is unlikely to be a problem. And with the dir_index option (enablable with tune2fs), lookups in large directories aren't too much of a big deal. However, note that many administrative tools (such as ls or rm) can have a hard time dealing with directories with too many files in them. As such, it's recommended to split your files up so that you don't have more than a few hundred to a thousand items in any given directory. An easy way to do this is to hash whatever ID you're using, and use the first few hex digits as intermediate directories.
For example, say you have item ID 12345, and it hashes to 'DEADBEEF02842.......'. You might store your files under /storage/root/d/e/12345. You've now cut the number of files in each directory by 1/256th.
If your server's filesystem has the dir_index feature turned on (see tune2fs(8) for details on checking and turning on the feature) then you can reasonably store upwards of 100,000 files in a directory before the performance degrades. (dir_index has been the default for new filesystems for most of the distributions for several years now, so it would only be an old filesystem that doesn't have the feature on by default.)
That said, adding another directory level to reduce the number of files in a directory by a factor of 16 or 256 would drastically improve the chances of things like ls * working without over-running the kernel's maximum argv size.
Typically, this is done by something like:
/a/a1111
/a/a1112
...
/b/b1111
...
/c/c6565
...
i.e., prepending a letter or digit to the path, based on some feature you can compute off the name. (The first two characters of md5sum or sha1sum of the file name is one common approach, but if you have unique object ids, then 'a'+ id % 16 is easy enough mechanism to determine which directory to use.)
60000 is nothing, 20000 as well. But you should put group these 20000 by any means in order to speed up access to them. Maybe in groups of 100 or 1000, by taking the number of the directory and dividing it by 100, 500, 1000, whatever.
E.g., I have a project where the files have numbers. I group them in 1000s, so I have
id/1/1332
id/3/3256
id/12/12334
id/350/350934
You actually might have a hard limit - some systems have 32 bit inodes, so you are limited to a number of 2^32 per file system.
In addition of the general answers (basically "don't bother that much", and "tune your filesystem", and "organize your directory with subdirectories containing a few thousand files each"):
If the individual images are small (e.g. less than a few kilobytes), instead of putting them in a folder, you could also put them in a database (e.g. with MySQL as a BLOB) or perhaps inside a GDBM indexed file. Then each small item won't consume an inode (on many filesystems, each inode wants at least some kilobytes). You could also do that for some threshold (e.g. put images bigger than 4kbytes in individual files, and smaller ones in a data base or GDBM file). Of course, don't forget to backup your data (and define a backup stategy).
The year is 2014. I come back in time to add this answer.
Lots of big/small files? You can use Amazon S3 and other alternatives based on Ceph like DreamObjects, where there are no directory limits to worry about.
I hope this helps someone decide from all the alternatives.
md5($id) ==> 0123456789ABCDEF
$file_path = items/012/345/678/9AB/CDE/F.jpg
1 node = 4096 subnodes (fast)

How to tell whether two NFS mounts are on the same remote filesystem?

My Linux-based system displays statistics for NFS-mounted filesystems, something like this:
Remote Path Mounted-on Stats
server1:/some/path/name /path1 100 GB free
server2:/other/path/name /path2 100 GB free
Total: 200 GB free
That works fine. The problem is when the same filesystem on the NFS server has been mounted twice on my client:
Remote Path Mounted-on Stats
server1:/some/path/name /path1 100 GB free
server1:/some/path/name2 /path2 100 GB free
Total: 200 GB free
server1's /some/path/name and /some/path/name2 are actually on the same filesystem, which has 100 GB free, but I erroneously add them up and report 200 GB free.
Is there any way to detect that they're on the same partition?
Approaches that won't work:
"Use statfs()": statfs() returns a struct statfs, which has a "file system ID" field, f_fsid. Unfortunately it's undefined and gets zeroed out over NFS.
"Don't mount the same partion multiple times." This is outside of my control.
"Use a heuristic based on available space." The method has to definitively work. Also, statfs() caches its output so it would be difficult to get this right in the face of large data movement.
If there's no solution I'll have to generate a config file in every potential mount point on the server side, but it would be a lot nicer if there was
some clean way to avoid that.
Thanks!
I guess if "stat -c %d /mountpoint" do what you want (I cannot test it right now)?
You probably want to read the remote system's shared file systems - using:
showmount -e server
That will give you the real paths that are being shared. When walking mounts from the remote system, prune them to the common root from the remote system and use that to determine if the mount points are from the same underlying file system.
This doesn't help you in the case that the file systems are separately shared from the same underlying file system.
You could add in a heuristic of checking for the overall file system size and space available, and assuming that if they're the same, and from the same remote server that it's on the same partition mapped to the shortest common path of the mount devices.
None of these help if you share from a loopback mounted file system that looks completely different in form from the others.
It doesn't help you in the case of a server that can be addressed in different names and addresses.

Disadvantages to creating/removing many hard links?

I need to create hundreds to thousands of temporary hard or symbolic links that will be deleted shortly after creation. For my purposes both types of links will work (i.e. the target is not a directory and it always exists on the same file system)
As I understand it, symbolic links create a small file that contains the path to the original file. Whereas a hardlink creates a reference to the data in the same inode. So maybe if I am going to be creating/deleting thousands of these links is it better to be creating and deleting thousands of tiny files (symlinks) or thousands of these references (hardlinks)? It seems like one taxes the hard drive (maybe fragmentation) while the other might tax the file system itself? Where are inode references stored. Do I risk corrupting the file system by making so many hard links? What about speed?
Thanks for your expertise!
This a work around to be able to use ffmpeg to encode a movie out of an arbitrary subset of images from a directory. Since ffmpeg requires that the files be named properly (e.g. frame%04d.jpg) I realized I can just create hard/sym links to the subset of files and just name the links appropriately. This avoids renaming the original files and having to actually copy the data. It works great but it requires creating and deleting many thousands of links, repeatedly.
Sort of addresses this problem too I believe:
convert image sequence using ffmpeg
If this activity breaks your file system, then your file system is at fault, not you. File systems are generally pretty reliable, so don't worry about that.
Both options require adding an entry in the directory. The symbolic link requires creating a file as well. When you access the file the hard link jumps directly to the content, while accessing a symlink requires finding the symlink file, reading it, finding the directory with the content, finding where the content is, and then accessing that. Therefore symlinks are more work for the filesystem all around.
But the difference is minute when compared to the work of actually reading the data in the files. Therefore I would not worry about it, and just go with whichever one best gives you the semantics you want.
Since you are not trying to create hundreds of thousands to the same file, hard links are marginally better performing.
However, symbolic links in /tmp if /tmp is tmpfs is even better performing yet.
Oh, and symlinks are too small to cause fragmentation issues.
Both options require the addition of a file entry in the directory inode, the directory structure may grow by allocating new blocks.
But a symbolic link requires the allocation of an inode and the filesystem has a limit for inodes. Your hundreds of thousands symlinks may hit that limit and you may get the "Not enough space for file" error message even with gigabytes free.
By default, the file system creation tool choose the maximum number of inodes according to the physical partition size. For instance for Linux ext2/3/4, mkfs.ext3 uses a bytes-per-inode ratio you can find in your /etc/mke2fs.conf.
For an existing filesystem, here is a command to get information about inodes:
# dumpe2fs /dev/sda1 | grep -i inode | less
Inode count: 979200
Free inodes: 742304
Inodes per group: 16320
Inode blocks per group: 510
First inode: 11
Inode size: 128
Journal inode: 8
First orphan inode: 441066
Journal backup: inode blocks
As a conclusion, you should prefer hard links mainly for resource consumption on disk and in memory (VFS structures in caches).
Another advice: do not create too many files in the same directory, 2'000 files is a reasonable limit to avoid performance issues.

Resources