Linux #open-file limit - linux

we are facing a situation where a process gets stuck due to running out of open files limit. The global setting file-max was set extremely high (set in sysctl.conf) & per-user value also was set to a high value in /etc/security/limits.conf. Even ulimit -n reflects the per-user value when ran as that head-less user (process-owner). So the question is, does this change require system reboot (my understanding is it doesn't) ? Has anyone faced the similar problem ? I am running ubuntu lucid & the application is a java process. #of ephemeral port range too is high enough, & when checked during the issue, the process had opened #1024 (Note the default value) files (As reported by lsof).

One problem you might run into is that the fd_set used by select is limited to FD_SETSIZE, which is fixed at compile time (in this case of the JRE), and that is limited to 1024.
#define FD_SETSIZE __FD_SETSIZE
/usr/include/bits/typesizes.h:#define __FD_SETSIZE 1024
Luckily both the c library and the kernel can handle arbitrary sized fd_set, so, for a compiled C program, it is possible to raise that limit.

Considering you have edited file-max value in sysctl.conf and /etc/security/limits.conf correctly; then:
edit /etc/pam.d/login, adding the line:
session required /lib/security/pam_limits.so
and then do
#ulimit -n unlimited
Note that you may need to log out and back in again before the changes take effect.

Related

ionice 'idle' not having the expected effects

We're working with a reasonably busy web server. We wanted to use rsync to do some data-moving which was clearly going to hammer the magnetic disk, so we used ionice to put the rsync process in the idle class. The queues for both disks on the system (SSD+HDD) are set to use the CFQ scheduler.
The result... was that the disk was absolutely hammered and the website performance was appalling.
I've done some digging to see if any tuning might help with this.
The man page for ionice says:
Idle: A program running with idle I/O priority will only get disk time
when no other program has asked for disk I/O for a defined grace period.
The impact of an idle I/O process on normal system activity should be zero.
This "defined grace period" is not clearly explained anywhere I can find with the help of Google. One posting suggest that it's the value of fifo_expire_async but I can't find any real support for this.
However, on our system, both fifo_expire_async and fifo_expire_sync are set sufficiently long (250ms, 125ms, which are the defaults) that the idle class should actually get NO disk bandwidth at all. Even if the person who believes that the grace period is set by fifo_expire_async is plain wrong, there's not a lot of wiggle-room in the statement "The impact of an idle I/O process on normal system activity should be zero".
Clearly this is not what's happening on our machine so I am wondering if CFQ+idle is simply broken.
Has anyone managed to get it to work? Tips greatly appreciated!
Update:
I've done some more testing today. I wrote a small Python app to read random sectors from all over the disk with short sleeps in between. I ran a copy of this without ionice and set it up to perform around 30 reads per second. I then ran a second copy of the app with various ionice classes to see if the idle class did what it said on the box. I saw no difference at all between the results when I used classes 1, 2, 3 (real-time, best-effort, idle). This, despite the fact that I'm now absolutely certain that the disk was busy.
Thus, I'm now certain that - at least for our setup - CFQ+idle does not work. [see Update 2 below - it's not so much "does not work" as "does not work as expected"...]
Comments still very welcome!
Update 2:
More poking about today. Discovered that when I push the I/O rate up dramatically, the idle-class processes DO in fact start to become starved. In my testing, this happened at I/O rates hugely higher than I had expected - basically hundreds of I/Os per second. I'm still trying to work out what the tuning parameters do...
I also discovered the rather important fact that async disk writes aren't included at all in the I/O prioritisation system! The ionice manpage I quoted above makes no reference to that fact, but the manpage for the syscall ioprio_set() helpfully states:
I/O priorities are supported for reads and for synchronous (O_DIRECT,
O_SYNC) writes. I/O priorities are not supported for asynchronous
writes because they are issued outside the context of the program
dirtying the memory, and thus program-specific priorities do not
apply.
This pretty significantly changes the way I was approaching the performance issues and I will be proposing an update for the ionice manpage.
Some more info on kernel and iosched settings (sdb is the HDD):
Linux 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) x86_64 GNU/Linux
/etc/debian_version = 9.3
(cd /sys/block/sdb/queue/iosched; grep . *)
back_seek_max:16384
back_seek_penalty:2
fifo_expire_async:250
fifo_expire_sync:125
group_idle:8
group_idle_us:8000
low_latency:1
quantum:8
slice_async:40
slice_async_rq:2
slice_async_us:40000
slice_idle:8
slice_idle_us:8000
slice_sync:100
slice_sync_us:100000
target_latency:300
target_latency_us:300000
AFAIK, the only opportunity to solve your problem is using CGroup v2 (kernel v. 4.5 or newer). Please see the following article:
https://andrestc.com/post/cgroups-io/
Also please note, that you may use the systemd's wrappers to configure CGroup limits on per-service basis:
http://0pointer.de/blog/projects/resources.html
Add nocache to that and you're set (you can join it with ionice and nice):
https://github.com/Feh/nocache
On Ubuntu install with:
apt install nocache
It simply omits cache on IO and thanks to that other processes won't starve when the cache is flushed.
It's like calling the commands with O_DIRECT, so now you can limit the IO for example with:
systemd-run --scope -q --nice=19 -p BlockIOAccounting=true -p BlockIOWeight=10 -p "BlockIOWriteBandwidth=/dev/sda 10M" nocache youroperation_here
I usually use it with:
nice -n 19 ionice -c 3 nocache youroperation_here

Limiting the memory usage of a program in Linux

I'm new to Linux and Terminal (or whatever kind of command prompt it uses), and I want to control the amount of RAM a process can use. I already looked for hours to find an easy-t-use guide. I have a few requirements for limiting it:
Multiple instances of the program will be running, but I only want to limit some of the instances.
I do not want the process to crash once it exceeds the limit. I want it to use HDD page swap.
The program will run under WINE, and is a .exe.
So can somebody please help with the command to limit the RAM usage on a process in Linux?
The fact that you’re using Wine makes no difference in this particular context, which leaves requirements 1 and 2. Requirement 2 –
I do not want the process to crash once it exceeds the limit. I want it to use HDD page swap.
– is known as limiting the resident set size or rss of the process, and it’s actually rather nontrivial to do on Linux, as is demonstrated by a question asked in 2010. You’ll need to set up Linux control groups (cgroups). Fortunately, Justin L.’s answer gives a brief rundown on how to do so. Note that
instead of jlebar, you should use your own Unix user name, and
instead of your/program, you should use wine /path/to/Windows/program.exe.
Using cgroups will also satisfy your other requirements – you can start as many instances of the program as you wish, but only those which you start with cgexec -g memory:limited will be limited.

what's the difference between output of "ulimit" command and the content of file "/etc/security/limits.conf"?

I am totally confused by obtaining the limits of open file descriptors in Linux.
which value is correct by them?
ulimit -n ======> 65535
but
vim /etc/security/limits.conf
soft nofile 50000
hard nofile 90000
The limits applied in /etc/security/limits.conf are applied by the limits authentication module at login if it's part of the PAM configuration. Then the shell gets invoked which can apply it's own limits to the shell.
If you're asking which one is in effect, then it's the result from the ulimit call. if it's not invoked with the -H option, then it displays the soft limit.
The idea behind the limits.conf settings is to have a global place to apply limits for, for example, remote logins
Limits for things like file descriptors can be set at the user level, or on a system wide level. /etc/security/limits.conf is where you can set user level limits, which might be different limits for each user, or just defaults that apply to all users. The example you show has a soft (~warning) level limit of 50000, but a hard (absolute maximum) limit of 90000.
However, a system limit of 65535 might be in place, which would take precedence over the user limit. I think system limits are set in /etc/sysctl.conf, if my memory serves correctly. You might check there to see if you're being limited by the system.
Also, the ulimit command can take switches to specifically show the soft (-Sn) and hard (-Hn) limits for file descriptors.
i think this conf is used by all the apps in the system. if you do want to change one particular app, you can try setrlimit() or getrlimt(). man doc dose explain everything.

VFS: file-max limit 1231582 reached

I'm running a Linux 2.6.36 kernel, and I'm seeing some random errors. Things like
ls: error while loading shared libraries: libpthread.so.0: cannot open shared object file: Error 23
Yes, my system can't consistently run an 'ls' command. :(
I note several errors in my dmesg output:
# dmesg | tail
[2808967.543203] EXT4-fs (sda3): re-mounted. Opts: (null)
[2837776.220605] xv[14450] general protection ip:7f20c20c6ac6 sp:7fff3641b368 error:0 in libpng14.so.14.4.0[7f20c20a9000+29000]
[4931344.685302] EXT4-fs (md16): re-mounted. Opts: (null)
[4982666.631444] VFS: file-max limit 1231582 reached
[4982666.764240] VFS: file-max limit 1231582 reached
[4982767.360574] VFS: file-max limit 1231582 reached
[4982901.904628] VFS: file-max limit 1231582 reached
[4982964.930556] VFS: file-max limit 1231582 reached
[4982966.352170] VFS: file-max limit 1231582 reached
[4982966.649195] top[31095]: segfault at 14 ip 00007fd6ace42700 sp 00007fff20746530 error 6 in libproc-3.2.8.so[7fd6ace3b000+e000]
Obviously, the file-max errors look suspicious, being clustered together and recent.
# cat /proc/sys/fs/file-max
1231582
# cat /proc/sys/fs/file-nr
1231712 0 1231582
That also looks a bit odd to me, but the thing is, there's no way I have 1.2 million files open on this system. I'm the only one using it, and it's not visible to anyone outside the local network.
# lsof | wc
16046 148253 1882901
# ps -ef | wc
574 6104 44260
I saw some documentation saying:
file-max & file-nr:
The kernel allocates file handles dynamically, but as yet it doesn't free them again.
The value in file-max denotes the maximum number of file- handles that the Linux kernel will allocate. When you get lots of error messages about running out of file handles, you might want to increase this limit.
Historically, the three values in file-nr denoted the number of allocated file handles, the number of allocated but unused file handles, and the maximum number of file handles. Linux 2.6 always reports 0 as the number of free file handles -- this is not an error, it just means that the number of allocated file handles exactly matches the number of used file handles.
Attempts to allocate more file descriptors than file-max are reported with printk, look for "VFS: file-max limit reached".
My first reading of this is that the kernel basically has a built-in file descriptor leak, but I find that very hard to believe. It would imply that any system in active use needs to be rebooted every so often to free up the file descriptors. As I said, I can't believe this would be true, since it's normal to me to have Linux systems stay up for months (even years) at a time. On the other hand, I also can't believe that my nearly-idle system is holding over a million files open.
Does anyone have any ideas, either for fixes or further diagnosis? I could, of course, just reboot the system, but I don't want this to be a recurring problem every few weeks. As a stopgap measure, I've quit Firefox, which was accounting for almost 2000 lines of lsof output (!) even though I only had one window open, and now I can run 'ls' again, but I doubt that will fix the problem for long. (edit: Oops, spoke too soon. By the time I finished typing out this question, the symptom was/is back)
Thanks in advance for any help.
I hate to leave a question open, so a summary for anyone who finds this.
I ended up reposting the question on serverfault instead (this article)
They weren't able to come up with anything, actually, but I did some more investigation and ultimately found that it's a genuine bug with NFSv4, specifically the server-side locking code. I had an NFS client which was running a monitoring script every 5 seconds, using rrdtool to log some data to an NFS-mounted file. Every time it ran, it locked the file for writing, and the server allocated (but erroneously never released) an open file descriptor. That script (plus another that ran less frequently) resulted in about 900 open files consumed per hour, and two months later, it hit the limit.
Several solutions are possible:
1) Use NFSv3 instead.
2) Stop running the monitoring script.
3) Store the monitoring results locally instead of on NFS.
4) Wait for the patch to NFSv4 that fixes this (Bruce Fields actually sent me a patch to try, but I haven't had time)
I'm sure you can think of other possible solutions.
Thanks for trying.

Limit the memory and cpu available for a user in Linux

I am a little concerned with the amount of resources that I can use in a shared machine. Is there any way to test if the administrator has a limit in the amount of resources that I can use? And if does, to make a more complete question, how can I set up such limit?
For process related limits, you can have a look in /etc/security/limits.conf (read the comments in the file, use google or use man limits.conf for more information). And as jpalecek points out, you may use ulimit -a to see (and possibly modify) all such limits currently in effect.
You can use the command quota to see if a disk quota is in effect.
You can try running
ulimit -a
to see what resource limits are in effect. Also, if you are allowed to change such limits, you can change them by the ulimit command, eg.
ulimit -c unlimited
lifts any limit for a size of a core file a process can make.
At the C level, the relevant functions (actually syscalls(2)...) could be setrlimit(2) and setpriority(2) and sched_setattr(2). You probably would want them to be called from your shell.
See also proc(5) -and try cat /proc/self/limits and sched(7).
You may want to use the renice(1) command.
If you run a long-lasting program (for several hours) not requiring user interaction, you could consider using some batch processing. Some Linux systems have a batch or at command.

Resources