Increasing the number of file descriptors in Linux - linux

I have a long running process which monitors the system and prints periodic logs. If I let it run for longer than 10-15 minutes, it exits with a message saying:
Too many open files.
The program is setup using real time timer_create() and timer_settime() which raise a SIGUSR1 every 2 seconds. In the handler, there is one fork()-exec() in child There is a wait in parent and subsequent mmap() and stream operations on /proc/acpi/battery/state and /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq and scaling_setspeed files.
I have taken care to close the stream FILE * pointers in the periodic signal handler and at all other places. I have also ensured munmap() of all the mapped files.
How can I get around this?
Should I increase the maximum file descriptors allowed or should I increase the maximum open files shown by ulimit -aS?
Why is this happening if I am closing all the FILE * using fclose()?
Here are the values for my system as of now:
#cat /proc/sys/fs/file-max
152808
#ulimit -aS
.
.
.
.
open files (-n) 1024

Use lsof or a debugger to find what files your process has open. Increasing the limit will just delay the point at which you run out of descriptors.

Related

Is it possible to monitor all write access to the filesystem of all process under linux

Is it possible to monitor all write access to the filesystem of all process under linux?
I've some different mounted filesystems. A lot of them are tempfs.
I'm interested in all writes to the root filesystem except the tempfs,devtmpfs etc.
I'm looking for something that will output: <PID xy> write n Bytes to /targe/filepath.
What monitoring tool can list all this write syscalls? Can they be filtered by mount points?
iotop (kernel version 2.6.20 or higher) or dstat could help you. E.g. iotop -o -b -d 10 like discussed in this similar thread.
/proc/diskstats has data for all the block devices.
https://www.kernel.org/doc/Documentation/iostats.txt
The /proc/diskstats file displays the I/O statistics of block devices. Each line contains the following 14 fields:
1 - major number
2 - minor mumber
3 - device name
4 - reads completed successfully
5 - reads merged
6 - sectors read
7 - time spent reading (ms)
8 - writes completed
9 - writes merged
10 - sectors written
11 - time spent writing (ms)
12 - I/Os currently in progress
13 - time spent doing I/Os (ms)
14 - weighted time spent doing I/Os (ms)
For more details refer to Documentation/iostats.txt
You can write a SystemTap script to monitor filesystem operations. Maybe you can visit the Brendan D. Gregg's blog, where there are many monitor tools.
fatrace (File Activity Trace)
fatrace reports file access events (Open, Read, Write, Close) from all running processes. Its main purpose is to find processes which keep waking up the disk unnecessarily and thus prevent some power saving.
When running it outputs one line per event in this format:
<timestamp> <processName(id)>: <accessType> </path/to/file>
For example:
23:10:21.375341 Plex Media Serv(2290): W /srv/dev-disk-by-uuid-UID/Plex/Library/Application Support/Plex Media Server/Logs/Plex Media Server.log
From which you easily get the all necessary infos
Timestamp from the --timestamp option
Process name (who is accessing)
File operation (O-pen R-read W-rite C-lose)
Filepath (where is it writing to).
You can limit the search scope with --current-mount to only record events on partition/mount of current directory.
So simply cd into the volume which corresponds to your spinning HDD first, and there run ftrace with the --current-mount option.
Without this option, all (real) partitions/mount points are being watched.
Very practical
With it I found out easily that the reason why my NAS disk was spinning 24/7 also when nobody accessed the NAS and also no maintenance tasks where about to run was unnecessary logging of the Plex Media Server.

Explain Node's st module fd and stat configuration

Node's st module documentation mentions fd and stat configuration:
cache: { // specify cache:false to turn off caching entirely
fd: {
max: 1000, // number of fd's to hang on to
maxAge: 1000*60*60, // amount of ms before fd's expire
},
stat: {
max: 5000, // number of stat objects to hang on to
maxAge: 1000 * 60, // number of ms that stats are good for
},
...
}
But what are these and how do they impact's st's delivery of static files? Can you give examples?
Those are configurations, for st cache module which is lru-cache.
fd
Which stands for file descriptor. Everytime sd module want to serve a file and needs to read the content from it, it needs to have/open a file descriptor. Caching file descriptor will remove the time taken to open a file.
If the file is moved or deleted, reading with the file descriptor will still result in the old content.
Each system has a maximum amount of open file descriptors per process and globally, and once you run out, you can't open anymore files. So make sure you set the cache.fd.max option lesser than amount per process.
stat
It represents the result of calls to fs.stat and friends. It is needed for setting etag, or responding with a 304.
The max option, is the maximum number of items/size and the maxAge, is the max amount of time an item can remain in memory.
Obviously, the for all the cache types(fd, stat, content,...) the higher the numbers(max and maxAge) are, some requests are served way faster, but more memory is consumed.
Setting fd.max to a optimized amount might be tricky. Since for each connection to be served a file descriptor is opened so technically. You would want to leave some space for the connections you want to handle, because if you hit the limit, your server won't receive anymore connections. Set it according to number of concurrent connections your server is expected to handle and max number of open files for your process in your system. Here's how you would check/change the max number in linux: http://cherry.world.edoors.com/CPZKoGkpxfbQ
As for stat.max, I suggest setting it according to available memory. I suggest testing/measuring it in your production system to find how much memory is used per 1 stat object, so you can decide.
Setting maxAge depends on the frequency of your files being updated.

SIPP: open file limit > FD_SETSIZE

actually I try to start SIPP 3.3 on opensuse 11 with a bash console with java.
When I start SIPP with
proc = Runtime.getRuntime().exec("/bin/bash", null, wd);
...
printWriter.println("./sipp -i "+Config.IP+" -sf uac.xml "+Config.IP+":5060");
the error stream gives the following output
Warning: open file limit > FD_SETSIZE; limiting max. # of open files to FD_SETSIZE = 1024
Resolving remote host '137.58.120.17'... Done.
What does the warning means? And is it possible that the bash terminal freezes because of this warning?
How can i remove this warning?
I'm the maintainer of SIPp and I've been looking into the FD_SETSIZE issues recently.
As is mentioned at Increasing limit of FD_SETSIZE and select, FD_SETSIZE is the maximum file descriptor that can be passed to the select() call, as it uses a bit-field internally to keep track of file descriptors. SIPp had code in it to check its own maximum open file limit (i.e. the one shown by ulimit -n), and if it was larger than FD_SETSIZE, to reduce it to FD_SETSIZE in order to avoid issues with select().
However, this has actually been unnecessary for a while - SIPp has used poll() rather than select() (which doesn't have the FD_SETSIZE limit, and has been POSIX-standardised and portable since 2001) since before I became maintainer in 2012. SIPp now also uses epoll where available for even better performance, as of the v3.4 release.
I've now removed this FD_SETSIZE check in the development code at https://github.com/SIPp/sipp, and replaced it with a more sensible check - making sure that the maximum number of open sockets (plus the maximum number of open calls, each of which may open its own media socket) is below the maximum number of file descriptors.
This warning is supposedly related to multi-socket transport options in SIPp eg. -t un or -t tn, (though I have observed it generate these warnings even when not specifying one of these).
SIPp includes an option that controls this warning message:
-skip_rlimit : Do not perform rlimit tuning of file descriptor limits. Default: false.
Though it has the desired effect for me of suppressing the warning output, on its own, it seems like a slightly dangerous option. Although I'm not certain of what will happen if you include this option and SIPp attempts to open more sockets than are available according to FD_SETSIZE, you may avoid possible problems on that front by also including the max_socket argument:
-max_socket : Set the max number of sockets to open simultaneously. This option is
significant if you use one socket per call. Once this limit is reached,
traffic is distributed over the sockets already opened. Default value is
50000
It means pretty much what it says... Your per-process open file limit (ulimit -n) is greater than the pre-defined constant FD_SETSIZE, which is 1024. So the program is adjusting your open file limit down to match FD_SETSIZE.

When executing "tar" on a directory with over a billion files, the process stayed in D status

I was doing some experiments to learn more about Linux process states.
So, there's a directory(named big_dir) with over a billion files in it(the directory has many sub-directories recursively), and then I run tar -cv big_dir | ssh anotherServer "tar -xv -C big_dir" and found out via executing top that, the tar process stays in D status. Meanwhile, the tar command keeps outputting the paths of the files.
I know that, the process was in D status because it was doing disk I/O, but why didn't its status keep switching between D and R? Printing the file names under the directory must have used some CPU computation, isn't it? Otherwise how could the find command know that it should print something?
If I run dd if=/dev/zero of=/dev/null, then the dd process status kept in R status from the top output. But why wasn't it in D status? Wasn't it doing I/O all the time?
/dev/zero and /dev/null are pseudo-devices. So there's no physical device behind them.
If I do
dd if=/dev/zero of=/tmp/zeroes
then top does show me dd in the D status. However it does spend a lot of it's time in R (in CPU time). top will simply sample the process table and consequently you may need to watch it for some time in order to see transient states.
I suspect for your tar example above that the amount of time outputting to stdout is negligible compared to the disk time. Note also that outputting to stdout will also involve the windowing system writing and whilst it's doing that the process will be sleeping. e.g. I'm running yes right now, and the majority of the work is being performed by my X server. The yes process is sleeping for most of the time I'm watching it (via top)
I'm sure your tar process SOMETIMES goes to R, but it's probably for a very short period of time, because it doesn't do that much - particularly since you are sending the data through a network. Unless that's a 10Gb/s network card [and everything else to "anotherServer" is really working at 1GB/s], this will be the slowest part of the chain. ssh itself will take a little bit of overhead as it encrypts the data.
It probably takes tar a few microseconds to ask for some data from the disk, and a few milliseconds for the disk to move its head and read the actual data. So you have about 0.1% of the time in "R", the rest is in "D".

Linux: uploading unfinished files - with file size check (scp/rsync) [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I typically end up in the following situation: I have, say, a 650 MB MPEG-2 .avi video file from a camera. Then, I use ffmpeg2theora to convert it into Theora .ogv video file, say some 150 MB in size. Finally, I want to upload this .ogv file to an ssh server.
Let's say, the ffmpeg2theora encoding process takes some 15 minutes on my PC. On the other hand, the upload goes on with a speed of about 60 KB/s, which takes some 45 minutes (for the 150MB .ogv). So: if I first encode, and wait for the encoding process to finish - and then upload, it would take approximately
15 min + 45 min = 1 hr
to complete the operation.
So, I thought it would be better if I could somehow start the upload, in parallel with the encoding operation; then, in principle - as the uploading process is slower (in terms of transferred bytes/sec) than the encoding one (in terms of generated bytes/sec) - the uploading process would always "trail behind" the encoding one, and so the whole operation (enc+upl) would complete in just 45 minutes (that is, just the time of the upload process +/- some minutes depending on actual upload speed situation on wire).
My first idea was to pipe the output of ffmpeg2theora to tee (so as to keep a local copy of the .ogv) and then, pipe the output further to ssh - as in:
./ffmpeg2theora-0.27.linux32.bin -v 8 -a 3 -o /dev/stdout MVI.AVI | tee MVI.ogv | ssh user#ssh.server.com "cat > ~/myvids/MVI.ogv"
While this command does, indeed, function - one can easily see in the running log in the terminal from ffmpeg2theora, that in this case, ffmpeg2theora calculates a predicted time of completion to be 1 hour; that is, there seems to be no benefit in terms of smaller completion time for both enc+upl. (While it is possible that this is due to network congestion, and me getting less of a network speed at the time - it seems to me, that ffmpeg2theora has to wait for an acknowledgment for each little chunk of data it sends through the pipe, and that ACK finally has to come from ssh... Otherwise, ffmpeg2theora would not have been able to provide a completion time estimate. Then again, maybe the estimate is wrong, while the operation would indeed complete in 45 mins - dunno, never had patience to wait and time the process; I just get pissed at 1hr as estimate, and hit Ctrl-C ;) ...)
My second attempt was to run the encoding process in one terminal window, i.e.:
./ffmpeg2theora-0.27.linux32.bin -v 8 -a 3 MVI.AVI # MVI.ogv is auto name for output
..., and the uploading process, using scp, in another terminal window (thereby 'forcing' 'parallelization'):
scp MVI.ogv user#ssh.server.com:~/myvids/
The problem here is: let's say, at the time when scp starts, ffmpeg2theora has already encoded 5 MB of the output .ogv file. At this time, scp sees this 5 MB as the entire file size, and starts uploading - and it exits when it encounters the 5 MB mark; while in the meantime, ffmpeg2theora may have produced additional 15 MB, making the .ogv file 20 MB in total size at the time scp has exited (finishing the transfer of the first 5 MB).
Then I learned (joen.dk ยป Tip: scp Resume) that rsync supports 'resume' of partially completed uploads, as in:
rsync --partial --progress myFile remoteMachine:dirToPutIn/
..., so I tried using rsync instead of scp - but it seems to behave exactly the same as scp in terms of file size, that is: it will only transfer up to the file size read at the beginning of the process, and then it will exit.
So, my question to the community is: Is there a way to parallelize the encoding and uploading process, so as to gain the decrease in total processing time?
I'm guessing there could be several ways, as in:
A command line option (that I haven't seen) that forces scp/rsync to continuously check the file size - if the file is open for writing by another process (then I could simply run the upload in another terminal window)
A bash script; say running rsync --partial in a while loop, that runs as long as the .ogv file is open for writing by another process (I don't actually like this solution, since I can hear the harddisk scanning for the resume point, every time I run rsync --partial - which, I guess, cannot be good; if I know that the same file is being written to at the same time)
A different tool (other than scp/rsync) that does support upload of a "currently generated"/"unfinished" file (the assumption being it can handle only growing files; it would exit if it encounters that the local file is suddenly less in size than the bytes already transferred)
... but it could also be, that I'm overlooking something - and 1hr is as good as it gets (in other words, it is maybe logically impossible to achieve 45 min total time - even if trying to parallelize) :)
Well, I look forward to comments that would, hopefully, clarify this for me ;)
Thanks in advance,
Cheers!
May be you can try sshfs (http://fuse.sourceforge.net/sshfs.html).This being a file system should have some optimization though I am not very sure.

Resources