Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I tried installing Intel MPI Benchmark on my computer and I got this error:
fork: retry: Resource temporarily unavailable
Then I received this error again when I ran ls and top command.
What is causing this error?
Configuration of my machine:
Dell precision T7500
Scientific Linux release 6.2 (Carbon)
This is commonly caused by running out of file descriptors.
There is the systems total file descriptor limit, what do you get from the command:
sysctl fs.file-nr
This returns counts of file descriptors:
<in_use> <unused_but_allocated> <maximum>
To find out what a users file descriptor limit is run the commands:
sudo su - <username>
ulimit -Hn
To find out how many file descriptors are in use by a user run the command:
sudo lsof -u <username> 2>/dev/null | wc -l
So now if you are having a system file descriptor limit issue you will need to edit your /etc/sysctl.conf file and add, or modify it it already exists, a line with fs.file-max and set it to a value large enough to deal with the number of file descriptors you need and reboot.
fs.file-max = 204708
Another possibility is too many threads. We just ran into this error message when running a test harness against an app that uses a thread pool. We used
watch -n 5 -d "ps -eL <java_pid> | wc -l"
to watch the ongoing count of Linux native threads running within the given Java process ID. After this hit about 1,000 (for us--YMMV), we started getting the error message you mention.
Related
I have a Java program after 2 weeks of running in average will become stuck and produce the following error:
Caused by: java.net.SocketException: Too many open files
at sun.nio.ch.Net.socket0(Native Method)
at sun.nio.ch.Net.socket(Net.java:415)
at sun.nio.ch.Net.socket(Net.java:408)
at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:105)
That hints to me that many sockets are opened but never closed.
Before diving into programmatic instrumentation i started to inspect what information i could draw from linux itself. I am using Redhat.
And then, a few questions came up as follows:
Why the following commands do not give the same output?
See
[ec2-user#ip-172-22-28-102 ~]$ sudo ls /proc/32085/fd | wc -l
592
[ec2-user#ip-172-22-28-102 ~]$ sudo lsof -a -p 32085 | wc -l
655
Is there a way to know from the proc stat info which thread created which file descriptor?
It seems like there is not because if i do the following, i am getting the same information:
[ec2-user#ip-172-22-28-102 ~]$ sudo ls /proc/32085/task/22386/fd | wc -l
592
[ec2-user#ip-172-22-28-102 ~]$ sudo ls /proc/32085/fd | wc -l
592
Same if i go to the thread directly from under /proc/ .
Thx
Is there a way to know from the proc stat info which thread created which file descriptor?
I am pretty sure the answer here is "no". File descriptors are opened by processes, not threads (and will be visible to all threads spawned by the same process).
Why the following commands do not give the same output?
First, the -a argument to lsof appears to be a no-op in this case. Specfically, the man says that it "causes list selection options to be ANDed, as described above". So you are really just running:
sudo lsof -p 32085
And that will print things other than open file descriptors (such as memory-mapped files, current working directory, etc), while /proc/<PID>/fd contains only open file descriptors. So you're getting different results because you're asking for different information.
The only reason you can receive that message is that you have opened files and you didn't close them after use. You have a file descriptor leak in your java application. Java programmers normally don't check memory as the garbage collector copes with unreferenced objects. If you save file descriptors without closing in some data structure or you don't close the files after using, you can reach the maximum limit allowed to a process (this is controlled per process and can be changed by the ulimit shell command)
But if your problem is a file descriptor leak, pushing up the ulimit will only delay the problem some time. File descriptors must be closed, or you'll run into trouble.
I've just ran across this difference today, the explanation is that lsof takes into account more types of files, like memory-mapped objects, run-time libraries etc
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
I've seen this question asked before, but none of the solutions have worked for me.
I'm having problems using the supervisor on my rpi b+. Every time I try to run my start my process, I get an error saying:
pi#raspberrypi ~ $ sudo supervisorctl start server
server: ERROR (no such process)
I have my config file set up at /etc/supervisord.conf
[program:server]
directory=/home/pi/ledticker
command=/usr/bin/python NetworkServer.py
autostart=false
autorestart=true
stopsignal=QUIT
[supervisord]
logfile=/var/log/supervisor/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10 ; (num of main logfile rotation backups;default 10)
loglevel=info ; (log level;default info; others: debug,warn,trace)
pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false ; (start in foreground if true;default false)
minfds=1024 ; (min. avail startup file descriptors;default 1024)
minprocs=200 ; (min. avail process descriptors;default 200)
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket
[unix_http_server]
file=/tmp/supervisor.sock ; (the path to the socket file)
I have tried doing the reread, update, reload commands but they haven't worked. Any ideas?
You should try to reload supervisord :
# supervisorctl reload
[y/N] ? y
In many cases, this error is resolved by that reload.
On my Fedora22, I modified below lines in /etc/supervisord.conf:
[include]
files = supervisord.d/*.ini
to
[include]
files = supervisord.d/*.conf
and then reload
i had faced same problem before. It was resolve by following solutions.
First edit your supervisord.conf file and add below lines :
[unix_http_server]
file=/tmp/supervisor.sock
chmod=0777
start SupervisorD service first using following command :
$ sudo /usr/bin/supervisord -c /etc/supervisord.conf
You can verify using : ps -ef | grep python
After supervisord starts, Try to start your program using following command :
$ sudo /usr/bin/supervisorctl -c /etc/supervisord.conf start all
In case of process multi-instances configuration full process name might look like server:server_0 (depends on your process_name template). Try:
sudo supervisorctl restart server:*
Otherwise you'll get same (no such process) error.
In some versions of supervisor the [include] section not work, you need to add the programs in the main supervisor configuration file in /etc/supervisord.conf
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm using a simple shell script on my Linux server which checks if an rsync job is running or if any client accesses some directories from the server via Samba. If this is the case then nothing happens, but if are there no jobs and Samba isn't used than the server goes into hibernation.
Is there any simple command which I can use to check if an SSH connection to the server exists? I want to add this to my shell script so that the server doesn't hibernate if such a connection exists.
Scan the process list for sshd: .
Established connections look something like this: sshd: <username>…
ps -A x | grep [s]shd
should work for you.
use who command
it gives output like
username pts/1 2013-06-19 19:51 (ip)
You could parse that to see how many non locals are added and get their usernames (or there are options see man who for more info
gives a count of how many non localhost users there are
who | grep -v localhost | wc -l
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have two servers running a vendor application. On one server if the app crashes it creates a core dump but the second it does not.
The servers were supposed to be set up the same but I am trying to figure out why the application doesn't create a core dump. I've checked all the typical settings and have been doing research with no luck.
The strange part is that if I run a kill -s SIGSEGV $$ as my app user, it generates a core dump in the same directory the app is supposed to create the core dump. The vendor and Linux group are both unsure at the moment that is why I'm looking here for help.
$ cat /proc/sys/kernel/core_pattern
core
$ cat /proc/sys/kernal/core_uses_pid
1
$ ulimit -c
unlimited
$ cat /etc/security/limits.conf | grep core
* soft core unlimited
* hard core unlimited
$ cat /etc/profile | grep ulimit
ulimit -c unlimited > /dev/null 2>&1
$ cat /proc/sys/fs/suid_dumpable
0
$ cat /etc/sysconfig/init | grep CORE
DAEMON_COREFILE_LIMIT='unlimited'
There could be several other reasons why the coredump is not created. Check the list of possible reasons in core(5): http://linux.die.net/man/5/core
Check dmesg output.
Check the specific process corefile size limit in /proc/PID/limits.
Check if the process user can create a file of typical coredump size in /proc/PID/cwd directory.
Specify absolute file path in /proc/sys/kernel/core_pattern, pointing to a known writable location.
Create a short program adhering to the coredump-accepting protocol, saving it somewhere, and specify it in /proc/sys/kernel/core_pattern, according to core(5). Coredumps piped to programs are not subject to limits.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
When running my application I sometimes get an error about too many files open.
Running ulimit -a reports that the limit is 1024. How do I increase the limit above 1024?
Edit
ulimit -n 2048 results in a permission error.
You could always try doing a ulimit -n 2048. This will only reset the limit for your current shell and the number you specify must not exceed the hard limit
Each operating system has a different hard limit setup in a configuration file. For instance, the hard open file limit on Solaris can be set on boot from /etc/system.
set rlim_fd_max = 166384
set rlim_fd_cur = 8192
On OS X, this same data must be set in /etc/sysctl.conf.
kern.maxfilesperproc=166384
kern.maxfiles=8192
Under Linux, these settings are often in /etc/security/limits.conf.
There are two kinds of limits:
soft limits are simply the currently enforced limits
hard limits mark the maximum value which cannot be exceeded by setting a soft limit
Soft limits could be set by any user while hard limits are changeable only by root.
Limits are a property of a process. They are inherited when a child process is created so system-wide limits should be set during the system initialization in init scripts and user limits should be set during user login for example by using pam_limits.
There are often defaults set when the machine boots. So, even though you may reset your ulimit in an individual shell, you may find that it resets back to the previous value on reboot. You may want to grep your boot scripts for the existence ulimit commands if you want to change the default.
If you are using Linux and you got the permission error, you will need to raise the allowed limit in the /etc/limits.conf or /etc/security/limits.conf file (where the file is located depends on your specific Linux distribution).
For example to allow anyone on the machine to raise their number of open files up to 10000 add the line to the limits.conf file.
* hard nofile 10000
Then logout and relogin to your system and you should be able to do:
ulimit -n 10000
without a permission error.
1) Add the following line to /etc/security/limits.conf
webuser hard nofile 64000
then login as webuser
su - webuser
2) Edit following two files for webuser
append .bashrc and .bash_profile file by running
echo "ulimit -n 64000" >> .bashrc ; echo "ulimit -n 64000" >> .bash_profile
3) Log out, then log back in and verify that the changes have been made correctly:
$ ulimit -a | grep open
open files (-n) 64000
Thats it and them boom, boom boom.
If some of your services are balking into ulimits, it's sometimes easier to put appropriate commands into service's init-script. For example, when Apache is reporting
[alert] (11)Resource temporarily unavailable: apr_thread_create: unable to create worker thread
Try to put ulimit -s unlimited into /etc/init.d/httpd. This does not require a server reboot.