lsof "lies" when using options? - linux

I have a problem where my Java application opens too many files. Debugging this issue, I am dependent on using lsof.
However running lsof this way takes too much time (more than one minutt):
lsof |grep "java"
I should be able to run it using the -p option, however it "lies". It shows too few lines.
lsof -p <PID of the java process>
This is my proof :
lsof |grep java | wc -l
1510146
lsof -p 802 | wc -l
4735
The same happens if I use the -u option limiting to username (process owner).
My system is :
Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u2 (2017-03-07) x86_64 GNU/Linux
Am I missing something ? Is there an alternative to using lsof ?

lsof is not lying.
The output of the command:
lsof |grep java | wc -l
may contain results of files or processes opened by other programs.
The result you are searching for is the result of the command:
lsof -p <PID> | wc -l
You can increase the limit of opened files for the user running your java application adding this line in /etc/security/limits.conf:
<USER> hard nofile 65536
you can check the current user's limits by typing:
su - <USER>
ulimit -a

lsof without parameter lists all open files, including files which are not using file descriptors – such as current working directories, memory mapped library files, and executable text files.
lsof -p <PID> lists open file descriptors. A file descriptor is a data structure used by a program to get a handle on a file, the most well know being 0,1,2 for standard in, standard out, and standard error.
See: https://www.netadmintools.com/art295.html

Based on my observation, it seems that
lsof | grep <pid> | wc -l
will give duplicate count, because every thread in the specified process will add a line, e.g. if your process have 8 threads, the result will be more than 8x the actual file count.
On the other hand,
lsof -p <PID> | wc -l
produce more exact result, because each file is counted (printed) only once.
Although I have not found official reference for this issue yet.

Related

Get files used by a binary

I am trying to locate a file used by a binary file during its execution. Using strace helps but its way too convoluted, macroed with grep is good enough, but does there exist an utility which can help me dump only files used by a binary?
you can try using:
lsof -p PID of the running process
lsof -c ssh would show all files opened by processes starting with the letter
Or try ltrace or maybe fuser
I've seen strace be used with some complex grep piping.. but it all depends on what exactly the end goal is.
You can also utilize the -e options in strace to filter, example is:
sudo strace -t -e trace=open,close,read,getdents,write,connect,accept whoami >/dev/null
and grep from there..

Bash, display processes in specific folder

I need to display processes, that are running in specific folder.
For example, there are folders "TEST" and "RUN". 3 sql files are running from TEST, and 2 from RUN. So when I use command ps xa, I can see all processes, runned from TEST and RUN together. What I want is to see processes, runned only from TEST folder, so only 3. Any commands, solutions to do this?
You can use lsof for this.
lsof | grep '/path/of/RUN'.
If you want to include both RUN and TEST in same command
lsof | grep -E "/path/of/RUN|/path/of/TEST"
Hope it helps.
You can try fuser to see which processes have particular files open; or, on Linux, examine the /proc/12345/cwd symlink for each of the candidate processes (replace 12345 with the process id of each).
fuser TEST/*.sql
for proc in /proc/[1-9]*; do
readlink "$proc/cwd" | grep -q TEST && echo "$proc"
done
The latter is not portable to other U*xes, though some may offer similar facilities.

Linux File descriptors

I have a Java program after 2 weeks of running in average will become stuck and produce the following error:
Caused by: java.net.SocketException: Too many open files
at sun.nio.ch.Net.socket0(Native Method)
at sun.nio.ch.Net.socket(Net.java:415)
at sun.nio.ch.Net.socket(Net.java:408)
at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:105)
That hints to me that many sockets are opened but never closed.
Before diving into programmatic instrumentation i started to inspect what information i could draw from linux itself. I am using Redhat.
And then, a few questions came up as follows:
Why the following commands do not give the same output?
See
[ec2-user#ip-172-22-28-102 ~]$ sudo ls /proc/32085/fd | wc -l
592
[ec2-user#ip-172-22-28-102 ~]$ sudo lsof -a -p 32085 | wc -l
655
Is there a way to know from the proc stat info which thread created which file descriptor?
It seems like there is not because if i do the following, i am getting the same information:
[ec2-user#ip-172-22-28-102 ~]$ sudo ls /proc/32085/task/22386/fd | wc -l
592
[ec2-user#ip-172-22-28-102 ~]$ sudo ls /proc/32085/fd | wc -l
592
Same if i go to the thread directly from under /proc/ .
Thx
Is there a way to know from the proc stat info which thread created which file descriptor?
I am pretty sure the answer here is "no". File descriptors are opened by processes, not threads (and will be visible to all threads spawned by the same process).
Why the following commands do not give the same output?
First, the -a argument to lsof appears to be a no-op in this case. Specfically, the man says that it "causes list selection options to be ANDed, as described above". So you are really just running:
sudo lsof -p 32085
And that will print things other than open file descriptors (such as memory-mapped files, current working directory, etc), while /proc/<PID>/fd contains only open file descriptors. So you're getting different results because you're asking for different information.
The only reason you can receive that message is that you have opened files and you didn't close them after use. You have a file descriptor leak in your java application. Java programmers normally don't check memory as the garbage collector copes with unreferenced objects. If you save file descriptors without closing in some data structure or you don't close the files after using, you can reach the maximum limit allowed to a process (this is controlled per process and can be changed by the ulimit shell command)
But if your problem is a file descriptor leak, pushing up the ulimit will only delay the problem some time. File descriptors must be closed, or you'll run into trouble.
I've just ran across this difference today, the explanation is that lsof takes into account more types of files, like memory-mapped objects, run-time libraries etc

How find out which process is using a file in Linux?

I tried to remove a file in Linux using rm -rf file_name, but got the error:
rm: file_name not removed. Text file busy
How can I find out which process is using this file?
You can use the fuser command, which is part of the psmisc package, like:
fuser file_name
You will receive a list of processes using the file.
You can use different flags with it, in order to receive a more detailed output.
You can find more info in the fuser's Wikipedia article, or in the man pages.
#jim's answer is correct -- fuser is what you want.
Additionally (or alternately), you can use lsof to get more information including the username, in case you need permission (without having to run an additional command) to kill the process. (THough of course, if killing the process is what you want, fuser can do that with its -k option. You can have fuser use other signals with the -s option -- check the man page for details.)
For example, with a tail -F /etc/passwd running in one window:
ghoti#pc:~$ lsof | grep passwd
tail 12470 ghoti 3r REG 251,0 2037 51515911 /etc/passwd
Note that you can also use lsof to find out what processes are using particular sockets. An excellent tool to have in your arsenal.
For users without fuser :
Although we can use lsof, there is another way i.e., we can query the /proc filesystem itself which lists all open files by all process.
# ls -l /proc/*/fd/* | grep filename
Sample output below:
l-wx------. 1 root root 64 Aug 15 02:56 /proc/5026/fd/4 -> /var/log/filename.log
From the output, one can use the process id in utility like ps to find program name
$ lsof | tree MyFold
As shown in the image attached:

What is the best way to identify which syslog daemon is running on Linux?

I'm writing Linux shell script (sh, bash or csh) to identify which syslog daemon is running.
What is the best way to do it?
Since I only consider RHEL and rpm based destribution, Debian and its derivatives can be ignored.
To the best of my knowledge, syslog-ng and rsyslog (the default) are the only ones available on RHEL. You could either probe the process space, see which process currently holds /var/log/syslog open or simply check which syslog daemon is installed (though, it's possible to have them both installed at the same time).
$ lsof /var/log/messages /var/log/syslog 2>&1 | grep syslog
$ rpm -q rsyslog syslog-ng
$ pgrep -u root syslog | xargs ps -p
One could parse the output of lsof to see which processes have the file /var/log/syslog open, a very crude example would be:
sudo lsof | grep /var/log/syslog | cut -f1 -d' '
If you are using a single distribution there may be more elegant ways of checking.
On a debian-based system, run the following script to see what's installed:
dpkg-query -l '*syslog*' | grep ii
This will give you output similar to the following
ii rsyslog 7.4.4-1ubuntu2.3 i386 reliable system and kernel logging daemon
That way you don't have to grep files etc. Hope it helps you out.

Resources