strace on Linux not logging all calls to open() - linux

I am using strace to capture calls to open(), close() and read() on Linux. The target process is the jetty web server. As far as I can tell, strace is not logging all calls to open(). Maybe the others too, I have not tried to correlate the file descriptors to open() calls.
For example, starting strace:
strace -f -e trace=open,close,read -o/tmp/strace.out -p62881
I then use wget to fetch 100 static files; all were retrieved successfully. In one run, only 56 open events were logged; on another run of 100 different files, I got 66 open events.
I believe that using "-f" results in strace attaching to all the LWPIDs for the threads ("Process 62881 attached with 25 threads - interrupt to quit
"); when I try to explicitly attach to all using multiple "-p" options, I get a single "attach" success message, but multiple "Operation not permitted messages", one for each child PID.
I restarted Jetty to clear its cache before my tests.
Kernel version is 2.6.32-504.3.3.el6.x86_64 (Red Hat). Strace package version is strace-4.5.19-1.19.el6.x86_64.
What am I missing?
Thanks

On some systems you have to use openat() instead of open().
Try:
strace -f -e trace=openat,close,read -o/tmp/strace.out -p62881

Try -ff (in addition to -f):
-ff: If the -o filename option is in effect, each processes trace is written to filename.pid where pid is
the numeric process id of each process. This is incompatible with -c, since no per-process counts are
kept.

Related

Running perf-top in a script

I have some intermittent performance issues that I want to capture via perf top. The issue is intermittent, so I want to write a script that runs perf top when the issue is occurring so that I can save the data and view it at a later time.
I can't seem to figure out how to get perf top to put its output in a file, it seems to demand to be run interactively. Here is what I've tried so far:
# timeout 10 perf top --stdio -E 20 > 'perf-top'
This does not kill perf, just leaves it running in the background forever until I create another console session, find the PID, and kill it.
# timeout --signal=9 10 perf top --stdio -E 20 > 'perf-top'
This kills perf in the expected 10 seconds, but the output is not written to the file that I specified.
Is there some special way that this command needs to be run? It works if I run it from an interactive ssh session, but I'd really like to be able to run it from a script. I'm trying to put it into an ansible task with a few other metrics-gathering programs.
SIGKILL (9) isn't catchable, so it's impossible for perf to flush any buffered output.
Use any another signal so perf can clean up after receiving the signal and write output.
If the default SIGTERM doesn't work, then maybe try SIGHUP, SIGINT, or others.
timeout --signal=INT
timeout 10 perf top --stdio -E 20 > perf-top (with the default SIGTERM) works for me (as non-root) on my Arch Linux desktop. perf version 5.0.g1c163f4

Linux File descriptors

I have a Java program after 2 weeks of running in average will become stuck and produce the following error:
Caused by: java.net.SocketException: Too many open files
at sun.nio.ch.Net.socket0(Native Method)
at sun.nio.ch.Net.socket(Net.java:415)
at sun.nio.ch.Net.socket(Net.java:408)
at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:105)
That hints to me that many sockets are opened but never closed.
Before diving into programmatic instrumentation i started to inspect what information i could draw from linux itself. I am using Redhat.
And then, a few questions came up as follows:
Why the following commands do not give the same output?
See
[ec2-user#ip-172-22-28-102 ~]$ sudo ls /proc/32085/fd | wc -l
592
[ec2-user#ip-172-22-28-102 ~]$ sudo lsof -a -p 32085 | wc -l
655
Is there a way to know from the proc stat info which thread created which file descriptor?
It seems like there is not because if i do the following, i am getting the same information:
[ec2-user#ip-172-22-28-102 ~]$ sudo ls /proc/32085/task/22386/fd | wc -l
592
[ec2-user#ip-172-22-28-102 ~]$ sudo ls /proc/32085/fd | wc -l
592
Same if i go to the thread directly from under /proc/ .
Thx
Is there a way to know from the proc stat info which thread created which file descriptor?
I am pretty sure the answer here is "no". File descriptors are opened by processes, not threads (and will be visible to all threads spawned by the same process).
Why the following commands do not give the same output?
First, the -a argument to lsof appears to be a no-op in this case. Specfically, the man says that it "causes list selection options to be ANDed, as described above". So you are really just running:
sudo lsof -p 32085
And that will print things other than open file descriptors (such as memory-mapped files, current working directory, etc), while /proc/<PID>/fd contains only open file descriptors. So you're getting different results because you're asking for different information.
The only reason you can receive that message is that you have opened files and you didn't close them after use. You have a file descriptor leak in your java application. Java programmers normally don't check memory as the garbage collector copes with unreferenced objects. If you save file descriptors without closing in some data structure or you don't close the files after using, you can reach the maximum limit allowed to a process (this is controlled per process and can be changed by the ulimit shell command)
But if your problem is a file descriptor leak, pushing up the ulimit will only delay the problem some time. File descriptors must be closed, or you'll run into trouble.
I've just ran across this difference today, the explanation is that lsof takes into account more types of files, like memory-mapped objects, run-time libraries etc

tail -f always displays "Killed"

When doing
tail -f /var/log/apache2/access.log
It shows logs and then
Killed
I have to re-execute tail -f to see new logs.
How do I make tail -f continually display logs without killing itself?
The first thing I'd do is try --follow instead of -f. Your problem could be happening because your log file is being rotated out. From the man page:
With --follow (-f), tail defaults to following the file descriptor, which means that even if a tail'ed file is renamed, tail will continue to track its end. This default behavior is not desirable when you really want to track the actual name of the file, not the file descriptor (e.g., log rotation). Use --follow=name in that case. That causes tail to track the named file in a way that accommodates renaming, removal and creation.*
tail -f should not get killed.
Btw, tail does not kill itself, it is killed by something. For example system is out of memory or resource limit is too restrictive.
Please figure out what kills your tail, using for example gdb or strace. Also check your environment, at least ulimit -a and dmesg for any clues.
If your description is correct, and tail actually displays
Killed
then it is probably not happening as a result of log rotation. Log rotation will causes tail to stop displaying new lines, but even if the file is deleted, tail will not be killed.
Rather, some other process on the system, or perhaps the kernel, is sending it a signal 9 (SIGKILL). Possible causes for this include:
A user in another terminal issuing a command such as kill -9 1234 or pkill -9 tail
Some other tool or daemon (although I can't think of any that would do this)
The kernel itself can send SIGKILL to your process. One scenario under which it would do this is if the OOM (Out of memory) killer kicked in. This happens when all RAM and swap in the system is used. The kernel will select a process which is using a lot of memory and kill it. If this was happening it would be visible in syslog, but it is quite unlikely that tail would use that much memory.
The kernel can send you SIGKILL if RLIMIT_CPU (the limit on the amount of CPU time your process has used) is exceeded. If you leave tail running for long enough, and you have a ulimit set, then this can happen. To check for this (and other resource limitations) use ulimit -a
In my opinion, either the first or last of these explanations seems most likely.
You need to use tail -F logfile , it will not get terminate if log file rotate.

Stracing to attach to a multi-threaded process

If I want to strace a multi-threaded process (of all of its threads), how should I do it?
I know that one can do strace -f to follow forked process? But how about attaching to a process which is already multi-threaded when I start stracing? Is a way to tell strace to trace all of system calls of all the threads which belong to this process?
2021 update
strace -fp PID just does the right thing on my system (Ubuntu 20.04.1 LTS). The strace manual page points this out:
-f Trace child processes as they are created by currently traced processes as a result of the fork(2), vfork(2) and clone(2) system
calls. Note that -p PID -f will attach all threads of process PID if it is multi-threaded, not only thread with thread_id = PID.
Looks like this text was added back in 2013. If -f had this behavior on my system at the time, I didn't realize it. It does now, though!
Original 2013 answer
I just did this in a kludgy way, by listing each tid to be traced.
You can find them through ps:
$ ps auxw -T | fgrep program_to_trace
me pid tid1 ...
me pid tid2 ...
me pid tid3 ...
me pid tid4 ...
and then, according to man strace, you can attach to multiple pids at once:
-p pid Attach to the process with the process ID pid and begin tracing. The trace may be terminated at any time by a keyboard interrupt
signal (CTRL-C). strace will respond by detaching itself from the traced process(es) leaving it (them) to continue running. Mul‐
tiple -p options can be used to attach to up to 32 processes in addition to command (which is optional if at least one -p option is
given).
It says pid, but iirc on Linux the pid and tid share the same namespace, and this appeared to work:
$ strace -f -p tid1 -p tid2 -p tid3 -p tid4
I think that might be the best you can do for now. But I suppose someone could extend strace with a flag for expanding tids. There would probably still be a race between finding the processes and attaching to them in which a freshly started one would be missed. It'd fit in with the existing caveat about strace -f:
-f Trace child processes as they are created by currently traced processes as a result of the fork(2) system call.
On non-Linux platforms the new process is attached to as soon as its pid is known (through the return value of fork(2) in the par‐
ent process). This means that such children may run uncontrolled for a while (especially in the case of a vfork(2)), until the par‐
ent is scheduled again to complete its (v)fork(2) call. On Linux the child is traced from its first instruction with no delay. If
the parent process decides to wait(2) for a child that is currently being traced, it is suspended until an appropriate child
process either terminates or incurs a signal that would cause it to terminate (as determined from the child's current signal dispo‐
sition).
On SunOS 4.x the tracing of vforks is accomplished with some dynamic linking trickery.
As answered in multiple comments, strace -fp <pid> will show the trace of all threads owned by that process - even ones that process already has spawned before strace begins.

I don't get coredump with all process

I try to get a coredump, so i use :
ulimit -c unlimited
I run my program in background, and I kill it :
kill -SEGV %1
But i just get :
[1]+ Exit 1 ./Test
And no coredumps are created.
I did the same with other programs and it works, so why that didn't work with all ? Anybody can help me ?
Thanks. (GNU/Linux, Debian 2.6.26)
If your program traps the SEGV signal and does something else, it won't invoke the OS core dump routine. Check that it doesn't do that.
Under Linux, processes which change their user ID using setuid, seteuid or some other parameters get excluded from dumping core for security reasons (Think: /bin/passwd dumps core while reading /etc/shadow into memory)
You can re-enable dumping core on Linux programs which change their user ID by calling prctl() after the change of UID
Also you might want to check that the program you're running is not changing its working directory ( chdir() ), because then it will create the core file in a different directory than the one you're running it from.
And you can try this too:
kill -ABRT pid
Try (as root):
sysctl kernel.core_pattern=core
and then repeat your experiment. On some systems that variable is set to /dev/null by default.
However, if you see exit status 1, perhaps the program indeed intercepts the signal.

Resources