Stracing to attach to a multi-threaded process - linux

If I want to strace a multi-threaded process (of all of its threads), how should I do it?
I know that one can do strace -f to follow forked process? But how about attaching to a process which is already multi-threaded when I start stracing? Is a way to tell strace to trace all of system calls of all the threads which belong to this process?

2021 update
strace -fp PID just does the right thing on my system (Ubuntu 20.04.1 LTS). The strace manual page points this out:
-f Trace child processes as they are created by currently traced processes as a result of the fork(2), vfork(2) and clone(2) system
calls. Note that -p PID -f will attach all threads of process PID if it is multi-threaded, not only thread with thread_id = PID.
Looks like this text was added back in 2013. If -f had this behavior on my system at the time, I didn't realize it. It does now, though!
Original 2013 answer
I just did this in a kludgy way, by listing each tid to be traced.
You can find them through ps:
$ ps auxw -T | fgrep program_to_trace
me pid tid1 ...
me pid tid2 ...
me pid tid3 ...
me pid tid4 ...
and then, according to man strace, you can attach to multiple pids at once:
-p pid Attach to the process with the process ID pid and begin tracing. The trace may be terminated at any time by a keyboard interrupt
signal (CTRL-C). strace will respond by detaching itself from the traced process(es) leaving it (them) to continue running. Mul‐
tiple -p options can be used to attach to up to 32 processes in addition to command (which is optional if at least one -p option is
given).
It says pid, but iirc on Linux the pid and tid share the same namespace, and this appeared to work:
$ strace -f -p tid1 -p tid2 -p tid3 -p tid4
I think that might be the best you can do for now. But I suppose someone could extend strace with a flag for expanding tids. There would probably still be a race between finding the processes and attaching to them in which a freshly started one would be missed. It'd fit in with the existing caveat about strace -f:
-f Trace child processes as they are created by currently traced processes as a result of the fork(2) system call.
On non-Linux platforms the new process is attached to as soon as its pid is known (through the return value of fork(2) in the par‐
ent process). This means that such children may run uncontrolled for a while (especially in the case of a vfork(2)), until the par‐
ent is scheduled again to complete its (v)fork(2) call. On Linux the child is traced from its first instruction with no delay. If
the parent process decides to wait(2) for a child that is currently being traced, it is suspended until an appropriate child
process either terminates or incurs a signal that would cause it to terminate (as determined from the child's current signal dispo‐
sition).
On SunOS 4.x the tracing of vforks is accomplished with some dynamic linking trickery.

As answered in multiple comments, strace -fp <pid> will show the trace of all threads owned by that process - even ones that process already has spawned before strace begins.

Related

What happens to other processes when a Docker container's PID1 exits?

Consider the following, which runs sleep 60 in the background and then exits:
$ cat run.sh
sleep 60&
ps
echo Goodbye!!!
$ docker run --rm -v $(pwd)/run.sh:/run.sh ubuntu:16.04 bash /run.sh
PID TTY TIME CMD
1 ? 00:00:00 bash
5 ? 00:00:00 sleep
6 ? 00:00:00 ps
Goodbye!!!
This will start a Docker container, with bash as PID1. It then fork/execs a sleep process, and then bash exits. When the Docker container dies, the sleep process somehow dies too.
My question is: what is the mechanism by which the sleep process is killed? I tried trapping SIGTERM in a child process, and that appears to not get tripped. My presumption is that something (either Docker or the Linux kernel) is sending SIGKILL when shutting down the cgroup the container is using, but I've found no documentation anywhere clarifying this.
EDIT The closest I've come to an explanation is the following quote from baseimage-docker:
If your init process is your app, then it'll probably only shut down itself, not all the other processes in the container. The kernel will then forcefully kill those other processes, not giving them a chance to gracefully shut down, potentially resulting in file corruption, stale temporary files, etc. You really want to shut down all your processes gracefully.
So at least according to this, the implication is that when the container exits, the kernel will sending a SIGKILL to all remaining processes. But I'd still like clarity on how it decides to do that (i.e., is it a feature of cgroups?), and ideally a more authoritative source would be nice.
OK, I seem to have come up with some more solid evidence that this is, in fact, the Linux kernel doing the terminating. In the clone(2) man page, there's this useful section:
CLONE_NEWPID (since Linux 2.6.24)
The first process created in a new namespace (i.e., the process
created using the CLONE_NEWPID flag) has the PID 1, and is the
"init" process for the namespace. Children that are orphaned
within the namespace will be reparented to this process rather than
init(8). Unlike the traditional init process, the "init" process of a
PID namespace can terminate, and if it does, all of the processes in
the namespace are terminated.
Unfortunately this is still vague on how exactly the processes in the namespace are terminated, but perhaps that's because, unlike a normal process exit, no entry is left in the process table. Whatever the case is, it seems clear that:
The kernel itself is killing the other processes
They are not killed in a way that allows them any chance to do cleanup, making it (almost?) identical to a SIGKILL

How to kill a process by its pid in linux

I'm new in linux and I'm building a program that receives the name of a process, gets its PID (i have no problem with that part) and then pass the PID to the kill command but its not working. It goes something like this:
read -p "Process to kill: " proceso
proid= pidof $proceso
echo "$proid"
kill $proid
Can someone tell me why it isn't killing it ? I know that there are some other ways to do it, even with the PID, but none of them seems to work for me. I believe it's some kind of problem with the Bash language (which I just started learning).
Instead of this:
proid= pidof $proceso
You probably meant this:
proid=$(pidof $proceso)
Even so,
the program might not get killed.
By default, kill PID sends the TERM signal to the specified process,
giving it a chance to shut down in an orderly manner,
for example clean up resources it's using.
The strongest signal to send a process to kill without graceful cleanup is KILL, using kill -KILL PID or kill -9 PID.
I believe it's some kind of problem with the bash language (which I just started learning).
The original line you posted, proid= pidof $proceso should raise an error,
and Bash would print an error message about it.
Debugging problems starts by reading and understanding the error messages the software is trying to tell you.
kill expects you to tell it **how to kill*, so there must be 64 different ways to kill your process :) They have names and numbers. The most lethal is -9. Some interesting ones include:
SIGKILL - The SIGKILL (also -9) signal forces the process to stop executing immediately. The program cannot ignore this signal. This process does not get to clean-up either.
SIGHUP - The SIGHUP signal disconnects a process from the parent process. This an also be used to restart processes. For example, "killall -SIGUP compiz" will restart Compiz. This is useful for daemons with memory leaks.
SIGINT - This signal is the same as pressing ctrl-c. On some systems, "delete" + "break" sends the same signal to the process. The process is interrupted and stopped. However, the process can ignore this signal.
SIGQUIT - This is like SIGINT with the ability to make the process produce a core dump.
use the following command to display the port and PID of the process:
sudo netstat -plten
AND THEN
kill -9 PID
Here is an example to kill a process running on port 8283 and has PID=25334
You have to send the SIGKILL flag with the kill statement.
kill -9 [pid]
If you don't the operating system will choose to kill the process at its convenience, SIGKILL (-9) will tell the os to kill the process NOW without ignoring the command until later.
Try this
kill -9
It will kill any process with PID given in brackets
Try "kill -9 $proid" or "kill -SIGKILL $proid" commands. If you want more information, click.
Based on what you have there, it looks like you aren't getting the actual PID in your proid variable. If you want to capture the output of pidof, you will need to enclose that command in backtics for the old form of command substitution ...
proid=`pidof $proceso`
... or like so for the new form of command substitution.
proid=$(pidof $proceso)
I had a similar problem, only wanting to run monitor (Video surveillance) for several hours a day.
Wrote two sh scripts;
cat startmotion.sh
#!/bin/sh
motion -c /home/username/.config/motion/motion.conf
And the second;
cat killmotion.sh
#!/bin/sh
OA=$(cat /var/run/motion/motion.pid)
kill -9 $OA
These were called from crontab at the scheduled time
ctontab -e
0 15 * * * /home/username/startmotion.sh
0 17 * * * /home/username/killmotion.sh
Very simple, but that's all I needed.

Simulating a process stuck in a blocking system call

I'm trying to test a behaviour which is hard to reproduce in a controlled environment.
Use case:
Linux system; usually Redhat EL 5 or 6 (we're just starting with RHEL 7 and systemd, so it's currently out of scope).
There're situations where I need to restart a service. The script we use for stopping the service usually works quite well; it sends a SIGTERM to the process, which is designed to handle it; if the process doesn't handle the SIGTERM within a timeout (usually a couple of minutes) the script sends a SIGKILL, then waits a couple minutes more.
The problem is: in some (rare) situations, the process doesn't exit after a SIGKILL; this usually happens when it's badly stuck on a system call, possibly because of a kernel-level issue (corrupt filesystem, or not-working NFS filesystem, or something equally bad requiring manual intervention).
A bug arose when the script didn't realize that the "old" process hadn't actually exited and started a new process while the old was still running; we're fixing this with a stronger locking system (so that at least the new process doesn't start if the old is running), but I find it difficult to test the whole thing because I haven't found a way to simulate an hard-stuck process.
So, the question is:
How can I manually simulate a process that doesn't exit when sending a SIGKILL to it, even as a privileged user?
If your process are stuck doing I/O, You can simulate your situation in this way:
lvcreate -n lvtest -L 2G vgtest
mkfs.ext3 -m0 /dev/vgtest/lvtest
mount /dev/vgtest/lvtest /mnt
dmsetup suspend /dev/vgtest/lvtest && dd if=/dev/zero of=/mnt/file.img bs=1M count=2048 &
In this way the dd process will stuck waiting for IO and will ignore every signal, I know the signals aren't ignore in the latest kernel when processes are waiting for IO on nfs filesystem.
Well... How about just not sending SIGKILL? So your env will behave like it was sent, but the process didn't quit.
Once a proces is in "D" state (or TASK_UNINTERRUPTIBLE) in a kernel code path where the execution can not be interrupted while a task is processed, which means sending any signals to the process would not be useful and would be ignored.
This can be caused due to device driver getting too many interrupts from the hardware, getting too many incoming network packets, data from NIC firmware or blocked on a HDD performing I/O. Normally if this happens very quickly and threads remain in this state for very short span of time.
Therefore what you need to be doing is look at the syslog and sar reports during the time when the process was stuck in D-state. If you find stack traces in the log, try to search kernel.bugzilla.org for similar issues or seek support from the Linux vendor.
I would code the opposite way. Have your server process write its pid in e.g. /var/run/yourserver.pid (this is common practice). Have the starting script read that file and test that the process does not exist e.g. with kill of signal 0, or with
yourserver_pid=$(cat /var/run/yourserver.pid)
if [ -f /proc/$yourserver_pid/exe ]; then
You could improve that by readlink /proc/$yourserver_pid/exe and comparing that to /usr/bin/yourserver
BTW, having a process still alive a few seconds after a SIGKILL is a serious situation (the common case when it could happen is if the process is stuck in a D state, waiting for some NFS server), and you probably should detect and syslog it (e.g. with logger in your script).
I also would try to first send SIGTERM, wait a few seconds, send SIGQUIT, wait a few seconds, and at last send SIGKILL and only a few seconds later test that the server process has gone
A bug arose when the script didn't realize that the "old" process hadn't actually exited and started a new process while the old was still running;
This is the bug in the OS/kernel level, not in your service script. The situation is rare and is hard to simulate because the OS is supposed to kill the process when SIGKILL signal happens. So I guess your goal is to let your script work well under a buggy kernel. Is that correct?
You can attach gdb to the process, SIGKILL won't remove such process from processlist but it will flag it as zombie, which might still be acceptable for your purpose.
void#tahr:~$ ping 8.8.8.8 > /tmp/ping.log &
[1] 3770
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 S 0:00 ping 8.8.8.8
void#tahr:~$ sudo gdb -p 3770
...
(gdb)
Other terminal
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 t 0:00 ping 8.8.8.8
sudo kill -9 3770
...
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 Z 0:00 [ping] <defunct>
First terminal again
(gdb) quit

How to monitor process status during process lifetime

I need to track the process status ps axf during executable lifetime.
Let's say I have executable main.exec and want to store into a file all subprocess which are called during main.exec execution.
$ main.exec &
$ echo $! # and redirect every ps change for PID $! in a file.
strace - trace system calls and signals
$ main.exec &
$ strace -f -p $! -o child.txt
-f Trace child processes as they are created by currently traced processes as a result of the fork(2), vfork(2) and clone(2) system calls. Note that -p PID -f will attach all threads of process PID if it is multi-threaded, not only thread with thread_id = PID.
If you can't recompile and instrument main.exec, ps in a loop is a simple option that may work for you:
while true; do ps --ppid=<pid> --pid=<pid> -o pid,ppid,%cpu,... >> mytrace.txt; sleep 0.2; done
Then parse the output accordingly.
top may also work, and can run in batch mode but not sure if you can get it to dynamically monitor child processes like ps. Don't think so.

Query on PID's in Strace command

I execute the following strace command with the intention of getting data about PID 13221
strace -fF -tT -all -o abc.txt -p 13221
However when the command executes and finishes I get output like below :
Process 13221 attached with 12 threads - interrupt to quit
Process 13252 attached
Process 13253 attached (waiting for parent)
Process 13253 resumed (parent 13252 ready)
Process 13252 suspended
Process 13252 resumed
Process 13253 detached
Process 13252 detached
Process 13232 detached
Process 13228 detached
Process 13225 detached
Process 13222 detached
Process 13221 detached
What are these extra PID's ? Are these the children of 13221 ? Who is creating them ?
Thanks.
What are these extra PID's ? Are these the children of 13221 ?
It must have been threads of your program. You have used "-f" in strace and this is why threads are also monitored.
How to know which of these are threads
If you run ls /proc/<PID>/task for your process you will get PIDs of all threads in your process (including a PID of the main thread).
It is simpler to do when you need to get thread PIDs comparing with running pstack for the same process. pstack is actually a gdb script, it stops a process when attaches. So it is simpler just to run ls /proc/<PID>/task

Resources