Simulating a process stuck in a blocking system call - linux

I'm trying to test a behaviour which is hard to reproduce in a controlled environment.
Use case:
Linux system; usually Redhat EL 5 or 6 (we're just starting with RHEL 7 and systemd, so it's currently out of scope).
There're situations where I need to restart a service. The script we use for stopping the service usually works quite well; it sends a SIGTERM to the process, which is designed to handle it; if the process doesn't handle the SIGTERM within a timeout (usually a couple of minutes) the script sends a SIGKILL, then waits a couple minutes more.
The problem is: in some (rare) situations, the process doesn't exit after a SIGKILL; this usually happens when it's badly stuck on a system call, possibly because of a kernel-level issue (corrupt filesystem, or not-working NFS filesystem, or something equally bad requiring manual intervention).
A bug arose when the script didn't realize that the "old" process hadn't actually exited and started a new process while the old was still running; we're fixing this with a stronger locking system (so that at least the new process doesn't start if the old is running), but I find it difficult to test the whole thing because I haven't found a way to simulate an hard-stuck process.
So, the question is:
How can I manually simulate a process that doesn't exit when sending a SIGKILL to it, even as a privileged user?

If your process are stuck doing I/O, You can simulate your situation in this way:
lvcreate -n lvtest -L 2G vgtest
mkfs.ext3 -m0 /dev/vgtest/lvtest
mount /dev/vgtest/lvtest /mnt
dmsetup suspend /dev/vgtest/lvtest && dd if=/dev/zero of=/mnt/file.img bs=1M count=2048 &
In this way the dd process will stuck waiting for IO and will ignore every signal, I know the signals aren't ignore in the latest kernel when processes are waiting for IO on nfs filesystem.

Well... How about just not sending SIGKILL? So your env will behave like it was sent, but the process didn't quit.

Once a proces is in "D" state (or TASK_UNINTERRUPTIBLE) in a kernel code path where the execution can not be interrupted while a task is processed, which means sending any signals to the process would not be useful and would be ignored.
This can be caused due to device driver getting too many interrupts from the hardware, getting too many incoming network packets, data from NIC firmware or blocked on a HDD performing I/O. Normally if this happens very quickly and threads remain in this state for very short span of time.
Therefore what you need to be doing is look at the syslog and sar reports during the time when the process was stuck in D-state. If you find stack traces in the log, try to search kernel.bugzilla.org for similar issues or seek support from the Linux vendor.

I would code the opposite way. Have your server process write its pid in e.g. /var/run/yourserver.pid (this is common practice). Have the starting script read that file and test that the process does not exist e.g. with kill of signal 0, or with
yourserver_pid=$(cat /var/run/yourserver.pid)
if [ -f /proc/$yourserver_pid/exe ]; then
You could improve that by readlink /proc/$yourserver_pid/exe and comparing that to /usr/bin/yourserver
BTW, having a process still alive a few seconds after a SIGKILL is a serious situation (the common case when it could happen is if the process is stuck in a D state, waiting for some NFS server), and you probably should detect and syslog it (e.g. with logger in your script).
I also would try to first send SIGTERM, wait a few seconds, send SIGQUIT, wait a few seconds, and at last send SIGKILL and only a few seconds later test that the server process has gone

A bug arose when the script didn't realize that the "old" process hadn't actually exited and started a new process while the old was still running;
This is the bug in the OS/kernel level, not in your service script. The situation is rare and is hard to simulate because the OS is supposed to kill the process when SIGKILL signal happens. So I guess your goal is to let your script work well under a buggy kernel. Is that correct?

You can attach gdb to the process, SIGKILL won't remove such process from processlist but it will flag it as zombie, which might still be acceptable for your purpose.
void#tahr:~$ ping 8.8.8.8 > /tmp/ping.log &
[1] 3770
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 S 0:00 ping 8.8.8.8
void#tahr:~$ sudo gdb -p 3770
...
(gdb)
Other terminal
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 t 0:00 ping 8.8.8.8
sudo kill -9 3770
...
void#tahr:~$ ps 3770
PID TTY STAT TIME COMMAND
3770 pts/13 Z 0:00 [ping] <defunct>
First terminal again
(gdb) quit

Related

What happens to other processes when a Docker container's PID1 exits?

Consider the following, which runs sleep 60 in the background and then exits:
$ cat run.sh
sleep 60&
ps
echo Goodbye!!!
$ docker run --rm -v $(pwd)/run.sh:/run.sh ubuntu:16.04 bash /run.sh
PID TTY TIME CMD
1 ? 00:00:00 bash
5 ? 00:00:00 sleep
6 ? 00:00:00 ps
Goodbye!!!
This will start a Docker container, with bash as PID1. It then fork/execs a sleep process, and then bash exits. When the Docker container dies, the sleep process somehow dies too.
My question is: what is the mechanism by which the sleep process is killed? I tried trapping SIGTERM in a child process, and that appears to not get tripped. My presumption is that something (either Docker or the Linux kernel) is sending SIGKILL when shutting down the cgroup the container is using, but I've found no documentation anywhere clarifying this.
EDIT The closest I've come to an explanation is the following quote from baseimage-docker:
If your init process is your app, then it'll probably only shut down itself, not all the other processes in the container. The kernel will then forcefully kill those other processes, not giving them a chance to gracefully shut down, potentially resulting in file corruption, stale temporary files, etc. You really want to shut down all your processes gracefully.
So at least according to this, the implication is that when the container exits, the kernel will sending a SIGKILL to all remaining processes. But I'd still like clarity on how it decides to do that (i.e., is it a feature of cgroups?), and ideally a more authoritative source would be nice.
OK, I seem to have come up with some more solid evidence that this is, in fact, the Linux kernel doing the terminating. In the clone(2) man page, there's this useful section:
CLONE_NEWPID (since Linux 2.6.24)
The first process created in a new namespace (i.e., the process
created using the CLONE_NEWPID flag) has the PID 1, and is the
"init" process for the namespace. Children that are orphaned
within the namespace will be reparented to this process rather than
init(8). Unlike the traditional init process, the "init" process of a
PID namespace can terminate, and if it does, all of the processes in
the namespace are terminated.
Unfortunately this is still vague on how exactly the processes in the namespace are terminated, but perhaps that's because, unlike a normal process exit, no entry is left in the process table. Whatever the case is, it seems clear that:
The kernel itself is killing the other processes
They are not killed in a way that allows them any chance to do cleanup, making it (almost?) identical to a SIGKILL

How to kill a process by its pid in linux

I'm new in linux and I'm building a program that receives the name of a process, gets its PID (i have no problem with that part) and then pass the PID to the kill command but its not working. It goes something like this:
read -p "Process to kill: " proceso
proid= pidof $proceso
echo "$proid"
kill $proid
Can someone tell me why it isn't killing it ? I know that there are some other ways to do it, even with the PID, but none of them seems to work for me. I believe it's some kind of problem with the Bash language (which I just started learning).
Instead of this:
proid= pidof $proceso
You probably meant this:
proid=$(pidof $proceso)
Even so,
the program might not get killed.
By default, kill PID sends the TERM signal to the specified process,
giving it a chance to shut down in an orderly manner,
for example clean up resources it's using.
The strongest signal to send a process to kill without graceful cleanup is KILL, using kill -KILL PID or kill -9 PID.
I believe it's some kind of problem with the bash language (which I just started learning).
The original line you posted, proid= pidof $proceso should raise an error,
and Bash would print an error message about it.
Debugging problems starts by reading and understanding the error messages the software is trying to tell you.
kill expects you to tell it **how to kill*, so there must be 64 different ways to kill your process :) They have names and numbers. The most lethal is -9. Some interesting ones include:
SIGKILL - The SIGKILL (also -9) signal forces the process to stop executing immediately. The program cannot ignore this signal. This process does not get to clean-up either.
SIGHUP - The SIGHUP signal disconnects a process from the parent process. This an also be used to restart processes. For example, "killall -SIGUP compiz" will restart Compiz. This is useful for daemons with memory leaks.
SIGINT - This signal is the same as pressing ctrl-c. On some systems, "delete" + "break" sends the same signal to the process. The process is interrupted and stopped. However, the process can ignore this signal.
SIGQUIT - This is like SIGINT with the ability to make the process produce a core dump.
use the following command to display the port and PID of the process:
sudo netstat -plten
AND THEN
kill -9 PID
Here is an example to kill a process running on port 8283 and has PID=25334
You have to send the SIGKILL flag with the kill statement.
kill -9 [pid]
If you don't the operating system will choose to kill the process at its convenience, SIGKILL (-9) will tell the os to kill the process NOW without ignoring the command until later.
Try this
kill -9
It will kill any process with PID given in brackets
Try "kill -9 $proid" or "kill -SIGKILL $proid" commands. If you want more information, click.
Based on what you have there, it looks like you aren't getting the actual PID in your proid variable. If you want to capture the output of pidof, you will need to enclose that command in backtics for the old form of command substitution ...
proid=`pidof $proceso`
... or like so for the new form of command substitution.
proid=$(pidof $proceso)
I had a similar problem, only wanting to run monitor (Video surveillance) for several hours a day.
Wrote two sh scripts;
cat startmotion.sh
#!/bin/sh
motion -c /home/username/.config/motion/motion.conf
And the second;
cat killmotion.sh
#!/bin/sh
OA=$(cat /var/run/motion/motion.pid)
kill -9 $OA
These were called from crontab at the scheduled time
ctontab -e
0 15 * * * /home/username/startmotion.sh
0 17 * * * /home/username/killmotion.sh
Very simple, but that's all I needed.

Does linux kill background processes if we close the terminal from which it has started?

I have an embedded system, on which I do telnet and then I run an application in background:
./app_name &
Now if I close my terminal and do telnet from other terminal and if I check then I can see this process is still running.
To check this I have written a small program:
#include<stdio.h>
main()
{
while(1);
}
I ran this program in my local linux pc in background and I closed the terminal.
Now, when I checked for this process from other terminal then I found that this process was also killed.
My question is:
Why undefined behavior for same type of process?
On which it is dependent?
Is it dependent on version of Linux?
Who should kill jobs?
Normally, foreground and background jobs are killed by SIGHUP sent by kernel or shell in different circumstances.
When does kernel send SIGHUP?
Kernel sends SIGHUP to controlling process:
for real (hardware) terminal: when disconnect is detected in a terminal driver, e.g. on hang-up on modem line;
for pseudoterminal (pty): when last descriptor referencing master side of pty is closed, e.g. when you close terminal window.
Kernel sends SIGHUP to other process groups:
to foreground process group, when controlling process terminates;
to orphaned process group, when it becomes orphaned and it has stopped members.
Controlling process is the session leader that established the connection to the controlling terminal.
Typically, the controlling process is your shell. So, to sum up:
kernel sends SIGHUP to the shell when real or pseudoterminal is disconnected/closed;
kernel sends SIGHUP to foreground process group when the shell terminates;
kernel sends SIGHUP to orphaned process group if it contains stopped processes.
Note that kernel does not send SIGHUP to background process group if it contains no stopped processes.
When does bash send SIGHUP?
Bash sends SIGHUP to all jobs (foreground and background):
when it receives SIGHUP, and it is an interactive shell (and job control support is enabled at compile-time);
when it exits, it is an interactive login shell, and huponexit option is set (and job control support is enabled at compile-time).
See more details here.
Notes:
bash does not send SIGHUP to jobs removed from job list using disown;
processes started using nohup ignore SIGHUP.
More details here.
What about other shells?
Usually, shells propagate SIGHUP. Generating SIGHUP at normal exit is less common.
Telnet or SSH
Under telnet or SSH, the following should happen when connection is closed (e.g. when you close telnet window on PC):
client is killed;
server detects that client connection is closed;
server closes master side of pty;
kernel detects that master pty is closed and sends SIGHUP to bash;
bash receives SIGHUP, sends SIGHUP to all jobs and terminates;
each job receives SIGHUP and terminates.
Problem
I can reproduce your issue using bash and telnetd from busybox or dropbear SSH server: sometimes, background job doesn't receive SIGHUP (and doesn't terminate) when client connection is closed.
It seems that a race condition occurs when server (telnetd or dropbear) closes master side of pty:
normally, bash receives SIGHUP and immediately kills background jobs (as expected) and terminates;
but sometimes, bash detects EOF on slave side of pty before handling SIGHUP.
When bash detects EOF, it by default terminates immediately without sending SIGHUP. And background job remains running!
Solution
It is possible to configure bash to send SIGHUP on normal exit (including EOF) too:
Ensure that bash is started as login shell. The huponexit works only for login shells, AFAIK.
Login shell is enabled by -l option or leading hyphen in argv[0]. You can configure telnetd to run /bin/bash -l or better /bin/login which invokes /bin/sh in login shell mode.
E.g.:
telnetd -l /bin/login
Enable huponexit option.
E.g.:
shopt -s huponexit
Type this in bash session every time or add it to .bashrc or /etc/profile.
Why does the race occur?
bash unblocks signals only when it's safe, and blocks them when some code section can't be safely interrupted by a signal handler.
Such critical sections invoke interruption points from time to time, and if signal is received when a critical section is executed, it's handler is delayed until next interruption point happens or critical section is exited.
You can start digging from quit.h in the source code.
Thus, it seems that in our case bash sometimes receives SIGHUP when it's in a critical section. SIGHUP handler execution is delayed, and bash reads EOF and terminates before exiting critical section or calling next interruption point.
Reference
"Job Control" section in official Glibc manual.
Chapter 34 "Process Groups, Sessions, and Job Control" of "The Linux Programming Interface" book.
When you close the terminal, shell sends SIGHUP to all background processes – and that kills them. This can be suppressed in several ways, most notably:
nohup
When you run program with nohup it catches SIGHUP and redirect program output.
$ nohup app &
disown
disown tells shell not to send SIGHUP
$ app &
$ disown
Is it dependent on version of linux?
It is dependent on your shell. Above applies at least for bash.
AFAIK in both cases the process should be killed. In order to avoid this you have to issue a nohup like the following:
> nohup ./my_app &
This way your process will continue executing. Probably the telnet part is due to a BUG similar to this one:
https://bugzilla.redhat.com/show_bug.cgi?id=89653
In order completely understand whats happening you need to get into unix internals a little bit.
When you are running a command like this
./app_name &
The app_name is sent to background process group. You can check about unix process groups here
When you close bash with normal exit it triggers SIGHUP hangup signal to all its jobs. Some information on unix job control is here.
In order to keep your app running when you exit bash you need to make your app immune to hangup signal with nohup utility.
nohup - run a command immune to hangups, with output to a non-tty
And finally this is how you need to do it.
nohup app_name & 2> /dev/null;
In modern Linux--that is, Linux with systemd--there is an additional reason this might happen which you should be aware of: "linger".
systemd kills processes left running from a login shell, even if the process is properly daemonized and protected from HUP. This is the default behavior in modern configurations of systemd.
If you run
loginctl enable-linger $USER
you can disable this behavior, allowing background processes to keep running. The mechanisms covered by the other answers still apply, however, and you should also protect your process against them.
enable-linger is permanent until it is re-disabled. You can check it with
ls /var/lib/systemd/linger
This may have files, one per username, for users who have enable-linger. Any user listed in the directory has the ability to leave background processes running at logout.

How to kill this immortal nginx worker?

I have started nginx and when I stop like root
/etc/init.d/nginx stop
after that I type
ps aux | grep nginx
and get response like tcp LISTEN 2124 nginx WORKER
kill -9 2124 # tried with kill -QUIT 2124, kill -KILL 2124
and after I type again
ps aux | grep nginx
and get response like tcp LISTEN 2125 nginx WORKER
and so on.
How to kill this immortal Chuck Norris worker ?
After kill -9 there's nothing more to do to the process - it's dead (or doomed to die). The reason it sticks around is because either (a) it's parent process hasn't waited for it yet, so the kernel holds the process table entry to keep it's status until the parent does so, or (b) the process is stuck on a system call into the kernel that is not finishing (which usually means a buggy driver and/or hardware).
If the first case, getting the parent to wait for the child, or terminating the parent, should work. Most programs don't have a clear way to make them "wait for a child", so that may not be an option.
In the second case, the most likely solution is to reboot. There may be tools that could clear such a condition, but that's not common. Depending on just what that kernel processing is doing, it may be possible to get it to unblock by other means - but that requires knowledge of that processing. For example, if the process is blocked on a kernel lock that some other process is somehow holding indefinitely, terminating that other process could aleviate the problem.
Note that the ps command can distinguish these two states as well. These show up in the 'Z' state. See the ps man page for more info: http://linux.die.net/man/1/ps. They may also show up with the text "defunct".
I had the same issue.
In my case gitlab was the responsible to bring the nginx workers.
when i completelly removed gitlab from my server i got able to kill the nginx workers.
ps -aux | grep "nginx"
Search for the workers and check at the first column who is bringing them up.
kill or unistall the responsible and kill the workers again, they will stop spawning ;D
I was having similar issue.
Check if you are using any auto-healer like Monit or Supervisor which runs the worker whenever you try to stop them. If Yes Disable them.
My workers were being spawned due to changes I forget i made in update-rc.d in Ubuntu.
So I installed sysv-rc-conf which gives a clean interface control of what processes are on reboot, you can disable from there and I assure you no Chuck Noris Resurrection :D

Stracing to attach to a multi-threaded process

If I want to strace a multi-threaded process (of all of its threads), how should I do it?
I know that one can do strace -f to follow forked process? But how about attaching to a process which is already multi-threaded when I start stracing? Is a way to tell strace to trace all of system calls of all the threads which belong to this process?
2021 update
strace -fp PID just does the right thing on my system (Ubuntu 20.04.1 LTS). The strace manual page points this out:
-f Trace child processes as they are created by currently traced processes as a result of the fork(2), vfork(2) and clone(2) system
calls. Note that -p PID -f will attach all threads of process PID if it is multi-threaded, not only thread with thread_id = PID.
Looks like this text was added back in 2013. If -f had this behavior on my system at the time, I didn't realize it. It does now, though!
Original 2013 answer
I just did this in a kludgy way, by listing each tid to be traced.
You can find them through ps:
$ ps auxw -T | fgrep program_to_trace
me pid tid1 ...
me pid tid2 ...
me pid tid3 ...
me pid tid4 ...
and then, according to man strace, you can attach to multiple pids at once:
-p pid Attach to the process with the process ID pid and begin tracing. The trace may be terminated at any time by a keyboard interrupt
signal (CTRL-C). strace will respond by detaching itself from the traced process(es) leaving it (them) to continue running. Mul‐
tiple -p options can be used to attach to up to 32 processes in addition to command (which is optional if at least one -p option is
given).
It says pid, but iirc on Linux the pid and tid share the same namespace, and this appeared to work:
$ strace -f -p tid1 -p tid2 -p tid3 -p tid4
I think that might be the best you can do for now. But I suppose someone could extend strace with a flag for expanding tids. There would probably still be a race between finding the processes and attaching to them in which a freshly started one would be missed. It'd fit in with the existing caveat about strace -f:
-f Trace child processes as they are created by currently traced processes as a result of the fork(2) system call.
On non-Linux platforms the new process is attached to as soon as its pid is known (through the return value of fork(2) in the par‐
ent process). This means that such children may run uncontrolled for a while (especially in the case of a vfork(2)), until the par‐
ent is scheduled again to complete its (v)fork(2) call. On Linux the child is traced from its first instruction with no delay. If
the parent process decides to wait(2) for a child that is currently being traced, it is suspended until an appropriate child
process either terminates or incurs a signal that would cause it to terminate (as determined from the child's current signal dispo‐
sition).
On SunOS 4.x the tracing of vforks is accomplished with some dynamic linking trickery.
As answered in multiple comments, strace -fp <pid> will show the trace of all threads owned by that process - even ones that process already has spawned before strace begins.

Resources