Process stuck in exit, shows as zombie but cannot be reaped - linux

I have a process that's monitored by its parent. The child encountered an error that caused it to call abort. The process does not tamper with the abort process, so it should proceed as expected (dump core, terminate). The parent is supposed to detect the child's termination and trigger a series of events to respond to the failure. The child is multi-threaded and complex.
Here's what I see from ps:
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 4929 1272 20 0 85440 6792 wait S+ pts/2 0:00 rxd
1 1000 4930 4929 20 0 0 0 exit Zl+ pts/2 38:21 [rxd] <defunct>
So the child (4930) has terminated. It is a zombie. I cannot attach to it, as expected. However, the parent (4929) stays blocked in:
int i;
// ...
waitpid (-1, &i, 0);
So it seems like the child is a zombie but somehow has not completed everything necessary for its parent to reap it. The WCHAN field of exit is, I think, a valuable clue.
The platform is 64-bit Linux, Ubuntu 13.04, kernel 3.8.0-30. The child doesn't appear to be dumping core or doing anything. I've left the system for several minutes and nothing changed.
Does anyone have any ideas what might be causing this or what I can do about it?
Update: Another interesting bit of information -- if I kill -9 the parent process, the child goes away. This is kind of baffling, since the parent process is trivial, just blocking in waitpid. Also, I don't get any core dump (from the child) when this problem happens.
Update: It seems the child is stuck in schedule, called from exit_mm, called from do_exit. I wonder why exit_mm would call schedule. And I wonder why killing the parent would unstick it.

I finally figured it out! The process was actually doing useful work all this time. The process held the last reference to a large file on a slow filesystem. When the process terminates, the last reference to the file is release, forcing the OS to reclaim the space. The file was so large that this required tens of thousands of I/O operations, taking 10 minutes or more.

Related

Why do we need a wait() system call?

Hello I am new to learning about system calls. I am currently learning about fork() and wait() system calls. I know that fork() creates a new child process. What confuses me is the wait() call.
This is what I understand so far:
(1) When a process dies, it goes into a 'Zombie State' i.e. it does not release its PID but waits for its parent to acknowledge that the child process has died and then the PID is released
(2) So we need a way to figure out when the child process has ended so that we don't leave any processes in the zombie state
I am confused with the following things:
(1) When running a C program where I fork a new child process, if I don't call wait() explicitly, is it done internally when the child process ends? Because you could still write a block of code in C where you run fork() without wait() and it seems to work fine?
(2) What does wait() do? I know it returns the PID of the child process that was terminated, but how is this helpful/related to releasing the PID of the terminated process?
I am sorry for such naive questions but this is something I was really curious about and I couldn't find any good resources online! Your help is much appreciated!
wait isn't about preventing zombie states. Zombie states are your friend.
POSIX more or less lets you do two things with pids: signal them with kill or reap them (and synchronize with them) with wait/waitpid/waittid.
The wait syscalls are primarily for waiting on a process to exit or die from a signal (though they can also be used to wait on other process status changes such as the child becoming stopped or the child waking up from being stopped).
Secondarily, they're about reaping exit/died statuses, thereby releasing (zombified) pids.
Until you release a pid with wait/waitpid/waittid, you can continue flogging the pid with requests for it to die (kill(pid,SIGTERM);) or with some other signal (other then SIGKILL) and you can rest assured the pid represents the process you've forked off and that you're not accidentally killing someone else's process.
But once you reap a zombified pid by waiting on it, then the pid is no longer yours and another process might take it (which typically happens after some time, as pids in the system typically increment and then wrap arround).
That's why auto-wait would be a bad idea (in some cases it isn't and then you can achieve it with globally with signal(SIGCHLD,SIG_IGN);) and why (short-lived) zombies states are your friend. They keep the child pid stable for you until you're ready to release it.
If you exit without releasing any of your children's pids, then you don't have to worry about zombie children anymore--your child processes will be reparented to the init process, which will wait on them for you when they die.
When you call fork(), a new process is created with you being its parent. When the child process finishes its running with a call to exit(), its process descriptor is still kept in the kernel's memory. It is your responsibility as its parent to collect its exit code, which is done with a call to wait() syscall. wait() blocks the parent process until one of its childrens is finished.
Zombie process is the name given to a process whose exit code was never collected by its parent.
Regarding to your first question - wait() is not called automatically as zombie processes wouldn't exist if it did. It is your responsibility as a programmer. Omitting the call to wait() will still work as you mentioned - but it is considered a bad practice.
Both this link and this link explains it good.

Why Linux process is defunct but its parent process is still alive?

In man page of ps it describes defunct as "Defunct ("zombie") process, terminated but not reaped by its parent".
The process use pipes to communicate with its parent. In the logs I verified it has closed pipes and exited normally.
Why it becomes defunct instead of properly destroyed?
Is there something the parent process can do to avoid such situations?
In top this process has no allocated memory, which confirms it has exited normally.
osysops 42884 42820 0 06:55 ? 00:00:03 [lecw]

What special precautions must I make for docker apps running as pid 1?

From what I gather, programs that run as pid 1 may need to take special precautions such as capturing certain signals.
It's not altogether clear how to correctly write a pid 1. I'd rather not use runit or supervisor in my case. For example, supervisor is written in python and if you install that, it'll result in a much larger container. I'm not a fan of runit.
Looking at the source code for runit is intersting but as usual, comments are virtually non-existent and don't explain what's being done for what reason.
There is a good discussion here:
When the process with pid 1 die for any reason, all other processes
are killed with KILL signal
When any process having children dies for any reason, its children are reparented to process with PID 1
Many signals which have default action of Term do not have one for PID 1.
The relevant part for your question:
you can’t stop process by sending SIGTERM or SIGINT, if process have not installed a signal handler

Node-forever restartall and child processes

We have a main node process that spawns a number of child processes with child_process.fork(), which themselves each spawn a helper child process. We are using node-forever to manage the lifetime of the main node processes, and often use forever restartall to restart this.
One problem we are seeing occasionally is that the grandchild processes will fail to terminate, and we end up with duplicated child processes running. Ie. what should be this:
Main Process
Child Process 1
Grandchild Process 1
Child Process 2
Grandchild Process 2
Ends up like this after restartall:
Main Process
Child Process 1
Grandchild Process 1
Child Process 2
Grandchild Process 2
Grandchild Process 1
Grandchild Process 2
Unsurprisingly this causes lots of weird problems and we usually have to restart the whole server (or kill processes manually, if we can establish which are the old ones).
As I understand it, forever issues a SIGTERM message to the process when it does a restartall. I believe this message should cascade down to the child and grandchild processes (but please correct me if I've made a false assumption there). Since this problem only occurs maybe once in 100, perhaps it's something related to timing?
What circumstances could cause the grandchild processes to fail to terminate? How to mitigate this?
OS is Debian Squeeze.
EDIT: My initial description was a bit over simplified. I've updated it to include all the details.
EDIT2 : We don't use forever anymore. I recommend PM2

Kill process '[avconv] <defunct>'

I have more than 30 process '[avconv] ' (i have a bug in script), With this command i find these process :
Ps aux | grep '\[avconv\] <defunct>'
but i don't know how to kill these process, anyone have an idea to kill these process ?
Thanks
A <defunct> process is a process that has already terminated, and hence cannot be killed, but for which the parent has not yet invoked one of the wait system calls (wait, wait3, wait4, waitpid, etc...) to read its exit status. As a result, the process information is retained by the system in case the parent eventually does try to obtain its status. Such processes disappear when the parent reads their exit status.
These <defunct> processes also disappear when the parent is killed, as the init process will take ownership of the process and obtain (and discard) its status.
You can avoid <defunct> processes by ensuring you issue as many wait system calls as you issue fork calls.
Alternatively, as J.F. Sebastian points out, you can also avoid <defunct> processes by either setting the SIGCHLD signal disposition to SIG_IGN (ignore the signal) or by using the SA_NOCLDWAIT flag when registering a SIGCHLD signal handler (or when resetting the default disposition with SIG_DFL) using sigaction. In this case, however, the child's exit status will not be made available to the parent - it is simply discarded.

Resources