PID re-use, edge case? - linux

I read that PID reuse is done when process is zombie and was waited for.
What happens if I forked hundreds of processes and just killed the parent without waiting for them? I know they will be children of init process but who will call wait on them? (else we are in trouble and limited with PIDs).

I know they will be children of init process but who will call wait on them?
init (process with PID 1) will reap all of its child processes, including adpoted zombie processes. As it says on https://en.wikipedia.org/wiki/Zombie_process
init periodically executes the wait system call to reap any zombies
with init as parent.

Related

Why do we need a wait() system call?

Hello I am new to learning about system calls. I am currently learning about fork() and wait() system calls. I know that fork() creates a new child process. What confuses me is the wait() call.
This is what I understand so far:
(1) When a process dies, it goes into a 'Zombie State' i.e. it does not release its PID but waits for its parent to acknowledge that the child process has died and then the PID is released
(2) So we need a way to figure out when the child process has ended so that we don't leave any processes in the zombie state
I am confused with the following things:
(1) When running a C program where I fork a new child process, if I don't call wait() explicitly, is it done internally when the child process ends? Because you could still write a block of code in C where you run fork() without wait() and it seems to work fine?
(2) What does wait() do? I know it returns the PID of the child process that was terminated, but how is this helpful/related to releasing the PID of the terminated process?
I am sorry for such naive questions but this is something I was really curious about and I couldn't find any good resources online! Your help is much appreciated!
wait isn't about preventing zombie states. Zombie states are your friend.
POSIX more or less lets you do two things with pids: signal them with kill or reap them (and synchronize with them) with wait/waitpid/waittid.
The wait syscalls are primarily for waiting on a process to exit or die from a signal (though they can also be used to wait on other process status changes such as the child becoming stopped or the child waking up from being stopped).
Secondarily, they're about reaping exit/died statuses, thereby releasing (zombified) pids.
Until you release a pid with wait/waitpid/waittid, you can continue flogging the pid with requests for it to die (kill(pid,SIGTERM);) or with some other signal (other then SIGKILL) and you can rest assured the pid represents the process you've forked off and that you're not accidentally killing someone else's process.
But once you reap a zombified pid by waiting on it, then the pid is no longer yours and another process might take it (which typically happens after some time, as pids in the system typically increment and then wrap arround).
That's why auto-wait would be a bad idea (in some cases it isn't and then you can achieve it with globally with signal(SIGCHLD,SIG_IGN);) and why (short-lived) zombies states are your friend. They keep the child pid stable for you until you're ready to release it.
If you exit without releasing any of your children's pids, then you don't have to worry about zombie children anymore--your child processes will be reparented to the init process, which will wait on them for you when they die.
When you call fork(), a new process is created with you being its parent. When the child process finishes its running with a call to exit(), its process descriptor is still kept in the kernel's memory. It is your responsibility as its parent to collect its exit code, which is done with a call to wait() syscall. wait() blocks the parent process until one of its childrens is finished.
Zombie process is the name given to a process whose exit code was never collected by its parent.
Regarding to your first question - wait() is not called automatically as zombie processes wouldn't exist if it did. It is your responsibility as a programmer. Omitting the call to wait() will still work as you mentioned - but it is considered a bad practice.
Both this link and this link explains it good.

What would cause a SIGTERM to not propagate to child processes?

I have a process on Linux that starts up 20 child processes via fork. When I kill the parent process, it will often kill all of the child processes, but sometimes it doesn't kill all of them, and I'm left with some orphaned processes. This isn't a race condition on startup, this is after the processes have been active for several minutes.
What sort of things could cause SIGTERM to not propagate to some child processes properly?
There is no automatic propagation of signals (SIGTERM or otherwise) to children in the process tree.
Inasmuch as killing a parent process can be observed to cause some children to exit, this is due to ancillary effects -- such as SIGPIPEs being caused when the child attempts to read or write to a pipeline with the dead parent on the other side.
If you want to ensure that children are cleaned up when your process receives a SIGTERM, install a signal handler and do it yourself.
If you use process group id (pgid) when sending a signal, the signal would be propagated to parent process and all its children.
To know pgid, use ps a -o pgid,command.

Why is a zombie process necessary?

Wikipedia basically gives all the possible information about zombie processes that I NEED to know but just a simple line on how it might be useful..in that a conflict in PIDs will not exist in the event the parent process creates another child process.
How is this then actually "useful"? Wouldn't the PID be then available if the named zombie process were to be removed instead of being kept there?
Or are there any other reasons as to why the zombie process should exist?
Zombie processes are actually really important and definitely need to exist. First it's important to understand how process creation works in Unix/Linux. The only way to create a new process is for an existing process to create a new child process via fork(). In this way, all of the processes on the system are arranged in a nice orderly tree heirarchy. Try running ps -Hu <your username> on a Linux system to see the heirarchy of processes that you own.
In many programs it is critically important for a parent process to be able to obtain basic information about its child processes that have exited. This basic information includes the exit status and resource usage of the child. When the parent is ready to get information about a dead child process it calls one of the wait() functions to wait for a child to exit and obtain exit status and resource usage info.
But what happens if a child process exits before the parent waits for it? This is where zombie processes become necessary. The operating system can't just discard the child process; the operation of the parent process may be dependent upon knowing the exit status or resource usage of the child. i.e. The parent process might need to know that the child exited abnormally, or it might be collecting CPU usage statistics for its children, etc. So, the only choice is to save off that information and make it available to the parent when it finally does call wait(). This information is what a zombie process is and it's a critical part of how process management works on Unix/Linux. Zombie processes allow the parent to be guaranteed to be able to retreive exit status, accounting information, and process id for child processes, regardless of whether the parent calls wait() before or after the child process exits.
This is why a zombie process is necessary.
Footnote: If the parent process never calls wait(), then the child process is reparented to the init process when the parent process dies, and init will wait() for the child.
The answer is on Wikipedia as well, which is:
This entry is still needed to allow the parent process to read its
child's exit status.
Zombie processes are useful.
Zombie processes allow the parent to be guaranteed to be able to retrieve exit status, accounting information, and process id of the child processes.
A process that doesn't clean up its child zombies isn't programmed properly.

Zombie Threads on POSIX systems

How do zombie threads get formed in C/C++, and what do you need to make sure to do in order to prevent them from being created? I know they're just normal threads that didn't terminate properly, but I'm a little hazy on the specifics.
A zombie thread is a joinable thread which has terminated, but which
hasn't been joined. Normally, either a thread should be joined at some
time, or it should be detached. Otherwise, the OS maintains its state
for some possible future join, which takes resources.
Do you mean pthreads or zombie processes? A zombie process (not thread) gets created when a parent doesn't reap its child. It's because the OS keeps the return state of the process if the parent needs it later. If the parent dies, the child is given to the init thread which just sits and calls "wait" over and over again (reaping any children that die). So a zombie process can only be created when the parent is still alive and the child has terminated.
The same applies for pthreads. If you detach the thread, it will not keep that process termination state around after it finishes (similar to processes).

Difference between SIGKILL SIGTERM considering process tree

What is the difference between SIGTERM and SIGKILL when it comes to the process tree?
When a root thread receives SIGKILL does it get killed cleanly or does it leave it's child threads as zombies?
Is there any signal which can be sent to a root thread to cleanly exit by not leaving any zombie threads ?
Thanks.
If you kill the root process (parent process), this should make orphan children, not zombie children. orphan children are made when you kill a process's parent, and the kernel makes init the parent of orphans. init is supposed to wait until orphan dies, then use wait to clean it up.
Zombie children are created when a process (not its parent) ends and its parent does not take up its exit status from the process table.
It sounds to me like you are worried about leaving orphans because by definition, when you kill a zombies parent process, the zombie child itself dies.
To kill your orphans, use kill -9 , which is the equivalent SIGKILL.
Here is a more in depth tutorial for killing stuff on linux:
http://riccomini.name/posts/linux/2012-09-25-kill-subprocesses-linux-bash/
You can't control that by signal; only its parent process can control that, by calling waitpid() or setting signal handlers for SIGCHLD. See SIGCHLD and SA_NOCLDWAIT in the sigaction(2) manpage for details.
Also, what happens to child threads depends on the Linux kernel version. With 2.6's POSIX threads, killing the main thread should cause the other threads to exit cleanly. With 2.4 LinuxThreads, each thread is actually a separate process and SIGKILL doesn't give the root thread a chance to tell the others to shut down, whereas SIGTERM does.

Resources