Impact of zombies processes on an embedded linux [closed]

Impact of zombies processes on an embedded linux [closed] - linux

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm developing a program (Grand parent process) that automatically relaunch a process (parent process) that calls two other processes (children processes) in case of errors.
If one of the children processes misbehave, the parent process try to close the application gracefully and the grand parent process restart everything. However, in case of bug or unexpected behavior, the grand parent process :
kills the parent process (which kills all the children)
Restart the parent process
Due to probably a problem in my code, the parents processes survive as zombies and sometime I find my embedded linux with 12 or 20 zombies. I know that zombies use very little ressources (if I'm not mistaken : only their entry into the process table).
My question, is there a theoretical limit to zombies number ?

My question, is there a theoretical limit to zombies number ?
Yes. It is whatever the maximum size of the kernel's process table is. This will vary from kernel to kernel and according to adjustable kernel parameters, but it is likely to be at least in the thousands.
But as long as we're here, let's address a couple of other things:
the grand parent process [...] kills the parent process (which kills all the children)
Killing a process does not automatically kill its children. They will be inherited by process ID 1, which you can observe in the process list if the processes live long enough. Cleaning those up after they terminate is one of the responsibilities of process 1, which may be why you have the impression that you are killing the grandchildren -- you probably don't see them left behind as zombies.
If you want to forcibly kill the children along with their parent, then you should be able to do so by putting the parent process in its own process group, and killing the whole group. (You need a separate process group so that the grandparent does not kill itself.)
Due to probably a problem in my code, the parents processes survive as zombies
This happens when a process's parent continues to run but does not wait(), waitpid(), or waitid() for the process after it terminates. In fact, that's closely related to zombie processes: they are indeed very light, because all they carry is the data that could be reported via one of those functions. Thus, "survive" is not a particularly apt description: a zombie process is no longer running; all that remains is some data about how it terminated.

I believe the only negative effect of keeping Zombie processes around is that they take up space in the kernel process table. The max number of zombies you can keep around should be the max number of processes your kernel supports, which you can query with cat /proc/sys/kernel/pid_max

Related

Are zombie processes picked up by init eventually or not?

I am reading book The Linux Programming Interface, I see the following 2 paragraphs about zombie process sort of conflicting:
When the parent does perform a wait(), the kernel removes the zombie,
since the last remaining information about the child is no longer
required. On the other hand, if the parent terminates without doing a
wait(), then the init process adopts the child and automatically
performs a wait(), thus removing the zombie process from the system.
If a parent creates a child, but fails to perform a wait(), then an
entry for the zombie child will be maintained indefinitely in the
kernel’s process table. If a large number of such zombie children are
created, they will eventually fill the kernel process table,
preventing the creation of new processes.
The first paragraph says, even the parent process did not call wait() and terminated the long running child process, init would pick the zombie process and clear the child process off upon its termination.
The following paragraph says, if the parent process does not call wait(), then its long running child processes will become zombie and occupy kernel process table forever.
I am confused - how can the situation of 2nd paragraph happens if init takes care of picking up zombie process? Or in what situation will init miss picking up zombie?
[Update] Yes, I misread the text, thanks Barmar, these 2 paragraphs talk about different scenarios.

Actually if the init was not there then the situation of second paragraph would happen. The init runs periodically that takes care of the zombie process.
More can be found here

why not reap child processes automatically?

I'm a beginner in Operating Systems and Linux, just a question on zombie processes.
I don't understand why parent processes need to reap child processes? Can't Linux just be designed to behave like: whenever a child process is terminated, it is going to be reaped automatically immediately without waiting for its parent process, which can save programers' time? Another question is, why a zombie process still consume system memory resources, isn't that it is already terminated, nothing needs to be maintained?

How linux kernel decide the next thread id

I have a question regarding linux kernel scheduling.
We know that, usually, linux maintains the current largest pid. If we want to start a new process, the kernel will use that largest id. So, if we kill and restart a new process, the process id are not sequential. Linux will use the largest id until it hits a limit.
But my question is how linux decides thread ID.
Say, process A and B are running. Process A crashes but process B is spawning new threads. Will process B just reuse that old tid belonging to process A, or, process B will also use the largest id as tid. Which case is more often? Do we have documents?
Thanks.

The kernel sets a maximum number of process/thread ids and simply recycles identifiers when the threads are garbage collected. So if process B spawns enough threads it will eventually reclaim thread ids from process A assuming it has been properly destroyed
Edit: Here are some links that can provide you with more specific answers
Difference between pid and tid
https://stackoverflow.com/a/8787888/5768168
"what is the value range of thread and process id?"
what is the value range of thread and process id?
"Linux PID recycling"
https://stackoverflow.com/a/11323428/5768168
"Process identifer"
https://en.wikipedia.org/wiki/Process_identifier#Unix-like
"The Linux kernel: Processes"
https://www.win.tue.nl/~aeb/linux/lk/lk-10.html

It sounds like you need to run your threads in with a PTHREAD_CREATE_JOINABLE attribute passed to pthread_create(), then have one reaper thread in your process dedicated to using pthread_join() or pthread_tryjoin() to wait for terminated threads. Rather than having an outside process trying to sort it out, have your process record the PID/TID pair after pthread_create() succeeds and have the reaper thread remove the pair when it detects the thread has terminated.
I typically combined that with a main thread that did nothing but spawn the thread-creation and reaper threads, then wait for a termination signal and terminate the thread-creator and reaper. The thread-creator stops immediately when signaled, the reaper stops when no more unterminated threads are running, the main thread terminates when both the thread-creator and reaper threads can be pthread_join()'d. Since the main thread's so simple it's unlikely to crash, which means most crashes in work threads simply deliver them to the reaper. If you want absolute certainty, your outside process should be the one to start your main process, then it can use wait() or it's siblings to monitor whether the main process has terminated (normally or by crashing).

Proper way to use fork() and wait()

I have just started learning about fork and wait in Linux and came across this paragraph in the wait() manual page notes:
A child that terminates, but has not been waited for becomes a "zombie". The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child. As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes. If a parent process terminates, then its "zombie" children (if any) are adopted by init(8), which automatically performs a wait to remove the zombies.
A question that came to mind after reading this:
Isn't the fact that not using wait() causes a resource waste until the parent terminates, a problem that amplifies when the parent process is meant to be a long lived process in the system?
Does this means I should always use wait() as soon as possible after using fork?

Isn't the fact that not using wait() will cause a resource waste until
the parent will terminate?
When a child process is running, there's no wastage of resource; it's still doing its task. The resource waste that your citation talks about is only when a child dies but it's parent hasn't reaped it yet i.e. not wait()ed on the child process.
a problem that amplifies when the parent process is meant to be a long
lived process in the system?
When your application runs for a very longtime and keeps forking children, there's a chance that the system might run out of resources when many child process are still running or the parent process didn't reap the exited children. It's the job of the application process to to optimally manage the resources on the system and reaping the child processes as soon as they might have done.
Does this means I should always use wait() as soon as possible after
using fork?
There's no straight "as early" or "as late" kind of answer to this. For example, parent process might want to carry on do something useful when the child is still running rather than waiting (It might be unnecessary to even check periodically if children status with WNOHANG when parent knows the children might have a long tasks to finish). So in this case, waiting as soon as forking a process might not be what you want. In general, parent should call wait() whenever it expects the child(ren) to have completed its task (or wants to know the stauts of children). The responsibility lies with the programmer to code correctly and call wait() at the most appropriate time.

How to kill defunct process on Linux by shell script [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
In my server machines, some process goes to the defunct state every day. It affects my CPU usage. I need to write a shell script to kill the defunct process id and parent id.
For example, when I run the command:
ps -ef|grep defunct.
it found many values. In that I need to kill only "[chrome] defunct" process.
sample entry:-
bitnami 12217 12111 0 Feb09 pts/3 00:00:00 [chrome] <defunct>
I need to kill this type of chrome entries. Can anyone suggest some samples to kill the entries?

Defunct processes do not go away until the parent process collects the corpse or the parent dies. When the parent process dies, the defunct processes are inherited by PID 1 (classically it is PID 1; it is some system process designated with the job), and PID 1 is designed to wait for dead bodies and remove them from the process table. So, strictly, the defunct processes only go away when their parent collects the corpse; when the original parent dies, the new parent collects the corpse so the defunct process goes away at last.
So, either write the parent code so that it waits on its dead children, or kill the parent process.
Note that defunct processes occupy very little resources - basically, a slot in the process table and the resource (timing) information that the parent can ask for.
Having said that, last year I was working on a machine where there were 3 new defunct processes per minute, owned by a a system process other than PID 1, that were not being harvested. Things like ps took a long, long, long time when the number of defunct processes climbed into the hundreds of thousands. (The solution was to install the correct fix pack for the o/s.) They are not completely harmless, but a few are not a major problem.

It's already dead. The parent needs to reap it and then it will go away.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string