I am trying to create a process using fork system call and then wait on the child process. I have used the following:
waitpid (pid, &status, 0);
1) The first problem is that the status is 8 bit shifted to the left e.g., if the child process returns 1, the waitpid function returns the value of the status in the status variable to be 256. Please let me know why it is doing that.
2) According to the manual, the waitpid waits for the child process to change state. but then it also says:
"The wait() system call suspends execution of the calling process until
one of its children terminates. The call wait(&status) is equivalent
to:
waitpid(-1, &status, 0);"
I am a bit confused here whether the waitpid and the wait calls wait for state change or for child process termination. Kindly clearify this point.
What does the zero in the third arguement specifies?
3) If i put the child process in sleep state, doesn't the state of the child process changes to be in waiting state by waiting for e.g., 5 secs?
Following is my program:
int main(int argc, char ** argv)
{
pid_t pid = fork();
pid_t ppp;
if (pid==0)
{
sleep(8);
printf ("\n I am the first child and my id is %d \n", getpid());
printf ("The first child process is now exiting now exiting\n\n");
exit (1);
}
else {
int status = 13;
printf ("\nI am now waiting for the child process %d\n", pid);
waitpid (pid, &status, 0);
printf ("\n the status returned by the exiting child is %d\n", status>>8);
}
printf("\nI am now exiting");
exit(0);
}
Thanks
The status parameter encodes more than just the exit code of the child. From man waitpid:
WIFEXITED(status)
returns true if the child terminated normally, that is, by calling exit(3) or _exit(2), or by returning from main().
WEXITSTATUS(status)
returns the exit status of the child. This consists of the least significant 8 bits of the status argument that the child specified in a call to exit(3) or _exit(2) or as the argument for a return statement in main(). This macro should only be employed if WIFEXITED returned true.
main waitpid explains what the third parameter does.
The value of options is an OR of zero or more of the following constants:
WNOHANG
return immediately if no child has exited.
WUNTRACED
also return if a child has stopped (but not traced via ptrace(2)). Status for traced children which have stopped is provided even if this option is not specified.
WCONTINUED (since Linux 2.6.10)
also return if a stopped child has been resumed by delivery of SIGCONT.
State change is very precisely and narrowly defined. From man waitpid:
A state change is considered to be: the child terminated; the child was stopped by a signal; or the child was resumed by a signal.
Going to sleep is not a state change. Being stopped by SIGSTOP/SIGTSTP is.
Related
These are from the documentation on waitpid()
https://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html
About the pid parameter
If pid is greater than 0, it specifies the process ID of a single child process for which status is requested.
About the return value
waitpid() returns because the status of a child process is available, these functions shall return a value equal to the process ID of the child process for which status is reported.
From this, I expect that if I pass a positive int (pid) I should get the same number as the return value.
I am observing that in some cases when I call waitpid() with a positive integer representing a pid with no other option parameters (the third paraeter is 0), the call returns another integer which is not equal to the pid passed in (although it is close in value).
So, when does waitpid() call return a value different from the the pid (>0) passed?
Edit 1:
The call looks like this: waitpid(pid, &status, 0)
pid is an int (>0)
status is an int
In some of the runs the value of PID passed and returned are different:
(In the below lines a!=b where a is the return value, b is the PID passed to waitpid() )
2019-11-21, 09:38:11.573007 [info] - WAITPID CALL: 16881!= 16890
2019-11-21, 09:38:11.573007 [info] - WAITPID CALL: 16881!= 16890
2019-11-21, 09:39:11.149065 [info] - WAITPID CALL: 17087!= 17096
2019-11-21, 09:41:11.570292 [info] - WAITPID CALL: 17433!= 17442
2019-11-21, 09:43:11.373236 [info] - WAITPID CALL: 17761!= 17770
This behavior is not consistent but happens once in a few runs.
I'm a C programmer learning about fork(), exec(), and wait() for the first time. I'm also whiteboarding a Standard C program which will run on Linux and potentially need a lot of child processes. What I can't gauge is... how many child processes are too many for one parent to spawn and then wait upon?
Suppose my code looked like this:
pid_t status[ LARGENUMBER ];
status[0] = fork();
if( status[0] == 0 )
{
// I am the child
exec("./newCode01.c");
}
status[1] = fork();
if( status[1] == 0 )
{
// child
exec("./newCode02.c");
}
...etc...
wait(status[0]);
wait(status[1]);
...and so on....
Obviously, the larger LARGENUMBER is, the greater the chance that the parent is still fork() ing while children are segfaulting or becoming zombies or whatever.
So this implementation seems problematic to me. As I understand it, the parent can only wait() for one child at a time? What if LARGENUMBER is huge, and the time gap between running status[0] = fork(); and wait(status[0]); is substantial? What if the child has run, becomes a zombie, and been terminated by the OS somehow in that time? Will the parent then wait(status[0]) forever?
In the above example, there must be some standard or guideline to how big LARGENUMBER can be. Or is my approach all wrong?
#define LARGENUMBER 1
#define LARGENUMBER 10
#define LARGENUMBER 100
#define LARGENUMBER 1000
#define LARGENUMBER ???
I want to play with this, but my instinct is to ask for advice before I invest the development time into a program which may or may not turn out to be infeasible. Any advice/experience is appreciated.
If you read the documentation of wait, you would know that
If status information is available prior to the call to wait(), return will be immediate.
That means, if the child has already terminated, wait() will return immediately.
The OS will not remove the information from the process table until you have called wait¹ for the child process or your program exits:
If a parent process terminates without waiting for all of its child processes to terminate, the remaining child processes will be assigned a new parent process ID corresponding to an implementation-dependent system process.
Of course you still can't spawn an unlimited amount of children, for more detail on that see Maximum number of children processes on Linux (as far as Linux is concerned, other OS will impose other limits).
¹: https://en.wikipedia.org/wiki/Zombie_process
I will try my best to explain.
First a bad example: where you fork() one child process, then wait for it to finish before forking another child process. This kills the multiprocessing degree, bad CPU utilization.
pid = fork();
if (pid == -1) { ... } // handle error
else if (pid == 0) {execv(...);} // child
else (pid > 0) {
wait(NULL); // parent
pid = fork();
if (pid == -1) { ... } // handle error
else if (pid == 0) {execv(...);} // child
else (pid > 0) {wait(NULL); } // parent
}
How should it be done ?
In this approach, you first create the two child process, then wait. Increase CPU utilization and multiprocessing degree.
pid1 = fork();
if (pid1 == -1) { ... } // handle error
if (pid1 == 0) {execv(...);}
pid2 = fork();
if (pid2 == -1) { ... } // handle error
if (pid2 == 0) {execv(...);}
if (pid1 > 0) {wait(NULL); }
if (pid2 > 0) {wait(NULL); }
NOTE:
even though it seems as parent is waiting before the second wait is executed, the child is still running and is not waiting to execv or being spawned.
In your case, you are doing the second approach, first fork all processes and save return value of fork then wait.
the parent can only wait() for one child at a time?
The parent can wait for all its children one at a time!, whether they already finished and became zombie process or still running. For more explained details look here.
How many child processes can a parent spawn before becoming infeasible?
It might be OS dependent, but one acceptable approach is to split the time given to a process to run in 2, half for child process and half for parent process.
So that processes don't exhaust the system and cheat by creating child processes which will run more than the OS wanted to give the parent process in first place.
#include <sys/wait.h>
#include <stdlib.h>
#include <unistd.h>
#include<stdlib.h>
int main(void)
{
pid_t pids[10];
int i;
for (i = 9; i >= 0; --i) {
pids[i] = fork();
if (pids[i] == 0) {
printf("Child%d\n",i);
sleep(i+1);
_exit(0);
}
}
for (i = 9; i >= 0; --i){
printf("parent%d\n",i);
waitpid(pids[i], NULL, 0);
}
return 0;
}
What is happening here? How is sleep() getting executed in the for loop? When is it getting called? Here is the output:
parent9
Child3
Child4
Child2
Child5
Child1
Child6
Child0
Child7
Child8
Child9 //there is a pause here
parent8
parent7
parent6
parent5
parent4
parent3
parent2
parent1
parent0
Please explain this output. I am not able to understand how it's working.
Step by step analysis would be great.
In the first loop, the original (parent) process forks 10 copies of itself. Each of these child processes (detected by the fact that fork() returned zero) prints a message, sleeps, and exits. All of the children are created at essentially the same time (since the parent is doing very little in the loop), so it's somewhat random when each of them gets scheduled for the first time - thus the scrambled order of their messages.
During the loop, an array of child process IDs is built. There is a copy of the pids[] array in all 11 processes, but only in the parent is it complete - the copy in each child will be missing the lower-numbered child PIDs, and have zero for its own PID. (Not that this really matters, as only the parent process actually uses this array.)
The second loop executes only in the parent process (because all of the children have exited before this point), and waits for each child to exit. It waits for the child that slept 10 seconds first; all the others have long since exited, so all of the messages (except the first) appear in quick succession. There is no possibility of random ordering here, since it's driven by a loop in a single process. Note that the first parent message actually appeared before any of the children messages - the parent was able to continue into the second loop before any of the child processes were able to start. This again is just the random behavior of the process scheduler - the "parent9" message could have appeared anywhere in the sequence prior to "parent8".
I see that you've tagged this question with "zombie-processes". Child0 thru Child8 spend one or more seconds in this state, between the time they exited and the time the parent did a waitpid() on them. The parent was already waiting on Child9 before it exited, so that one process spent essentially no time as a zombie.
This example code might be a bit more illuminating if there were two messages in each loop - one before and one after the sleep/waitpid. It would also be instructive to reverse the order of the second loop (or equivalently, to change the first loop to sleep(10-i)).
The value of status is not returned correctly from the child to the parent process.
#include<stdio.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/wait.h>
#include<stdlib.h>
#include<string.h>
#define BUF_SIZE 200
int main(void){
pid_t pid;
int status=6;
char buf[BUF_SIZE];
pid=fork();
if(pid){
sprintf(buf,"Value in parent process is %d\n",status);
write(1,buf,strlen(buf));
wait(&status);
sprintf(buf,"Value returned from child process is %d\n",status);
write(1,buf,strlen(buf));
}
else if(pid==0){
status++;
sprintf(buf,"Returning %d..\n",status);
write(1,buf,strlen(buf));
exit(status);
}
return 0;
}
The output of the code is :
Value in parent process is 6
Returning 7..
Value returned from child process is 1792
Where is 1792 coming from? Why is this value not 7?
Because the man page continues...
If status is not NULL, wait() and waitpid() store status information in
the int to which it points. This integer can be inspected with the
following macros (which take the integer itself as an argument, not a
pointer to it, as is done in wait() and waitpid()!):
WIFEXITED(status)
returns true if the child terminated normally, that is, by call‐
ing exit(3) or _exit(2), or by returning from main().
WEXITSTATUS(status)
returns the exit status of the child. This consists of the
least significant 8 bits of the status argument that the child
specified in a call to exit(3) or _exit(2) or as the argument
for a return statement in main(). This macro should be employed
only if WIFEXITED returned true.
I'm on Linux platform and using Perl. First of all I created a thread, and forked a child process in this new thread. When the parent in the new thread returned and joined to the main thread, I would like to send TERM signal to the child process spawned in the created thread, but the signal handler doesn't work, and the child process becomes zombie. Here's my code:
use strict;
use warnings;
use Thread 'async';
use POSIX;
my $thrd = async {
my $pid = fork();
if ($pid == 0) {
$SIG{TERM} = \&child_exit;
`echo $$ > 1`;
for (1..5) {
print "in child process: cycle $_\n";
sleep 2;
}
exit(0);
}
else {
$SIG{CHLD} = \&reaper;
}
};
$thrd->detach();
sleep 4;
my $cpid = `cat 1`;
kill "TERM", $cpid;
while (1) {}
sub child_exit {
print "child $$ exits!\n";
exit(0);
}
sub reaper {
my $pid;
while (($pid = waitpid(-1, &WNOHANG)) > 0) {
print "reaping child process $pid\n";
}
}
Any suggestions about how to successfully and safely send signal in this situation?
Why are you saying that the SIGTERM handler does not work? Because the child becomes a zombie?
All children process become zombies unless you wait for them. Put waitpid($pid, 0); after the kill(). Unless you see in child process: cycle 5 in your printout, the kill and the signal handler are working just fine.
Note that it's super shaky to use a hardcoded file to communicate between your forked process and your main process. I'd recommend you use a pipe.
Edit:
Wrt your sig handler not being called, I think this is a perl bug. perl sends signals only to the main thread. When you fork(), your thread becomes the main thread but I think perl does not realize that. You can work this around though by re-forwarding the signal to yourself.
Before you create the threads, just add:
sub sigforwarder {
threads->self()->kill(shift);
}
$SIG{TERM} = \&sigforwarder;
That should fix your problem
I think the problem is, that child_exit does not perform any exit on the parent thread. It just exits the child. I don't know exactly what's happening in perl (just stumbled upon a fork/exec-construction in c), but I would try to catch SIG_CHILD in the parent process to detect its termination.
(the message "child $$ exits" is output in the child's 1, so it will not be visible on screen, right?)
EDIT: I just tried your example and I think I got it: You are performing a fork in the child process and check for the pid to be 0 (which is the parent async-Process). You might just check for != 0.