I am forking a child, and trying to kill it.
pid_t *child_pid;
int main(){
child_pid = mmap(NULL, sizeof(pid_t), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);$
int a = fork();
if (a != 0) {
printf("child # %d\n", a);
*child_pid = a;
system("sleep 100");
} else {
sleep(1);
printf("Trying to kill %d\n", *child_pid);
int ret = kill(*child_pid,SIGKILL);
printf("killled with %d\n", ret);
}
}
However, the kill command gets stuck at:
child # 4752
Trying to kill 4752
In the meantime, calling ps shows this:
4752 pts/4 00:00:00 simple <defunct>
You are killing yourself. fork() returns 0 if you are in the forked process, or the child process id (PID) in the 'master' process.
So, the upper branch of your if() clause is performed in the master process, where you copy the child's process ID (stored in a) to child_pid.
In the lower branch you are in the child process, where take the child_pid, which is your own and then happily kill() yourself... That's why you never get the line 'Killed with...'
As paxdiablo pointed out, since this is a child process it will remain a zombie until you fetch the exit status with wait() or the master process exits.
BTW, I'm not sure what you want to do with this code:
If you want to exit the child process gracefully you can just do an exit().
If you want to kill your child process, keep track of the child PID (returned by fork()) and then kill() it from your master process.
If you want to kill the master process from your child (odd as that may sound) be careful, as that may take the child process(es) with it. You have to detach the child process from the master process (see the manual page for daemon()).
The child process is dead, it's just that the process entry will hang around until someone collects the exit code.
If the parent doesn't do it, it will eventually be inherited by init, which will reap it at some point.
Related
I'm thinking about some tool that can pause the program at start.
For example, my_bin starts running at once.
$ ./my_bin
With this tool
$ magic_tool ./my_bin
my_bin will start. I can get the PID. Then I can start the actual running later.
I've just tested my suggestion in the comments and it worked! This is the code in my magic_tool.c:
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
int main (int argc, char *argv[])
{
pid_t pid;
printf("Executing %s to wrap %s.\n", argv[0], argv[1]);
pid = fork();
if (pid == -1)
return -1;
if (pid == 0) {
raise(SIGSTOP);
execl(argv[1], "", NULL);
} else {
printf("PID == %d\n", pid);
}
return 0;
}
I wrote another test program target.c:
#include <stdio.h>
int main ()
{
puts("It works!\n");
return 0;
}
Running ./magic_tool ./target printed a PID and returned to shell. Only after running kill -SIGCONT <printed_pid> was It works! printed. You'll probably want to have PID saved somewhere else and also perform some checks in the magic_tool, but I think this is nonetheless a good proof of concept.
EDIT:
I was playing around with this a bit more and for some reason it didn't always work (see why below). The solution is simple - just follow a proper fork off and die pattern a bit more closely in magic_tool.c:
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
int main (int argc, char *argv[])
{
pid_t pid;
printf("Executing %s to wrap %s.\n", argv[0], argv[1]);
pid = fork();
if (pid == -1)
return -1;
if (pid == 0) {
setsid();
pid = fork();
if (pid == -1)
return -1;
if (pid == 0) {
raise(SIGSTOP);
if (execl(argv[1], "", NULL))
return -1;
}
printf("PID == %d\n", pid);
}
return 0;
}
I found an explanation in this answer:
When you start the root process from your shell, it is a process group leader, and its descendants are members of that group. When that leader terminates, the process group is orphaned. When the system detects a newly-orphaned process group in which any member is stopped, then every member of the process group is sent a SIGHUP followed by a SIGCONT.
So, some of your descendant processes are still stopped when the leader terminates, and thus everyone receives a SIGHUP followed by a SIGCONT, which for practical purposes mean they die of SIGHUP.
Exactly which descendants are still stopped (or even just merrily advancing toward exit()) is a timing race.
The answer also links to IEEE Std 1003.1-2017 _Exit entry which contains more details on the matter.
This is mostly a very similar idea as #gst, but done entirely in the shell, you can spawn a subshell (this forks and create a new pid) and have the subshell send itself SIGSTOP signal, when the subshell receives a SIGCONT signal and resumes, the subshell exec the intended program (this replaces the subshell with the intended program without creating a new pid). So that the main shell can continue doing stuff, the subshell should run on background with &.
In a nutshell:
(kill -STOP $BASHPID && exec ./my_bin) &
subpid=$! # get the pid of above subshell
... do something else ...
kill -CONT $subpid # resume
Another idea that wouldn't suffer from race condition between the main process sending SIGCONT and the subshell SIGSTOP-ing itself is to use a file descriptor to implement the wait instead:
exec {PIPEFD}<> <(:) # set PIPEFD to the file descriptor of an anonymous pipe
(read -u $PIPEFD && exec ./my_bin) &
subpid=$! # get the pid of above subshell
... do something else ...
echo >&$PIPEFD # resume
I have this simple test:
int main() {
int res = fork();
if (res == 0) { // child
printf("Son running now, pid = %d\n", getpid());
}
else { // parent
printf("Parent running now, pid = %d\n", getpid());
wait(NULL);
}
return 0;
}
When I run it a hundred times, i.e. run this command,
for ((i=0;i<100;i++)); do echo ${i}:; ./test; done
I get:
0:
Parent running now, pid = 1775
Son running now, pid = 1776
1:
Parent running now, pid = 1777
Son running now, pid = 1778
2:
Parent running now, pid = 1779
Son running now, pid = 1780
and so on; whereas when I first write to a file and then read the file, i.e. run this command,
for ((i=0;i<100;i++)); do echo ${i}:; ./test; done > forout
cat forout
I get it flipped! That is,
0:
Son running now, pid = 1776
Parent running now, pid = 1775
1:
Son running now, pid = 1778
Parent running now, pid = 1777
2:
Son running now, pid = 1780
Parent running now, pid = 1779
I know about the scheduler. What does this result not mean, in terms of who runs first after forking?
The forking function, do_fork() (at kernel/fork.c) ends with setting the need_resched flag to 1, with the comment by kernel developers saying, "let the child process run first."
I guessed that this has something to do with the buffers that the printf writes to.
Also, is it true to say that the input redirection (>) writes everything to a buffer first and only then copies to the file? And even so, why would this change the order of the prints?
Note: I am running the test on a single-core virtual machine with a Linux kernel v2.4.14.
Thank you for your time.
When you redirect, glibc detects that stdout is not tty turns on output buffering for efficiency. The buffer is therefore not written until the process exits. You can see this with e.g.:
int main() {
printf("hello world\n");
sleep(60);
}
When you run it interactively, it prints "hello world" and waits. When you redirect to a file, you will see that nothing is written for 60 seconds:
$ ./foo > file & tail -f file
(no output for 60 seconds)
Since your parent process waits for the child, it will necessarily always exit last, and therefore flush its output last.
I created a program that does:
1. fork
2. from the son, create execvp with csh. The execvp csh runs a script a.sh that prints in infinite loop.
My problem is that I cant stop or kill the process from the father (using kill(SIGKILL,pid) from the father process didn't work).
I think that the problem is in execvp.
when I print the pid from the script(echo $BASHPID ) I get a different pid from the one that I get before the execvp. i know the pid after execvp is supposed to remain the same, but it seems like it doesn't.
here is the problematic code:
int ExeExternal(char* args[MAX_ARG], char* cmdString, int* fg_pid, char** L_Fg_Cmd){
//int child_status;
int pID;
pID = fork();
switch(pID) {
case -1: //error
perror("fork");
return 1;
case 0 : // Child Process
setpgrp();
printf("son getpid() : %d",getpid());
fflush(stdout);
char* argument_for_cshs[5];
char* cmd="csh";
argument_for_cshs[0]="csh";
argument_for_cshs[1]="-f";
argument_for_cshs[2]="-c";
argument_for_cshs[3]=cmdString;
argument_for_cshs[4]=NULL;
execvp(argument_for_cshs[0],argument_for_cshs);
//if return => there is a problem
any solution?
In the terminal, I executed a main parent process which will fork a child process. In both the parent and child processes I implemented a SIGINT signal handler.
So when I press "ctrl+c", will both the handlers be called at the same time? Or do I need to call the child process's signal handler explicitly in the parent process's handler?
I looked up this post:
How does Ctrl-C terminate a child process?
which says that "The SIGINT signal is generated by the terminal line discipline, and broadcast to all processes in the terminal's foreground process group". I just didn't quite understand what does "foreground process group" means.
Thanks,
In both the parent and child processes I implemented a SIGINT signal
handler. So when I press "ctrl+c", will both the handlers be called at
the same time?
Yes, they both will receive SIGINT.
Or do I need to call the child process's signal handler explicitly in
the parent process's handler?
"Calling" another process' signal handler doesn't make sense. If the both the process have a handler installed then they will be called once they receive the signal SIGINT.
I just didn't quite understand what does "foreground process group"
means.
Typically, a process associated with a controlling terminal is foreground process and its process group is called foreground process group. When you start a process from the command line, it's a foreground process:
E.g.
$ ./script.sh # foreground process
$ ./script & # background process
I suggest you read about tty and The TTY demystified for a detailed explanation.
setpgid POSIX C process group minimal example
This illustrates how the signal does get sent to the child, if the child didn't change its process group with setpgid.
main.c
#define _XOPEN_SOURCE 700
#include <assert.h>
#include <signal.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
volatile sig_atomic_t is_child = 0;
void signal_handler(int sig) {
char parent_str[] = "sigint parent\n";
char child_str[] = "sigint child\n";
signal(sig, signal_handler);
if (sig == SIGINT) {
if (is_child) {
write(STDOUT_FILENO, child_str, sizeof(child_str) - 1);
} else {
write(STDOUT_FILENO, parent_str, sizeof(parent_str) - 1);
}
}
}
int main(int argc, char **argv) {
pid_t pid, pgid;
(void)argv;
signal(SIGINT, signal_handler);
signal(SIGUSR1, signal_handler);
pid = fork();
assert(pid != -1);
if (pid == 0) {
/* Change the pgid.
* The new one is guaranteed to be different than the previous, which was equal to the parent's,
* because `man setpgid` says:
* > the child has its own unique process ID, and this PID does not match
* > the ID of any existing process group (setpgid(2)) or session.
*/
is_child = 1;
if (argc > 1) {
setpgid(0, 0);
}
printf("child pid, pgid = %ju, %ju\n", (uintmax_t)getpid(), (uintmax_t)getpgid(0));
assert(kill(getppid(), SIGUSR1) == 0);
while (1);
exit(EXIT_SUCCESS);
}
/* Wait until the child sends a SIGUSR1. */
pause();
pgid = getpgid(0);
printf("parent pid, pgid = %ju, %ju\n", (uintmax_t)getpid(), (uintmax_t)pgid);
/* man kill explains that negative first argument means to send a signal to a process group. */
kill(-pgid, SIGINT);
while (1);
}
GitHub upstream.
Compile with:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -Wpedantic -o setpgid setpgid.c
Run without setpgid
Without any CLI arguments, setpgid is not done:
./setpgid
Possible outcome:
child pid, pgid = 28250, 28249
parent pid, pgid = 28249, 28249
sigint parent
sigint child
and the program hangs.
As we can see, the pgid of both processes is the same, as it gets inherited across fork.
Then whenever you hit:
Ctrl + C
It outputs again:
sigint parent
sigint child
This shows how:
to send a signal to an entire process group with kill(-pgid, SIGINT)
Ctrl + C on the terminal sends a kill to the entire process group by default
Quit the program by sending a different signal to both processes, e.g. SIGQUIT with Ctrl + \.
Run with setpgid
If you run with an argument, e.g.:
./setpgid 1
then the child changes its pgid, and now only a single sigint gets printed every time from the parent only:
child pid, pgid = 16470, 16470
parent pid, pgid = 16469, 16469
sigint parent
And now, whenever you hit:
Ctrl + C
only the parent receives the signal as well:
sigint parent
You can still kill the parent as before with a SIGQUIT:
Ctrl + \
however the child now has a different PGID, and does not receive that signal! This can seen from:
ps aux | grep setpgid
You will have to kill it explicitly with:
kill -9 16470
This makes it clear why signal groups exist: otherwise we would get a bunch of processes left over to be cleaned manually all the time.
Tested on Ubuntu 18.04.
This happens if parent crashes after cloning child process, but before sending the unblocking byte with SendContinueSignalToChild(). In this case pipe file handle remains opened and child stays infinitely blocked on read(...) within WaitForContinueSignal(). After the crash, child is adopted by init process.
Steps to reproduce:
l. Simulate parent crash in google_breakpad::ExceptionHandler::GenerateDump(CrashContext *context):
...
const pid_t child = sys_clone(
ThreadEntry, stack, CLONE_FILES | CLONE_FS | CLONE_UNTRACED, &thread_arg, NULL, NULL, NULL);
int r, status;
// Allow the child to ptrace us
sys_prctl(PR_SET_PTRACER, child, 0, 0, 0);
int *ptr = 0;
*ptr = 42; // <------- Crash here
SendContinueSignalToChild();
...
Send one of the handled signal to the parent (e.g. SIGSEGV), so that the above GenerateDump(...) method is envoked.
Observe that parent exits but child still exists, blocked on WaitForContinueSignal().
Output for the above steps:
dmytro#db:~$ ./test &
[1] 25050
dmytro#db:~$ Test: started
dmytro#db:~$ ps aflxw | grep test
0 1000 25050 18923 20 0 40712 2680 - R pts/37 0:13 | | \_ ./test
0 1000 25054 18923 20 0 6136 856 pipe_w S+ pts/37 0:00 | | \_ grep --color=auto test
dmytro#db:~$ kill -11 25050
[1]+ Segmentation fault (core dumped) ./test
dmytro#db:~$ ps aflxw | grep test
0 1000 25058 18923 20 0 6136 852 pipe_w S+ pts/37 0:00 | | \_ grep --color=auto test
1 1000 25055 1687 20 0 40732 356 pipe_w S pts/37 0:00 \_ ./test
1687 is the init pid.
In the real world the crash happens in a thread parallel to the one that handles signal.
NOTE: the issue can also happen because of normal program termination (i.e. exit(0) is called in a parallel thread).
Tested on Linux 3.3.8-2.2., mips and i686 platforms.
So, my 2 questions:
Is it the expected behavior for the breakpad library to keep child alive? My expectation is that child should exit immediately after parent crashes/exits.
If it is not expected behavior, what is the best solution to finish client after parent crash/exit?
Thanks in advance!
Any clue on possible solution?
This can also happen during shutdown crash, if the crashed thread is not main, and the parent process exits from main() in exactly this time slot, so apparently it's not that unlikely to happen as it seems at a first glance.
At this moment, I think this is happening because of CLONE_FILES flag of clone() function. This leads to the situation where read() on pipe in child is not returning EOF if parent process quits.
I have not yet done the examination if we can safely get rid of this flag in clone() call.