ISSUES with SIGCONT and waitpid() in LINUX - linux

I'll try to keep it simple. I'm currently replicating a shell for LINUX. I use a linked list structure "job_list" to store all background processes. If a background process is terminated, then it is removed from the list. If a background process is suspended, its status within the list is changed from BACKGROUND to STOPPED. If the process is reawakened (through a SIGCONT signal), then the idea is that the process state inside the list should be changed back to BACKGROUND.
My problem is the following: when I send a SIGSTOP signal to a process, section //1 is executed and its change of state is successfully registered in the list. However, when I reawaken that same process using a SIGCONT signal, WIFCONTINUED(status) will return false, but WIFEXITED(status) will always return true. Consequently, section //3 is executed and the process is removed from the list.
What could be wrong? Thanks in advance.
void sigchld_handler (){
block_SIGCHLD();
job *item;
int l_size = list_size(job_list);
int i, new_pid, pid_wait, status, info;
enum status status_res;
for (i = 1; i <= l_size; i++){
item = get_item_bypos(job_list, i);
new_pid = item->pgid;
pid_wait = waitpid(new_pid, &status, WUNTRACED | WNOHANG);
if (WIFSTOPPED(status)){
//1
printf("****SUSPENDED\n");
item->state = STOPPED;
}else if (WIFCONTINUED(status)){
//2
printf("****CONTINUED\n");
item->state = BACKGROUND;
}else if (WIFEXITED(status)){
//3
printf("****EXITED\n");
l_size--;
i--;
delete_job(job_list, item);
}
}
print_job_list(job_list);
unblock_SIGCHLD();
}

You appear to be missing the WCONTINUED value in your call to waitpid.
From the waitpid specification:
pid_t waitpid(pid_t pid, int *stat_loc, int options);
The options argument is constructed from the bitwise-inclusive OR of zero or more of the following flags, defined in the header:
WCONTINUED
The waitpid() function shall report the status of any continued child process specified by pid whose status has not been reported since it continued from a job control stop.

Related

How do I kill linux spawnProcess when the main process suddenly dies?

I have come across a problem with my application and the spawnProcess.
If the main application for some reason dies/is killed then the spawned processes seem to live on and I can't reach them unless I use terminal to kill them via their PIDs. My goal is if the main application dies then the spawned processes should be killed also, somehow.
My code is like this
auto appPid = spawnProcess("path/to/process");
scope(exit){ auto exitcode = wait(appPid);
stderr.writeln(...);}
And if I use the same approach when the main process dies, using wait(thisProcessID) I get an error. "No overload matches". Any ideas how to solve this problem?
Here's some code that will do it on Linux. It doesn't have all the features of the stdlib's spawnProcess, it just shows the bare basics, but expanding it from here isn't hard if you need more.
import core.sys.posix.unistd;
version(linux) {
// this function is Linux-specific
import core.stdc.config;
import core.sys.posix.signal;
// we can tell the kernel to send our child process a signal
// when the parent dies...
extern(C) int prctl(int, c_ulong, c_ulong, c_ulong, c_ulong);
// the constant I pulled out of the C headers
enum PR_SET_PDEATHSIG = 1;
}
pid_t mySpawnProcess(string process) {
if(auto pid = fork()) {
// this branch is the parent, it can return the child pid
// you can:
// import core.sys.posix.sys.wait;
// waitpid(this_ret_value, &status, 0);
// if you want the parent to wait for the child to die
return pid;
} else {
// child
// first, tell it to terminate when the parent dies
prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0);
// then, exec our process
char*[2] args;
char[255] buffer;
// gotta copy the string into another buffer
// so we zero terminate it and have a C style char**...
buffer[0 .. process.length] = process[];
buffer[process.length] = 0;
args[0] = buffer.ptr;
// then call exec to run the new program
execve(args[0], args.ptr, null);
assert(0); // never reached
}
}
void main() {
mySpawnProcess("/usr/bin/cat");
// parent process sleeps for one second, then exits
usleep(1_000_000);
}
So the lower level functions need to be used, but Linux does have a function that does what you need.
Of course, since it sends a signal, your child might want to handle that to close more gracefully than the default termination, but try this program and run ps while it sleeps to see cat running, then notice the cat dies when the parent exits.

Ptrace parent process

I'm trying to monitor/redirect syscalls in my own process. LD_PRELOAD doesn't work when fwrite calls write inside libc, and got/plt hooks seem to have the same problem. I'm looking for a solution based on ptrace, but I can't fork() and run the main app as a child because the app communicates with its parent via signals.
There is a thread from 2006 that suggests the tracer can be on a thread group that's different from the tracee, but it doesn't seem to work in practice: http://yarchive.net/comp/linux/ptrace_self_attach.html
pid = fork();
if (pid == 0) {
prctl(PR_SET_PTRACER, getppid());
raise(SIGSTOP);
} else {
sleep(1);
ptrace(PTRACE_SEIZE, pid, NULL, NULL);
for (;;) {
int status;
int ret = waitpid(pid, &status, 0);
warn("wait=%d:", ret);
ret = ptrace(PTRACE_SYSCALL, pid, NULL, NULL);
warn("ptrace=%d:", ret);
}
}
The problem I'm facing is that ptrace(PTRACE_SYSCALL) expects the tracee to be in ptrace-wait state, i.e. it must have raised SIGSTOP and the tracer needs to wait() for it. Since the relation is inversed in this case (tracer is the child of the tracee) PTRACE_SYSCALL returns ESRCH.
How does strace get away with tracing an existing pid ?
I'm a bit unclear on what exactly you're asking here. It sounds like you have the attaching part resolved (which is the most difficult problem to resolve). If that is the case, then getting the process to stop is not a problem. Just send the process a signal. The process will stop and send you a TRAP so you can decide what to do with the signal. At this point you can call ptrace(PTRACE_SYSCALL, pid, 0, 0). This will both start it in SYSCALL trace mode, and prevent your signal from arriving at the debugee (thus not introducing unexpected signals into the process).

How to keep track of child processes

So my program spawns a number of child processes in response to certain events, and I'm doing something ike this to keep track and kill them upon program exit (Perl syntax):
my %children = ();
# this will be called upon exit
sub kill_children {
kill 'INT' => keys %children;
exit;
}
# main code
while(1) {
...
my $child = fork();
if ($child > 0) {
$children{$child} = 1;
} elsif ($child == 0) {
# do child work ...
exit();
} else {
# handle the error
}
}
So the idea is as above. However, there's a blatant race condition there, in that a given child can start and terminate before the father has a chance to run and record its pid in the %children hash. So the father may end up thinking that a given pid belongs to an active child, even if this child has terminated.
Is there a way to do what I'm trying to accomplish in a safe way?
Edit: To better keep track of children, the code can be extended as follows (which however also suffer of the exact same race condition, so that's why I didn't write it fully in the first place):
my %children = ();
sub reap {
my $child;
while (($child = waitpid(-1, WNOHANG)) > 0) {
#print "collecting dead child $child\n";
delete $children{$child};
}
}
$SIG{CHLD} = \&reap;
# this will be called upon exit
sub kill_children {
local $SIG{CHLD} = 'IGNORE';
kill 'INT' => keys %children;
exit;
}
# main code
while(1) {
...
my $child = fork();
if ($child > 0) {
$children{$child} = 1;
} elsif ($child == 0) {
# do child work ...
exit();
} else {
# handle the error
}
}
Even in this case, the contents of %children may not reflect the actual active children.
Edit 2: I found this question, which is exactly about the same problem. I like the solution suggested in there.
On UNIX it's not a race condition. This is the standard way to handle fork(). When the child process exits, its status is changed to "terminated"; it becomes a zombie. It still has an entry in the process table until the parent process calls one of the wait functions. Only after that is the dead process really removed.
Even if the parent sets itself up to ignore SIGCHLD, it still wouldn't qualify as a race condition; the parent would just have a PID that's not valid anymore. In that case, wait() would return ECHILD. But setting SIGCHLD would free up a child's PID, possibly leading to the parent trying to kill a process that is not a child.
On Windows, which doesn't have a fork call, it is emulated by creating a thread in the perl process. See perlfork. I'm not knowlegable enough about Windows to opinionate about if that could cause a race condition, but I suspect not.

SIGINT signal re-install in linux

I am writing a program dealing with Linux signals. To be more specific, I want to re-install signal SIGINT in child process, only to find that it doesn't work.
Here is a simpler version of my code:
void handler(int sig){
//do something
exit(0);
}
void handler2(int sig){
//do something
exit(0);
}
int main(){
signal(SIGINT, handler);
if ((pid = fork()) == 0) {
signal(SIGINT, handler2); // re-install signal SIGINT
// do something that takes some time
printf("In child process:\n");
execve("foo", argv, environ); // foo is a executable in local dir
exit(0);
}else{
int status;
waitpid(pid, &status, 0); // block itself waiting for child procee to exit
}
return 0;
}
When shell is printing "In child process:", I press ctrl+c. I find that function handler is executed without problem, but handler2 is never executed.
Could you help me with this bug in my code?
Update:
I want the child process to receive SIGINT signal during foo running process, is that possible?
It is not a bug - calling execve has replaced the running binary image. The function handler2() (and any other function of your binary) is no longer mapped in the program memory having been replaced by the image of "foo" and therefore all signal settings are replaced to a default.
If you wish the signal handler to be active during "foo" run, you have to:
make sure the handler function is mapped into the memory of foo
a signal handler is registered after "foo" starts.
One way to do this is to create a shared library that contains the signal handler and an init function that is defined as a constructor that registers said signal handler and force it into the "foo" memory by manipulating the environment under which you execve foo (the environ variable) to include
LD_PRELOAD=/path/to/shared_library.so
#gby's anwser has given comprehensive background knowlegde. I am here to give another solution without shared library.
Every time child process stops or terminates, parent process will receive SIGCHLD. You can handler this SIGCHLD signal to know if child process was terminated by SIGINT. In your handler:
pid_t pid = waitpid(pid_t pid,int * status,int options)
You can get status of child process through waitpid function.
if(WIFSIGNALED(status) && (pid == child_pid)){
if(WTERMSIG(status) == SIGINT){
// now you know your foo has received SIGINT.
// do whatever you like.
}
}

Explicitly invoke SIG_DFL/SIG_IGN handlers on Linux

I've blocked, and then waited for a signal via the following code:
sigset_t set;
sigfillset(&set); // all signals
sigprocmask(SIG_SETMASK, &set, NULL); // block all signals
siginfo_t info;
int signum = sigwaitinfo(&set, &info); // wait for next signal
struct sigaction act;
sigaction(signum, NULL, &act); // get the current handler for the signal
act.sa_handler(signum); // invoke it
The last line generates a segmentation fault, as the handler is set to SIG_DFL (defined as 0). How can I manually invoke the default handler if it's set to SIG_DFL or SIG_IGN? Also note that SIG_IGN is defined as 1.
As you discovered you cannot invoke SIG_DFL and SIG_IGN per se. However, you can more-or-less mimic their behavior.
Briefly, imitating normal signal disposition would be:
quite easy for user-defined sa_handlers
easy enough for SIG_IGN, with the caveat that you'd need to waitpid() in the case of CHLD
straightforward but unpleasant for SIG_DFL, re-raising to let the kernel do its magic.
Does this do what you want?
#include <signal.h>
#include <stdlib.h>
/* Manually dispose of a signal, mimicking the behavior of current
* signal dispositions as best we can. We won't cause EINTR, for
* instance.
*
* FIXME: save and restore errno around the SIG_DFL logic and
* SIG_IGN/CHLD logic.
*/
void dispatch_signal(const int signo) {
int stop = 0;
sigset_t oset;
struct sigaction curact;
sigaction(signo, NULL, &curact);
/* SIG_IGN => noop or soak up child term/stop signals (for CHLD) */
if (SIG_IGN == curact.sa_handler) {
if (SIGCHLD == signo) {
int status;
while (waitpid(-1, &status, WNOHANG|WUNTRACED) > 0) {;}
}
return;
}
/* user defined => invoke it */
if (SIG_DFL != curact.sa_handler) {
curact.sa_handler(signo);
return;
}
/* SIG_DFL => let kernel handle it (mostly).
*
* We handle noop signals ourselves -- "Ign" and "Cont", which we
* can never intercept while stopped.
*/
if (SIGURG == signo || SIGWINCH == signo || SIGCONT == signo) return;
/* Unblock CONT if this is a "Stop" signal, so that we may later be
* woken up.
*/
stop = (SIGTSTP == signo || SIGTTIN == signo || SIGTTOU == signo);
if (stop) {
sigset_t sig_cont;
sigemptyset(&sig_cont);
sigaddset(&sig_cont, SIGCONT);
sigprocmask(SIG_UNBLOCK, &sig_cont, &oset);
}
/* Re-raise, letting the kernel do the work:
* - Set exit codes and corefiles for "Term" and "Core"
* - Halt us and signal WUNTRACED'ing parents for "Stop"
* - Do the right thing if we forgot to handle any special
* signals or signals yet to be introduced
*/
kill(getpid(), signo);
/* Re-block CONT, if needed */
if (stop) sigprocmask(SIG_SETMASK, &oset, NULL);
}
UPDATE
(in response to OP's excellent questions)
1: does this slot in after the sigwaitinfo?
Yes. Something like:
... block signals ...
signo = sigwaitinfo(&set, &info);
dispatch_signal(signo);
2: Why not raise those signals handled by SIG_IGN, they'll be ignored anyway
It's slightly more efficient to noop in userspace than by three syscalls (re-raise, unmask, re-mask). Moreover, CHLD has special semantics when SIG_IGNored.
3: Why treat SIGCHLD specially?
Originally (check answer edits) I didn't -- re-raised it in the SIG_IGN case,
because IGNored CHLD signals tell the kernel to automatically reap children.
However, I changed it because "natural" CHLD signals carry information about
the terminated process (at least PID, status, and real UID).
User-generated CHLD signals don't carry the same semantics, and, in my testing,
IGNoring them doesn't cause 2.6 to autoreap queued zombies whose SIGCHLD
was "missed." So, I do it myself.
4: Why are "stop" related signals unblocking CONT. Will not invoking the default handler for CONT unstop the process?
If we're stopped (not executing) and CONT is blocked, we will never receive the
signal to wake us up!
5: Why not call raise instead of the kill line you've given?
Personal preference; raise() would work, too.
I see 2 mistakes in your code :
1) You should reverse the last two lines like this :
act.sa_handler(signum);
sigaction(signum, NULL, &act);
2) You must pass a function handler to the fiedl sa_handler instead of a int. The prototype of the function shoudl look like this :
/**
*some where in you code
*/
void handler (int signal){ /*your code*/}
/**
*
*/
act.sa_handler = handler;
If you want the default handler to be invoked, you should set the field sa_handler to SIG_DFL and it should work.
I'm not aware of the way to do it.
Only suggestion I have is to look into the man 7 signal and perform manually the action according the table you see there. Ign is nothing. Core is call to abort(). Term is _exit().
Of course you can also set signal handler back to SIG_DFL and then kill(getpid(),THE_SIG) (or its equivalent raise(THE_SIG)). (I personally do not like raise because on some systems it might produce some messages on stderr.)

Resources