Can I set the process group of an existing process? - linux

I have a bunch of mini-server processes running. They're in the same process group as a FastCGI server I need to stop. The FastCGI server will kill everything in its process group, but I need those mini-servers to keep running.
Can I change the process group of a running, non-child process (they're children of PID 1)? setpgid() fails with "No such process" though I'm positive its there.
This is on Fedora Core 10.
NOTE the processes are already running. New servers do setsid(). These are some servers spawned by older code which did not.

One thing you could try is to do setsid() in the miniservers. That will make them session and process group leaders.
Also, keep in mind that you can't change the process group id to one from another session, and that you have to do the call to change the process group either from within the process that you want to change the group of, or from the parent of the process.
I've recently written some test code to periodically change the process group of a set of processes for a very similar task. You need not change the group id periodically, it's just that I thought I might evade a certain script that periodically checked for a group that runs for longer than a certain amount of time. It may also help you track down the error that you get with setpgid():
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>
void err(const char *msg);
void prn(const char *msg);
void mydaemon();
int main(int arc, char *argv[]) {
mydaemon();
if (setsid() < 0)
err("setsid");
int secs = 5*60;
/* creating a pipe for the group leader to send changed
group ids to the child */
int pidx[2];
if (pipe(pidx))
err("pipe");
fcntl(pidx[0], F_SETFL, O_NONBLOCK);
fcntl(pidx[1], F_SETFL, O_NONBLOCK);
prn("begin");
/* here the child forks, it's a stand in for the set of
processes that need to have their group ids changed */
int child = fork();
switch (child) {
case -1: err("fork3");
case 0:
close(pidx[1]);
while(1) {
sleep(7);
secs -= 7;
if (secs <= 0) { prn("end child"); exit(0); }
int pid;
/* read new pid if available */
if (read(pidx[0], &pid, sizeof pid) != sizeof pid) continue;
/* set new process group id */
if (setpgid(getpid(), pid)) err("setpgid2");
prn("child group changed");
}
default: break;
}
close(pidx[0]);
/* here the group leader is forked every 20 seconds so that
a new process group can be sent to the child via the pipe */
while (1) {
sleep(20);
secs -= 20;
int pid = fork();
switch (pid) {
case -1: err("fork2");
case 0:
pid = getpid();
/* set process group leader for this process */
if (setpgid(pid, pid)) err("setpgid1");
/* inform child of change */
if (write(pidx[1], &pid, sizeof pid) != sizeof pid) err("write");
prn("group leader changed");
break;
default:
close(pidx[1]);
_exit(0);
}
if (secs <= 0) { prn("end leader"); exit(0); }
}
}
void prn(const char *msg) {
char buf[256];
strcpy(buf, msg);
strcat(buf, "\n");
write(2, buf, strlen(buf));
}
void err(const char *msg) {
char buf[256];
strcpy(buf, msg);
strcat(buf, ": ");
strcat(buf, strerror(errno));
prn(buf);
exit(1);
}
void mydaemon() {
int pid = fork();
switch (pid) {
case -1: err("fork");
case 0: break;
default: _exit(0);
}
close(0);
close(1);
/* close(2); let's keep stderr */
}

After some research I figured it out. Inshalla got the essential problem, "you can't change the process group id to one from another session" which explains why my setpgid() was failing (with a misleading message). However, it seems you can change it from any other process in the group (not necessarily the parent).
Since these processes were started by a FastCGI server and that FastCGI server was still running and in the same process group. Thus the problem, can't restart the FastCGI server without killing the servers it spawned. I wrote a new CGI program which did a setpgid() on the running servers, executed it through a web request and problem solved!

It sounds like you actually want to daemonise the process rather than move process groups. (Note: one can move process groups, but I believe you need to be in the same session and the target needs to already be a process group.)
But first, see if daemonising works for you:
#include <unistd.h>
#include <stdio.h>
int main() {
if (fork() == 0) {
setsid();
if (fork() == 0) {
printf("I'm still running! pid:%d", getpid());
sleep(10);
}
_exit(0);
}
return 0;
}
Obviously you should actually check for errors and such in real code, but the above should work.
The inner process will continue running even when the main process exits. Looking at the status of the inner process from /proc we find that it is, indeed, a child of init:
Name: a.out
State: S (sleeping)
Tgid: 21513
Pid: 21513
PPid: 1
TracerPid: 0

Related

Why EAGAIN in pthread_key_create happens?

Sometimes when I try to create key with pthread_key_create I'm getting EAGAIN error code. Is it possible to know exactly why?
Documentation says:
The system lacked the necessary resources to create another thread-specific data key, or the system-imposed limit on the total number of keys per process [PTHREAD_KEYS_MAX] would be exceeded.
How to check if it was a limit for keys? Maybe some king of monitor tool to check how many keys already opened in system and how many still could be used?
One important thing about our code: we use fork() and have multiple processes running. And each process could have multiple threads.
I found that we don't have independent limit for thread keys when we use fork(). Here is little example.
#include <stdio.h>
#include <pthread.h>
#include <string.h>
#include <unistd.h>
size_t create_keys(pthread_key_t *keys, size_t number_of_keys)
{
size_t counter = 0;
for (size_t i = 0; i < number_of_keys; i++)
{
int e = pthread_key_create(keys + i, NULL);
if (e)
{
printf("ERROR (%d): index: %ld, pthread_key_create (%d)\n", getpid(), i, e);
break;
}
counter++;
}
return counter;
}
int main(int argc, char const *argv[])
{
printf("maximim number of thread keys: %ld\n", sysconf(_SC_THREAD_KEYS_MAX));
printf("process id: %d\n", getpid());
const size_t number_of_keys = 1024;
pthread_key_t keys_1[number_of_keys];
memset(keys_1, 0, number_of_keys * sizeof(pthread_key_t));
printf("INFO (%d): number of active keys: %ld\n", getpid(), create_keys(keys_1, number_of_keys));
pid_t p = fork();
if (p == 0)
{
printf("process id: %d\n", getpid());
pthread_key_t keys_2[number_of_keys];
memset(keys_2, 0, number_of_keys * sizeof(pthread_key_t));
printf("INFO (%d): number of active keys: %ld\n", getpid(), create_keys(keys_2, number_of_keys));
}
return 0;
}
When I run this example on Ubuntu 16.04 I see that child process can not create any new thread key if I use same number of keys as limit (1024). But if I use 512 keys for parent and child processes I can run it without error.
As you know, fork() traditionally works by copying the process in memory and then continuing execution from the same point within each copy as parent and child. This is what the return code of fork() indicates.
In order to perform fork(), the internals of the process must be duplicated. Memory, stack, open files, and probably thread local storage keys. Each system is different in its implementation of fork(). Some systems allow you to customise the areas of the process that get copied (see Linux clone(2) interface). However, the concept remains the same.
So, on to your example code: if you allocate 1024 keys in the parent, every child process inherits a full key table and has no spare keys to work with, resulting in the errors. If you allocate only 512 keys in the parent, then every child inherits a half-empty keys table and has 512 spare keys to play with, hence no errors arise.
Maximum value:
#include <unistd.h>
#include <stdio.h>
int main ()
{
printf ("%ld\n", sysconf(_SC_THREAD_KEYS_MAX));
return 0;
}
Consider using pthread_key_delete.

What's the intention of waitpid

The sample code below is from linux man page waitpid function. Can the last else if be replaced with else? When I write code, I would write like:if, else if, and end with else. So I think it is strange in the sample code.
#include <sys/wait.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
int
main(int argc, char *argv[])
{
pid_t cpid, w;
int status;
cpid = fork();
if (cpid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (cpid == 0) { /* Code executed by child */
printf("Child PID is %ld\n", (long) getpid());
if (argc == 1)
pause(); /* Wait for signals */
_exit(atoi(argv[1]));
} else { /* Code executed by parent */
do {
w = waitpid(cpid, &status, WUNTRACED | WCONTINUED);
if (w == -1) {
perror("waitpid");
exit(EXIT_FAILURE);
}
if (WIFEXITED(status)) {
printf("exited, status=%d\n", WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
printf("killed by signal %d\n", WTERMSIG(status));
} else if (WIFSTOPPED(status)) {
printf("stopped by signal %d\n", WSTOPSIG(status));
} else if (WIFCONTINUED(status)) { /* can this be
*replaced with else ???
*/
printf("continued\n");
}
} while (!WIFEXITED(status) && !WIFSIGNALED(status));
exit(EXIT_SUCCESS);
}
}
Here's the question I think you are asking:
Is it possible for waitpid to return a status such that none of
WIFEXITED(status), WIFSIGNALED(status), WIFSTOPPED(status), or
WIFCONTINUED(status) returns nonzero?
The answer is almost certainly "no", and the Linux man page implies as much without (unfortunately) explicitly saying it.
The UNIX standard says more here, and specifically does guarantee that one of those macros will return nonzero in certain cases. But Linux is not (necessarily) compliant to this standard.
But I think the best solution in code would be to have an else clause after those four options, and (depending on your application) take this to mean that the status is unknown. This could happen even if it's not the case now - maybe in some future version of Linux there is another kind of exit condition not covered here, and you don't want your code to crash in such a case.
The standard does seem to have evolved over time. For example, earlier versions didn't have the WIFCONTINUED case, and I also found some references online to a WIFCORED in some other systems. So it would be good to make your code flexible if you're concerned about it.

Linux: combing sleep() with signals

Do you know where I can see a list of signals and functions that cannot be used alongside sleep() command?
For example, you can see this code:
// this program presents how to block signal SIGINT
// while running in critical region
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
static void sig_int(int);
int main(void) //6
{
sigset_t newmask, oldmask, zeromask;
if (signal(SIGINT, sig_int) == SIG_ERR)
fprintf(stderr,"signal(SIGINT) error");
sigemptyset(&zeromask);
sigemptyset(&newmask);
sigaddset(&newmask, SIGINT);
/* block SIGINT and save current signal mask */
if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)
fprintf(stderr,"SIG_BLOCK error");
/* critical region of code */
printf("In critical region: SIGINT will be blocked for 3 sec.\n");
printf("Type Ctrl-C in first 3 secs and see what happens.\n");
printf("Then run this program again and type Ctrl-C when 3 secs elapsed.\n");
fflush(stdout);
sleep(3);
printf("end sleep");
/* allow all signals and pause */
if (sigsuspend(&zeromask) != -1)
fprintf(stderr,"sigsuspend error");
printf("after return from sigsuspend: ");
/* reset signal mask which unblocks SIGINT */
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
fprintf(stderr,"SIG_SETMASK error");
/* and continue processing ... */
exit(0);
}
static void sig_int(int signo)
{
printf("\nIn sig_int: SIGINT\n"); fflush(stdout);
return;
}
The program doesn't wake up after the sleep(). Do you know why?
If you use strace, you can see what your program actually does
strace ./my-sig-program
If sleep never returns, I guess that task has received a SIGSTOP (this one can not be intercepted) or SIGTSTP (this one you can intercept with a signal handler), causing the OS the halt the entire process, until a SIGCONT is received.

How to safely `waitpid()` in a plugin with `SIGCHLD` handler calling `wait()` setup in the main program

I am writing a module for a toolkit which need to execute some sub processes and read their output. However, the main program that uses the toolkit may also spawn some sub processes and set up a signal handler for SIGCHLD which calls wait(NULL) to get rid of zombie processes. As a result, if the subprocess I create exit inside my waitpid(), the child process is handled before the signal handler is called and therefore the wait() in the signal handler will wait for the next process to end (which could take for ever). This behavior is described in the man page of waitpid (See grantee 2) since the linux implementation doesn't seem to allow the wait() family to handle SIGCHLD. I have tried popen() and posix_spawn() and both of them have the same problem. I have also tried to use double fork() so that the direct child exist immediately but I still cannot garentee that waitpid() is called after SIGCHLD is recieved.
My question is, if other part of the program sets up a signal handler which calls wait() (maybe it should rather call waidpid but that is not sth I can control), is there a way to safely execute child processes without overwriting the SIGCHLD handler (since it might do sth useful in some programs) or any zombie processes.
A small program which shows the problem is here (Noted that the main program only exit after the long run child exit, instead of the short one which is what it is directly waiting for with waitpid()):
#include <signal.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
static void
signalHandler(int sig)
{
printf("%s: %d\n", __func__, sig);
int status;
int ret = waitpid(-1, &status, 0);
printf("%s, ret: %d, status: %d\n", __func__, ret, status);
}
int
main()
{
struct sigaction sig_act;
memset(&sig_act, 0, sizeof(sig_act));
sig_act.sa_handler = signalHandler;
sigaction(SIGCHLD, &sig_act, NULL);
if (!fork()) {
sleep(20);
printf("%s: long run child %d exit.\n", __func__, getpid());
_exit(0);
}
pid_t pid = fork();
if (!pid) {
sleep(4);
printf("%s: %d exit.\n", __func__, getpid());
_exit(0);
}
printf("%s: %d -> %d\n", __func__, getpid(), pid);
sleep(1);
printf("%s, start waiting for %d\n", __func__, pid);
int status;
int ret = waitpid(pid, &status, 0);
printf("%s, ret: %d, pid: %d, status: %d\n", __func__, ret, pid, status);
return 0;
}
If the process is single-threaded, you can block the CHLD signal temporarily (using sigprocmask), fork/waitpid, then unblock again.
Do not forget to unblock the signal in the forked child - although POSIX states the signal mask is undefined when a process starts, most existing programs expect it to be completely unset.

Closing a file descriptor that is being polled

If I have two threads (Linux, NPTL), and I have one thread that is polling on one or more of file descriptors, and another is closing one of them, is that a reasonable action? Am I doing something that I shouldn't be doing in MT environment?
The main reason I consider doing that, is that I don't necessarily want to communicate with the polling thread, interrupt it, etc., I instead would like to just close the descriptor for whatever reasons, and when the polling thread wakes up, I expect the revents to contain POLLNVAL, which would be the indication that the file descriptor should just be thrown away by the thread before the next poll.
I've put together a simple test, which does show that the POLLNVAL is exactly what's going to happen. However, in that case, POLLNVAL is only set when the timeout expires, closing the socket doesn't seem to make the poll() return. If that's the case, I can kill the thread to make poll() restart to return.
#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <poll.h>
#include <errno.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
static pthread_t main_thread;
void * close_some(void*a) {
printf("thread #2 (%d) is sleeping\n", getpid());
sleep(2);
close(0);
printf("socket closed\n");
// comment out the next line to not forcefully interrupt
pthread_kill(main_thread, SIGUSR1);
return 0;
}
void on_sig(int s) {
printf("signal recieved\n");
}
int main(int argc, char ** argv) {
pthread_t two;
struct pollfd pfd;
int rc;
struct sigaction act;
act.sa_handler = on_sig;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
sigaction(SIGUSR1, &act, 0);
main_thread = pthread_self();
pthread_create(&two, 0, close_some, 0);
pfd.fd = 0;
pfd.events = POLLIN | POLLRDHUP;
printf("thread 0 (%d) polling\n", getpid());
rc = poll(&pfd, 1, 7000);
if (rc < 0) {
printf("error : %s\n", strerror(errno));
} else if (!rc) {
printf("time out!\n");
} else {
printf("revents = %x\n", pfd.revents);
}
return 0;
}
For Linux at least, this seems risky. The manual page for close warns:
It is probably unwise to close file descriptors while they may be in
use by system calls in other threads in the same process. Since a
file descriptor may be reused, there are some obscure race conditions
that may cause unintended side effects.
Since you're on Linux, you could do the following:
Set up an eventfd and add it to the poll
Signal the eventfd (write to it) when you want to close a fd
In the poll, when you see activity on the eventfd you can immediately close a fd and remove it from poll
Alternatively you could simply establish a signal handler and check for errno == EINTR when poll returns. The signal handler would only need to set some global variable to the value of the fd you're closing.
Since you're on Linux you might want to consider epoll as a superior albeit non-standard alternative to poll.

Resources