Does the CHILD_SUBREAPER bit persist across fork()? - linux

When a process sets the child subreaper bit with prctl(PR_SET_CHILD_SUBREAPER, 1) (documented here), does it need to use prctl(PR_SET_CHILD_SUBREAPER, 0) to clear it after a fork?

No, the child subreaper bit does not persist across forks.
The relevant Linux kernel code is in copy_signal() in kernel/fork.c: the signal struct is initialized to all zeros, and the is_child_subreaper bit is never set.
However, has_child_subreaper is set:
sig->has_child_subreaper = current->signal->has_child_subreaper ||
current->signal->is_child_subreaper;
This test program demonstrates the behavior:
#include <stdio.h>
#include <stdlib.h>
#include <sys/prctl.h>
int main(int argc, char** argv) {
int pid;
int i;
prctl(PR_SET_CHILD_SUBREAPER, 1);
prctl(PR_GET_CHILD_SUBREAPER, &i);
printf("Before fork: %d\n", i);
pid = fork();
if (pid < 0) {
return 1;
} else if (pid == 0) {
prctl(PR_GET_CHILD_SUBREAPER, &i);
printf("In child: %d\n", i);
return 0;
}
return 0;
}
Outputs:
Before fork: 1
In child: 0

Related

Alternate Without Semaphores

The program below creates two processes.
The parent process prints the numbers from 0 to 9, while the
child process prints the characters from A to J.
How can I make it alternate the printing using only signals? so no using semaphores or any IPCs.
#include <stdio.h>
int main()
{
int i = 0;
if (fork() == 0) /* Child process */
{
for (i = 0; i < 10; i++) {
printf("%c\n", i + 65);
}
} else /* Parent process */
{
for (i = 0; i < 10; i++) {
printf("%d\n", i);
}
}
}
I figured it out.
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <sys/wait.h>
#include <signal.h>
int main()
{
int i=0;
int status;
int childPID = fork();
if (childPID == 0) /* Child process */
{
for(i=0;i<10;i++){
printf("%c\n",i+65);
kill(getpid(), SIGSTOP);
}
} else /* Parent process */
{
for(i=0;i<10;i++) {
waitpid(childPID,&status,WUNTRACED);
printf("%d\n",i);
kill(childPID, SIGCONT);
}
}
}

Is this a bug in linux kernel concerning write to /proc/self/loginuid?

There is a possibility that i found a bug in linux kernel. Let's consider application that write to /proc/self/loginuid from main thread and one auxiliary thread. The code is below:
#include <stdio.h>
#include <pthread.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
void write_loginuid(char *str)
{
int fd;
printf("%s\n", str);
fd = open("/proc/self/loginuid", O_RDWR);
if (fd < 0) {
perror(str);
return;
}
if (write(fd, "0", 2) != 2) {
printf("write\n");
perror(str);
}
close(fd);
}
void *thread_function(void *arg)
{
fprintf(stderr, "Hello from thread! my pid = %u, tid = %u, parent pid = %u\n", getpid(), syscall(SYS_gettid), getppid());
write_loginuid("thread");
return NULL;
}
int main()
{
pthread_t thread;
pthread_create(&thread, NULL, thread_function, NULL);
write_loginuid("main process");
fprintf(stderr, "test my pid = %u, tid = %u, parent pid = %u\n", getpid(), syscall(SYS_gettid), getppid());
pthread_join(thread, NULL);
return 0;
}
After executing this application we get:
main process
test my pid = 3487, tid = 3487, parent pid = 3283
Hello from thread! my pid = 3487, tid = 3488, parent pid = 3283
thread
write
thread: Operation not permitted
That tells us the thread write failed by -EPERM.
Looking at the kernel file fs/proc/base.c and function proc_loginuid_write() we see at the beginning check:
static ssize_t proc_loginuid_write(struct file * file, const char __user * buf,
size_t count, loff_t *ppos)
{
struct inode * inode = file_inode(file);
uid_t loginuid;
kuid_t kloginuid;
int rv;
/* this is the probably buggy check */
rcu_read_lock();
if (current != pid_task(proc_pid(inode), PIDTYPE_PID)) {
rcu_read_unlock();
return -EPERM;
}
rcu_read_unlock();
So, looking at the code above we see that only for exact PID (checked by me with printks) we pass through.Thread doesn't satisfy the condition, because compared pids differs.
So my question is: is this a bug ? Why to not allow thread's of particular process to change the loginuid? I encountered this in login application that spawned another thread for PAM login.
Whether this is bug or not i written a fix that extends writing permission to this file by threads:
rcu_read_lock();
/*
* I changed the condition that it checks now the tgid as returned in sys_getpid()
* rather than task_struct pointers
*/
if (task_tgid_vnr(current) != task_tgid_vnr(pid_task(proc_pid(inode), PIDTYPE_PID))) {
rcu_read_unlock();
return -EPERM;
}
rcu_read_unlock();
What do you think about it? Does it affects security?

trying to use pipe(2) with the sort unix tool but not working

I have been struggling to find what I'm doing wrong and I can't seem to find the issue. When I compile the code below, I get an I/O error.
e.g: /usr/bin/sort: read failed: -: Input/output error
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char **argv, char **envp)
{
int fd[2];
pid_t pid;
pipe(fd);
pid = fork();
if (pid == -1) {
exit(EXIT_FAILURE);
}
if (pid == 0) { /* child */
char *exe[]= { "/usr/bin/sort", NULL };
close(fd[0]);
execve("/usr/bin/sort", exe, envp);
}
else {
char *a[] = { "zilda", "andrew", "bartholomeu", NULL };
int i;
close(fd[1]);
for (i = 0; a[i]; i++)
printf("%s\n", a[i]);
}
return 0;
}
dup2(fd[0], 0) in the child.
dup2(fd[1], 1) in the parent.
close the other fd.

Can not get proper response from select() using writefds

Parent receives SIGPIPE sending chars to aborted child process through FIFO pipe.
I am trying to avoid this, using select() function. In the attached sample code,
select() retruns OK even after the child at the other end of pipe having been terminated.
Tested in
RedHat EL5 (Linux 2.6.18-194.32.1.el5)
GNU C Library stable release version 2.5
Any help appreciated. Thnak you.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/stat.h>
#include <unistd.h>
static void sigpipe_fct();
main()
{
struct stat st;
int i, fd_out, fd_in, child;
char buf[1024];
#define p_out "/tmp/pout"
signal(SIGPIPE, sigpipe_fct);
if (stat(p_out, &st) != 0) {
mknod(p_out, S_IFIFO, 0);
chmod(p_out, 0666);
}
/* start receiving process */
if ((child = fork()) == 0) {
if ((fd_in = open(p_out, O_RDONLY)) < 0) {
perror(p_out);
exit(1);
}
while(1) {
i = read(fd_in, buf, sizeof(buf));
fprintf(stderr, "child %d read %.*s\n", getpid(), i, buf);
lseek(fd_in, 0, 0);
}
}
else {
fprintf(stderr,
"reading from %s - exec \"kill -9 %d\" to test\n", p_out, child);
if ((fd_out = open(p_out, O_WRONLY + O_NDELAY)) < 0) { /* output */
perror(p_out);
exit(1);
}
while(1) {
if (SelectChkWrite(fd_out) == fd_out) {
fprintf(stderr, "SelectChkWrite() success write abc\n");
write(fd_out, "abc", 3);
}
else
fprintf(stderr, "SelectChkWrite() failed\n");
sleep(3);
}
}
}
static void sigpipe_fct()
{
fprintf(stderr, "SIGPIPE received\n");
exit(-1);
}
SelectChkWrite(ch)
int ch;
{
#include <sys/select.h>
fd_set writefds;
int i;
FD_ZERO(&writefds);
FD_SET (ch, &writefds);
i = select(ch + 1, NULL, &writefds, NULL, NULL);
if (i == -1)
return(-1);
else if (FD_ISSET(ch, &writefds))
return(ch);
else
return(-1);
}
From the Linux select(3) man page:
A descriptor shall be considered ready for writing when a call to an
output function with O_NONBLOCK clear would not block, whether or not
the function would transfer data successfully.
When the pipe is closed, it won't block, so it is considered "ready" by select.
BTW, having #include <sys/select.h> inside your SelectChkWrite() function is extremely bad form.
Although select() and poll() are both in the POSIX standard, select() is much older and more limited than poll(). In general, I recommend people use poll() by default and only use select() if they have a good reason. (See here for one example.)

Reading a child process's /proc/pid/mem file from the parent

In the program below, I am trying to cause the following to happen:
Process A assigns a value to a stack variable a.
Process A (parent) creates process B (child) with PID child_pid.
Process B calls function func1, passing a pointer to a.
Process B changes the value of variable a through the pointer.
Process B opens its /proc/self/mem file, seeks to the page containing a, and prints the new value of a.
Process A (at the same time) opens /proc/child_pid/mem, seeks to the right page, and prints the new value of a.
The problem is that, in step 6, the parent only sees the old value of a in /proc/child_pid/mem, while the child can indeed see the new value in its /proc/self/mem. Why is this the case? Is there any way that I can get the parent to to see the child's changes to its address space through the /proc filesystem?
#include <fcntl.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <unistd.h>
#define PAGE_SIZE 0x1000
#define LOG_PAGE_SIZE 0xc
#define PAGE_ROUND_DOWN(v) ((v) & (~(PAGE_SIZE - 1)))
#define PAGE_ROUND_UP(v) (((v) + PAGE_SIZE - 1) & (~(PAGE_SIZE - 1)))
#define OFFSET_IN_PAGE(v) ((v) & (PAGE_SIZE - 1))
# if defined ARCH && ARCH == 32
#define BP "ebp"
#define SP "esp"
#else
#define BP "rbp"
#define SP "rsp"
#endif
typedef struct arg_t {
int a;
} arg_t;
void func1(void * data) {
arg_t * arg_ptr = (arg_t *)data;
printf("func1: old value: %d\n", arg_ptr->a);
arg_ptr->a = 53;
printf("func1: address: %p\n", &arg_ptr->a);
printf("func1: new value: %d\n", arg_ptr->a);
}
void expore_proc_mem(void (*fn)(void *), void * data) {
off_t frame_pointer, stack_start;
char buffer[PAGE_SIZE];
const char * path = "/proc/self/mem";
int child_pid, status;
int parent_to_child[2];
int child_to_parent[2];
arg_t * arg_ptr;
off_t child_offset;
asm volatile ("mov %%"BP", %0" : "=m" (frame_pointer));
stack_start = PAGE_ROUND_DOWN(frame_pointer);
printf("Stack_start: %lx\n",
(unsigned long)stack_start);
arg_ptr = (arg_t *)data;
child_offset =
OFFSET_IN_PAGE((off_t)&arg_ptr->a);
printf("Address of arg_ptr->a: %p\n",
&arg_ptr->a);
pipe(parent_to_child);
pipe(child_to_parent);
bool msg;
int child_mem_fd;
char child_path[0x20];
child_pid = fork();
if (child_pid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (!child_pid) {
close(child_to_parent[0]);
close(parent_to_child[1]);
printf("CHILD (pid %d, parent pid %d).\n",
getpid(), getppid());
fn(data);
msg = true;
write(child_to_parent[1], &msg, 1);
child_mem_fd = open("/proc/self/mem", O_RDONLY);
if (child_mem_fd == -1) {
perror("open (child)");
exit(EXIT_FAILURE);
}
printf("CHILD: child_mem_fd: %d\n", child_mem_fd);
if (lseek(child_mem_fd, stack_start, SEEK_SET) == (off_t)-1) {
perror("lseek");
exit(EXIT_FAILURE);
}
if (read(child_mem_fd, buffer, sizeof(buffer))
!= sizeof(buffer)) {
perror("read");
exit(EXIT_FAILURE);
}
printf("CHILD: new value %d\n",
*(int *)(buffer + child_offset));
read(parent_to_child[0], &msg, 1);
exit(EXIT_SUCCESS);
}
else {
printf("PARENT (pid %d, child pid %d)\n",
getpid(), child_pid);
printf("PARENT: child_offset: %lx\n",
child_offset);
read(child_to_parent[0], &msg, 1);
printf("PARENT: message from child: %d\n", msg);
snprintf(child_path, 0x20, "/proc/%d/mem", child_pid);
printf("PARENT: child_path: %s\n", child_path);
child_mem_fd = open(path, O_RDONLY);
if (child_mem_fd == -1) {
perror("open (child)");
exit(EXIT_FAILURE);
}
printf("PARENT: child_mem_fd: %d\n", child_mem_fd);
if (lseek(child_mem_fd, stack_start, SEEK_SET) == (off_t)-1) {
perror("lseek");
exit(EXIT_FAILURE);
}
if (read(child_mem_fd, buffer, sizeof(buffer))
!= sizeof(buffer)) {
perror("read");
exit(EXIT_FAILURE);
}
printf("PARENT: new value %d\n",
*(int *)(buffer + child_offset));
close(child_mem_fd);
printf("ENDING CHILD PROCESS.\n");
write(parent_to_child[1], &msg, 1);
if (waitpid(child_pid, &status, 0) == -1) {
perror("waitpid");
exit(EXIT_FAILURE);
}
}
}
int main(void) {
arg_t arg;
arg.a = 42;
printf("In main: address of arg.a: %p\n", &arg.a);
explore_proc_mem(&func1, &arg.a);
return EXIT_SUCCESS;
}
This program produces the output below. Notice that the value of a (boldfaced) differs between parent's and child's reading of the /proc/child_pid/mem file.
In main: address of arg.a: 0x7ffffe1964f0
Stack_start: 7ffffe196000
Address of arg_ptr->a: 0x7ffffe1964f0
PARENT (pid 20376, child pid 20377)
PARENT: child_offset: 4f0
CHILD (pid 20377, parent pid 20376).
func1: old value: 42
func1: address: 0x7ffffe1964f0
func1: new value: 53
PARENT: message from child: 1
CHILD: child_mem_fd: 4
PARENT: child_path: /proc/20377/mem
CHILD: new value 53
PARENT: child_mem_fd: 7
PARENT: new value 42
ENDING CHILD PROCESS.
There's one silly mistake in this code:
const char * path = "/proc/self/mem";
...
snprintf(child_path, 0x20, "/proc/%d/mem", child_pid);
printf("PARENT: child_path: %s\n", child_path);
child_mem_fd = open(path, O_RDONLY);
So you always end up reading parent's memory here. However after changing this, I get:
CHILD: child_mem_fd: 4
CHILD: new value 53
read (parent): No such process
And I don't know why it could happen - maybe /proc is too slow in refreshing the entries? (it's from perror("read") in the parent - had to add a comment to see which one fails) But that seems weird, since the seek worked - as well as open itself.
That question doesn't seem to be new either: http://lkml.indiana.edu/hypermail/linux/kernel/0007.1/0939.html (ESRCH is "no such process")
Actually a better link is: http://www.webservertalk.com/archive242-2004-7-295131.html - there was an issue with marking processes pthread-attach-safe. You can find there Alan Cox sending someone to Solar Designer... for me that spells "here be dragons" and that it's not solvable if you don't hack kernels in your sleep :(
Maybe it's enough for you to check what is gdb doing in that case and replicating it? (Probably it just goes via ptrace(PTRACE_PEEKDATA,...))
The solution is to use ptrace to synchronize parent with child. Even though I am already communicating between parent and child (and the man page for ptrace says that it causes the two processes to behave as if they were parent and child), and even though the child is blocking on the read call, the child has apparently not "stopped" enough for Linux to allow the parent to read the child's /proc/child_pid/mem file. But if the parent first calls ptrace (after it receives the message over the pipe) with PTRACE_ATTACH, then it can open the file--and get the correct contents! Then the parent calls ptrace again, with PTRACE_DETACH, before sending the message back to the child to terminate.

Resources