Debugging segmentation fault in a multi-threaded (using clone) program - multithreading

I wrote a code to create some threads and whenever one of the threads finish a new thread is created to replace it. As I was not able to create very large number of threads (>450) using pthreads, I used clone system call instead. (Please note that I am aware of the implication of having such a huge number of threads, but this program is meant to only stress the system).
As clone() requires the stack space for the child thread to be specified as parameter, I malloc the required chunk of stack space for each thread and free it up when the thread finishes. When a thread finishes I send a signal to the parent to notify it of the same.
The code is given below:
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>
#include <errno.h>
#define NUM_THREADS 5
unsigned long long total_count=0;
int num_threads = NUM_THREADS;
static int thread_pids[NUM_THREADS];
static void *thread_stacks[NUM_THREADS];
int ppid;
int worker() {
int i;
union sigval s={0};
for(i=0;i!=99999999;i++);
if(sigqueue(ppid, SIGUSR1, s)!=0)
fprintf(stderr, "ERROR sigqueue");
fprintf(stderr, "Child [%d] done\n", getpid());
return 0;
}
void sigint_handler(int signal) {
char fname[35]="";
FILE *fp;
int ch;
if(signal == SIGINT) {
fprintf(stderr, "Caught SIGINT\n");
sprintf(fname, "/proc/%d/status", getpid());
fp = fopen(fname,"r");
while((ch=fgetc(fp))!=EOF)
fprintf(stderr, "%c", (char)ch);
fclose(fp);
fprintf(stderr, "No. of threads created so far = %llu\n", total_count);
exit(0);
} else
fprintf(stderr, "Unhandled signal (%d) received\n", signal);
}
int main(int argc, char *argv[]) {
int rc, i; long t;
void *chld_stack, *chld_stack2;
siginfo_t siginfo;
sigset_t sigset, oldsigset;
if(argc>1) {
num_threads = atoi(argv[1]);
if(num_threads<1) {
fprintf(stderr, "Number of threads must be >0\n");
return -1;
}
}
signal(SIGINT, sigint_handler);
/* Block SIGUSR1 */
sigemptyset(&sigset);
sigaddset(&sigset, SIGUSR1);
if(sigprocmask(SIG_BLOCK, &sigset, &oldsigset)==-1)
fprintf(stderr, "ERROR: cannot block SIGUSR1 \"%s\"\n", strerror(errno));
printf("Number of threads = %d\n", num_threads);
ppid = getpid();
for(t=0,i=0;t<num_threads;t++,i++) {
chld_stack = (void *) malloc(148*512);
chld_stack2 = ((char *)chld_stack + 148*512 - 1);
if(chld_stack == NULL) {
fprintf(stderr, "ERROR[%ld]: malloc for stack-space failed\n", t);
break;
}
rc = clone(worker, chld_stack2, CLONE_VM|CLONE_FS|CLONE_FILES, NULL);
if(rc == -1) {
fprintf(stderr, "ERROR[%ld]: return code from pthread_create() is %d\n", t, errno);
break;
}
thread_pids[i]=rc;
thread_stacks[i]=chld_stack;
fprintf(stderr, " [index:%d] = [pid:%d] ; [stack:0x%p]\n", i, thread_pids[i], thread_stacks[i]);
total_count++;
}
sigemptyset(&sigset);
sigaddset(&sigset, SIGUSR1);
while(1) {
fprintf(stderr, "Waiting for signal from childs\n");
if(sigwaitinfo(&sigset, &siginfo) == -1)
fprintf(stderr, "- ERROR returned by sigwaitinfo : \"%s\"\n", strerror(errno));
fprintf(stderr, "Got some signal from pid:%d\n", siginfo.si_pid);
/* A child finished, free the stack area allocated for it */
for(i=0;i<NUM_THREADS;i++) {
fprintf(stderr, " [index:%d] = [pid:%d] ; [stack:%p]\n", i, thread_pids[i], thread_stacks[i]);
if(thread_pids[i]==siginfo.si_pid) {
free(thread_stacks[i]);
thread_stacks[i]=NULL;
break;
}
}
fprintf(stderr, "Search for child ended with i=%d\n",i);
if(i==NUM_THREADS)
continue;
/* Create a new thread in its place */
chld_stack = (void *) malloc(148*512);
chld_stack2 = ((char *)chld_stack + 148*512 - 1);
if(chld_stack == NULL) {
fprintf(stderr, "ERROR[%ld]: malloc for stack-space failed\n", t);
break;
}
rc = clone(worker, chld_stack2, CLONE_VM|CLONE_FS|CLONE_FILES, NULL);
if(rc == -1) {
fprintf(stderr, "ERROR[%ld]: return code from clone() is %d\n", t, errno);
break;
}
thread_pids[i]=rc;
thread_stacks[i]=chld_stack;
total_count++;
}
fprintf(stderr, "Broke out of infinite loop. [total_count=%llu] [i=%d]\n",total_count, i);
return 0;
}
I have used couple of arrays to keep track of the child processes' pid and the stack area base address (for freeing it).
When I run this program it terminates after sometime. Running with gdb tells me that one of the thread gets a SIGSEGV (segmentation fault). But it doesn't gives me any location, the output is similar to the following:
Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 15864]
0x00000000 in ?? ()
I tried running it under valgrind with the following commandline:
valgrind --tool=memcheck --leak-check=yes --show-reachable=yes -v --num-callers=20 --track-fds=yes ./a.out
But it keeps running without any issues under valgrind.
I am puzzled as to how to debug this program. I felt that this might be some stack overflow or something but increasing the stack size (upto 74KB) didn't solved the problem.
My only query is why and where is the segmentation fault or how to debug this program.

Found the actual issue.
When the worker thread signals the parent process using sigqueue(), the parent sometimes gets the control immediately and frees up the stack before the child executes the return statement. When the same child thread uses return statement, it causes segmentation fault as the stack got corrupted.
To solve this I replaced
exit(0)
instead of
return 0;

I think i found the answer
Step 1
Replace this:
static int thread_pids[NUM_THREADS];
static void *thread_stacks[NUM_THREADS];
By this:
static int *thread_pids;
static void **thread_stacks;
Step 2
Add this in the main function (after checking arguments):
thread_pids = malloc(sizeof(int) * num_threads);
thread_stacks = malloc(sizeof(void *) * num_threads);
Step 3
Replace this:
chld_stack2 = ((char *)chld_stack + 148*512 - 1);
By this:
chld_stack2 = ((char *)chld_stack + 148*512);
In both places you use it.
I dont know if its really your problem, but after testing it i didnt get any segmentation fault. Btw i did only get segmentation faults when using more than 5 threads.
Hope i helped!
edit: tested with 1000 threads and runs perfectly
edit2: Explanation why the static allocation of thread_pids and thread_stacks causes an error.
The best way to do this is with an example.
Assume num_threads = 10;
The problem occurs in the following code:
for(t=0,i=0;t<num_threads;t++,i++) {
...
thread_pids[i]=rc;
thread_stacks[i]=chld_stack;
...
}
Here you try to access memory which does not belong to you (0 <= i <= 9, but both arrays have a size of 5). That can cause either segmentation fault or data corruption. Data corruption may happen if both arrays are allocated one after the other, resulting in writing to the other array. Segmentation can happen if you write in memory you dont have allocated (statically or dynamically).
You may be lucky and have no errors at all, but the code is surely not safe.
About the non-aligned pointer: I think i dont have to explain more than in my comment.

Related

need to know how to interrupt all pthreads

In Linux, I am emulating an embedded system that has one thread that gets messages delivered to the outside world. If some thread detects an insurmountable problem, my goal is to stop all the other threads in their tracks (leaving useful stack traces) and allow only the message delivery thread to continue. So in my emulation environment, I want to "pthread_kill(tid, SIGnal)" each "tid". (I have a list. I'm using SIGTSTP.) Unfortunately, only one thread is getting the signal. "sigprocmask()" is not able to unmask the signal. Here is my current (non-working) handler:
void
wait_until_death(int sig)
{
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, sig);
sigprocmask(SIG_UNBLOCK, &mask, NULL);
for (;;)
pause();
}
I get verification that all the pthread_kill()'s get invoked, but only one thread has the handler in the stack trace. Can this be done?
This minimal example seems to function in the manner you want - all the threads except the main thread end up waiting in wait_until_death():
#include <stdio.h>
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
#define NTHREADS 10
pthread_barrier_t barrier;
void
wait_until_death(int sig)
{
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, sig);
sigprocmask(SIG_UNBLOCK, &mask, NULL);
for (;;)
pause();
}
void *thread_func(void *arg)
{
pthread_barrier_wait(&barrier);
for (;;)
pause();
}
int main(int argc, char *argv[])
{
const int thread_signal = SIGTSTP;
const struct sigaction sa = { .sa_handler = wait_until_death };
int i;
pthread_t thread[NTHREADS];
pthread_barrier_init(&barrier, NULL, NTHREADS + 1);
sigaction(thread_signal, &sa, NULL);
for (i = 0; i < NTHREADS; i++)
pthread_create(&thread[i], NULL, thread_func, NULL);
pthread_barrier_wait(&barrier);
for (i = 0; i < NTHREADS; i++)
pthread_kill(thread[i], thread_signal);
fprintf(stderr, "All threads signalled.\n");
for (;;)
pause();
return 0;
}
Note that unblocking the signal in the wait_until_death() isn't required: the signal mask is per-thread, and the thread that is executing the signal handler isn't going to be signalled again.
Presumably the problem is in how you are installing the signal handler, or setting up thread signal masks.
This is impossible. The problem is that some of the threads you stop may hold locks that the thread you want to continue running requires in order to continue making forward progress. Just abandon this idea entirely. Trust me, this will only cause you great pain.
If you literally must do it, have all the other threads call a conditional yielding point at known safe places where they hold no lock that can prevent any other thread from reaching its next conditional yielding point. But this is very difficult to get right and is very prone to deadlock and I strongly advise not trying it.

Cygwin: interrupting blocking read

I've written the program which spawns a thread that reads in a loop from stdin in a blocking fashion. I want to make the thread return from blocked read immediately. I've registered my signal handler (with sigaction and without SA_RESTART flag) in the reading thread, send it a signal and expect read to exit with EINTR error. But it doesn't happen. Is it issue or limitation of Cygwin or am I doing something wrong?
Here is the code:
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
pthread_t thread;
volatile int run = 0;
void root_handler(int signum)
{
printf("%s ENTER (thread is %x)\n", __func__, pthread_self());
run = 0;
}
void* thr_func(void*arg)
{ int res;
char buffer[256];
printf("%s ENTER (thread is %x)\n", __func__, pthread_self());
struct sigaction act;
memset (&act, 0, sizeof(act));
act.sa_sigaction = &root_handler;
//act.sa_flags = SA_RESTART;
if (sigaction(SIGUSR1, &act, NULL) < 0) {
perror ("sigaction error");
return 1;
}
while(run)
{
res = read(0,buffer, sizeof(buffer));
if(res == -1)
{
if(errno == EINTR)
{
puts("read was interrupted by signal");
}
}
else
{
printf("got: %s", buffer);
}
}
printf("%s LEAVE (thread is %x)\n", __func__, pthread_self());
}
int main() {
run = 1;
printf("root thread: %x\n", pthread_self());
pthread_create(&thread, NULL, &thr_func, NULL);
printf("thread %x started\n", thread);
sleep(4);
pthread_kill(thread, SIGUSR1 );
//raise(SIGUSR1);
pthread_join(thread, NULL);
return 0;
}
I'm using Cygwin (1.7.32(0.274/5/3)).
I've just tried to do the same on Ubuntu and it works (I needed to include signal.h, though, even though in Cygwin it compiled as it is). It seems to be peculiarity of Cygwin's implementation.

Differences between POSIX threads on OSX and LINUX?

Can anyone shed light on the reason that when the below code is compiled and run on OSX the 'bartender' thread skips through the sem_wait() in what seems like a random manner and yet when compiled and run on a Linux machine the sem_wait() holds the thread until the relative call to sem_post() is made, as would be expected?
I am currently learning not only POSIX threads but concurrency as a whole so absoutely any comments, tips and insights are warmly welcomed...
Thanks in advance.
#include <stdio.h>
#include <stdlib.h>
#include <semaphore.h>
#include <fcntl.h>
#include <unistd.h>
#include <pthread.h>
#include <errno.h>
//using namespace std;
#define NSTUDENTS 30
#define MAX_SERVINGS 100
void* student(void* ptr);
void get_serving(int id);
void drink_and_think();
void* bartender(void* ptr);
void refill_barrel();
// This shared variable gives the number of servings currently in the barrel
int servings = 10;
// Define here your semaphores and any other shared data
sem_t *mutex_stu;
sem_t *mutex_bar;
int main() {
static const char *semname1 = "Semaphore1";
static const char *semname2 = "Semaphore2";
pthread_t tid;
mutex_stu = sem_open(semname1, O_CREAT, 0777, 0);
if (mutex_stu == SEM_FAILED)
{
fprintf(stderr, "%s\n", "ERROR creating semaphore semname1");
exit(EXIT_FAILURE);
}
mutex_bar = sem_open(semname2, O_CREAT, 0777, 1);
if (mutex_bar == SEM_FAILED)
{
fprintf(stderr, "%s\n", "ERROR creating semaphore semname2");
exit(EXIT_FAILURE);
}
pthread_create(&tid, NULL, bartender, &tid);
for(int i=0; i < NSTUDENTS; ++i) {
pthread_create(&tid, NULL, student, &tid);
}
pthread_join(tid, NULL);
sem_unlink(semname1);
sem_unlink(semname2);
printf("Exiting the program...\n");
}
//Called by a student process. Do not modify this.
void drink_and_think() {
// Sleep time in milliseconds
int st = rand() % 10;
sleep(st);
}
// Called by a student process. Do not modify this.
void get_serving(int id) {
if (servings > 0) {
servings -= 1;
} else {
servings = 0;
}
printf("ID %d got a serving. %d left\n", id, servings);
}
// Called by the bartender process.
void refill_barrel()
{
servings = 1 + rand() % 10;
printf("Barrel refilled up to -> %d\n", servings);
}
//-- Implement a synchronized version of the student
void* student(void* ptr) {
int id = *(int*)ptr;
printf("Started student %d\n", id);
while(1) {
sem_wait(mutex_stu);
if(servings > 0) {
get_serving(id);
} else {
sem_post(mutex_bar);
continue;
}
sem_post(mutex_stu);
drink_and_think();
}
return NULL;
}
//-- Implement a synchronized version of the bartender
void* bartender(void* ptr) {
int id = *(int*)ptr;
printf("Started bartender %d\n", id);
//sleep(5);
while(1) {
sem_wait(mutex_bar);
if(servings <= 0) {
refill_barrel();
} else {
printf("Bar skipped sem_wait()!\n");
}
sem_post(mutex_stu);
}
return NULL;
}
The first time you run the program, you're creating named semaphores with initial values, but since your threads never exit (they're infinite loops), you never get to the sem_unlink calls to delete those semaphores. If you kill the program (with ctrl-C or any other way), the semaphores will still exist in whatever state they are in. So if you run the program again, the sem_open calls will succeed (because you don't use O_EXCL), but they won't reset the semaphore value or state, so they might be in some odd state.
So you should make sure to call sem_unlink when the program STARTS, before calling sem_open. Better yet, don't use named semaphores at all -- use sem_init to initialize a couple of unnamed semaphores instead.

Select() doesnt behave properly with an eventfd

I want to use eventfd as a way to signal simple events between kernelspace and userspace. eventfd will be used as a way to signal and the actual data will be transferred using ioctl.
Before going ahead with implementing this, I wrote a simple program to see how eventfd behaves with select(). It seems that if you use select to wait on an eventfd, it wont return when u write to it in a separate thread. In the code I wrote, the writing thread waits for 5 seconds beginning from program start before writing to the eventfd twice. I would expect the select() to return in the reading thread immediately following this write but this does not happen. The select() returns only after the timeout of 10 seconds and returns zero. Regardless of this return zero, when I try to read the eventfd after 10 seconds, I get the correct value.
I use Ubuntu 12.04.1 (3.2.0-29-generic-pae) i386
Any idea why this is so? It seems to me that select() is not working as it should.
PS: This question is similar to linux - Can't get eventfd to work with epoll together
Is anyone else facing similar issues?
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h> //Definition of uint64_t
#include <pthread.h> //One thread writes to fd, other waits on it and then reads it
#include <time.h> //Writing thread uses delay before writing
#include <sys/eventfd.h>
int efd; //Event file descriptor
void * writing_thread_func() {
uint64_t eftd_ctr = 34;
ssize_t s;
printf("\n%s: now running...",__func__);
printf("\n%s: now sleeping for 5 seconds...",__func__);
fflush(stdout); //must call fflush before sleeping to ensure previous printf() is executed
sleep(5);
printf("\n%s: Writing %lld to eventfd...",__func__,eftd_ctr);
s = write(efd, &eftd_ctr, sizeof(uint64_t));
if (s != sizeof(uint64_t)) {
printf("\n%s: eventfd writing error. Exiting...",__func__);
exit(EXIT_FAILURE);
}
eftd_ctr = 99;
printf("\n%s: Writing %lld to eventfd...",__func__,eftd_ctr);
s = write(efd, &eftd_ctr, sizeof(uint64_t));
if (s != sizeof(uint64_t)) {
printf("\n%s: eventfd writing error. Exiting...",__func__);
exit(EXIT_FAILURE);
}
printf("\n%s: thread exiting...",__func__);
pthread_exit(0);
}
void * reading_thread_func() {
ssize_t s;
uint64_t eftd_ctr;
int retval; //for select()
fd_set rfds; //for select()
struct timeval tv; //for select()
printf("\n%s: now running...",__func__);
printf("\n%s: now waiting on select()...",__func__);
//Watch efd
FD_ZERO(&rfds);
FD_SET(efd, &rfds);
//Wait up to 10 seconds
tv.tv_sec = 10;
tv.tv_usec = 0;
retval = select(1, &rfds, NULL, NULL, &tv);
if (retval == -1){
printf("\n%s: select() error. Exiting...",__func__);
exit(EXIT_FAILURE);
} else if (retval > 0) {
printf("\n%s: select() says data is available now. Exiting...",__func__);
printf("\n%s: returned from select(), now executing read()...",__func__);
s = read(efd, &eftd_ctr, sizeof(uint64_t));
if (s != sizeof(uint64_t)){
printf("\n%s: eventfd read error. Exiting...",__func__);
exit(EXIT_FAILURE);
}
printf("\n%s: Returned from read(), value read = %lld",__func__, eftd_ctr);
} else if (retval == 0) {
printf("\n%s: select() says that no data was available even after 10 seconds...",__func__);
printf("\n%s: but lets try reading efd count anyway...",__func__);
s = read(efd, &eftd_ctr, sizeof(uint64_t));
if (s != sizeof(uint64_t)){
printf("\n%s: eventfd read error. Exiting...",__func__);
exit(EXIT_FAILURE);
}
printf("\n%s: Returned from read(), value read = %lld",__func__, eftd_ctr);
exit(EXIT_FAILURE);
}
printf("\n%s: thread exiting...",__func__);
pthread_exit(0);
}
int main() {
pthread_t writing_thread_var, reading_thread_var;
//Create eventfd
efd = eventfd(0,0);
if (efd == -1){
printf("\n%s: Unable to create eventfd! Exiting...",__func__);
exit(EXIT_FAILURE);
}
printf("\n%s: eventfd created. value = %d. Spawning threads...",__func__,efd);
//Create threads
pthread_create(&writing_thread_var, NULL, writing_thread_func, NULL);
pthread_create(&reading_thread_var, NULL, reading_thread_func, NULL);
//Wait for threads to terminate
pthread_join(writing_thread_var, NULL);
pthread_join(reading_thread_var, NULL);
printf("\n%s: closing eventfd. Exiting...",__func__);
close(efd);
exit(EXIT_SUCCESS);
}
So it was a silly mistake:
I changed:
retval = select(1, &rfds, NULL, NULL, &tv);
to:
retval = select(efd+1, &rfds, NULL, NULL, &tv);
and it worked.
Thanks again #Steve-o

Reading a child process's /proc/pid/mem file from the parent

In the program below, I am trying to cause the following to happen:
Process A assigns a value to a stack variable a.
Process A (parent) creates process B (child) with PID child_pid.
Process B calls function func1, passing a pointer to a.
Process B changes the value of variable a through the pointer.
Process B opens its /proc/self/mem file, seeks to the page containing a, and prints the new value of a.
Process A (at the same time) opens /proc/child_pid/mem, seeks to the right page, and prints the new value of a.
The problem is that, in step 6, the parent only sees the old value of a in /proc/child_pid/mem, while the child can indeed see the new value in its /proc/self/mem. Why is this the case? Is there any way that I can get the parent to to see the child's changes to its address space through the /proc filesystem?
#include <fcntl.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <unistd.h>
#define PAGE_SIZE 0x1000
#define LOG_PAGE_SIZE 0xc
#define PAGE_ROUND_DOWN(v) ((v) & (~(PAGE_SIZE - 1)))
#define PAGE_ROUND_UP(v) (((v) + PAGE_SIZE - 1) & (~(PAGE_SIZE - 1)))
#define OFFSET_IN_PAGE(v) ((v) & (PAGE_SIZE - 1))
# if defined ARCH && ARCH == 32
#define BP "ebp"
#define SP "esp"
#else
#define BP "rbp"
#define SP "rsp"
#endif
typedef struct arg_t {
int a;
} arg_t;
void func1(void * data) {
arg_t * arg_ptr = (arg_t *)data;
printf("func1: old value: %d\n", arg_ptr->a);
arg_ptr->a = 53;
printf("func1: address: %p\n", &arg_ptr->a);
printf("func1: new value: %d\n", arg_ptr->a);
}
void expore_proc_mem(void (*fn)(void *), void * data) {
off_t frame_pointer, stack_start;
char buffer[PAGE_SIZE];
const char * path = "/proc/self/mem";
int child_pid, status;
int parent_to_child[2];
int child_to_parent[2];
arg_t * arg_ptr;
off_t child_offset;
asm volatile ("mov %%"BP", %0" : "=m" (frame_pointer));
stack_start = PAGE_ROUND_DOWN(frame_pointer);
printf("Stack_start: %lx\n",
(unsigned long)stack_start);
arg_ptr = (arg_t *)data;
child_offset =
OFFSET_IN_PAGE((off_t)&arg_ptr->a);
printf("Address of arg_ptr->a: %p\n",
&arg_ptr->a);
pipe(parent_to_child);
pipe(child_to_parent);
bool msg;
int child_mem_fd;
char child_path[0x20];
child_pid = fork();
if (child_pid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (!child_pid) {
close(child_to_parent[0]);
close(parent_to_child[1]);
printf("CHILD (pid %d, parent pid %d).\n",
getpid(), getppid());
fn(data);
msg = true;
write(child_to_parent[1], &msg, 1);
child_mem_fd = open("/proc/self/mem", O_RDONLY);
if (child_mem_fd == -1) {
perror("open (child)");
exit(EXIT_FAILURE);
}
printf("CHILD: child_mem_fd: %d\n", child_mem_fd);
if (lseek(child_mem_fd, stack_start, SEEK_SET) == (off_t)-1) {
perror("lseek");
exit(EXIT_FAILURE);
}
if (read(child_mem_fd, buffer, sizeof(buffer))
!= sizeof(buffer)) {
perror("read");
exit(EXIT_FAILURE);
}
printf("CHILD: new value %d\n",
*(int *)(buffer + child_offset));
read(parent_to_child[0], &msg, 1);
exit(EXIT_SUCCESS);
}
else {
printf("PARENT (pid %d, child pid %d)\n",
getpid(), child_pid);
printf("PARENT: child_offset: %lx\n",
child_offset);
read(child_to_parent[0], &msg, 1);
printf("PARENT: message from child: %d\n", msg);
snprintf(child_path, 0x20, "/proc/%d/mem", child_pid);
printf("PARENT: child_path: %s\n", child_path);
child_mem_fd = open(path, O_RDONLY);
if (child_mem_fd == -1) {
perror("open (child)");
exit(EXIT_FAILURE);
}
printf("PARENT: child_mem_fd: %d\n", child_mem_fd);
if (lseek(child_mem_fd, stack_start, SEEK_SET) == (off_t)-1) {
perror("lseek");
exit(EXIT_FAILURE);
}
if (read(child_mem_fd, buffer, sizeof(buffer))
!= sizeof(buffer)) {
perror("read");
exit(EXIT_FAILURE);
}
printf("PARENT: new value %d\n",
*(int *)(buffer + child_offset));
close(child_mem_fd);
printf("ENDING CHILD PROCESS.\n");
write(parent_to_child[1], &msg, 1);
if (waitpid(child_pid, &status, 0) == -1) {
perror("waitpid");
exit(EXIT_FAILURE);
}
}
}
int main(void) {
arg_t arg;
arg.a = 42;
printf("In main: address of arg.a: %p\n", &arg.a);
explore_proc_mem(&func1, &arg.a);
return EXIT_SUCCESS;
}
This program produces the output below. Notice that the value of a (boldfaced) differs between parent's and child's reading of the /proc/child_pid/mem file.
In main: address of arg.a: 0x7ffffe1964f0
Stack_start: 7ffffe196000
Address of arg_ptr->a: 0x7ffffe1964f0
PARENT (pid 20376, child pid 20377)
PARENT: child_offset: 4f0
CHILD (pid 20377, parent pid 20376).
func1: old value: 42
func1: address: 0x7ffffe1964f0
func1: new value: 53
PARENT: message from child: 1
CHILD: child_mem_fd: 4
PARENT: child_path: /proc/20377/mem
CHILD: new value 53
PARENT: child_mem_fd: 7
PARENT: new value 42
ENDING CHILD PROCESS.
There's one silly mistake in this code:
const char * path = "/proc/self/mem";
...
snprintf(child_path, 0x20, "/proc/%d/mem", child_pid);
printf("PARENT: child_path: %s\n", child_path);
child_mem_fd = open(path, O_RDONLY);
So you always end up reading parent's memory here. However after changing this, I get:
CHILD: child_mem_fd: 4
CHILD: new value 53
read (parent): No such process
And I don't know why it could happen - maybe /proc is too slow in refreshing the entries? (it's from perror("read") in the parent - had to add a comment to see which one fails) But that seems weird, since the seek worked - as well as open itself.
That question doesn't seem to be new either: http://lkml.indiana.edu/hypermail/linux/kernel/0007.1/0939.html (ESRCH is "no such process")
Actually a better link is: http://www.webservertalk.com/archive242-2004-7-295131.html - there was an issue with marking processes pthread-attach-safe. You can find there Alan Cox sending someone to Solar Designer... for me that spells "here be dragons" and that it's not solvable if you don't hack kernels in your sleep :(
Maybe it's enough for you to check what is gdb doing in that case and replicating it? (Probably it just goes via ptrace(PTRACE_PEEKDATA,...))
The solution is to use ptrace to synchronize parent with child. Even though I am already communicating between parent and child (and the man page for ptrace says that it causes the two processes to behave as if they were parent and child), and even though the child is blocking on the read call, the child has apparently not "stopped" enough for Linux to allow the parent to read the child's /proc/child_pid/mem file. But if the parent first calls ptrace (after it receives the message over the pipe) with PTRACE_ATTACH, then it can open the file--and get the correct contents! Then the parent calls ptrace again, with PTRACE_DETACH, before sending the message back to the child to terminate.

Resources