Cause all processes running under OpenMPI to dump core - linux

I'm running a distributed process under OpenMPI on linux.
When one process dies, mpirun will detect this and kill the other processes. But even though I get a core from the process which died, I don't get a core from the processes killed by OpenMPI.
Why am I not getting these other corefiles? How can I get these corefiles?

The other processes are just killed by Open MPI. They didn't segfault themselves. From an MPI perspective, their execution was erroneous, but from a C perspective, it was fine. As such, there's no reason for them to have dumped core.

OMPI's mpiexec kills the remaining ranks by first sending them SIGTERM and then SIGKILL (should any of them survive SIGTERM). None of those signals results in core being dumped. You could probably install a signal handler for SIGTERM that calls abort(3) in order to force core dumps on kill.
Here is some sample code that works with Open MPI 1.6.5:
#include <stdlib.h>
#include <signal.h>
#include <mpi.h>
void term_handler (int sig) {
// Restore the default SIGABRT disposition
signal(SIGABRT, SIG_DFL);
// Abort (dumps core)
abort();
}
int main (int argc, char **argv) {
int rank;
MPI_Init(&argc, &argv);
// Override the SIGTERM handler AFTER the call to MPI_Init
signal(SIGTERM, term_handler);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// Cause division-by-zero exception in rank 0
rank = 1 / rank;
// Make other ranks wait for rank 0
MPI_Bcast(&rank, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
Open MPI's MPI_Init installs special handlers for some known signals that either print useful debug information or generate backtrace files (.btr). That's why the SIGTERM handler has to be installed after the call to MPI_Init and the default action of SIGABRT (used by abort(3)) has to be restored before calling abort().
Note that the signal handler will appear at the top of the call stack in the core file:
(gdb) bt
#0 0x0000003bfd232925 in raise () from /lib64/libc.so.6
#1 0x0000003bfd234105 in abort () from /lib64/libc.so.6
#2 0x0000000000400dac in term_handler (sig=15) at test.c:8
#3 <signal handler called>
#4 0x00007fbac7ad0bc7 in mca_btl_sm_component_progress () from /path/libmpi.so.1
#5 0x00007fbac7c9fca7 in opal_progress () from /path/libmpi.so.1
...
I would rather recommend that you should use a parallel debugger such as TotalView or DDT, if you have one at your disposal.

Related

Linux - Using mutex to synchonise serial port

I'm writing a C program for Linux OS.
The program can start a timer: both main program and timer can send and receive characters on a serial port.
My attempt is to serialize the serial port access by a mutex in a global structure initialized on the opening with:
if (pthread_mutex_init( &pED->lockSerial, NULL) != 0)
{
lwsl_err("lockSerial init failed\n");
}
I protected all the functions that send data on the port as follow:
ssize_t cmdFirmwareVersion(EngineData *pED)
{
if (pED->fdSerialPort==-1)
return -1;
LOCK_SERIAL;
unsigned char cmd[] = { 0x00, 0x00, 0x7F };
write( pED->fdSerialPort, cmd, sizeof(cmd));
int rx = read ( pED->fdSerialPort, rxbuffer, sizeof rxbuffer);
dump( rxbuffer, rx);
UNLOCK_SERIAL;
return rx;
}
where
#define LOCK_SERIAL if (0!=pthread_mutex_lock(&pED->lockSerial)) {printf("Err lock");return 0;}
#define UNLOCK_SERIAL pthread_mutex_unlock(&pED->lockSerial);
Running the program and starting the timer I see the requests are regular. When I trigger one of this calls on other way (from a rx websocket function) the program hangs and I need to kill it.
Why the entire program stops ??
If a process hangs, it could be because of circular wait for mutexes or holding mutex and trying to lock it again. This could cause deadlock.
ps output will show thread's state as D or S if it's waiting for a resource. it will appear as the process is hung.
D uninterruptible sleep (usually IO)
S interruptible sleep (waiting for an event to complete)
I have made a thread to hold mutex and try to lock it again.
ps output and GDB shows main thread and child thread are in sleep.
xxxx#virtualBox:~$ ps -eflT |grep a.out
0 S root 3982 3982 2265 0 80 0 - 22155 - 20:28 pts/0 00:00:00 ./a.out
1 S root 3982 3984 2265 0 80 0 - 22155 - 20:28 pts/0 00:00:00 ./a.out
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7ffff7fdf740 (LWP 4625) "a.out" 0x00007ffff7bbed2d in __GI___pthread_timedjoin_ex (
threadid=140737345505024, thread_return=0x0, abstime=0x0, block= <optimized out>) at pthread_join_common.c:89
2 Thread 0x7ffff77c4700 (LWP 4629) "a.out" __lll_lock_wait ()
at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
Please check blog Tech Easy for more information on threads.

How to make mprotect() to make forward progress after handling pagefaulte exception? [duplicate]

I want to write a signal handler to catch SIGSEGV.
I protect a block of memory for read or write using
char *buffer;
char *p;
char a;
int pagesize = 4096;
mprotect(buffer,pagesize,PROT_NONE)
This protects pagesize bytes of memory starting at buffer against any reads or writes.
Second, I try to read the memory:
p = buffer;
a = *p
This will generate a SIGSEGV, and my handler will be called.
So far so good. My problem is that, once the handler is called, I want to change the access write of the memory by doing
mprotect(buffer,pagesize,PROT_READ);
and continue normal functioning of my code. I do not want to exit the function.
On future writes to the same memory, I want to catch the signal again and modify the write rights and then record that event.
Here is the code:
#include <signal.h>
#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
char *buffer;
int flag=0;
static void handler(int sig, siginfo_t *si, void *unused)
{
printf("Got SIGSEGV at address: 0x%lx\n",(long) si->si_addr);
printf("Implements the handler only\n");
flag=1;
//exit(EXIT_FAILURE);
}
int main(int argc, char *argv[])
{
char *p; char a;
int pagesize;
struct sigaction sa;
sa.sa_flags = SA_SIGINFO;
sigemptyset(&sa.sa_mask);
sa.sa_sigaction = handler;
if (sigaction(SIGSEGV, &sa, NULL) == -1)
handle_error("sigaction");
pagesize=4096;
/* Allocate a buffer aligned on a page boundary;
initial protection is PROT_READ | PROT_WRITE */
buffer = memalign(pagesize, 4 * pagesize);
if (buffer == NULL)
handle_error("memalign");
printf("Start of region: 0x%lx\n", (long) buffer);
printf("Start of region: 0x%lx\n", (long) buffer+pagesize);
printf("Start of region: 0x%lx\n", (long) buffer+2*pagesize);
printf("Start of region: 0x%lx\n", (long) buffer+3*pagesize);
//if (mprotect(buffer + pagesize * 0, pagesize,PROT_NONE) == -1)
if (mprotect(buffer + pagesize * 0, pagesize,PROT_NONE) == -1)
handle_error("mprotect");
//for (p = buffer ; ; )
if(flag==0)
{
p = buffer+pagesize/2;
printf("It comes here before reading memory\n");
a = *p; //trying to read the memory
printf("It comes here after reading memory\n");
}
else
{
if (mprotect(buffer + pagesize * 0, pagesize,PROT_READ) == -1)
handle_error("mprotect");
a = *p;
printf("Now i can read the memory\n");
}
/* for (p = buffer;p<=buffer+4*pagesize ;p++ )
{
//a = *(p);
*(p) = 'a';
printf("Writing at address %p\n",p);
}*/
printf("Loop completed\n"); /* Should never happen */
exit(EXIT_SUCCESS);
}
The problem is that only the signal handler runs and I can't return to the main function after catching the signal.
When your signal handler returns (assuming it doesn't call exit or longjmp or something that prevents it from actually returning), the code will continue at the point the signal occurred, reexecuting the same instruction. Since at this point, the memory protection has not been changed, it will just throw the signal again, and you'll be back in your signal handler in an infinite loop.
So to make it work, you have to call mprotect in the signal handler. Unfortunately, as Steven Schansker notes, mprotect is not async-safe, so you can't safely call it from the signal handler. So, as far as POSIX is concerned, you're screwed.
Fortunately on most implementations (all modern UNIX and Linux variants as far as I know), mprotect is a system call, so is safe to call from within a signal handler, so you can do most of what you want. The problem is that if you want to change the protections back after the read, you'll have to do that in the main program after the read.
Another possibility is to do something with the third argument to the signal handler, which points at an OS and arch specific structure that contains info about where the signal occurred. On Linux, this is a ucontext structure, which contains machine-specific info about the $PC address and other register contents where the signal occurred. If you modify this, you change where the signal handler will return to, so you can change the $PC to be just after the faulting instruction so it won't re-execute after the handler returns. This is very tricky to get right (and non-portable too).
edit
The ucontext structure is defined in <ucontext.h>. Within the ucontext the field uc_mcontext contains the machine context, and within that, the array gregs contains the general register context. So in your signal handler:
ucontext *u = (ucontext *)unused;
unsigned char *pc = (unsigned char *)u->uc_mcontext.gregs[REG_RIP];
will give you the pc where the exception occurred. You can read it to figure out what instruction it
was that faulted, and do something different.
As far as the portability of calling mprotect in the signal handler is concerned, any system that follows either the SVID spec or the BSD4 spec should be safe -- they allow calling any system call (anything in section 2 of the manual) in a signal handler.
You've fallen into the trap that all people do when they first try to handle signals. The trap? Thinking that you can actually do anything useful with signal handlers. From a signal handler, you are only allowed to call asynchronous and reentrant-safe library calls.
See this CERT advisory as to why and a list of the POSIX functions that are safe.
Note that printf(), which you are already calling, is not on that list.
Nor is mprotect. You're not allowed to call it from a signal handler. It might work, but I can promise you'll run into problems down the road. Be really careful with signal handlers, they're tricky to get right!
EDIT
Since I'm being a portability douchebag at the moment already, I'll point out that you also shouldn't write to shared (i.e. global) variables without taking the proper precautions.
You can recover from SIGSEGV on linux. Also you can recover from segmentation faults on Windows (you'll see a structured exception instead of a signal). But the POSIX standard doesn't guarantee recovery, so your code will be very non-portable.
Take a look at libsigsegv.
You should not return from the signal handler, as then behavior is undefined. Rather, jump out of it with longjmp.
This is only okay if the signal is generated in an async-signal-safe function. Otherwise, behavior is undefined if the program ever calls another async-signal-unsafe function. Hence, the signal handler should only be established immediately before it is necessary, and disestablished as soon as possible.
In fact, I know of very few uses of a SIGSEGV handler:
use an async-signal-safe backtrace library to log a backtrace, then die.
in a VM such as the JVM or CLR: check if the SIGSEGV occurred in JIT-compiled code. If not, die; if so, then throw a language-specific exception (not a C++ exception), which works because the JIT compiler knew that the trap could happen and generated appropriate frame unwind data.
clone() and exec() a debugger (do not use fork() – that calls callbacks registered by pthread_atfork()).
Finally, note that any action that triggers SIGSEGV is probably UB, as this is accessing invalid memory. However, this would not be the case if the signal was, say, SIGFPE.
There is a compilation problem using ucontext_t or struct ucontext (present in /usr/include/sys/ucontext.h)
http://www.mail-archive.com/arch-general#archlinux.org/msg13853.html

Calling "clone()" on linux but it seems to malfunction

A simple test program, I expect it will "clone" to fork a child process, and each process can execute till its end
#include<stdio.h>
#include<sched.h>
#include<unistd.h>
#include<sys/types.h>
#include<errno.h>
int f(void*arg)
{
pid_t pid=getpid();
printf("child pid=%d\n",pid);
}
char buf[1024];
int main()
{
printf("before clone\n");
int pid=clone(f,buf,CLONE_VM|CLONE_VFORK,NULL);
if(pid==-1){
printf("%d\n",errno);
return 1;
}
waitpid(pid,NULL,0);
printf("after clone\n");
printf("father pid=%d\n",getpid());
return 0;
}
Ru it:
$g++ testClone.cpp && ./a.out
before clone
It didn't print what I expected. Seems after "clone" the program is in unknown state and then quit. I tried gdb and it prints:
Breakpoint 1, main () at testClone.cpp:15
(gdb) n-
before clone
(gdb) n-
waiting for new child: No child processes.
(gdb) n-
Single stepping until exit from function clone#plt,-
which has no line number information.
If I remove the line of "waitpid", then gdb prints another kind of weird information.
(gdb) n-
before clone
(gdb) n-
Detaching after fork from child process 26709.
warning: Unexpected waitpid result 000000 when waiting for vfork-done
Cannot remove breakpoints because program is no longer writable.
It might be running in another process.
Further execution is probably impossible.
0x00007fb18a446bf1 in clone () from /lib64/libc.so.6
ptrace: No such process.
Where did I get wrong in my program?
You should never call clone in a user-level program -- there are way too many restrictions on what you are allowed to do in the cloned process.
In particular, calling any libc function (such as printf) is a complete no-no (because libc doesn't know that your clone exists, and have not performed any setup for it).
As K. A. Buhr points out, you also pass too small a stack, and the wrong end of it. Your stack is also not properly aligned.
In short, even though K. A. Buhr's modification appears to work, it doesn't really.
TL;DR: clone, just don't use it.
The second argument to clone is a pointer to the child's stack. As per the manual page for clone(2):
Stacks grow downward on all processors that run Linux (except the HP PA processors), so child_stack usually points to the topmost address of the memory space set up for the child stack.
Also, 1024 bytes is a paltry amount for a stack. The following modified version of your program appears to run correctly:
// #define _GNU_SOURCE // may be needed if compiled as C instead of C++
#include <stdio.h>
#include <sched.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>
int f(void*arg)
{
pid_t pid=getpid();
printf("child pid=%d\n",pid);
return 0;
}
char buf[1024*1024]; // *** allocate more stack ***
int main()
{
printf("before clone\n");
int pid=clone(f,buf+sizeof(buf),CLONE_VM|CLONE_VFORK,NULL);
// *** in previous line: pointer is to *end* of stack ***
if(pid==-1){
printf("%d\n",errno);
return 1;
}
waitpid(pid,NULL,0);
printf("after clone\n");
printf("father pid=%d\n",getpid());
return 0;
}
Also, #Employed Russian is right -- you probably shouldn't use clone except if you're trying to have some fun. Either fork or vfork are more sensible interfaces to clone whenever they meet your needs.

Does a call to MPI_Barrier affect every thread in an MPI process?

Does a call to MPI_Barrier affect every thread in an MPI process or only the thread
that makes the call?
For your information , my MPI application will run with MPI_THREAD_MULTIPLE.
Thanks.
The way to think of this is that MPI_Barrier (and other collectives) are blocking function calls, which block until all processes in the communicator have completed the function. That, I think, makes it a little easier to figure out what should happen; the function blocks, but other threads continue on their way unimpeded.
So consider the following chunk of code (The shared done flag being flushed to communicate between threads is not how you should be doing thread communication, so please don't use this as a template for anything. Furthermore, using a reference to done will solve this bug/optimization, see the end of comment 2):
#include <mpi.h>
#include <omp.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char**argv) {
int ierr, size, rank;
int provided;
volatile int done=0;
MPI_Comm comm;
ierr = MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
if (provided == MPI_THREAD_SINGLE) {
fprintf(stderr,"Could not initialize with thread support\n");
MPI_Abort(MPI_COMM_WORLD,1);
}
comm = MPI_COMM_WORLD;
ierr = MPI_Comm_size(comm, &size);
ierr = MPI_Comm_rank(comm, &rank);
if (rank == 1) sleep(10);
#pragma omp parallel num_threads(2) default(none) shared(rank,comm,done)
{
#pragma omp single
{
/* spawn off one thread to do the barrier,... */
#pragma omp task
{
MPI_Barrier(comm);
printf("%d -- thread done Barrier\n", rank);
done = 1;
#pragma omp flush
}
/* and another to do some printing while we're waiting */
#pragma omp task
{
int *p = &done;
while(!(*p) {
printf("%d -- thread waiting\n", rank);
sleep(1);
}
}
}
}
MPI_Finalize();
return 0;
}
Rank 1 sleeps for 10 seconds, and all the ranks start a barrier in one thread. If you run this with mpirun -np 2, you'd expect the first of rank 0s threads to hit the barrier, and the other to cycle around printing and waiting -- and sure enough, that's what happens:
$ mpirun -np 2 ./threadbarrier
0 -- thread waiting
0 -- thread waiting
0 -- thread waiting
0 -- thread waiting
0 -- thread waiting
0 -- thread waiting
0 -- thread waiting
0 -- thread waiting
0 -- thread waiting
0 -- thread waiting
1 -- thread waiting
0 -- thread done Barrier
1 -- thread done Barrier

Return code when OS kills your process

I've wanted to test if with multiply processes I'm able to use more than 4GB of ram on 32bit O.S (mine: Ubuntu with 1GB ram).
So I've written a small program that mallocs slightly less then 1GB, and do some action on that array, and ran 5 instances of this program vie forks.
The thing is, that I suspect that O.S killed 4 of them, and only one survived and displayed it's "PID: I've finished").
(I've tried it with small arrays and got 5 printing, also when I look at the running processes with TOP, I see only one instance..)
The weird thing is this - I've received return code 0 (success?) in ALL of the instances, including the ones that were allegedly killed by O.S.
I didn't get any massage stating that processes were killed.
Is this return code normal for this situation?
(If so, it reduces my trust in 'return codes'...)
thanks.
Edit: some of the answers suggested possible errors in the small program, so here it is. the larger program that forks and saves return codes is larger, and I have trouble uploading it here, but I think (and hope) it's fine.
Also I've noticed that if instead of running it with my forking program, I run it with terminal using './a.out & ./a.out & ./a.out & ./a.out &' (when ./a.out is the binary of the small program attached)
I do see some 'Killed' messages.
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#define SMALL_SIZE 10000
#define BIG_SIZE 1000000000
#define SIZE BIG_SIZE
#define REAPETS 1
int
main()
{
pid_t my_pid = getpid();
char * x = malloc(SIZE*sizeof(char));
if (x == NULL)
{
printf("Malloc failed!");
return(EXIT_FAILURE);
}
int x2=0;
for(x2=0;x2<REAPETS;++x2)
{
int y;
for(y=0;y<SIZE;++y)
x[y] = (y+my_pid)%256;
}
printf("%d: I'm over.\n",my_pid);
return(EXIT_SUCCESS);
}
Well, if your process is unable to malloc() the 1GB of memory, the OS will not kill the process. All that happens is that malloc() returns NULL. So depending on how you wrote your code, it's possible that the process could return 0 anyway - if you wanted it to return an error code when a memory allocation fails (which is generally good practice), you'd have to program that behavior into it.
What signal was used to kill the processes?
Exit codes between 0 and 127, inclusive, can be used freely, and codes above 128 indicate that the process was terminated by a signal, where the exit code is
128 + the number of the signal used
A process' return status (as returned by wait, waitpid and system) contains more or less the following:
Exit code, only applies if process terminated normally
whether normal/abnormal termination occured
Termination signal, only applies if process was terminated by a signal
The exit code is utterly meaningless if your process was killed by the OOM killer (which will apparently send you a SIGKILL signal)
for more information, see the man page for the wait command.
This code shows how to get the termination status of a child:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int
main (void)
{
pid_t pid = fork();
if (pid == -1)
{
perror("fork()");
}
/* parent */
else if (pid > 0)
{
int status;
printf("Child has pid %ld\n", (long)pid);
if (wait(&status) == -1)
{
perror("wait()");
}
else
{
/* did the child terminate normally? */
if(WIFEXITED(status))
{
printf("%ld exited with return code %d\n",
(long)pid, WEXITSTATUS(status));
}
/* was the child terminated by a signal? */
else if (WIFSIGNALED(status))
{
printf("%ld terminated because it didn't catch signal number %d\n",
(long)pid, WTERMSIG(status));
}
}
}
/* child */
else
{
sleep(10);
exit(0);
}
return 0;
}
Have you checked the return value from fork()? There's a good chance that if fork() can't allocate enough memory for the new process' address space, then it will return an error (-1). A typical way to call fork() is:
pid_t pid;
switch(pid = fork())
{
case 0:
// I'm the child process
break;
case -1:
// Error -- check errno
fprintf(stderr, "fork: %s\n", strerror(errno));
break;
default:
// I'm the parent process
}
Exit code is only "valid" when WIFEXITED macro evaluates to true. See man waitpid(2).
You can use WIFSIGNALED macro to see if your program has been signaled.

Resources