Why wont MPI finalize gracefully? - linux

Whenever I try to finalize my mpi program, i get errors similar to the following.
[mpiexec] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents & ~POLLIN & ~POLLOUT & ~POLLHUP)) failed
[mpiexec] main (./pm/pmiserv/pmip.c:221): demux engine error waiting for event
[mpiexec] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:99): one of the processes terminated badly; aborting
[mpiexec] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error waiting for completion
[mpiexec] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:352): bootstrap server returned error waiting for completion
[mpiexec] main (./ui/mpich/mpiexec.c:294): process manager error waiting for completion
Sometimes, it gets a glibc "double free or corruption" error instead. Each process is single-threaded, and each process is for sure calling MPI_Finalize(). Any idea what could be going wrong here?

I've written a small test programm that should exit without any errors. Please try to run it. If it exits gracefully, then the problem is with your code.
#include <mpi.h>
#include <cstdio>
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int my_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
int finalize_retcode = MPI_Finalize();
if(0 == my_rank) fprintf(stderr, "Process, return_code\n");
fprintf(stderr, "%i, %i\n", my_rank, finalize_retcode);
return 0;
}

I just ran into a similar problem.
MPI_Request* req = (MPI_Request*) malloc(sizeof(MPI_Request)*2*numThings*numItems);
int count;
for( item in items ) {
count = 0;
for( thing in things ) {
MPI_Irecv(<sendBufF>, 1, MPI_INT, <src>, <tag>, MPI_COMM_WORLD, &req[count++]);
MPI_Isend(<recvBufF>, 1, MPI_INT, <dest>, <tag>, MPI_COMM_WORLD, &req[count++]);
}
}
MPI_Status* stat = (MPI_Status*) malloc(sizeof(MPI_Status)*2*numThings*numItems);
MPI_Waitall(count, req, stat);
The call to MPI_Waitall(...) is made with a value of count that is less then the number of Isend and recv's performed; which results in messages not being received. Moving count=0 outside the for loops resolved the MPI_Finalize(...) error.

Related

why is msg queue not created

I am learning message queues, wrote code to create message queue
#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <stdlib.h>
#include <errno.h>
int main()
{
key_t key;
int msgid;
key = ftok("proj", 64);
if (key == -1) {
perror("ftok failed");
exit(1);
}
printf("key:%x\n", key);
//IPC_CREAT: creating message queue if not exists
msgid = msgget(key, IPC_CREAT);
if (msgid == -1) {
perror("msgget failed");
printf("errno:%d\n", errno);
if (errno == ENOENT)
printf("No message queue exists for key and msgflg did not specify IPC_CREAT\n");
exit(2);
}
printf("msgid:%x\n", msgid);
return 0;
}
Running the command did not show output: ipcs -q
panther2#ubuntu:~/c_codes/msg_queue$ ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
Can you guys please tell me if i am making any mistake
As I see there is nothing wrong with your code, but the behavior is really strange, even on my system.
As mssget returns 0, everything is ok ( it should return a non negative number which 0 is ) and the queue can be used.
I added a for(;;); at the end of your prog and start it again. ipcs now shows:
0x4025077b 0 krud 0 0 0
After I ipcrm -q 0 and start the program again, I got a new id for each run. I now removed the endless loop and all and everything still works, every run I got a message queue with different number which I always have to destroy before next run.
That is really strange!
I found a lot reports on that topic, e.g.:
https://www.unix.com/programming/248572-msgget-2-returns-0-workaround-fix.html
http://forums.codeguru.com/showthread.php?403036-strange-problem-in-using-msgget%28%29-in-Linux
Keep us informed if you have found a valid solution!
As my system now generates at every run a new message queue with a id > 0, I can't reproduce this behavior anymore. I did not want to reboot again ;)

Sending SIGINT to QProcess

I want to send SIGINT to a program started using QProcess.
I am working on ubuntu.
Source code of my process looks like this:
#include <iostream>
#include <csignal>
#include <stdlib.h>
#include <unistd.h>
void int_handle(int sig)
{
std::cout<<"Received SIGINT\n";
exit(0);
}
int main()
{
std::cout<<"Main called\n";
signal(SIGINT, int_handle);
while(1)
{
std::cout<<"Sleeping.....\n";
sleep(1);
}
return 0;
}
Compiled this program and generated executable my_prog
my Qprocess looks as shown below
QProcess* process= new Qprocess();
QString command = "my_prog";
process->start(command);
process->waitForStarted();
Based on some event I tried sending SIGINT in following ways
process->kill();
process->close();
process->write("0x03");
process->terminate();
kill(process->pid(), SIGINT);
QString command = kill -9 <PID>;
QByteArray ba = command.toLatin1();
system(ba.data());
Even after trying all these things I am not able to receive the SIGINT in my program.
Please help me in finding the correct way to implement this.
EDIT1: Updated the example program.
I tried to explain the problem and ignored syntax errors in the example.
Sorry for that.
Thanks in advance.
Besides from several syntax errors/typos in your example, which will prevent the code from even compiling, the program which you try to kill has two and a half issues:
The signal handler has the wrong signature, it receives an integer parameter as shown in the manpage. This won't even compile with g++.
In main no event loop or similar is started. Thus when you execute the binary, it registers the signal handler, and exits immediately after that, because signal() is non-blocking.
From the signal() manpage:
Avoid its use: use sigaction(2) instead.
Edit
Point 1 and 2 are obsoleted by EDIT1 of OP, point 3 remains.
As pointed out by Murphy, QProcess captures stdout/stderr and makes it available through a QIODevice interface. If you don't forward the subprocess output to the parent process, you won't see any output.
After forwarding the process channels, you must also send the correct signal if you want your signal handler to be called. The process->kill() sends a SIGKILL not a SIGINT, so your signal handler wouldn't be invoked. Most of your examples for killing the subprocess are sending the wrong signal.
Finally, be sure that your command is actually starting. I had to specify a relative local path ./my_prog in order to have the process start successfully.
Here is some code based on your incomplete example that works for me:
#include <QProcess>
#include <QDebug>
#include <unistd.h>
#include <csignal>
int main(int argc, char *argv[])
{
QProcess *process = new QProcess();
// Start process from local directory
QString command = "./my_prog";
// Forward output of process to parent stdout/stderr
process->setProcessChannelMode(QProcess::ForwardedChannels);
process->start(command);
// Ensure process starts successfully; wait indefinitely
if(process->waitForStarted(-1))
{
qDebug() << "Process started.";
// Wait a little before sending signal
sleep(1);
// Send the correct signal
kill(process->pid(), SIGINT);
} else {
qDebug() << "Failed to start process.";
}
}

Can i read the exit value from the console application?

I want to read the exit value from my console application to exit all the related threads with that application before exit.
This work for me on Windows try it
#include <csignal>
#include <iostream>
#include <ostream>
#include <string>
using namespace std;
namespace
{
volatile sig_atomic_t exit;
void signal_handler(int sig)
{
signal(sig, signal_handler);
exit= 1;
}
}
int main()
{
signal(SIGINT, signal_handler);
signal(SIGTERM, signal_handler);
#ifdef SIGBREAK
signal(SIGBREAK, signal_handler);
#endif
while (!exit)
{
/* do something */
}
// Catch signal here
}
Take a look at https://stackoverflow.com/questions/298498/c-console-breaking. The standard library you need to use is csignal
What you can do is register for signals which force your app to close (SIGTERM) and perform logic there, like exiting your multiple threads. This post suggests that this should work with windows as well.
You could also register a function with atexit which seems to catch normal exit from main() etc, not sure if closing the terminal will count as "normal exit".
Edit: Ok so it seems you want to be notified as soon as the process exits. Sorry, I misread your question due to the term "exit value". Well if you start the process via CreateProcess() API, you should be able to do WaitForSingleObject() on the handle. This function will block until the process exited. So you can place all the code which you want to be executed after the process stopped after this call, and all should be fine.
If you in fact want the exit code of a process (return X in main()):
Programmatically, you can use GetExitCodeProcess() from WinAPI:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683189(v=vs.85).aspx
In the shell, use the %errorlevel% variable.

How to create a process in linux

I'm trying to create a process in linux, however I keep getting an error. In my c++ code, I just want to open firefox.exe. Here's my code:
//header files
#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <iostream>
using namespace std;
//main function used to run program
int main()
{
//declaration of a process id variable
pid_t pid;
//fork a child process is assigned
//to the process id
pid=fork();
//code to show that the fork failed
//if the process id is less than 0
if(pid<0)
{
fprintf(stderr, "Fork Failed");// error occurred
exit(-1); //exit
}
//code that runs if the process id equals 0
//(a successful for was assigned
else if(pid==0)
{
//this statement creates a specified child process
execlp("usr/bin","firefox",NULL);//child process
}
//code that exits only once a child
//process has been completed
else
{
wait(NULL);//parent will wait for the child process to complete
cout << pid << endl;
printf("Child Complete");
exit(0);
}
}
There is an error for the wait() function. I left this out and tried, but nothing happened.
You have to write:
execlp("/usr/bin/firefox","firefox",NULL);
You also need to put an _exit after execlp in case it fails.
I don't think that you have called execlp correctly.
It isn't going to append "firefox" to "usr/bin". Because it will search the PATH environment variable you can call it with execlp("firefox","firefox",NULL).
Aside: Yes, the exec family of functions allows you to break the nominal guarantee that argv[0] should name the executable. Sorry, that is just the way it is.
To create a process you can use system call, fork call, execl call.
TO know how to create process in linux using these call please follow the following link.
I think it will help you more to understand about process creations with example.
http://www.firmcodes.com/process-in-linux/

Problem waking up multiple threads using condition variable API in win32

I have a problem in understanding how the winapi condition variables work.
On the more specific side, what I want is a couple of threads waiting on some condition. Then I want to use the WakeAllConditionVariable() call to wake up all the threads so that they can do work. Besides the fact that i just want the threads started, there isn't any other prerequisite for them to start working ( like you would have in an n producer / n consumer scenario ).
Here's the code so far:
#define MAX_THREADS 4
CONDITION_VARIABLE start_condition;
SRWLOCK cond_rwlock;
bool wake_all;
__int64 start_times[MAX_THREADS];
Main thread:
int main()
{
HANDLE h_threads[ MAX_THREADS ];
int tc;
for (tc = 0; tc < MAX_THREADS; tc++)
{
DWORD tid;
h_threads[tc] = CreateThread(NULL,0,(LPTHREAD_START_ROUTINE)thread_routine,(void*)tc,0,&tid);
if( h_threads[tc] == NULL )
{
cout << "Error while creating thread with index " << tc << endl;
continue;
}
}
InitializeSRWLock( &cond_rwlock );
InitializeConditionVariable( &start_condition );
AcquireSRWLockExclusive( &cond_rwlock );
// set the flag to true, then wake all threads
wake_all = true;
WakeAllConditionVariable( &start_condition );
ReleaseSRWLockExclusive( &cond_rwlock );
WaitForMultipleObjects( tc, h_threads, TRUE, INFINITE );
return 0;
}
And here is the code for the thread routine:
DWORD thread_routine( PVOID p_param )
{
int t_index = (int)(p_param);
AcquireSRWLockShared( &cond_rwlock );
// main thread sets wake_all to true and calls WakeAllConditionVariable()
// so this thread should start doing the work (?)
while ( !wake_all )
SleepConditionVariableSRW( &start_condition,&cond_rwlock, INFINITE,CONDITION_VARIABLE_LOCKMODE_SHARED );
QueryPerformanceCounter((LARGE_INTEGER*)&start_times[t_index]);
// do the actual thread related work here
return 0;
}
This code does not do what i would expect it to do. Sometimes just one thread finishes the job, sometimes two or three, but never all of them. The main function never gets past the WaitForMultipleObjects() call.
I'm not exactly sure what I've done wrong, but I would assume some synchronization issue somewhere ?
Any help would be appreciated. (sorry if I re-posted older topic with different dressing :)
You initialize the cond_rwlock and start_condition variables too late. Move the code up, before you start the threads. A thread is likely to start running right away, especially on a multi-core machine.
And test the return values of api functions. You don't know why it doesn't work because you never check for failure.

Resources