I am using MPI_Probe to send messages dynamically (where the receiver does not know the size of the message being sent). My code looks somewhat like this -
if (world_rank == 0) {
int *buffer = ...
int bufferSize = ...
MPI_Send(buffer, buffersize, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Status status;
MPI_Probe(0, 0, MPI_COMM_WORLD, &status);
int count = -1;
MPI_Get_count(&status, MPI_INT, &count);
int* buffer = (int*)malloc(sizeof(int) * count);
MPI_Recv(buffer, count, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
If I am running this code in multiple threads, is there a chance that MPI_Probe gets called in one thread and the MPI_recv gets called in another thread because of the scheduler interleaving the threads. In essence, is the above code thread-safe.
First of all, MPI is not thread-safe by default. You'll have to check if your particular library has been compiled for thread safety and then initialize MPI using MPI_Init_thread instead of MPI_Init.
Supposing that your MPI instance is initialized for thread-safe routines, your code is still not thread safe due to the race-condition you already identified.
The pairing of MPI_Probe and MPI_Recv in a multi-threaded environment is not thread safe, this is a known problem in MPI-2: http://htor.inf.ethz.ch/publications/img/gregor-any_size-mpi3.pdf
There are at least two possible solutions. You can either use MPI-3 MPI_Mprobe and MPI_MRecv, or use a lock/mutex around the critical code. This could look as follows:
MPI-2 solution (using a mutex/lock):
int number_amount;
if (world_rank == 0) {
int *buffer = ...
int bufferSize = ...
MPI_Send(buffer, buffersize, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Status status;
int count = -1;
/* aquire mutex/lock */
MPI_Probe(0, 0, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &count);
int* buffer = (int*)malloc(sizeof(int) * count);
MPI_Recv(buffer, count, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* release mutex/lock */
}
MPI-3 solution:
int number_amount;
if (world_rank == 0) {
int *buffer = ...
int bufferSize = ...
MPI_Send(buffer, buffersize, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Status status;
MPI_Message msg;
int count = -1;
MPI_Mprobe(0, 0, MPI_COMM_WORLD, &msg, &status);
MPI_Get_count(&status, MPI_INT, &count);
int* buffer = (int*)malloc(sizeof(int) * count);
MPI_Mrecv(buffer, count, MPI_INT, &msg, MPI_STATUS_IGNORE);
}
Related
I am starting to learn MPI and I was following a tutorial and wrote two files in C. The compilation and running of the first file is fine, but when perform the same thing on the second file, it wouldn't work. And after I encountered the error, now I cant run neither of the files even if I recompiled them.
I could not find a solution to my problem anywhere on the web, including on stackoverflow. This post was the closest that i come across, but it didnot provide any solution. Error occurred in MPI_Send on communicator MPI_COMM_WORLD MPI_ERR_RANK:invalid rank
First file :
#include <stdio.h>
#include <mpi.h>
Int main (int argc, char *argv){
//Initialize the MPI environment
MPI_Init(NULL, NULL);
//find out rank, size
int world_rank;
MPI_Comm_rank (MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size (MPI_COMM_WORLD, &world_size);
int number;
if (world_rank == 0){
number = -1;
MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
}
else if (world_rank == 1){
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n", number);
}
//Finalise the MPI environment
MPI_Finalize();
}
Second File :
#include <stdio.h>
#include <mpi.h>
int main (int argc, char **argv){
//Initialize the MPI environment
MPI_Init(NULL, NULL);
//find out rank, size
int world_rank;
MPI_Comm_rank (MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size (MPI_COMM_WORLD, &world_size);
int X, Y, Z;
if (world_rank == 0){
scanf("%d", &X);
scanf("%d", &Y);
MPI_Send(&X, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
MPI_Send(&Y, 1, MPI_INT, 1, 1, MPI_COMM_WORLD);
MPI_Recv(&Z, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 2\n", Z);
}
else if (world_rank == 1){
MPI_Recv(&X, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Recv(&Y, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
Z = X + Y;
MPI_Send(&Z, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
}
//Finalise the MPI environment
MPI_Finalize();
}
ERROR MESSAGES :
[ubuntu:2638] *** An error occurred in MPI_Send
[ubuntu:2638] *** reported by process [135921665,0]
[ubuntu:2638] *** on communicator MPI_COMM_WORLD
[ubuntu:2638] *** MPI_ERR_RANK: invalid rank
[ubuntu:2638] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[ubuntu:2638] *** and potentially your MPI job)
UPDATE:
Here is the command line that i used
mpicc -o 123 file1.c
mpirun 123
This was ok for the first time, but not after
mpicc -o 123 file2.c
mpirun 123
This was where i first encountered the error
I have a process with multiple threads. I have registered prepare function and parent handler using __register_atfork(blocksigprof,restoresigprof,NULL,NULL);
function.
Now let us assume that two threads call fork at the same time. And I have a counter increment in blocksigprof and counter decrement in restoresigprof.
Considering above scenario, will the blocksigprof and restoresigprof be called in pair always?
Is there any locking mechanism which inherently done in __register_atfork.
#define NUM_THREADS 8
static int go=0;
static int exec = 1;
static int ev_od = 0;
static void *
test_thread (void *arg) {
int j;
pid_t c, d;
while(!go) // Wait, so that all threads are here.
continue;
// All will fork, hopefully at same time because of go signal wait.
while(exec) {
c = fork();
if (c < 0) {
printf("SANJAY: fork() failed.\n");
exit(1);
} else if (c == 0) { // Child
exit(0);
}
else { // parent
d = waitpid(c, NULL, 0);
}
}
return NULL;
}
extern int __register_atfork(void (*)(void),void (*)(void),void (*)(void),void *);
static sigset_t s_new;
static sigset_t s_old;
static int count = 0;
static void blocksigprof(void){
count++;
#ifdef SYS_gettid
pid_t tid = syscall(SYS_gettid);
if (tid % 2) {
printf("sleep.\n");
usleep(1);
}
#else
#error "SYS_gettid unavailable on this system"
#endif
printf("Pre-fork. Count should be one. %d\n", count);
}
static void restoresigprof(void){
printf("Post-fork. Count should be one. %d\n", count);
count--;
}
int
main () {
pthread_t t[NUM_THREADS];
void *ptr;
long size = 500 * 1024 * 1024;
int i, m;
volatile int result = 0;
int g_iters = 100;
(void) __register_atfork(blocksigprof,restoresigprof,NULL,NULL);
// Increase size, so fork takes time.
printf("SANJAY: Increasing process size.\n");
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
// Create threads.
for (i = 0; i < NUM_THREADS; ++i) {
pthread_create(&t[i], NULL, test_thread, NULL);
}
printf("SANJAY: Killing time.\n");
// Kill time, so that all threads are at same place post it, waiting for go. 100M cycles.
for (m = 0; m < 1000000; ++m)
for (i = 0; i < g_iters; ++i )
result ^= i;
// Give all threads go at same time.
printf("SANJAY: Let threads execute.\n");
go = 1;
usleep(10000000); // Wait for 10 sec.
exec = 0;
// Wait for all threads to finish.
for (i = 0; i < NUM_THREADS; ++i) {
pthread_join(t[i], NULL);
}
printf("SANJAY: Done.\n");
return 0;
}
pthread_atfork specification doesn't require its implementation to serialize calls to prepare and parent handlers, so a safe assumption is that there is no syncronization.
glibc implementation does lock an internal mutex that prevents multiple threads from entering the handlers in parallel. However, that is an implementation detail. The comments in the code say that such an implementation is not POSIX-compliant because POSIX requires pthread_atfork to be async-signal-safe, and using a mutex there makes it not async-signal-safe.
To make your code robust, I recommend using atomics or a mutex to protect your shared state from race condition.
I am working on a MPI I/O problem. Rank 0 reads the position from a parameter file and then sends to Rank 1, 2, 3. All these processes(1,2,3) will get the text from the reading file according to the position Rank 0 gave them and write in different lines in a writing file. When I run the program in one single computer, everything is ok. But when I use 2 computers(still 4 processes, Rank 0,1 on server while Rank 1,2 on client), some random lines of the output file has gone missing! Here is my code
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
//define the message
#define MSG_MISSION_COMPLETE 78
#define MSG_EXIT 79
//define a structural message of MPI
int array_of_blocklengths[3] = { 1, 1, 1 };
MPI_Aint array_of_displacements[3] = { 0, sizeof(float), sizeof(float) + sizeof(int) };
MPI_Datatype array_of_types[3] = {MPI_FLOAT, MPI_FLOAT, MPI_INT};
MPI_Datatype location;
int master();
int slave(MPI_File fhr, MPI_File fhw);
int main(int argc, char* argv[])
{
int rank;
MPI_File fhr, fhw;
char read[] = "./sharedReadSample1.txt";
char write[] = "./sharedWriteSample1.txt";
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("%d is speaking\n", rank);
MPI_File_open(MPI_COMM_WORLD, read, MPI_MODE_RDONLY, MPI_INFO_NULL, &fhr);
MPI_File_open(MPI_COMM_WORLD, write, MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &fhw);
if (rank == 0)//rank 0, dispatch the tasks
master();
else//other processes
slave(fhr, fhw);
MPI_Finalize();
printf("%d said byebye\n", rank);
MPI_File_close(&fhr);
MPI_File_close(&fhw);
return 0;
}
int master()//master, read the parameters, send them to other slave processes, get the message of task finishing, arrange next task to the slave who completed the task
{
int i, size, firstmsg, nslave;
int buf[256];
struct{
float pause;//pause time
int stand;//starting position in the file
int offset;//offset
}buf_str[10000] = { {0.0,0,0} };
MPI_Comm_size(MPI_COMM_WORLD, &size);
nslave = size - 1;//the number of slaves
FILE* fp;
FILE* fpm;//for log
fp = fopen("sharedAttributeSample1.txt", "rb");
if (fp == NULL)
{
printf("The file was not opened\n");
getchar();
//send a quit message to slaves, use the tag to tell them(>10000)
for (i = 10000; i < 10000 + nslave; i++)
{
buf[0] = MSG_EXIT;
MPI_Send(&buf[0], 1, MPI_INT, i - 10000 + 1, i, MPI_COMM_WORLD);
}
return 0;
}
else
printf("The file was opened\n");
fpm = fopen("./logs/log_master.txt","wb");
if (fpm == NULL)
printf("master log system failed to load!\n");
for (i = 0; i < 10000;i++)
{
fscanf(fp,"%f,%d,%d", &buf_str[i].pause, &buf_str[i].stand, &buf_str[i].offset);
}
MPI_Status status;
MPI_Type_struct(3, array_of_blocklengths, array_of_displacements, array_of_types, &location);
MPI_Type_commit(&location);
for (i = 0; i < nslave; i++)
{
MPI_Send(&buf_str[i], 1, location, i+1, i, MPI_COMM_WORLD);
fprintf(fpm, "initial message %d sent\n",i);
}
for (i = nslave; i < 10000; i++)
{
MPI_Recv(buf, 256, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);//receive messages from slaves
fprintf(fpm, "task %d complete massage received\n",status.MPI_TAG);
if (buf[0] == MSG_MISSION_COMPLETE)//send next task
{
firstmsg = status.MPI_SOURCE;
fprintf(fpm, "task %d is sent to %d \n", i, firstmsg);
MPI_Send(&buf_str[i], 1, location, firstmsg, i, MPI_COMM_WORLD);
}
}
for (i = 10000; i < 10000+nslave; i++)//send quitting message
{
buf[0] = MSG_EXIT;
MPI_Send(&buf_str[0], 1, location, i-10000+1, i, MPI_COMM_WORLD);
}
fclose(fp);
fclose(fpm);
return 0;
}
int slave(MPI_File fhr, MPI_File fhw)
{
struct{
float pause;
int stand;
int offset;
}buf_str;
char buf[256];
int buf_s[256];
int rank, size, nslave, i=0;
char name[30];
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
nslave = size - 1;
FILE* fps[nslave];
//open their own logging pointers
for(i=0;i<nslave;i++)
{
if(i == rank-1)
{
sprintf(name,"./logs/logfile_slave%d",i+1);
fps[i] = fopen(name, "w");
if(fps[i] == NULL)
printf("failed to open logfile of slave %d\n", i+1);
break;
}
}
MPI_Status status;
MPI_Status status_read;
MPI_Status status_write;
MPI_Type_struct(3, array_of_blocklengths, array_of_displacements, array_of_types, &location);
MPI_Type_commit(&location);
while (1)
{
//receive the message from master
MPI_Recv(&buf_str, 1, location, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
fprintf(fps[i], "process %d message %d received\n",rank,status.MPI_TAG);
if (status.MPI_TAG < 10000){//if it is a task
sleep(buf_str.pause);//sleep, to simulate a computing process
fprintf(fps[i], "process %d sleep for %f seconds\n", rank, buf_str.pause);
//read from the position given
MPI_File_read_at(fhr, buf_str.stand, buf, buf_str.offset, MPI_CHAR, &status_read);
buf[buf_str.offset] = '\n';//need a \n
MPI_File_write_at(fhw, status.MPI_TAG*(buf_str.offset+1), buf, buf_str.offset+1, MPI_CHAR, &status_write);
fprintf(fps[i], "%d has done task %d\n", rank, status.MPI_TAG);
//send task complete message to master
buf_s[0] = MSG_MISSION_COMPLETE;
MPI_Send(&buf_s, 1, MPI_INT, 0, status.MPI_TAG, MPI_COMM_WORLD);
}
else
break;
}
fclose(fps[i]);
return 0;
}
I try to use a SOCK_SEQPACKET socket with this:
int rc, len;
int worker_sd, pass_sd;
char buffer[80];
struct iovec iov[1];
struct msghdr msg;
memset(&msg, 0, sizeof(msg));
memset(iov, 0, sizeof(iov));
iov[0].iov_base = buffer;
iov[0].iov_len = sizeof(buffer);
msg.msg_iov = iov;
msg.msg_iovlen = 1;
if((socket_fd = socket(AF_UNIX, SOCK_SEQPACKET, 0)) < 0)
{
perror("server: socket");
exit -1;
}
memset(&server_address, 0, sizeof(server_address));
server_address.sun_family = AF_UNIX;
strcpy(server_address.sun_path, "/mysocket");
unlink("/mysocket");
if(bind(socket_fd, (const struct sockaddr *) &server_address, sizeof(server_address)) < 0)
{
close(socket_fd);
perror("server: bind error");
return 1;
}
while(1)
{
printf("wait for message\n");
bytes_received = recvmsg(socket_fd, &msg, MSG_WAITALL);
printf("%d bytes\n", bytes_received);
}
The problem is that the process does not wait but receives -1 from recvmsg and loops forever. Nowhere in the manpages is there any reference what functions shall be used with SOCK_SEQPACKET-style sockets, for example I am not really sure whether recvmsg is even the correct function.
SOCK_SEQPACKET is connection-orientated so you must first accept a connection then do your IO on the accepted client socket.
recvmsg() returns -1 when an error has occured - errno will be set to the error number.
Read here: http://pubs.opengroup.org/onlinepubs/009695399/functions/recvmsg.html
When i tried to call multiple MPI_Send or MPI_Recv in the program, the executable is getting hanged in the nodes and the root. ie, when it is trying to execute the second MPI_Send or MPI_Recv, the communication is getting blocked. At the same time the binaries are running at 100% in the machines.
When i tried to run this code in windows 7 64 bit with OpenMPI 1.6.3 64-bit, it ran successfully. But the same code is not working in Linux, ie, CentOS 6.3 x86_64 with OpenMPI 1.6.3 -64 bit. What is the problem i have done.
Posting the code below
#include <mpi.h>
int main(int argc, char** argv) {
MPI::Init();
int rank = MPI::COMM_WORLD.Get_rank();
int size = MPI::COMM_WORLD.Get_size();
char name[256] = { };
int len = 0;
MPI::Get_processor_name(name, len);
printf("Hi I'm %s:%d\n", name, rank);
if (rank == 0)
{
while (size >= 1)
{
int val, stat = 1;
MPI::Status status;
MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, 1, 0, status);
int source = status.Get_source();
printf("%s:%d received %d from %d\n", name, rank, val, source);
MPI::COMM_WORLD.Send(&stat, 1, MPI::INT, 1, 2);
printf("%s:%d sent status %d\n", name, rank, stat);
size--;
}
} else
{
int val = rank + 10;
int stat = 0;
printf("%s:%d sending %d...\n", name, rank, val);
MPI::COMM_WORLD.Send(&val, 1, MPI::INT, 0, 0);
printf("%s:%d sent %d\n", name, rank, val);
MPI::Status status;
MPI::COMM_WORLD.Recv(&stat, 1, MPI::INT, 0, 2, status);
int source = status.Get_source();
printf("%s:%d received status %d from %d\n", name, rank, stat, source);
}
size = MPI::COMM_WORLD.Get_size();
if (rank == 0)
{
while (size >= 1)
{
int val, stat = 1;
MPI::Status status;
MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, 1, 1, status);
int source = status.Get_source();
printf("%s:0 received %d from %d\n", name, val, source);
size--;
}
printf("all workers checked in!\n");
}
else
{
int val = rank + 10 + 5;
printf("%s:%d sending %d...\n", name, rank, val);
MPI::COMM_WORLD.Send(&val, 1, MPI::INT, 0, 1);
printf("%s:%d sent %d\n", name, rank, val);
}
MPI::Finalize();
return 0;
}
Hi Hristo, I have changed the source as you said and the code is again posting
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv)
{
int iNumProcess = 0, iRank = 0, iNameLen = 0, n;
char szNodeName[MPI_MAX_PROCESSOR_NAME] = {};
MPI_Status stMPIStatus;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &iNumProcess);
MPI_Comm_rank(MPI_COMM_WORLD, &iRank);
MPI_Get_processor_name(szNodeName, &iNameLen);
printf("Hi I'm %s:%d\n", szNodeName, iRank);
if (iRank == 0)
{
int iNode = 1;
while (iNumProcess > 1)
{
int iVal = 0, iStat = 1;
MPI_Recv(&iVal, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &stMPIStatus);
printf("%s:%d received %d\n", szNodeName, iRank, iVal);
MPI_Send(&iStat, 1, MPI_INT, iNode, 1, MPI_COMM_WORLD);
printf("%s:%d sent Status %d\n", szNodeName, iRank, iStat);
MPI_Recv(&iVal, 1, MPI_INT, MPI_ANY_SOURCE, 2, MPI_COMM_WORLD, &stMPIStatus);
printf("%s:%d received %d\n", szNodeName, iRank, iVal);
iNumProcess--;
iNode++;
}
printf("all workers checked in!\n");
}
else
{
int iVal = iRank + 10;
int iStat = 0;
printf("%s:%d sending %d...\n", szNodeName, iRank, iVal);
MPI_Send(&iVal, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
printf("%s:%d sent %d\n", szNodeName, iRank, iVal);
MPI_Recv(&iStat, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &stMPIStatus);
printf("%s:%d received status %d\n", szNodeName, iRank, iVal);
iVal = 20;
printf("%s:%d sending %d...\n", szNodeName, iRank, iVal);
MPI_Send(&iVal, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
printf("%s:%d sent %d\n", szNodeName, iRank, iVal);
}
MPI_Finalize();
return 0;
}
i got the output as folows. ie, after the send send/receive, root is infinitely waiting and the nodes are ruing with 100% CPU utilisation. Its output is giving below
Hi I'm N1433:1
N1433:1 sending 11...
Hi I'm N1425:0
N1425:0 received 11
N1425:0 sent Status 1
N1433:1 sent 11
N1433:1 received status 11
N1433:1 sending 20...
Here N1433 and N1425 are machine names. Please help
The code for the master is wrong. It is always sending to and awaiting messages from the same rank - rank 1. Thus the program would only function correctly if run as mpiexec -np 2 .... What you've probably wanted to do is to use MPI_ANY_SOURCE as the source rank and then use that source rank as the destination in the send operation. You shouldn't also use while (size >= 1) since rank 0 is not talking to itself and the number of communications is expected to be one less than size.
if (rank == 0)
{
while (size > 1)
// ^^^^^^^^
{
int val, stat = 1;
MPI::Status status;
MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, MPI_ANY_SOURCE, 0, status);
// Use wildcard source here ------------^^^^^^^^^^^^^^
int source = status.Get_source();
printf("%s:%d received %d from %d\n", name, rank, val, source);
MPI::COMM_WORLD.Send(&stat, 1, MPI::INT, source, 2);
// Send back to the same process --------^^^^^^
printf("%s:%d sent status %d\n", name, rank, stat);
size--;
}
} else
Doing something like this in the worker is pointless:
MPI::Status status;
MPI::COMM_WORLD.Recv(&stat, 1, MPI::INT, 0, 2, status);
// Source rank is fixed here ------------^
int source = status.Get_source();
printf("%s:%d received status %d from %d\n", name, rank, stat, source);
You have already specified rank 0 as the source in the receive operation so it would only be able to receive messages from rank 0. There is no way that status.Get_source() would return any value other than 0, unless some communication error had occurred, in which case an exception would get thrown by MPI::COMM_WORLD.Recv().
The same is also true for the second loop in your code.
By the way, your are using what used to be the official standard C++ bindings. They were deprecated in MPI-2.2 and the latest version of the standard (MPI-3.0) removed them completely as no longer supported by the MPI Forum. You should be using the C bindings instead or rely on 3-rd party C++ interfaces like Boost.MPI.
After installing and MPICH2 instead of OpenMPI, it worked successfully. I think there is some problem in using OpenMPI 1.6.3 in my cluster machines.