Consumer thread is not synchronized until producer finishes its execution - multithreading

I'm learning C++ (14) and currently I'm working on the concurrency part.
I've written a small producer/consumer example to learn and understand condition_variables.
The idea is for one thread to fill up a vector with numbers and then the other to print it to console:
#include <iostream>
#include <thread>
#include <condition_variable>
#include <vector>
#include <atomic>
std::atomic<bool> done{false};
std::mutex ready_mutex;
std::condition_variable ready;
std::vector<int> numbers;
void produce() {
auto idx = 0;
while( true ) {
++idx;
std::lock_guard<std::mutex> lk(ready_mutex);
for(int i = 0; i < 5; ++i) {
numbers.push_back(i);
}
if( 5 == idx) {
done = true;
break;
}
ready.notify_one();
}
ready.notify_one();
}
void consume() {
while( true ) {
std::unique_lock<std::mutex> lk(ready_mutex);
ready.wait(lk, [](){ return !v.empty(); });
std::cout << "(" << numbers.size() << ")" << std::endl;
for(auto x: numbers) {
std::cout << x << ", ";
}
std::cout << std::endl;
numbers.clear();
std::cout << "(" << numbers.size() << ")" << std::endl;
std::cout.flush();
if(done) break;
}
}
int main() {
std::thread t(produce);
std::thread t2(consume);
t.join();
t2.join();
return 0;
}
The problem is that the program is not showing me the expected output.
What I'm expecting is:
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
But what I get is:
(25)
0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4,
(0)
For what I see, the producer thread runs all iterations in one go and then notifies the consumer thread, yet after a few hours I still don't understand why the producer is not waking up the consumer until the end.
Also, I have noticed that this problem persists if instead of using a global vector, I pass it as a reference to the threads:
void producerer(std::vector<int>& v);
void consumer(std::vector<int>& v);
std::vector<int> numbers;
std::thread t(producer, std::ref(numbers));
std::thread t2(consumerer, std::ref(numbers));
OS: Debian 9
Compilers: g++ 6.3.0, clang++ 3.8.1-24
Compiling flags: -Werror -Wextra -Wall -std=c++14 -O0 -g3 -pthread

The issue here is that when the producer releases the lock, you're expecting the consumer to acquire the lock next. However, because of the while loop in the producer, it's possible that the producer re-acquires the lock before the consumer can. If the producer only "produced" once and then yielded to the consumer, there wouldn't be an issue but since you want a ping-pong effect, you need to ensure that the producer waits its turn in the same way the consumer is waiting its turn.
One way to solve this is with another boolean. I'll call it canProduce and set up the producer in the same way you set up the consumer. I'll also repeat the entire process 15 times in main, to minimize the chance of a fluke false positive, should the producer still not be correctly waiting its turn.
#include <iostream>
#include <thread>
#include <condition_variable>
#include <vector>
#include <atomic>
std::atomic<bool> done{false};
std::atomic<bool> canProduce{true};
std::mutex ready_mutex;
std::condition_variable ready;
std::vector<int> numbers;
void produce() {
auto idx = 0;
while( true ) {
++idx;
std::unique_lock<std::mutex> lk(ready_mutex);
ready.wait(lk, [](){ return canProduce == true; });
for(int i = 0; i < 5; ++i) {
numbers.push_back(i);
}
canProduce = false;
// Manually unlocking thread to ensure that when the consumer thread
// wakes up, this thread is unlocked and the consumer thread can acquire
// the lock, otherwise it'll go back to sleep.
lk.unlock();
ready.notify_one();
if( 5 == idx) {
done = true;
break;
}
}
}
void consume() {
while( true ) {
std::unique_lock<std::mutex> lk(ready_mutex);
ready.wait(lk, [](){ return !numbers.empty(); });
std::cout << "(" << numbers.size() << ") ";
for(auto x: numbers) {
std::cout << x << ", ";
}
numbers.clear();
std::cout << "(" << numbers.size() << ")" << std::endl;
std::cout.flush();
if (done) {
break;
} else {
canProduce = true;
lk.unlock();
ready.notify_one();
}
}
}
int main() {
for (int i = 0; i < 15; i++) {
std::cout << "Attempt " << i << std::endl;
std::thread t(produce);
std::thread t2(consume);
t.join();
t2.join();
done = false;
canProduce = true;
}
return 0;
}
Output
Attempt 1
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
Attempt 2
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
(5) 0, 1, 2, 3, 4, (0)
...

Related

I don't know why my IOCP's numberOfConcurrentThreads argument doesn't work

I have read a material below and tested some code.
Does IOCP creates its own threads?
My codes are as below:
#include <functional>
#include <stdio.h>
#include <iostream>
#include <crtdbg.h>
#include <conio.h>
#include <array>
#include <unordered_map>
using namespace std;
namespace IOCP_test {
struct myOverlapped {
OVERLAPPED overLapped;
int number;
};
DWORD WINAPI myCallBack(LPVOID completionPort) {
DWORD NumberOfByteTransfered = 0;
VOID* CompletionKey = NULL;
OVERLAPPED* overlappedPointer = NULL;
while (true) {
auto success = GetQueuedCompletionStatus(
(HANDLE)completionPort,
&NumberOfByteTransfered,
(LPDWORD)&CompletionKey,
&overlappedPointer,
INFINITE);
if (success) {
myOverlapped* mO = (myOverlapped*)overlappedPointer;
while (true) {
cout << mO->number << endl;
}
//Sleep(10);
//PostQueuedCompletionStatus(completionPort, 0, 0, overlappedPointer);
}
}
}
void IOCP_test() {
// TODO: sleep을 안걸었는데.. 왜 5개 스레드가 모두 작동할까?
int workerThreadCount = 5;
HANDLE hIOCP = CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 1);
vector<HANDLE> workerThreadVector;
DWORD NumberOfByteTransfered = 0;
VOID* CompletionKey = NULL;
OVERLAPPED* overlappedPointer = NULL;
for (DWORD i = 0; i < workerThreadCount; i++)
{
HANDLE WorkerThread = CreateThread(NULL, 0, myCallBack, hIOCP, 0, NULL);
workerThreadVector.push_back(WorkerThread);
}
myOverlapped a1;
a1.number = 1;
myOverlapped a2;
a2.number = 2;
myOverlapped a3;
a3.number = 3;
myOverlapped a4;
a4.number = 4;
myOverlapped a5;
a5.number = 5;
if (hIOCP) {
PostQueuedCompletionStatus(hIOCP, 0, 0, (LPOVERLAPPED)&a1);
PostQueuedCompletionStatus(hIOCP, 0, 0, (LPOVERLAPPED)&a2);
PostQueuedCompletionStatus(hIOCP, 0, 0, (LPOVERLAPPED)&a3);
PostQueuedCompletionStatus(hIOCP, 0, 0, (LPOVERLAPPED)&a4);
PostQueuedCompletionStatus(hIOCP, 0, 0, (LPOVERLAPPED)&a5);
}
char key;
while (true) {
key = _getch();
if (key == 'e') {
break;
}
if (key == 'n') {
cout << endl;
}
}
for (DWORD i = 0; i < workerThreadVector.size(); i++)
{
CloseHandle(workerThreadVector[i]);
}
if (hIOCP) {
CloseHandle(hIOCP);
}
}
//IOCP_test::IOCP_test();
}
int main()
{
IOCP_test::IOCP_test();
_CrtDumpMemoryLeaks();
return 0;
}
I thought, if numberOfConcurrentThreads works, only a thread that cout '1' should be running, not all five threads.
However, result is as below.
All overlapped works were being processed by worker threads..
Why all overlapped works were processed by 5 worker threads?
I think it should be 1 worker thread that works because numberOfConcurrentThreads is 1.
I hope your wise answers.
Thank you for reading.
I figured it out.
It is because I called 'cout' in my worker thread's function.
I think 'cout' invokes IO interrupt for my worker thread, and it makes IOCP signal another worker thread.
My test code is as below.
You can run this code, and press any key except 'e'.('e' is for exit.)
You can see which worker thread works in IOCP.
#include <functional>
#include <stdio.h>
#include <iostream>
#include <crtdbg.h>
#include <conio.h>
#include <array>
#include <unordered_map>
int iocp_test = -1;
namespace IOCP_test {
struct myOverlapped {
OVERLAPPED overLapped;
int number;
};
DWORD WINAPI myCallBack(LPVOID completionPort) {
DWORD NumberOfByteTransfered = 0;
VOID* CompletionKey = NULL;
OVERLAPPED* overlappedPointer = NULL;
while (true) {
auto success = GetQueuedCompletionStatus(
(HANDLE)completionPort,
&NumberOfByteTransfered,
(LPDWORD)&CompletionKey,
&overlappedPointer,
INFINITE);
if (success) {
myOverlapped* mO = (myOverlapped*)overlappedPointer;
while (true) {
iocp_test = mO->number;
}
//Sleep(10);
//PostQueuedCompletionStatus(completionPort, 0, 0, overlappedPointer);
}
}
}
void IOCP_test() {
// sleep을 안걸었는데.. 왜 5개 스레드가 모두 작동할까?
// cout 같은 IO 인터럽트가 워커 스레드에서 호출되면 IOCP에서 그 스레드는 작업 중이 아닌 스레드로 판단하는 것 같다.
int workerThreadCount = 5;
HANDLE hIOCP = CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 2);
vector<HANDLE> workerThreadVector;
DWORD NumberOfByteTransfered = 0;
VOID* CompletionKey = NULL;
OVERLAPPED* overlappedPointer = NULL;
for (DWORD i = 0; i < workerThreadCount; i++)
{
HANDLE WorkerThread = CreateThread(NULL, 0, myCallBack, hIOCP, 0, NULL);
workerThreadVector.push_back(WorkerThread);
}
myOverlapped a1;
a1.number = 1;
myOverlapped a2;
a2.number = 2;
myOverlapped a3;
a3.number = 3;
myOverlapped a4;
a4.number = 4;
myOverlapped a5;
a5.number = 5;
if (hIOCP) {
PostQueuedCompletionStatus(hIOCP, 0, 0, (LPOVERLAPPED)&a1);
PostQueuedCompletionStatus(hIOCP, 0, 0, (LPOVERLAPPED)&a2);
PostQueuedCompletionStatus(hIOCP, 0, 0, (LPOVERLAPPED)&a3);
PostQueuedCompletionStatus(hIOCP, 0, 0, (LPOVERLAPPED)&a4);
PostQueuedCompletionStatus(hIOCP, 0, 0, (LPOVERLAPPED)&a5);
}
char key;
while (true) {
key = _getch();
cout << iocp_test;
if (key == 'e') {
break;
}
if (key == 'n') {
cout << endl;
}
}
for (DWORD i = 0; i < workerThreadVector.size(); i++)
{
CloseHandle(workerThreadVector[i]);
}
if (hIOCP) {
CloseHandle(hIOCP);
}
}
//IOCP_test::IOCP_test();
}
int main()
{
IOCP_test::IOCP_test();
_CrtDumpMemoryLeaks();
return 0;
}

My thread-safe queue code appears to work, any possible race conditions, deadlocks, or other design problems?

I am new to using condition_variables and unique_locks in C++. I am working on creating an event loop that polls two custom event-queues and a "boolean" (see integer acting as boolean), which can be acted upon by multiple sources.
I have a demo (below) that appears to work, which I would greatly appreciate if you can review and confirm if it follows the best practices for using unique_lock and condition_variables and any problems you foresee happening (race conditions, thread blocking, etc).
In ThreadSafeQueue::enqueue(...): are we unlocking twice by calling notify and having the unique_lock go out of scope?
In the method TheadSafeQueue::dequeueAll(): We assume it is being called by a method that has been notified (cond.notify), and therefore has been locked. Is there a better way to encapsulate this to keep the caller cleaner?
Do we need to make our class members volatile similar to this?
Is there a better way to mockup our situation that allows us to test if we've correctly implemented the locks? Perhaps without the sleep statements and automating the checking process?
ThreadSafeQueue.h:
#include <condition_variable>
#include <cstdint>
#include <iostream>
#include <mutex>
#include <vector>
template <class T>
class ThreadSafeQueue {
public:
ThreadSafeQueue(std::condition_variable* cond, std::mutex* unvrsl_m)
: ThreadSafeQueue(cond, unvrsl_m, 1) {}
ThreadSafeQueue(std::condition_variable* cond, std::mutex* unvrsl_m,
uint32_t capacity)
: cond(cond),
m(unvrsl_m),
head(0),
tail(0),
capacity(capacity),
buffer((T*)malloc(get_size() * sizeof(T))),
scratch_space((T*)malloc(get_size() * sizeof(T))) {}
std::condition_variable* cond;
~ThreadSafeQueue() {
free(scratch_space);
free(buffer);
}
void resize(uint32_t new_cap) {
std::unique_lock<std::mutex> lock(*m);
check_params_resize(new_cap);
free(scratch_space);
scratch_space = buffer;
buffer = (T*)malloc(sizeof(T) * new_cap);
copy_cyclical_queue();
free(scratch_space);
scratch_space = (T*)malloc(new_cap * sizeof(T));
tail = get_size();
head = 0;
capacity = new_cap;
}
void enqueue(const T& value) {
std::unique_lock<std::mutex> lock(*m);
resize();
buffer[tail++] = value;
if (tail == get_capacity()) {
tail = 0;
} else if (tail > get_capacity())
throw("Something went horribly wrong TSQ: 75");
cond->notify_one();
}
// Assuming m has already been locked by the caller...
void dequeueAll(std::vector<T>* vOut) {
if (get_size() == 0) return;
scratch_space = buffer;
copy_cyclical_queue();
vOut->insert(vOut->end(), buffer, buffer + get_size());
head = tail = 0;
}
// Const functions because they shouldn't be modifying the internal variables
// of the object
bool is_empty() const { return get_size() == 0; }
uint32_t get_size() const {
if (head == tail)
return 0;
else if (head < tail) {
// 1 2 3
// 0 1 2
// 1
// 0
return tail - head;
} else {
// 3 _ 1 2
// 0 1 2 3
// capacity-head + tail+1 = 4-2+0+1 = 2 + 1
return get_capacity() - head + tail + 1;
}
}
uint32_t get_capacity() const { return capacity; }
//---------------------------------------------------------------------------
private:
std::mutex* m;
uint32_t head;
uint32_t tail;
uint32_t capacity;
T* buffer;
T* scratch_space;
uint32_t get_next_empty_spot();
void copy_cyclical_queue() {
uint32_t size = get_size();
uint32_t cap = get_capacity();
if (size == 0) {
return; // because we have nothing to copy
}
if (head + size <= cap) {
// _ 1 2 3 ... index = 1, size = 3, 1+3 = 4 = capacity... only need 1 copy
memcpy(buffer, scratch_space + head, sizeof(T) * size);
} else {
// 5 1 2 3 4 ... index = 1, size = 5, 1+5 = 6 = capacity... need to copy
// 1-4 then 0-1
// copy number of bytes: front = 1, to (5-1 = 4 elements)
memcpy(buffer, scratch_space + head, sizeof(T) * (cap - head));
// just copy the bytes from the front up to the first element in the old
// array
memcpy(buffer + (cap - head), scratch_space, sizeof(T) * tail);
}
}
void check_params_resize(uint32_t new_cap) {
if (new_cap < get_size()) {
std::cerr << "ThreadSafeQueue: check_params_resize: size(" << get_size()
<< ") > new_cap(" << new_cap
<< ")... data "
"loss will occur if this happens. Prevented."
<< std::endl;
}
}
void resize() {
uint32_t new_cap;
uint32_t size = get_size();
uint32_t cap = get_capacity();
if (size + 1 >= cap - 1) {
std::cout << "RESIZE CALLED --- BAD" << std::endl;
new_cap = 2 * cap;
check_params_resize(new_cap);
free(scratch_space); // free existing (too small) scratch space
scratch_space = buffer; // transfer pointer over
buffer = (T*)malloc(sizeof(T) * new_cap); // allocate a bigger buffer
copy_cyclical_queue();
// move over everything with memcpy from scratch_space to buffer
free(scratch_space); // free what used to be the too-small buffer
scratch_space =
(T*)malloc(sizeof(T) * new_cap); // recreate scratch space
tail = size;
head = 0;
// since we're done with the old array... delete for memory management->
capacity = new_cap;
}
}
};
// Event Types
// keyboard/mouse
// network
// dirty flag
Main.cpp:
#include <unistd.h>
#include <cstdint>
#include <iostream>
#include <mutex>
#include <queue>
#include <sstream>
#include <thread>
#include "ThreadSafeQueue.h"
using namespace std;
void write_to_threadsafe_queue(ThreadSafeQueue<uint32_t> *q,
uint32_t startVal) {
uint32_t count = startVal;
while (true) {
q->enqueue(count);
cout << "Successfully enqueued: " << count << endl;
count += 2;
sleep(count);
}
}
void sleep_and_set_redraw(int *redraw, condition_variable *cond) {
while (true) {
sleep(3);
__sync_fetch_and_or(redraw, 1);
cond->notify_one();
}
}
void process_events(vector<uint32_t> *qOut, condition_variable *cond,
ThreadSafeQueue<uint32_t> *q1,
ThreadSafeQueue<uint32_t> *q2, int *redraw, mutex *m) {
while (true) {
unique_lock<mutex> lck(*m);
cond->wait(lck);
q1->dequeueAll(qOut);
q2->dequeueAll(qOut);
if (__sync_fetch_and_and(redraw, 0)) {
cout << "FLAG SET" << endl;
qOut->push_back(0);
}
for (auto a : *qOut) cout << a << "\t";
cout << endl;
cout << "PROCESSING: " << qOut->size() << endl;
qOut->clear();
}
}
void test_2_queues_and_bool() {
try {
condition_variable cond;
mutex m;
ThreadSafeQueue<uint32_t> q1(&cond, &m, 1024);
ThreadSafeQueue<uint32_t> q2(&cond, &m, 1024);
int redraw = 0;
vector<uint32_t> qOut;
thread t1(write_to_threadsafe_queue, &q1, 2);
thread t2(write_to_threadsafe_queue, &q2, 1);
thread t3(sleep_and_set_redraw, &redraw, &cond);
thread t4(process_events, &qOut, &cond, &q1, &q2, &redraw, &m);
t1.join();
t2.join();
t3.join();
t4.join();
} catch (system_error &e) {
cout << "MAIN TEST CRASHED" << e.what();
}
}
int main() { test_2_queues_and_bool(); }

How to fix error occured in MPI_Send reported by process [135921655, 0] on communicator MPI_COMM_WORLD

I am starting to learn MPI and I was following a tutorial and wrote two files in C. The compilation and running of the first file is fine, but when perform the same thing on the second file, it wouldn't work. And after I encountered the error, now I cant run neither of the files even if I recompiled them.
I could not find a solution to my problem anywhere on the web, including on stackoverflow. This post was the closest that i come across, but it didnot provide any solution. Error occurred in MPI_Send on communicator MPI_COMM_WORLD MPI_ERR_RANK:invalid rank
First file :
#include <stdio.h>
#include <mpi.h>
Int main (int argc, char *argv){
//Initialize the MPI environment
MPI_Init(NULL, NULL);
//find out rank, size
int world_rank;
MPI_Comm_rank (MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size (MPI_COMM_WORLD, &world_size);
int number;
if (world_rank == 0){
number = -1;
MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
}
else if (world_rank == 1){
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n", number);
}
//Finalise the MPI environment
MPI_Finalize();
}
Second File :
#include <stdio.h>
#include <mpi.h>
int main (int argc, char **argv){
//Initialize the MPI environment
MPI_Init(NULL, NULL);
//find out rank, size
int world_rank;
MPI_Comm_rank (MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size (MPI_COMM_WORLD, &world_size);
int X, Y, Z;
if (world_rank == 0){
scanf("%d", &X);
scanf("%d", &Y);
MPI_Send(&X, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
MPI_Send(&Y, 1, MPI_INT, 1, 1, MPI_COMM_WORLD);
MPI_Recv(&Z, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 2\n", Z);
}
else if (world_rank == 1){
MPI_Recv(&X, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Recv(&Y, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
Z = X + Y;
MPI_Send(&Z, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
}
//Finalise the MPI environment
MPI_Finalize();
}
ERROR MESSAGES :
[ubuntu:2638] *** An error occurred in MPI_Send
[ubuntu:2638] *** reported by process [135921665,0]
[ubuntu:2638] *** on communicator MPI_COMM_WORLD
[ubuntu:2638] *** MPI_ERR_RANK: invalid rank
[ubuntu:2638] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[ubuntu:2638] *** and potentially your MPI job)
UPDATE:
Here is the command line that i used
mpicc -o 123 file1.c
mpirun 123
This was ok for the first time, but not after
mpicc -o 123 file2.c
mpirun 123
This was where i first encountered the error

pass 2-d arrays with different names to one function

How can i pass 2-d arrays with different names (but the same structure and data type and size) to a function?
using namespace std;
void OutputArray ();
int A[4][2] = {{1, 2} , {3, 4} , { 5, 7} , {8, 1} };
int B[4][2] = {{5, 6} , {7, 8} , { 3, 9} , {2, 2} };
int main ()
{
OutputArray(A[][2]);
OutputArray(B[][2]);
system("pause");
return 0;
}
void OutputArray(int intNumbersArray[][2])
{
for (int intCounter = 0; intCounter < 4 ; intCounter++)
{
cout << outputArray[intCounter][0] << outputArray[intCounter][1] << endl;
}
}
and consequently have the printed A and B.
after fixing some bugs i got this code to work:
#include <iostream>
using namespace std;
void OutputArray (int intNumbersArray[][2]);
int A[4][2] = {{1, 2} , {3, 4} , { 5, 7} , {8, 1} };
int B[4][2] = {{5, 6} , {7, 8} , { 3, 9} , {2, 2} };
void OutputArray(int intNumbersArray[][2])
{
int intCounter = 0;
for (intCounter = 0; intCounter < 4 ; intCounter++)
{
cout << intNumbersArray[intCounter][0] << intNumbersArray[intCounter][1] << endl;
}
}
int main ()
{
OutputArray(A);
OutputArray(B);
cin;
return 0;
}
you needed #include <iostream> for sin and cout
you needed the signature of OutputArray to contain the parameter int intNumbersArray[][2]
you needed to pass only A and B and not A[][2] B[][2]
In function OutputArray you used outputArray and not intNumbersArray which is the array you meant

OpenMPI multiple MPI_Send and MPI_recv not working

When i tried to call multiple MPI_Send or MPI_Recv in the program, the executable is getting hanged in the nodes and the root. ie, when it is trying to execute the second MPI_Send or MPI_Recv, the communication is getting blocked. At the same time the binaries are running at 100% in the machines.
When i tried to run this code in windows 7 64 bit with OpenMPI 1.6.3 64-bit, it ran successfully. But the same code is not working in Linux, ie, CentOS 6.3 x86_64 with OpenMPI 1.6.3 -64 bit. What is the problem i have done.
Posting the code below
#include <mpi.h>
int main(int argc, char** argv) {
MPI::Init();
int rank = MPI::COMM_WORLD.Get_rank();
int size = MPI::COMM_WORLD.Get_size();
char name[256] = { };
int len = 0;
MPI::Get_processor_name(name, len);
printf("Hi I'm %s:%d\n", name, rank);
if (rank == 0)
{
while (size >= 1)
{
int val, stat = 1;
MPI::Status status;
MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, 1, 0, status);
int source = status.Get_source();
printf("%s:%d received %d from %d\n", name, rank, val, source);
MPI::COMM_WORLD.Send(&stat, 1, MPI::INT, 1, 2);
printf("%s:%d sent status %d\n", name, rank, stat);
size--;
}
} else
{
int val = rank + 10;
int stat = 0;
printf("%s:%d sending %d...\n", name, rank, val);
MPI::COMM_WORLD.Send(&val, 1, MPI::INT, 0, 0);
printf("%s:%d sent %d\n", name, rank, val);
MPI::Status status;
MPI::COMM_WORLD.Recv(&stat, 1, MPI::INT, 0, 2, status);
int source = status.Get_source();
printf("%s:%d received status %d from %d\n", name, rank, stat, source);
}
size = MPI::COMM_WORLD.Get_size();
if (rank == 0)
{
while (size >= 1)
{
int val, stat = 1;
MPI::Status status;
MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, 1, 1, status);
int source = status.Get_source();
printf("%s:0 received %d from %d\n", name, val, source);
size--;
}
printf("all workers checked in!\n");
}
else
{
int val = rank + 10 + 5;
printf("%s:%d sending %d...\n", name, rank, val);
MPI::COMM_WORLD.Send(&val, 1, MPI::INT, 0, 1);
printf("%s:%d sent %d\n", name, rank, val);
}
MPI::Finalize();
return 0;
}
Hi Hristo, I have changed the source as you said and the code is again posting
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv)
{
int iNumProcess = 0, iRank = 0, iNameLen = 0, n;
char szNodeName[MPI_MAX_PROCESSOR_NAME] = {};
MPI_Status stMPIStatus;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &iNumProcess);
MPI_Comm_rank(MPI_COMM_WORLD, &iRank);
MPI_Get_processor_name(szNodeName, &iNameLen);
printf("Hi I'm %s:%d\n", szNodeName, iRank);
if (iRank == 0)
{
int iNode = 1;
while (iNumProcess > 1)
{
int iVal = 0, iStat = 1;
MPI_Recv(&iVal, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &stMPIStatus);
printf("%s:%d received %d\n", szNodeName, iRank, iVal);
MPI_Send(&iStat, 1, MPI_INT, iNode, 1, MPI_COMM_WORLD);
printf("%s:%d sent Status %d\n", szNodeName, iRank, iStat);
MPI_Recv(&iVal, 1, MPI_INT, MPI_ANY_SOURCE, 2, MPI_COMM_WORLD, &stMPIStatus);
printf("%s:%d received %d\n", szNodeName, iRank, iVal);
iNumProcess--;
iNode++;
}
printf("all workers checked in!\n");
}
else
{
int iVal = iRank + 10;
int iStat = 0;
printf("%s:%d sending %d...\n", szNodeName, iRank, iVal);
MPI_Send(&iVal, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
printf("%s:%d sent %d\n", szNodeName, iRank, iVal);
MPI_Recv(&iStat, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &stMPIStatus);
printf("%s:%d received status %d\n", szNodeName, iRank, iVal);
iVal = 20;
printf("%s:%d sending %d...\n", szNodeName, iRank, iVal);
MPI_Send(&iVal, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
printf("%s:%d sent %d\n", szNodeName, iRank, iVal);
}
MPI_Finalize();
return 0;
}
i got the output as folows. ie, after the send send/receive, root is infinitely waiting and the nodes are ruing with 100% CPU utilisation. Its output is giving below
Hi I'm N1433:1
N1433:1 sending 11...
Hi I'm N1425:0
N1425:0 received 11
N1425:0 sent Status 1
N1433:1 sent 11
N1433:1 received status 11
N1433:1 sending 20...
Here N1433 and N1425 are machine names. Please help
The code for the master is wrong. It is always sending to and awaiting messages from the same rank - rank 1. Thus the program would only function correctly if run as mpiexec -np 2 .... What you've probably wanted to do is to use MPI_ANY_SOURCE as the source rank and then use that source rank as the destination in the send operation. You shouldn't also use while (size >= 1) since rank 0 is not talking to itself and the number of communications is expected to be one less than size.
if (rank == 0)
{
while (size > 1)
// ^^^^^^^^
{
int val, stat = 1;
MPI::Status status;
MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, MPI_ANY_SOURCE, 0, status);
// Use wildcard source here ------------^^^^^^^^^^^^^^
int source = status.Get_source();
printf("%s:%d received %d from %d\n", name, rank, val, source);
MPI::COMM_WORLD.Send(&stat, 1, MPI::INT, source, 2);
// Send back to the same process --------^^^^^^
printf("%s:%d sent status %d\n", name, rank, stat);
size--;
}
} else
Doing something like this in the worker is pointless:
MPI::Status status;
MPI::COMM_WORLD.Recv(&stat, 1, MPI::INT, 0, 2, status);
// Source rank is fixed here ------------^
int source = status.Get_source();
printf("%s:%d received status %d from %d\n", name, rank, stat, source);
You have already specified rank 0 as the source in the receive operation so it would only be able to receive messages from rank 0. There is no way that status.Get_source() would return any value other than 0, unless some communication error had occurred, in which case an exception would get thrown by MPI::COMM_WORLD.Recv().
The same is also true for the second loop in your code.
By the way, your are using what used to be the official standard C++ bindings. They were deprecated in MPI-2.2 and the latest version of the standard (MPI-3.0) removed them completely as no longer supported by the MPI Forum. You should be using the C bindings instead or rely on 3-rd party C++ interfaces like Boost.MPI.
After installing and MPICH2 instead of OpenMPI, it worked successfully. I think there is some problem in using OpenMPI 1.6.3 in my cluster machines.

Resources