Nested parallellism in OpenMP

Nested parallellism in OpenMP - nested

I want to map tasks to three threads as follows:
Each of taskA, taskB, and taskC must be executed by separate threads.
taskA has subtasks task(1), task(2), and task(3).
taskB has subtasks task(11), task(12), and task(13).
taskC has subtasks task(21), task(22), and task(23).
If any one of taskA, taskB, and taskC finishes and there is at least one unstarted subtask of another task, the thread associated with the finished task should steal the unstarted subtask.
I was not able to achieve this setting. All I was able to do the following MWE. In this MWE, threads do not obey the rules 2, 3, 4.
Here is my MWE:
double task(int taskid) {
int tid = omp_get_thread_num();
int nthreads = omp_get_num_threads();
printf("%d/%d: taskid=%d\n", tid, nthreads, taskid);
int i;
double t = 1.1;
for(i = 0; i < 10000000*taskid; i++) {
t *= t/i;
}
return t;
}
double taskA() {
int tid = omp_get_thread_num();
int nthreads = omp_get_num_threads();
printf("%s %d/%d\n", __FUNCTION__, tid, nthreads);
double a, b, c;
//#pragma omp parallel
//#pragma omp single
{
#pragma omp task untied shared(a)
a=task(1);
#pragma omp task untied shared(b)
b=task(2);
#pragma omp task untied shared(c)
c=task(3);
}
return a+b+c;
}
double taskB() {
int tid = omp_get_thread_num();
int nthreads = omp_get_num_threads();
printf("%s %d/%d\n", __FUNCTION__, tid, nthreads);
double a, b, c;
//#pragma omp parallel
//#pragma omp single
{
#pragma omp task untied shared(a)
a=task(11);
#pragma omp task untied shared(b)
b=task(12);
#pragma omp task untied shared(c)
c=task(13);
}
return a+b+c;
}
double taskC() {
int tid = omp_get_thread_num();
int nthreads = omp_get_num_threads();
printf("%s %d/%d\n", __FUNCTION__, tid, nthreads);
double a, b, c;
//#pragma omp parallel
//#pragma omp single
{
#pragma omp task untied shared(a)
a=task(21);
#pragma omp task untied shared(b)
b=task(22);
#pragma omp task untied shared(c)
c=task(23);
}
return a+b+c;
}
int main() {
omp_set_num_threads(3);
double a,b,c;
#pragma omp parallel
#pragma omp single
{
#pragma omp task untied
a=taskA();
#pragma omp task untied
b=taskB();
#pragma omp task untied
c=taskC();
}
#pragma omp taskwait
printf("%g %g %g\n", a, b, c);
return 0;
}
Compiled as:
icpc -Wall -fopenmp -O2 -o nestedomp nestedomp.c
Output:
taskC 1/3
1/3: taskid=21
taskA 2/3
taskB 0/3
0/3: taskid=23
2/3: taskid=22
1/3: taskid=1
1/3: taskid=2
2/3: taskid=3
0/3: taskid=11
1/3: taskid=12
2/3: taskid=13
Here, thread 0 starts processing task 23, however it must start processing 1 or 11.

You could use thread id to structure work distribution:
#pragma omp parallel num_threads(3)
{
int tid = omp_get_thread_num();
if (tid == 0)
// Task 0
} else if (tid == 1) {
// Task 1
} else
// Task 2
}
You can set the number of threads according to your needs and introduce nesting at the task level.

Related

In openMP is there a way to for tasks to share variables?

In my experience, when I update a varible in 1 task the variable is not updated in other tasks even if the first task that updated the variable is done executing. For example given the code,
int nThreads = atoi(argv[1]);
omp_set_num_threads(nThreads);
int currentInt = 0;
int numEdges = 1000000;
#pragma omp parallel shared(currentInt)
{
#pragma omp single
{
#pragma omp task shared(currentInt)
{
printf("I am doing kruskals: Thread %d\n", omp_get_thread_num());
while(currentInt < numEdges)
{
currentInt++;
}
printf("Kruskals Done! %d\n", currentInt);
#pragma omp shared(currentInt)
{
for(int i = 0; i < 10000000; i++){
}
printf("Helper: Current Int %d Thread %d \n", currentInt, omp_get_thread_num());
}
}
#pragma omp taskwait
}
}
It will always print currentInt 0. Even if the first task finishes before the second. I need this because I am trying to parallize an algorithm where a have a sequential task going through a large array and many parallel tasks excuting simultanously on parts of that array and once the sequential task reaches the portion of the array that a parallel task is working on the parallel task can stop itself because it is no longer needed. The parallel and sequential tasks share no dependancies so that is not a problem.
Any help will be appreciated.

Implementation of MergeSort algorithm using OpenMP

I am trying to implement Mergesort algorithm using OpenMP for first time. I came across this block of code where they are using parallel sections directive to divide the unsorted array. But for Repeatedly merging the subarrays to produce new sorted array there is no parallel implementation. I have added the parallel directive for each loop in the merge method.Will this create the overhead on thread? I am not sure if this is right.Please correct me if I am wrong.How to proceed.Thanks.
void merge_divide(int array[],int low,int high)
{
int mid;
if(low<high)
{
mid=(low+high)/2;
#pragma omp parallel sections
{
#pragma omp section
{
merge_divide(array,low,mid);
}
#pragma omp section
{
merge_divide(array,mid+1,high);
}
}
merge_conquer(array,low,mid,high);
}
Merge method
void merge_conquer(int array[],int low,int mid,int high)
{
int temp[30];
int i,j,k,m;
j=low;
m=mid+1;
#pragma omp parallel for
for(i=low; j<=mid && m<=high ; i++)
{
if(array[j]<=array[m])
{
temp[i]=array[j];
j++;
}
else
{
temp[i]=array[m];
m++;
}
}
if(j>mid)
{
#pragma omp parallel for
for(k=m; k<=high; k++)
{
temp[i]=array[k];
i++;
}
}
else
{
#pragma omp parallel for
for(k=j; k<=mid; k++)
{
temp[i]=array[k];
i++;
}
}
#pragma omp parallel for
for(k=low; k<=high; k++)
array[k]=temp[k];
}

Pthread Mutex Lock Linux

I created a simple program that shows the use of mutex lock. Here is the code...
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#define NUM_THREAD 2
pthread_mutex_t mutex;
int call_time;
void *makeCall(void *param)
{
call_time = 10;
pthread_mutex_lock(&mutex);
printf("Hi I'm thread #%u making a call\n", (unsigned int) pthread_self());
do{
printf("%d\n", call_time);
call_time--;
sleep(1);
}
while(call_time > 0);
pthread_mutex_unlock(&mutex);
return 0;
}
int main()
{
int i;
pthread_t thread[NUM_THREAD];
//init mutex
pthread_mutex_init(&mutex, NULL);
//create thread
for(i = 0; i < NUM_THREAD; i++)
pthread_create(&thread[i], NULL, makeCall, NULL);
//join thread
for(i = 0; i < NUM_THREAD; i++)
pthread_join(thread[i], NULL);
pthread_mutex_destroy(&mutex);
return 0;
}
The output is...
Hi I'm thread #3404384000 making a call
10
10
9
8
7
6
5
4
3
2
1
Hi I'm thread #3412776704 making a call
0
However, if I modify the function makeCall and transfer the variable call_time inside the mutex locks...
pthread_mutex_lock(&mutex);
call_time = 10;
/*
*
*
*
*/
pthread_mutex_unlock(&mutex);
The program now gives me the correct output where each of the thread counts down from 10 to 0. I don't understand the difference it makes transferring the variable call_time inside the locks. I hope someone can make me understand this behavior of my program. Cheers!

call_time is a shared variable that is accessed from 2 threads and so must be protected. What is happening is that the first thread starts, sets call_time to 10 and prints the first round.Then the second thread starts, resets call_time back to 10 and waits for the mutex. The first thread now comes back and keeps running with call_time reset to 10. After it is done and frees the mutex, the second thread can now run. call_time is now 0 since the first thread left it at 0, and so it just prints the last round.
Try this program, I think it will demonstrate threads better:
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#define NUM_THREAD 2
pthread_mutex_t mutex;
int call_time;
void *makeCall(void *param)
{
int temp;
do{
pthread_mutex_lock(&mutex);
printf("Hi I'm thread #%u making a call\n", (unsigned int) pthread_self());
printf("%d\n", call_time);
temp = call_time--;
pthread_mutex_unlock(&mutex);
//sleep(1); //try with and without this line and see the difference.
}
while(temp > 0);
return 0;
}
int main()
{
int i;
call_time = 100;
pthread_t thread[NUM_THREAD];
//init mutex
pthread_mutex_init(&mutex, NULL);
//create thread
for(i = 0; i < NUM_THREAD; i++)
pthread_create(&thread[i], NULL, makeCall, NULL);
//join thread
for(i = 0; i < NUM_THREAD; i++)
pthread_join(thread[i], NULL);
pthread_mutex_destroy(&mutex);
return 0;
}

Thread for interprocess communication in OpenMP

I have an OpenMP parallelized program that looks like that:
[...]
#pragma omp parallel
{
//initialize threads
#pragma omp for
for(...)
{
//Work is done here
}
}
Now I'm adding MPI support. What I will need is a thread that handles the communication, in my case, calls GatherAll all the time and fills/empties a linked list for receiving/sending data from the other processes. That thread should send/receive until a flag is set. So right now there is no MPI stuff in the example, my question is about the implementation of that routine in OpenMP.
How do I implement such a thread? For example, I tried to introduce a single directive here:
[...]
int kill=0
#pragma omp parallel shared(kill)
{
//initialize threads
#pragma omp single nowait
{
while(!kill)
send_receive();
}
#pragma omp for
for(...)
{
//Work is done here
}
kill=1
}
but in this case the program gets stuck because the implicit barrier after the for-loop waits for the thread in the while-loop above.
Thank you, rugermini.

You could try adding a nowait clause to your single construct:
EDIT: responding to the first comment
If you enable nested parallelism for OpenMP, you might be able to achieve what you want by making two levels of parallelism. In the top level, you have two concurrent parallel sections, one for the MPI communications, the other for local computation. This last section can itself be parallelized, which gives you a second level of parallelisation. Only threads executing this level will be affected by barriers in it.
#include <iostream>
#include <omp.h>
int main()
{
int kill = 0;
#pragma omp parallel sections
{
#pragma omp section
{
while (kill == 0){
/* manage MPI communications */
}
}
#pragma omp section
{
#pragma omp parallel
#pragma omp for
for (int i = 0; i < 10000 ; ++i) {
/* your workload */
}
kill = 1;
}
}
}
However, you must be aware that your code is going to break if you don't have at least two threads, which means you're breaking the assumption that the sequential and parallelized versions of the code should do the same thing.
It would be much cleaner to wrap your OpenMP kernel inside a more global MPI communication scheme (potentially using asynchronous communications to overlap communications with computations).

You have to be careful, because you can't just have your MPI calling thread "skip" the omp for loop; all threads in the thread team have to go through the for loop.
There's a couple ways you could do this: with nested parallism and tasks, you could launch one task to do the message passing and anther to call a work routine which has an omp parallel for in it:
#include <mpi.h>
#include <omp.h>
#include <stdio.h>
void work(int rank) {
const int n=14;
#pragma omp parallel for
for (int i=0; i<n; i++) {
int tid = omp_get_thread_num();
printf("%d:%d working on item %d\n", rank, tid, i);
}
}
void sendrecv(int rank, int sneighbour, int rneighbour, int *data) {
const int tag=1;
MPI_Sendrecv(&rank, 1, MPI_INT, sneighbour, tag,
data, 1, MPI_INT, rneighbour, tag,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
int main(int argc, char **argv) {
int rank, size;
int sneighbour;
int rneighbour;
int data;
int got;
MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &got);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
omp_set_nested(1);
sneighbour = rank + 1;
if (sneighbour >= size) sneighbour = 0;
rneighbour = rank - 1;
if (rneighbour <0 ) rneighbour = size-1;
#pragma omp parallel
{
#pragma omp single
{
#pragma omp task
{
sendrecv(rank, sneighbour, rneighbour, &data);
printf("Got data from %d\n", data);
}
#pragma omp task
work(rank);
}
}
MPI_Finalize();
return 0;
}
Alternately, you could make your omp for loop schedule(dynamic) so that the other threads can pick up some of the slack from while the master thread is sending, and the master thread can pick up some work when it's done:
#include <mpi.h>
#include <omp.h>
#include <stdio.h>
void sendrecv(int rank, int sneighbour, int rneighbour, int *data) {
const int tag=1;
MPI_Sendrecv(&rank, 1, MPI_INT, sneighbour, tag,
data, 1, MPI_INT, rneighbour, tag,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
int main(int argc, char **argv) {
int rank, size;
int sneighbour;
int rneighbour;
int data;
int got;
const int n=14;
MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &got);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
omp_set_nested(1);
sneighbour = rank + 1;
if (sneighbour >= size) sneighbour = 0;
rneighbour = rank - 1;
if (rneighbour <0 ) rneighbour = size-1;
#pragma omp parallel
{
#pragma omp master
{
sendrecv(rank, sneighbour, rneighbour, &data);
printf("Got data from %d\n", data);
}
#pragma omp for schedule(dynamic)
for (int i=0; i<n; i++) {
int tid = omp_get_thread_num();
printf("%d:%d working on item %d\n", rank, tid, i);
}
}
MPI_Finalize();
return 0;
}

Hmmm. If you are indeed adding MPI 'support' to your program, then you ought to be using mpi_allgather as mpi_gatherall does not exist. Note that mpi_allgather is a collective operation, that is all processes in the communicator call it. You can't have a process gathering data while the other processes do whatever it is they do. What you could do is use MPI single-sided communications to implement your idea; this will be a little tricky but no more than that if one process only reads the memory of other processes.
I'm puzzled by your use of the term 'thread' wrt MPI. I fear that you are confusing OpenMP and MPI, one of whose variants is called OpenMPI. Despite this name it is as different from OpenMP as chalk from cheese. MPI programs are written in terms of processes, not threads. The typical OpenMP implementation does indeed use threads, though the details are generally well-hidden from the programmer.
I'm seriously impressed that you are trying, or seem to be trying, to use MPI 'inside' your OpenMP code. This is exactly the opposite of work I do, and see others do on some seriously large computers. The standard mode for such 'hybrid' parallelisation is to write MPI programs which call OpenMP code. Many of today's very large computers comprise collections of what are, in effect, multicore boxes. A typical approach to programming one of these is to have one MPI process running on each box, and for each of those processes to use one OpenMP thread for each core in the box.

Missing OpenMP feature: Thread Priority

Anyone think about it. OpenMP features to adjust cpu muscles to handle dumbbel. In my research for openmp we cannot set thread priority to execute block code with powerfull muscle. Only one way(_beginthreadex or CreateThread function with 5. parameters) to create threads with highest priority.
Here some code for this issue:
This is manual setting.
int numberOfCore = ( execute __cpuid to obtain number of cores on your cpu ).
HANDLES* hThreads = new HANDLES[ numberOfCore ];
hThreads[0] = _beginthreadex( NULL, 0, someThreadFunc, NULL, 0, NULL );
SetThreadPriority( hThreads[0], HIGH_PRIORITY_CLASS );
WaitForMultipleObjects(...);
Here is i want to see this part:
#pragma omp parallel
{
#pragma omp for ( threadpriority:HIGH_PRIORITY_CLASS )
for( ;; ) { ... }
}
Or
#pragma omp parallel
{
// Generally this function greatly appreciativable.
_omp_set_priority( HIGH_PRIORITY_CLASS );
#pragma omp for
for( ;; ) { ... }
}
I dont know if there was a way to setup priority with openmp pls inform us.

You can do SetThreadPriority in the body of the loop without requiring special support from OpenMP:
for (...)
{
DWORD priority=GetThreadPriority(...);
SetThreadPriority(...);
// stuff
SetThreadPriority(priority);
}

Simple test reveals unexpected results:
I have run a simple test in Visual Studio 2010 (Windows 7):
#include <stdio.h>
#include <omp.h>
#include <windows.h>
int main()
{
int tid, nthreads;
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_ABOVE_NORMAL);
#pragma omp parallel private(tid) num_threads(4)
{
tid = omp_get_thread_num();
printf("Thread %d: Priority = %d\n", tid, GetThreadPriority(GetCurrentThread()));
}
printf("\n");
#pragma omp parallel private(tid) shared(nthreads) num_threads(4)
{
tid = omp_get_thread_num();
#pragma omp master
{
printf("Master Thread %d: Priority = %d\n", tid, GetThreadPriority(GetCurrentThread()));
}
}
#pragma omp parallel num_threads(4)
{
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_ABOVE_NORMAL);
}
printf("\n");
#pragma omp parallel private(tid) num_threads(4)
{
tid = omp_get_thread_num();
printf("Thread %d: Priority = %d\n", tid, GetThreadPriority(GetCurrentThread()));
}
return 0;
}
The output is:
Thread 1: Priority = 0
Thread 0: Priority = 1
Thread 2: Priority = 0
Thread 3: Priority = 0
Master Thread 0: Priority = 1
Thread 0: Priority = 1
Thread 1: Priority = 1
Thread 3: Priority = 1
Thread 2: Priority = 1
Explanation:
The OpenMP master threads is executed with the thread priority of the main.
The other OpenMP threads are left in Normal priority.
When manually setting the thread priority of OpenMP threads, the threads remains with that priority.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Nested parallellism in OpenMP - nested

You could use thread id to structure work distribution: #pragma omp parallel num_threads(3) { int tid = omp_get_thread_num(); if (tid == 0) // Task 0 } else if (tid == 1) { // Task 1 } else // Task 2 } You can set the number of threads according to your needs and introduce nesting at the task level.

Related

In openMP is there a way to for tasks to share variables?

Implementation of MergeSort algorithm using OpenMP

Pthread Mutex Lock Linux

Thread for interprocess communication in OpenMP

Missing OpenMP feature: Thread Priority

Categories

Resources