How to parallelize for loop in OpenMP?

How to parallelize for loop in OpenMP? - multithreading

This is my program which calculates sum of 10000 element which is assign to 1. The sum should be 5000 for 1st thread and 5000 for other but for every run it is giving different output
#include<omp.h>
#include<stdio.h>
int main()
{
int i,sum1=0,sum2=0,a[10000],sum_final=0;
for(i=0;i<10000;i++)
{
a[i]=1;
}
#pragma omp parallel
{
if(omp_get_thread_num()==0)
{
for(i=0;i<5000;i++)
{
sum1+=a[i];
}
printf("Sum1 is %d\n",sum1);
}
if(omp_get_thread_num()==1)
{
for(i=5000;i<10000;i++)
{
sum2+=a[i];
}
printf("Sum2 is %d\n",sum2);
}
}
return 0;
}

Your loop counter should be private. I think you should try
#pragma omp parallel private(i)

Related

In openMP is there a way to for tasks to share variables?

In my experience, when I update a varible in 1 task the variable is not updated in other tasks even if the first task that updated the variable is done executing. For example given the code,
int nThreads = atoi(argv[1]);
omp_set_num_threads(nThreads);
int currentInt = 0;
int numEdges = 1000000;
#pragma omp parallel shared(currentInt)
{
#pragma omp single
{
#pragma omp task shared(currentInt)
{
printf("I am doing kruskals: Thread %d\n", omp_get_thread_num());
while(currentInt < numEdges)
{
currentInt++;
}
printf("Kruskals Done! %d\n", currentInt);
#pragma omp shared(currentInt)
{
for(int i = 0; i < 10000000; i++){
}
printf("Helper: Current Int %d Thread %d \n", currentInt, omp_get_thread_num());
}
}
#pragma omp taskwait
}
}
It will always print currentInt 0. Even if the first task finishes before the second. I need this because I am trying to parallize an algorithm where a have a sequential task going through a large array and many parallel tasks excuting simultanously on parts of that array and once the sequential task reaches the portion of the array that a parallel task is working on the parallel task can stop itself because it is no longer needed. The parallel and sequential tasks share no dependancies so that is not a problem.
Any help will be appreciated.

Linux Thread priority , behaviour is abnormal

In the below code snippet, I am creating 6 threads. Each with different priorities. The priority is mentioned in global priority array. I am doing a continuous increment of global variables inside each thread based on thread index. I was expecting the count to be higher if thread priority is higher. but my output is not adhering to priority concepts pl. refer to the output order shown below. I am trying this out on Ubuntu 16.04 and Linux kernel 4.10.
O/P,
Thread=0
Thread=3
Thread=2
Thread=5
Thread=1
Thread=4
pid=32155 count=4522138740
pid=32155 count=4509082289
pid=32155 count=4535088439
pid=32155 count=4517943246
pid=32155 count=4522643905
pid=32155 count=4519640181
Code:
#include <stdio.h>
#include <pthread.h>
#define FAILURE -1
#define MAX_THREADS 15
long int global_count[MAX_THREADS];
/* priority of each thread */
long int priority[]={1,20,40,60,80,99};
void clearGlobalCounts()
{
int i=0;
for(i=0;i<MAX_THREADS;i++)
global_count[i]=0;
}
/**
thread parameter is thread index
**/
void funcDoNothing(void *threadArgument)
{
int count=0;
int index = *((int *)threadArgument);
printf("Thread=%d\n",index);
clearGlobalCounts();
while(1)
{
count++;
if(count==100)
{
global_count[index]++;
count=0;
}
}
}
int main()
{
int i=0;
for(int i=0;i<sizeof(priority)/sizeof(long int);i++)
create_thread(funcDoNothing, i,priority[i]);
sleep(3600);
for(i=0;i<sizeof(priority)/sizeof(long int);i++)
{
printf("pid=%d count=%ld\n",getpid(),
global_count[i]);
}
}
create_thread(void *func,int thread_index,int priority)
{
pthread_attr_t attr;
struct sched_param schedParam;
void *pParm=NULL;
int id;
int * index = malloc(sizeof(int));
*index = thread_index;
void *res;
/* Initialize the thread attributes */
if (pthread_attr_init(&attr))
{
printf("Failed to initialize thread attrs\n");
return FAILURE;
}
if(pthread_attr_setschedpolicy(&attr, SCHED_FIFO))
{
printf("Failed to pthread_attr_setschedpolicy\n");
return FAILURE;
}
if (pthread_attr_setschedpolicy(&attr, SCHED_FIFO))
{
printf("Failed to setschedpolicy\n");
return FAILURE;
}
/* Set the capture thread priority */
pthread_attr_getschedparam(&attr, &schedParam);;
schedParam.sched_priority = sched_get_priority_max(SCHED_FIFO) - 1;
schedParam.sched_priority = priority;
if (pthread_attr_setschedparam(&attr, &schedParam))
{
printf("Failed to setschedparam\n");
return FAILURE;
}
pthread_create(&id, &attr, (void *)func, index);
}

The documentation for pthread_attr_setschedparam says:
In order for the parameter setting made by
pthread_attr_setschedparam() to have effect when calling
pthread_create(3), the caller must use pthread_attr_setinheritsched(3)
to set
the inherit-scheduler attribute of the attributes object attr to PTHREAD_EXPLICIT_SCHED.
So you have to call pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED) , for example:
if (pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED) != 0) {
perror("pthread_attr_setinheritsched");
}
pthread_create(&id, &attr, (void *)func, index);
Note: Your code produces a lot of compiler warnings, you need to fix those. You do not want to try to test code which have a lot of undefined behavior - as indicated by some of the warnings. You should probably lower the sleep(3600) to just a few seconds, since when you get your threads running under SCHED_FIFO, they will hog your CPU and the machine appears freezed while they are running.

Implementation of MergeSort algorithm using OpenMP

I am trying to implement Mergesort algorithm using OpenMP for first time. I came across this block of code where they are using parallel sections directive to divide the unsorted array. But for Repeatedly merging the subarrays to produce new sorted array there is no parallel implementation. I have added the parallel directive for each loop in the merge method.Will this create the overhead on thread? I am not sure if this is right.Please correct me if I am wrong.How to proceed.Thanks.
void merge_divide(int array[],int low,int high)
{
int mid;
if(low<high)
{
mid=(low+high)/2;
#pragma omp parallel sections
{
#pragma omp section
{
merge_divide(array,low,mid);
}
#pragma omp section
{
merge_divide(array,mid+1,high);
}
}
merge_conquer(array,low,mid,high);
}
Merge method
void merge_conquer(int array[],int low,int mid,int high)
{
int temp[30];
int i,j,k,m;
j=low;
m=mid+1;
#pragma omp parallel for
for(i=low; j<=mid && m<=high ; i++)
{
if(array[j]<=array[m])
{
temp[i]=array[j];
j++;
}
else
{
temp[i]=array[m];
m++;
}
}
if(j>mid)
{
#pragma omp parallel for
for(k=m; k<=high; k++)
{
temp[i]=array[k];
i++;
}
}
else
{
#pragma omp parallel for
for(k=j; k<=mid; k++)
{
temp[i]=array[k];
i++;
}
}
#pragma omp parallel for
for(k=low; k<=high; k++)
array[k]=temp[k];
}

Linux - Control Flow in a linux kernel module

I am learning to write kernel modules and in one of the examples I had to make sure that a thread executed 10 times and exits, so I wrote this according to what I have studied:
#include <linux/module.h>
#include <linux/kthread.h>
struct task_struct *ts;
int flag = 0;
int id = 10;
int function(void *data) {
int n = *(int*)data;
set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout(n*HZ); // after doing this it executed infinitely and i had to reboot
while(!kthread_should_stop()) {
printk(KERN_EMERG "Ding");
}
flag = 1;
return 0;
}
int init_module (void) {
ts = kthread_run(function, (void *)&id, "spawn");
return 0;
}
void cleanup_module(void) {
if (flag==1) { return; }
else { if (ts != NULL) kthread_stop(ts);
}
return;
}
MODULE_LICENSE("GPL");
What I want to know is :
a) How to make thread execute 10 times like a loop
b) How does the control flows in these kind of processes that is if we make it to execute 10 times then does it go back and forth between function and cleanup_module or init_module or what exactly happens?

If you control kthread with kthread_stop, the kthread shouldn't exit until be ing stopped (see also that answer). So, after executing all operations, kthread should wait until stopped.
Kernel already implements kthread_worker mechanism, when kthread just executes works, added to it.
DEFINE_KTHREAD_WORKER(worker);
struct my_work
{
struct kthread_work *work; // 'Base' class
int n;
};
void do_work(struct kthread_work *work)
{
struct my_work* w = container_of(work, struct my_work, work);
printk(KERN_EMERG "Ding %d", w->n);
// And free work struct at the end
kfree(w);
}
int init_module (void) {
int i;
for(i = 0; i < 10; i++)
{
struct my_work* w = kmalloc(sizeof(struct my_work), GFP_KERNEL);
init_kthread_work(&w->work, &do_work);
w->n = i + 1;
queue_kthread_work(&worker, &w->work);
}
ts = kthread_run(&kthread_worker_fn, &worker, "spawn");
return 0;
}
void cleanup_module(void) {
kthread_stop(ts);
}

Implementing boost::barrier in C++11

I've been trying to get a project rid of every boost reference and switch to pure C++11.
At one point, thread workers are created which wait for a barrier to give the 'go' command, do the work (spread through the N threads) and synchronize when all of them finish. The basic idea is that the main loop gives the go order (boost::barrier .wait()) and waits for the result with the same function.
I had implemented in a different project a custom made Barrier based on the Boost version and everything worked perfectly. Implementation is as follows:
Barrier.h:
class Barrier {
public:
Barrier(unsigned int n);
void Wait(void);
private:
std::mutex counterMutex;
std::mutex waitMutex;
unsigned int expectedN;
unsigned int currentN;
};
Barrier.cpp
Barrier::Barrier(unsigned int n) {
expectedN = n;
currentN = expectedN;
}
void Barrier::Wait(void) {
counterMutex.lock();
// If we're the first thread, we want an extra lock at our disposal
if (currentN == expectedN) {
waitMutex.lock();
}
// Decrease thread counter
--currentN;
if (currentN == 0) {
currentN = expectedN;
waitMutex.unlock();
currentN = expectedN;
counterMutex.unlock();
} else {
counterMutex.unlock();
waitMutex.lock();
waitMutex.unlock();
}
}
This code has been used on iOS and Android's NDK without any problems, but when trying it on a Visual Studio 2013 project it seems only a thread which locked a mutex can unlock it (assertion: unlock of unowned mutex).
Is there any non-spinning (blocking, such as this one) version of barrier that I can use that works for C++11? I've only been able to find barriers which used busy-waiting which is something I would like to prevent (unless there is really no reason for it).

class Barrier {
public:
explicit Barrier(std::size_t iCount) :
mThreshold(iCount),
mCount(iCount),
mGeneration(0) {
}
void Wait() {
std::unique_lock<std::mutex> lLock{mMutex};
auto lGen = mGeneration;
if (!--mCount) {
mGeneration++;
mCount = mThreshold;
mCond.notify_all();
} else {
mCond.wait(lLock, [this, lGen] { return lGen != mGeneration; });
}
}
private:
std::mutex mMutex;
std::condition_variable mCond;
std::size_t mThreshold;
std::size_t mCount;
std::size_t mGeneration;
};

Use a std::condition_variable instead of a std::mutex to block all threads until the last one reaches the barrier.
class Barrier
{
private:
std::mutex _mutex;
std::condition_variable _cv;
std::size_t _count;
public:
explicit Barrier(std::size_t count) : _count(count) { }
void Wait()
{
std::unique_lock<std::mutex> lock(_mutex);
if (--_count == 0) {
_cv.notify_all();
} else {
_cv.wait(lock, [this] { return _count == 0; });
}
}
};

Here's my version of the accepted answer above with Auto reset behavior for repetitive use; this was achieved by counting up and down alternately.
/**
* #brief Represents a CPU thread barrier
* #note The barrier automatically resets after all threads are synced
*/
class Barrier
{
private:
std::mutex m_mutex;
std::condition_variable m_cv;
size_t m_count;
const size_t m_initial;
enum State : unsigned char {
Up, Down
};
State m_state;
public:
explicit Barrier(std::size_t count) : m_count{ count }, m_initial{ count }, m_state{ State::Down } { }
/// Blocks until all N threads reach here
void Sync()
{
std::unique_lock<std::mutex> lock{ m_mutex };
if (m_state == State::Down)
{
// Counting down the number of syncing threads
if (--m_count == 0) {
m_state = State::Up;
m_cv.notify_all();
}
else {
m_cv.wait(lock, [this] { return m_state == State::Up; });
}
}
else // (m_state == State::Up)
{
// Counting back up for Auto reset
if (++m_count == m_initial) {
m_state = State::Down;
m_cv.notify_all();
}
else {
m_cv.wait(lock, [this] { return m_state == State::Down; });
}
}
}
};

Seem all above answers don't work in the case the barrier is placed too near
Example: Each thread run the while loop look like this:
while (true)
{
threadBarrier->Synch();
// do heavy computation
threadBarrier->Synch();
// small external calculations like timing, loop count, etc, ...
}
And here is the solution using STL:
class ThreadBarrier
{
public:
int m_threadCount = 0;
int m_currentThreadCount = 0;
std::mutex m_mutex;
std::condition_variable m_cv;
public:
inline ThreadBarrier(int threadCount)
{
m_threadCount = threadCount;
};
public:
inline void Synch()
{
bool wait = false;
m_mutex.lock();
m_currentThreadCount = (m_currentThreadCount + 1) % m_threadCount;
wait = (m_currentThreadCount != 0);
m_mutex.unlock();
if (wait)
{
std::unique_lock<std::mutex> lk(m_mutex);
m_cv.wait(lk);
}
else
{
m_cv.notify_all();
}
};
};
And the solution for Windows:
class ThreadBarrier
{
public:
SYNCHRONIZATION_BARRIER m_barrier;
public:
inline ThreadBarrier(int threadCount)
{
InitializeSynchronizationBarrier(
&m_barrier,
threadCount,
8000);
};
public:
inline void Synch()
{
EnterSynchronizationBarrier(
&m_barrier,
0);
};
};

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to parallelize for loop in OpenMP? - multithreading

Your loop counter should be private. I think you should try #pragma omp parallel private(i)

Related

In openMP is there a way to for tasks to share variables?

Linux Thread priority , behaviour is abnormal

Implementation of MergeSort algorithm using OpenMP

Linux - Control Flow in a linux kernel module

Implementing boost::barrier in C++11

Categories

Resources