omp single construct with other threads non waiting - multithreading

I have some code inside a parallel region, which needs to be executed once. Lets assume, it is the function single_call(). Other threads have to wait until single_call() is finished. So far, an omp single is sufficient.
However I don't need the others threads to wait for this one at the end, and I don't want to enforce all threads executing that code.
This is the one solution I came up with:
bool executed = false;
#pragma omp parallel shared(executed)
{
// parallel code
#pragma omp critical
{
if (!executed)
{
single_call();
executed = true;
}
}
// parallel code, single_call was called once.
}
Does anyone has a better solution for this problem? Maybe there is an openmp-builtin?

Related

How to create monitoring thread?

I got a question while I'm doing for ray-tracing stuff.
I have created multiple threads to split whole image to be processed and let it process its allocated task. Threads work well as it is intended. I would like to monitor the work progress in real-time.
To resolve this problem, I have created one more thread to monitor current state.
Here is the monitoring pseudo-code:
/* Global var */
int cnt = 0; // count the number of row processed
void* render_disp(void* arg){ // thread for monitoring current render-processing
/* monitoring global variable and calculate percentage to display */
double result = 100.*cnt/(h-1);
fprintf(stderr,"\r3.2%f%% of image is processed!", result);
}
void* process(void* arg){ // multiple threads work here
// Rendering process
for(........)
pthread_mutex_lock(&lock);
cnt++;
pthread_mutex_unlock(&lock);
for(........)
}
I wrote the code for initialization of pthread and mutex in main() function.
Basically, I think this monitoring thread should display current state but this thread seems to be called only once and quit.
How do I change this code to this thread function to be called until the whole rendering is finished?

How to implement a re-usable thread barrier with std::atomic

I have N threads performing various task and these threads must be regularly synchronized with a thread barrier as illustrated below with 3 thread and 8 tasks. The || indicates the temporal barrier, all threads have to wait until the completion of 8 tasks before starting again.
Thread#1 |----task1--|---task6---|---wait-----||-taskB--| ...
Thread#2 |--task2--|---task5--|-------taskE---||----taskA--| ...
Thread#3 |-task3-|---task4--|-taskG--|--wait--||-taskC-|---taskD ...
I couldn’t find a workable solution, thought the little book of Semaphores http://greenteapress.com/semaphores/index.html was inspiring. I came up with a solution using std::atomic shown below which “seems” to be working using three std::atomic.
I am worried about my code breaking down on corner cases hence the quoted verb. So can you share advise on verification of such code? Do you have a simpler fool proof code available?
std::atomic<int> barrier1(0);
std::atomic<int> barrier2(0);
std::atomic<int> barrier3(0);
void my_thread()
{
while(1) {
// pop task from queue
...
// and execute task
switch(task.id()) {
case TaskID::Barrier:
barrier2.store(0);
barrier1++;
while (barrier1.load() != NUM_THREAD) {
std::this_thread::yield();
}
barrier3.store(0);
barrier2++;
while (barrier2.load() != NUM_THREAD) {
std::this_thread::yield();
}
barrier1.store(0);
barrier3++;
while (barrier3.load() != NUM_THREAD) {
std::this_thread::yield();
}
break;
case TaskID::Task1:
...
}
}
}
Boost offers a barrier implementation as an extension to the C++11 standard thread library. If using Boost is an option, you should look no further than that.
If you have to rely on standard library facilities, you can roll your own implementation based on std::mutex and std::condition_variable without too much of a hassle.
class Barrier {
int wait_count;
int const target_wait_count;
std::mutex mtx;
std::condition_variable cond_var;
Barrier(int threads_to_wait_for)
: wait_count(0), target_wait_count(threads_to_wait_for) {}
void wait() {
std::unique_lock<std::mutex> lk(mtx);
++wait_count;
if(wait_count != target_wait_count) {
// not all threads have arrived yet; go to sleep until they do
cond_var.wait(lk,
[this]() { return wait_count == target_wait_count; });
} else {
// we are the last thread to arrive; wake the others and go on
cond_var.notify_all();
}
// note that if you want to reuse the barrier, you will have to
// reset wait_count to 0 now before calling wait again
// if you do this, be aware that the reset must be synchronized with
// threads that are still stuck in the wait
}
};
This implementation has the advantage over your atomics-based solution that threads waiting in condition_variable::wait should get send to sleep by your operating system's scheduler, so you don't block CPU cores by having waiting threads spin on the barrier.
A few words on resetting the barrier: The simplest solution is to just have a separate reset() method and have the user ensure that reset and wait are never invoked concurrently. But in many use cases, this is not easy to achieve for the user.
For a self-resetting barrier, you have to consider races on the wait count: If the wait count is reset before the last thread returned from wait, some threads might get stuck in the barrier. A clever solution here is to not have the terminating condition depend on the wait count variable itself. Instead you introduce a second counter, that is only increased by the thread calling the notify. The other threads then observe that counter for changes to determine whether to exit the wait:
void wait() {
std::unique_lock<std::mutex> lk(mtx);
unsigned int const current_wait_cycle = m_inter_wait_count;
++wait_count;
if(wait_count != target_wait_count) {
// wait condition must not depend on wait_count
cond_var.wait(lk,
[this, current_wait_cycle]() {
return m_inter_wait_count != current_wait_cycle;
});
} else {
// increasing the second counter allows waiting threads to exit
++m_inter_wait_count;
cond_var.notify_all();
}
}
This solution is correct under the (very reasonable) assumption that all threads leave the wait before the inter_wait_count overflows.
With atomic variables, using three of them for a barrier is simply overkill that only serves to complicate the issue. You know the number of threads, so you can simply atomically increment a single counter every time a thread enters the barrier, and then spin until the counter becomes greater or equal to N. Something like this:
void barrier(int N) {
static std::atomic<unsigned int> gCounter = 0;
gCounter++;
while((int)(gCounter - N) < 0) std::this_thread::yield();
}
If you don't have more threads than CPU cores and a short expected waiting time, you might want to remove the call to std::this_thread::yield(). This call is likely to be really expensive (more than a microsecond, I'd wager, but I haven't measured it). Depending on the size of your tasks, this may be significant.
If you want to do repeated barriers, just increment the N as you go:
unsigned int lastBarrier = 0;
while(1) {
switch(task.id()) {
case TaskID::Barrier:
barrier(lastBarrier += processCount);
break;
}
}
I would like to point out that in the solution given by #ComicSansMS ,
wait_count should be reset to 0 before executing cond_var.notify_all();
This is because when the barrier is called a second time the if condition will always fail, if wait_count is not reset to 0.

How to synchronize this, nicely?

Given the following C++11 code fragment:
#include <condition_variable>
#include <mutex>
std::mutex block;
long count;
std::condition_variable cv;
void await()
{
std::unique_lock<std::mutex> lk(block);
if (count > 0)
cv.wait(lk);
}
void countDown()
{
std::lock_guard<std::mutex> lk(block);
if (count > 0)
{
count--;
if (count==0) cv.notify_all();
}
}
If it is not clear what I am trying to accomplish, I am wanting calls to await to pause the calling thread while count is greater than 0, and if it has already been reduced to zero, then it should not pause at all. Other threads may call countDown() which will wake all threads that had previously called await.
The above code seems to work in all cases that I've tried, but I have this nagging doubt about it, because it seems to me like there is a possibility for unexpected behavior if the thread calling await() just happens to get preempted immediately after its condition test has been evaluated and just before the thread is actually suspended by the cv.wait() call, and if the countDown function is getting called at this time, and the count equals 0, then it would issue a notify to the condition variable, IF it were actually already waiting on it... but the thread calling await hasn't hit the cv.wait() call yet, so when the thread calling await resumes, it stops at the cv.wait() call and waits indefinitely.
I actually haven't seen this happen yet in practice, but I would like to harden the code against the eventuality.
It is good that you are thinking about these possibilities. But in this case your code is correct and safe.
If await gets preempted immediately after its condition test has been evaluated and just before the thread is actually suspended by the cv.wait() call, and if the countDown function is getting called at this time, the latter thread will block while trying to obtain the block mutex until await actually calls cv.wait(lk).
The call to cv.wait(lk) implicitly releases the lock on block, and thus now another thread can obtain the lock on block in countDown(). And as long as a thread holds the lock on block in countDown() (even after cv.notify_all() is called), the await thread can not return from cv.wait(). The await thread implicitly blocks on trying to re-lock block during the return from cv.wait().
Update
I did make a rookie mistake while reviewing your code though <blush>.
cv.wait(lk) may return spuriously. That is, it may return even though it hasn't been notified. To guard against this you should place your wait under a while loop, instead of under an if:
void await()
{
std::unique_lock<std::mutex> lk(block);
while (count > 0)
cv.wait(lk);
}
Now if the wait returns spuriously, it re-checks the condition, and if still not satisfied, waits again.

Using worker threads to add new tasks to a taskPool in D

This a simplification and narrowing to another of my questions: Need help parallel traversing a dag in D
Say you've got some code that you want to parallelize. The problem is, some of the things you need to do have prerequisites. So you have to make sure that those prerequisites are done before you add the new task into the pool. The simple conceptual answer is to add new tasks as their prerequisites finish.
Here I have a little chunk of code that emulates that pattern. The problem is, it throws an exception because pool.finish() gets called before a new task is put on the queue by the worker thread. Is there a way to just wait 'till all threads are idle or something? Or is there another construct that would allow this pattern?
Please note: this is a simplified version of my code to illustrate the problem. I can't just use taskPool.parallel() in a foreach.
import std.stdio;
import std.parallelism;
void simpleWorker(uint depth, uint maxDepth, TaskPool pool){
writeln("Depth is: ",depth);
if (++depth < maxDepth){
pool.put( task!simpleWorker(depth,maxDepth,pool));
}
}
void main(){
auto pool = new TaskPool();
auto t = task!simpleWorker(0,5,pool);
pool.put(t);
pool.finish(true);
if (t.done()){ //rethrows the exception thrown by the thread.
writeln("Done");
}
}
I fixed it: http://dpaste.dzfl.pl/eb9e4cfc
I changed to for loop to:
void cleanNodeSimple(Node node, TaskPool pool){
node.doProcess();
foreach (cli; pool.parallel(node.clients,1)){ // using parallel to make it concurrent
if (cli.canProcess()) {
cleanNodeSimple(cli, pool);
// no explicit task creation (already handled by parallel)
}
}
}

pthread_cond_wait never unblocking - thread pools

I'm trying to implement a sort of thread pool whereby I keep threads in a FIFO and process a bunch of images. Unfortunately, for some reason my cond_wait doesn't always wake even though it's been signaled.
// Initialize the thread pool
for(i=0;i<numThreads;i++)
{
pthread_t *tmpthread = (pthread_t *) malloc(sizeof(pthread_t));
struct Node* newNode;
newNode=(struct Node *) malloc(sizeof(struct Node));
newNode->Thread = tmpthread;
newNode->Id = i;
newNode->threadParams = 0;
pthread_cond_init(&(newNode->cond),NULL);
pthread_mutex_init(&(newNode->mutx),NULL);
pthread_create( tmpthread, NULL, someprocess, (void*) newNode);
push_back(newNode, &threadPool);
}
for() //stuff here
{
//...stuff
pthread_mutex_lock(&queueMutex);
struct Node *tmpNode = pop_front(&threadPool);
pthread_mutex_unlock(&queueMutex);
if(tmpNode != 0)
{
pthread_mutex_lock(&(tmpNode->mutx));
pthread_cond_signal(&(tmpNode->cond)); // Not starting mutex sometimes?
pthread_mutex_unlock(&(tmpNode->mutx));
}
//...stuff
}
destroy_threads=1;
//loop through and signal all the threads again so they can exit.
//pthread_join here
}
void *someprocess(void* threadarg)
{
do
{
//...stuff
pthread_mutex_lock(&(threadNode->mutx));
pthread_cond_wait(&(threadNode->cond), &(threadNode->mutx));
// Doesn't always seem to resume here after signalled.
pthread_mutex_unlock(&(threadNode->mutx));
} while(!destroy_threads);
pthread_exit(NULL);
}
Am I missing something? It works about half of the time, so I would assume that I have a race somewhere, but the only thing I can think of is that I'm screwing up the mutexes? I read something about not signalling before locking or something, but I don't really understand what's going on.
Any suggestions?
Thanks!
Firstly, your example shows you locking the queueMutex around the call to pop_front, but not round push_back. Typically you would need to lock round both, unless you can guarantee that all the pushes happen-before all the pops.
Secondly, your call to pthread_cond_wait doesn't seem to have an associated predicate. Typical usage of condition variables is:
pthread_mutex_lock(&mtx);
while(!ready)
{
pthread_cond_wait(&cond,&mtx);
}
do_stuff();
pthread_mutex_unlock(&mtx);
In this example, ready is some variable that is set by another thread whilst that thread holds a lock on mtx.
If the waiting thread is not blocked in the pthread_cond_wait when pthread_cond_signal is called then the signal will be ignored. The associated ready variable allows you to handle this scenario, and also allows you to handle so-called spurious wake-ups where the call to pthread_cond_wait returns without a corresponding call to pthread_cond_signal from another thread.
I'm not sure, but I think you don't have to (you must not) lock the mutex in the thread pool before calling pthread_cond_signal(&(tmpNode->cond)); , otherwise, the thread which is woken up won't be able to lock the mutex as part of pthread_cond_wait(&(threadNode->cond), &(threadNode->mutx)); operation.

Resources