I have two groups of threads A and B. Threads in A execute function funA, and threads in B execute function funB. It is OK for threads in each group to execute their functions concurrently, but funA and funB must not be executed concurrently. How do we achieve this? Is this problem have a name, so I can read about it online?
One possible solution is the following:
std::atomic<std::int64_t> counter{};
void funA() {
std::int64_t curr = 0;
while(
(curr=counter.load(std::memory_order_relaxed)) < 0 ||
!counter.compare_exchange_weak(curr,curr+1,std::memory_order_relaxed));
// implementation of funA goes here
counter.fetch_sub(1); // better to happen using RAII
}
void funB() {
std::int64_t curr = 0;
while(
(curr=counter.load(std::memory_order_relaxed)) > 0 ||
!counter.compare_exchange_weak(curr,curr-1,std::memory_order_relaxed));
// implementation of funB goes here
counter.fetch_add(1); // better to happen using RAII
}
Is this correct? Is it the best we can do? What I don't like about it is that threads in same group compete against each other on those while loops.
Related
There are n threads. I'm trying to implement a function (pseudo code) which will directly block if it's called by a thread. Every thread will be blocked and the function will stop blocking threads if it was called by more than n/2 threads. If more than n/2 threads called the function, the function will no longer block other threads and will immediately return instead.
I did it like this but I'm not sure if I did the last part correctly where the function will immediately return if more than n/2 threads called it? :S
(Pseudocode is highly appreciated because then I have a better chance to understand it! :) )
int n = total amount of threads
sem waiter = 0
sem mutex = 1
int counter = 0
function void barrier()
int x
P(mutex)
if counter > n / 2 then
V(mutex)
for x = 0; x <= n / 2; x++;
V(waiter)
end for
end if
else
counter++
V(mutex)
P(waiter)
end else
end function
What you describe is a non-resetting barrier. Pthreads has a barrier implementation, but it is of the resetting variety.
To implement what you're after with pthreads, you will want a mutex plus a condition variable, and a shared counter. A thread entering the function locks the mutex and checks the counter. If not enough other threads have yet arrived then it waits on the CV, otherwise it broadcasts to it to wake all the waiting threads. If you wish, you can make it just the thread that tips the scale that broadcasts. Example:
struct my_barrier {
pthread_mutex_t barrier_mutex;
pthread_cond_t barrier_cv;
int threads_to_await;
};
void barrier(struct my_barrier *b) {
pthread_mutex_lock(&b->barrier_mutex);
if (b->threads_to_await > 0) {
if (--b->threads_to_await == 0) {
pthread_cond_broadcast(&b->barrier_cv);
} else {
do {
pthread_cond_wait(&b->barrier_cv, &b->barrier_mutex);
} while (b->threads_to_await);
}
}
pthread_mutex_unlock(&b->barrier_mutex);
}
Update: pseudocode
Or since a pseudocode representation is important to you, here's the same thing in a pseudocode language similar to the one used in the question:
int n = total amount of threads
mutex m
condition_variable cv
int to_wait_for = n / 2
function void barrier()
lock(mutex)
if to_wait_for == 1 then
to_wait_for = 0
broadcast(cv)
else if to_wait_for > 1 then
to_wait_for = to_wait_for - 1
wait(cv)
end if
unlock(mutex)
end function
That's slightly higher-level than your pseudocode, in that it does not assume that the mutex is implemented as a semaphore. (And with pthreads, which you tagged, you would need a pthreads mutex, not a semaphore, to go with a pthreads condition variable). It also omits the details of the real C code that deal with spurrious wakeup from waiting on the condition variable and with initializing the mutex and cv. Also, it presents the variables as if they are all globals -- such a function can be implemented that way in practice, but it is poor form.
Note also that it assumes that pthreads semantics for the condition variable: that waiting on the cv will temporarily release the mutex, allowing other threads to lock it, but that a thread that waits on the cv will reacquire the mutex before itself proceeding past the wait.
A few assumptions I am making within my answer:
P(...) is analogous to sem_wait(...)
V(...) is analogous to sem_post(...)
the barrier cannot be reset
I'm not sure if I did the last part correctly where the function will immediately return if more than n/2 threads called it
The pseudocode should work fine for the most part, but the early return/exit conditions could be significantly improved upon.
Some concerns (but nothing major):
The first time the condition counter > n / 2 is met, the waiter semaphore is signaled (i.e. V(...)) (n / 2) + 1 times (since it is from 0 to n / 2 inclusive), instead of n / 2 (which is also the value of counter at that moment).
Every subsequent invocation after counter > n / 2 is first met will also signal (i.e. V(...)) the waiter semaphore another (n / 2) + 1 times. Instead, it should early return and not re-signal.
These can be resolved with a few minor tweaks.
int n = total count of threads
sem mutex = 1;
sem waiter = 0;
int counter = 0;
bool released = FALSE;
function void barrier() {
P(mutex)
// instead of the `released` flag, could be replaced with the condition `counter > n / 2 + 1`
if released then
// ensure the mutex is released prior to returning
V(mutex)
return
end if
if counter > n / 2 then
// more than n/2 threads have tried to wait, mark barrier as released
released = TRUE
// mutex can be released at this point, as any thread acquiring `mutex` after will see that `release` is TRUE and early return
V(mutex)
// release all blocked threads; counter is guaranteed to never be incremeneted again
int x
for x = 0; x < counter; x++
V(waiter)
end for
else
counter++
V(mutex)
P(waiter)
end else
}
I'm new to atomic techniques and try to implement a safe thread version for the follow code:
// say m_cnt is unsigned
void Counter::dec_counter()
{
if(0==m_cnt)
return;
--m_cnt;
if(0 == m_cnt)
{
// Do seomthing
}
}
Every thread that calls dec_counter must decrement it by one and "Do something" should be done only one time - at when the counter is decremented to 0.
After fighting with it, I did the follow code that does it well (I think), but I wonder if this is the way to do it, or is there a better way. Thanks.
// m_cnt is std::atomic<unsigned>
void Counter::dec_counter()
{
// loop until decrement done
unsigned uiExpectedValue;
unsigned uiNewValue;
do
{
uiExpectedValue = m_cnt.load();
// if other thread already decremented it to 0, then do nothing.
if (0 == uiExpectedValue)
return;
uiNewValue = uiExpectedValue - 1;
// at the short time from doing
// uiExpectedValue = m_cnt.load();
// it is possible that another thread had decremented m_cnt, and it won't be equal here to uiExpectedValue,
// thus the loop, to be sure we do a decrement
} while (!m_cnt.compare_exchange_weak(uiExpectedValue, uiNewValue));
// if we are here, that means we did decrement . so if it was to 0, then do something
if (0 == uiNewValue)
{
// do something
}
}
The thing with atomic is that only that one statement is atomic.
If you write
std::atomic<int> i {20}
...
if (!--i)
...
Then just 1 thread will enter the if.
However, if you split up the change and the test, then other threads can get into the gap, and you may get strange results:
std::atomic<int> i {20}
...
--i;
// other thread(s) can modify i just here
if (!i)
...
Of course you can split the condition test for the decrement by using a local variable:
std::atomic<int> i {20}
...
int j=--i;
// other thread(s) can modify i just here
if (!j)
...
All the simple math operations are generally efficiently supported for small atomics in c++
For more complex types and expressions, you need to use the read/modify/write member methods.
These allow you to read the current value, calculate the new value, and then call compare_exchange_strong or compare_exchange_weak say "if the value has not changed, then store my new value, otherwise give me the new current value" a a single atomic operation. You can stick this in a loop and keep recalculating the new value until you are lucky enough that your thread is the only writer. If there are not too many threads trying too often to change the value this is reasonably efficient as well.
I am working on a shared counter increment application using combining tree concept. My goal is to make this application work on 2^n number of cores such as 4, 8, 16, 32, etc. This algorithm might err on any thread failure. The assumption is that there would be no thread failure or very slow threads.
Two threads compete at leaf nodes and the latter one arriving goes up the tree.
The first one that arrives waits until the second one goes up the hierarchy and comes down with the correct return value.
The second thread wakes the first thread up
Each thread gets the correct fetchAndAdd value
But this algorithm sometimes gets locked inside while (nodes[index].isActive == 1) or while(nodes[index].waiting == 1) loop. I don't see any possibility of a deadlock because only two threads are competing at each node. Could you guys enlighten me on this problem??
int increment(int threadId, int index, int value) {
int lastValue = __sync_fetch_and_add(&nodes[index].firstValue, value);
if (index == 0) return lastValue;
while (nodes[index].isActive == 1) {
}
if (lastValue == 0) {
while(nodes[index].waiting == 1) {
}
nodes[index].waiting = 1;
nodes[lindex].isActive = 0;
} else {
nodes[index].isActive = 1;
nodes[index].result = increment(threadId, (index - 1)/2, nodes[index].firstValue);
nodes[index].firstValue = 0;
nodes[index].waiting = 0;
}
return nodes[index].result + lastValue;
}
I don't think that will work on 1 core. You infinitely loop on isActive because you can't set isActive to 0 unless it is 0.
I'm not sure if you're code has a mechanism to stop this but, Here's my best crack at it Here are the threads that run and cause problems:
ex)
thread1 thread 2
nodes[10].isActive = 1
//next run on index 10
while (nodes[index].isActive == 1) {//here is the deadlock}
It's hard to understand exactly what's going on here/ what you're trying to do but I would recommend that somehow you need to be able to deactivate nodes[index].isActive. You may want to set it to 0 at the end of the function
I have a function that boils down to:
while(doWork)
{
config = generateConfigurationForTesting();
result = executeWork(config);
doWork = isDone(result);
}
How can I rewrite this for efficient asynchronous execution, assuming all functions are thread safe, independent of previous iterations, and probably require more iterations than the maximum number of allowable threads ?
The problem here is we don't know how many iterations are required in advance so we can't make a dispatch_group or use dispatch_apply.
This is my first attempt, but it looks a bit ugly to me because of arbitrarily chosen values and sleeping;
int thread_count = 0;
bool doWork = true;
int max_threads = 20; // arbitrarily chosen number
dispatch_queue_t queue =
dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
while(doWork)
{
if(thread_count < max_threads)
{
dispatch_async(queue, ^{ Config myconfig = generateConfigurationForTesting();
Result myresult = executeWork();
dispatch_async(queue, checkResult(myresult)); });
thread_count++;
}
else
usleep(100); // don't consume too much CPU
}
void checkResult(Result value)
{
if(value == good) doWork = false;
thread_count--;
}
Based on your description, it looks like generateConfigurationForTesting is some kind of randomization technique or otherwise a generator which can make a near-infinite number of configuration (hence your comment that you don't know ahead of time how many iterations you will need). With that as an assumption, you are basically stuck with the model that you've created, since your executor needs to be limited by some reasonable assumptions about the queue and you don't want to over-generate, as that would just extend the length of the run after you have succeeded in finding value ==good measurements.
I would suggest you consider using a queue (or OSAtomicIncrement* and OSAtomicDecrement*) to protect access to thread_count and doWork. As it stands, the thread_count increment and decrement will happen in two different queues (main_queue for the main thread and the default queue for the background task) and thus could simultaneously increment and decrement the thread count. This could lead to an undercount (which would cause more threads to be created than you expect) or an overcount (which would cause you to never complete your task).
Another option to making this look a little nicer would be to have checkResult add new elements into the queue if value!=good. This way, you load up the initial elements of the queue using dispatch_apply( 20, queue, ^{ ... }) and you don't need the thread_count at all. The first 20 will be added using dispatch_apply (or an amount that dispatch_apply feels is appropriate for your configuration) and then each time checkResult is called you can either set doWork=false or add another operation to queue.
dispatch_apply() works for this, just pass ncpu as the number of iterations (apply never uses more than ncpu worker threads) and keep each instance of your worker block running for as long as there is more work to do (i.e. loop back to generateConfigurationForTesting() unless !doWork).
I am working on multithread programming and I am stuck on something.
In my program there are two tasks and two types of robots for carrying out the tasks:
Task 1 requires any two types of robot and
task 2 requires 2 robot1 type and 2 robot2 type.
Total number of robot1 and robot2 and pointers to these two types are given for initialization. Threads share these robots and robots are reserved until a thread is done with them.
Actual task is done in doTask1(robot **) function which takes pointer to a robot pointer as parameter so I need to pass the robots that I reserved. I want to provide concurrency. Obviously if I lock everything it will not be concurrent. robot1 is type of Robot **. Since It is used by all threads before one thread calls doTask or finish it other can overwrite robot1 so it changes things. I know it is because robot1 is shared by all threads. Could you explain how can I solve this problem? I don't want to pass any arguments to thread start routine.
rsc is my struct to hold number of robots and pointers that are given in an initialization function.
void *task1(void *arg)
{
int tid;
tid = *((int *) arg);
cout << "TASK 1 with thread id " << tid << endl;
pthread_mutex_lock (&mutexUpdateRob);
while (rsc->totalResources < 2)
{
pthread_cond_wait(&noResource, &mutexUpdateRob);
}
if (rsc->numOfRobotA > 0 && rsc->numOfRobotB > 0)
{
rsc->numOfRobotA --;
rsc->numOfRobotB--;
robot1[0] = &rsc->robotA[counterA];
robot1[1] = &rsc->robotB[counterB];
counterA ++;
counterB ++;
flag1 = true;
rsc->totalResources -= 2;
}
pthread_mutex_unlock (&mutexUpdateRob);
doTask1(robot1);
pthread_mutex_lock (&mutexUpdateRob);
if(flag1)
{
rsc->numOfRobotA ++;
rsc->numOfRobotB++;
rsc->totalResources += 2;
}
if (totalResource >= 2)
{
pthread_cond_signal(&noResource);
}
pthread_mutex_unlock (&mutexUpdateRob);
pthread_exit(NULL);
}
If robots are global resources, threads should not dispose of them. It should be the duty of the main thread exit (or cleanup) function.
Also, there sould be a way for threads to locate unambiguously the robots, and to lock their use.
The robot1 array seems to store the robots, and it seems to be a global array. However:
its access is not protected by a mutex (pthread_mutex_t), it seems now that you've taken care of that.
Also, the code in task1 is always modifying entries 0 and 1 of this array. If two threads or more execute that code, the entries will be overwritten. I don't think that it is what you want. How will that array be used afterwards?
In fact, why does this array need to be global?
The bottom line is this: as long as this array is shared by threads, they will have problems working concurrently. Think about it this way:
You have two companies using robots to work, but they're using the same truck (robot1) to move the robots around. How are these two companies supposed to function properly, and efficiently with only one truck?