Shared counter using combining tree deadlock issue - multithreading

I am working on a shared counter increment application using combining tree concept. My goal is to make this application work on 2^n number of cores such as 4, 8, 16, 32, etc. This algorithm might err on any thread failure. The assumption is that there would be no thread failure or very slow threads.
Two threads compete at leaf nodes and the latter one arriving goes up the tree.
The first one that arrives waits until the second one goes up the hierarchy and comes down with the correct return value.
The second thread wakes the first thread up
Each thread gets the correct fetchAndAdd value
But this algorithm sometimes gets locked inside while (nodes[index].isActive == 1) or while(nodes[index].waiting == 1) loop. I don't see any possibility of a deadlock because only two threads are competing at each node. Could you guys enlighten me on this problem??
int increment(int threadId, int index, int value) {
int lastValue = __sync_fetch_and_add(&nodes[index].firstValue, value);
if (index == 0) return lastValue;
while (nodes[index].isActive == 1) {
}
if (lastValue == 0) {
while(nodes[index].waiting == 1) {
}
nodes[index].waiting = 1;
nodes[lindex].isActive = 0;
} else {
nodes[index].isActive = 1;
nodes[index].result = increment(threadId, (index - 1)/2, nodes[index].firstValue);
nodes[index].firstValue = 0;
nodes[index].waiting = 0;
}
return nodes[index].result + lastValue;
}

I don't think that will work on 1 core. You infinitely loop on isActive because you can't set isActive to 0 unless it is 0.
I'm not sure if you're code has a mechanism to stop this but, Here's my best crack at it Here are the threads that run and cause problems:
ex)
thread1 thread 2
nodes[10].isActive = 1
//next run on index 10
while (nodes[index].isActive == 1) {//here is the deadlock}
It's hard to understand exactly what's going on here/ what you're trying to do but I would recommend that somehow you need to be able to deactivate nodes[index].isActive. You may want to set it to 0 at the end of the function

Related

barrier code and waiting for all thread to reach rendezvous and then enter critical section

semaphore mutex = 1;
semaphore barrier = 0;
int count = 0;
void barrier-done() {
wait(mutex);
count++;
if (count < N ) {
post(mutex);
wait(barrier);
}
else {
post(mutex);
count = 0;
for (int i = 1; i < N; i++) {
post(barrier);
}
}
}
does anyone know the problem with this code? I'm trying to implement a code for barrier.
Assuming N is the number of threads you are expecting to wait for the barrier.
For Example N=10, then the threads 1 to 9 will have if condition true and they will wait for barrier.
The 10th Thread calling this will have that condition false because (10 !< 10).
So it will go ahead and post barrier 9 times.
I am not sure of the exact situation you want to achieve. But, this is what I understood from your code. May be you might need to tweak the if condition a bit.
I had the same issue but the problem is that you can't use minus sign in the name of function "barrier-done" after fixing this bug the code will be correct.

Create a function that will block until it was called by more than n/2 threads (pseudocode)

There are n threads. I'm trying to implement a function (pseudo code) which will directly block if it's called by a thread. Every thread will be blocked and the function will stop blocking threads if it was called by more than n/2 threads. If more than n/2 threads called the function, the function will no longer block other threads and will immediately return instead.
I did it like this but I'm not sure if I did the last part correctly where the function will immediately return if more than n/2 threads called it? :S
(Pseudocode is highly appreciated because then I have a better chance to understand it! :) )
int n = total amount of threads
sem waiter = 0
sem mutex = 1
int counter = 0
function void barrier()
int x
P(mutex)
if counter > n / 2 then
V(mutex)
for x = 0; x <= n / 2; x++;
V(waiter)
end for
end if
else
counter++
V(mutex)
P(waiter)
end else
end function
What you describe is a non-resetting barrier. Pthreads has a barrier implementation, but it is of the resetting variety.
To implement what you're after with pthreads, you will want a mutex plus a condition variable, and a shared counter. A thread entering the function locks the mutex and checks the counter. If not enough other threads have yet arrived then it waits on the CV, otherwise it broadcasts to it to wake all the waiting threads. If you wish, you can make it just the thread that tips the scale that broadcasts. Example:
struct my_barrier {
pthread_mutex_t barrier_mutex;
pthread_cond_t barrier_cv;
int threads_to_await;
};
void barrier(struct my_barrier *b) {
pthread_mutex_lock(&b->barrier_mutex);
if (b->threads_to_await > 0) {
if (--b->threads_to_await == 0) {
pthread_cond_broadcast(&b->barrier_cv);
} else {
do {
pthread_cond_wait(&b->barrier_cv, &b->barrier_mutex);
} while (b->threads_to_await);
}
}
pthread_mutex_unlock(&b->barrier_mutex);
}
Update: pseudocode
Or since a pseudocode representation is important to you, here's the same thing in a pseudocode language similar to the one used in the question:
int n = total amount of threads
mutex m
condition_variable cv
int to_wait_for = n / 2
function void barrier()
lock(mutex)
if to_wait_for == 1 then
to_wait_for = 0
broadcast(cv)
else if to_wait_for > 1 then
to_wait_for = to_wait_for - 1
wait(cv)
end if
unlock(mutex)
end function
That's slightly higher-level than your pseudocode, in that it does not assume that the mutex is implemented as a semaphore. (And with pthreads, which you tagged, you would need a pthreads mutex, not a semaphore, to go with a pthreads condition variable). It also omits the details of the real C code that deal with spurrious wakeup from waiting on the condition variable and with initializing the mutex and cv. Also, it presents the variables as if they are all globals -- such a function can be implemented that way in practice, but it is poor form.
Note also that it assumes that pthreads semantics for the condition variable: that waiting on the cv will temporarily release the mutex, allowing other threads to lock it, but that a thread that waits on the cv will reacquire the mutex before itself proceeding past the wait.
A few assumptions I am making within my answer:
P(...) is analogous to sem_wait(...)
V(...) is analogous to sem_post(...)
the barrier cannot be reset
I'm not sure if I did the last part correctly where the function will immediately return if more than n/2 threads called it
The pseudocode should work fine for the most part, but the early return/exit conditions could be significantly improved upon.
Some concerns (but nothing major):
The first time the condition counter > n / 2 is met, the waiter semaphore is signaled (i.e. V(...)) (n / 2) + 1 times (since it is from 0 to n / 2 inclusive), instead of n / 2 (which is also the value of counter at that moment).
Every subsequent invocation after counter > n / 2 is first met will also signal (i.e. V(...)) the waiter semaphore another (n / 2) + 1 times. Instead, it should early return and not re-signal.
These can be resolved with a few minor tweaks.
int n = total count of threads
sem mutex = 1;
sem waiter = 0;
int counter = 0;
bool released = FALSE;
function void barrier() {
P(mutex)
// instead of the `released` flag, could be replaced with the condition `counter > n / 2 + 1`
if released then
// ensure the mutex is released prior to returning
V(mutex)
return
end if
if counter > n / 2 then
// more than n/2 threads have tried to wait, mark barrier as released
released = TRUE
// mutex can be released at this point, as any thread acquiring `mutex` after will see that `release` is TRUE and early return
V(mutex)
// release all blocked threads; counter is guaranteed to never be incremeneted again
int x
for x = 0; x < counter; x++
V(waiter)
end for
else
counter++
V(mutex)
P(waiter)
end else
}

Atomic increment if non-negative and atomic decrement if non-positive

I have two groups of threads A and B. Threads in A execute function funA, and threads in B execute function funB. It is OK for threads in each group to execute their functions concurrently, but funA and funB must not be executed concurrently. How do we achieve this? Is this problem have a name, so I can read about it online?
One possible solution is the following:
std::atomic<std::int64_t> counter{};
void funA() {
std::int64_t curr = 0;
while(
(curr=counter.load(std::memory_order_relaxed)) < 0 ||
!counter.compare_exchange_weak(curr,curr+1,std::memory_order_relaxed));
// implementation of funA goes here
counter.fetch_sub(1); // better to happen using RAII
}
void funB() {
std::int64_t curr = 0;
while(
(curr=counter.load(std::memory_order_relaxed)) > 0 ||
!counter.compare_exchange_weak(curr,curr-1,std::memory_order_relaxed));
// implementation of funB goes here
counter.fetch_add(1); // better to happen using RAII
}
Is this correct? Is it the best we can do? What I don't like about it is that threads in same group compete against each other on those while loops.

Is it possible to block on wait on a semaphore without using the data when it's available?

The software I'm working on is a data analyzer with a sliding window. I have 2 threads, one producer and one consumer, that use a circular buffer.
The consumer must process data only if the first element in the buffer is old enough, therefore there are at least X elements in the buffer. But after the processing, only X/4 data can be deleted, because of the moving window.
My solution below works quite well, except that I have a trade-off between being fast (busy form of waiting in the check), or being efficient (sleep for some time). The problem is that the sleep time varies according to load, thread scheduling and elaboration complexity, so I can potentially slow down the performances.
Is there a way to poll a semaphore to check if there are at least X elements, blocking the thread otherwise, but acquiring only X/4 after the processing has been done? The tryAcquire option does not work because when it wakes the thread consumes all the data, and not one half.
I've thought about copyng the elements in a second buffer, but actually there are 7 circular buffers of big data, therefore I'd like to avoid data duplication, or even data moving.
//common structs
QSemaphore written;
QSemaphore free;
int writtenIndex = 0;
int readIndex = 0;
myCircularBuffer buf;
bool scan = true;
//producer
void produceData(data d)
{
while ( free.tryAcquire(1, 1000) == false && scan == true)
{
//avoid deadlock!
//once per second give up waiting and check if closing
}
if (scan == false) return;
buf.at(writtenIndex) = d;
writtenIndex = (writtenIndex+1) % bufferSize;
written.release();
}
//consumer
void consumeData()
{
while(1)
{
//here goes the problem: usleep (slow), sched_yield (B.F.O.W.) or what?
if (buf.at(writtenIndex).age - buf.at(readIndex).age < X)
{
//usleep(100); ? how much time?
//sched_yield(); ?
//tryAcquire not an option!
continue;
}
processTheData();
written.acquire(X/4);
readIndex = (readIndex + X/4) % bufferSize;
free.release(X/4);
}

Can there be a deadlock in Bakery Algorithm max() operation?

As resources stated, Bakery algorithm is supposed to be deadlock free.
But when I tried to understand the pseudocode, I came up with a line which could raise a deadlock (according to my knowledge).
Reffering to the code below,
in Lock() function, we have a line saying
label[i] = max( label[0], ..., label[n-1] ) + 1;
What if two threads come to that state at the same time and since max is not atomic, two labels will get the same value?
Then since two labels have to same value, both threads with that labels will get the permission to go for the critical section at the same time. Wouldn't that occur a deadlock?
Tried myself best to explain the problem here. Comment if it is still not clear. Thanks .
class Bakery implements Lock {
volatile boolean[] flag;
volatile Label[] label;
public Bakery (int n) {
flag = new boolean[n];
label = new Label[n];
for (int i = 0; i < n; i++) {
flag[i] = false; label[i] = 0;
}
public void lock() {
flag[i] = true;
label[i] =max(label[0], ...,label[n-1])+1;
while ( $ k flag[k] && (label[i],i) > (label[k],k);
}
}
public void unlock() {
flag[i] = false;
}
Then since two labels have to same value, both threads with that labels will get the permission to go for the critical section at the same time. Wouldn't that occur a deadlock?
To begin with, you probably mean a race, not a deadlock.
However, no, there won't be a race here. If you look, there's the condition
(label[i],i) > (label[k],k)
and while this happens, the thread effectively busy-waits.
This means that even if label[i] is the same as label[k] (as both performed the max concurrently), the thread numbered higher will defer to the thread numbered lower.
(Arguably, this is a problem with the algorithm, as it inherently prioritizes the threads.)

Resources