Using fetch-and-add as lock - multithreading

I am trying to understand how fetch-and-add can be used as a lock. Here is what the book (OS's: 3 Easy pieces) says:
The basic operation is pretty simple: when
a thread wishes to acquire a lock, it first does an atomic fetch-and-add
on the ticket value; that value is now considered this thread’s “turn”
(myturn). The globally shared lock->turn is then used to determine
which thread’s turn it is; when (myturn == turn) for a given thread,
it is that thread’s turn to enter the critical section.
What I do not understand is how the thread checks if the lock held by another process before entering the cretical seection. All I can read that the value will be incremented, no mention of checks!
Another part says:
Unlock is accomplished
simply by incrementing the turn such that the next waiting thread (if
there is one) can now enter the critical section.
Which I can not interpret in a way where checks will not be performed, which can not be true becuase it compremises the whole porpose of locking cretical sections. What am I fmissing here? Thanks.

What I do not understand is how the thread checks if the lock held by another process before entering the cretical seection.
You need an "atomic fetch" for this, maybe something like "while( atomic_fetch(currently_serving) != my_ticket) { /* wait */ }".
If you have "atomic fetch and add", then you can implement "atomic fetch" by doing "atomic fetch and add the value zero", maybe something like "while( atomic_fetch_and_add(currently_serving, 0) != my_ticket) { /* wait */ }".
For reference; the full sequence could be something like:
my_ticket = atomic_fetch_and_add(ticket_counter, 1);
while( atomic_fetch_and_add(currently_serving, 0) != my_ticket) {
/* wait */
}
/* Critical section (lock successfully acquired). */
atomic_fetch_and_add(currently_serving, 1); /* Release the lock */
Of course you might have a better atomic fetch you can use instead (e.g. for some CPUs any normal aligned load is atomic).

Related

Where can PTHRED_MUTEX_ADAPTIVE_NP be specified and how does it work?

I found that there's a macro called PTHRED_MUTEX_ADAPTIVE_NP which is somehow given as a value to a mutex so that the mutex does an adaptive spinning, meaning that it spins in the magnitude of an immediate wakeup through the kernel would last. But how do I utilize this configuration-macro to a thread ?
And as I've developed an improved shared readers-writer lock (it needs only one atomic operation at best in contrast to the three operations given in the Wikipedia-solution) with relative writer-priority (further readers are stalled when there's a writer and the readers before are allowed to proceed) which could also make use of adaptive spinning: how is the number of spinning-cycles calculated ?
I found that there's a macro called PTHRED_MUTEX_ADAPTIVE_NP
Some pthreads implementations provide a macro PTHREAD_MUTEX_ADAPTIVE_NP (note spelling) that is one of the possible values of the kind_np mutex attribute, but neither that attribute nor the macro are standard. It looks like at least BSD and AIX have them, or at least did at one time, but this is not something you should be using in new code.
But how do I utilize this configuration-macro to a thread ?
You don't. Even if you are using a pthreads implementation that supports it, this is the value of a mutex attribute, not a thread attribute. You obtain a mutex with that attribute value by explicitly requesting it when you initialize the mutex. It would look something like this:
pthread_mutexattr_t attr;
pthread_mutex_t mutex;
int rval;
// Return-value checks omitted for brevity and clarity
rval = pthread_mutexattr_init(&attr);
rval = pthread_mutexattr_setkind_np(&attr, PTHREAD_MUTEX_ADAPTIVE_NP);
rval = pthread_mutex_init(&mutex, &attr);
There are other mutex attributes that you can set in analogous ways, which is one of the reasons I wrote this answer. Although you should not be using the kind_np attribute, you can follow this general model for other mutex attributes. There are also thread attributes, which work similarly.
I found the code in the glibc:
That's the "adaptive" mutex locking code of pthread_mutex_lock
in the glibc 2.31:
else if (__builtin_expect (PTHREAD_MUTEX_TYPE (mutex)
== PTHREAD_MUTEX_ADAPTIVE_NP, 1))
{
if (! __is_smp)
goto simple;
if (LLL_MUTEX_TRYLOCK (mutex) != 0)
{
int cnt = 0;
int max_cnt = MIN (max_adaptive_count (),
mutex->__data.__spins * 2 + 10);
do
{
if (cnt++ >= max_cnt)
{
LLL_MUTEX_LOCK (mutex);
break;
}
atomic_spin_nop ();
}
while (LLL_MUTEX_TRYLOCK (mutex) != 0);
mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8;
}
assert (mutex->__data.__owner == 0);
}
So the spin count is doubled up to a maximum plus 10 first (system configurable or 1000 if thre's no configuration) and after the locking the difference between the actual spins and the predefined spins divided by 8 is added to the next spin-count.

Java 8 Stamped Lock: Why this piece of code doesnt result into a deadlock?

In my attempt to understand Optimistic locking in Java 8,I came across the below piece of code.
Original Blog Here.
As explained in the blog, this piece of code is attempting to convert a read lock into a write lock.The code requests an explicit write lock if conversion of read lock to write lock failed.
It puzzles me How can it be expected that the explicit write lock get granted when the parent thread is already holding a read lock? It doesn't look like the read lock is getting released at any point before a write lock is forcefully requested. To my flawed understanding, the thread would wait infinitely for write lock as the read lock is never released creating a deadlock.
Why this doesn't result in a deadlock here?
ExecutorService executor = Executors.newFixedThreadPool(2);
StampedLock lock = new StampedLock();
executor.submit(() -> {
long stamp = lock.readLock();
try {
if (count == 0) {
stamp = lock.tryConvertToWriteLock(stamp);
if (stamp == 0L) {
System.out.println("Could not convert to write lock");
stamp = lock.writeLock();
}
count = 23;
}
System.out.println(count);
} finally {
lock.unlock(stamp);
}
});
stop(executor);
Thanks in advance!
Does this help? From the API for tryConvertToWriteLock
If the lock state matches the given stamp, atomically performs one of the following actions. If the stamp represents holding a write lock, returns it. Or, if a read lock, if the write lock is available, releases the read lock and returns a write stamp. Or, if an optimistic read, returns a write stamp only if immediately available. This method returns zero in all other cases.

using atomic c++11 to implement a thread safe down counter to zero

I'm new to atomic techniques and try to implement a safe thread version for the follow code:
// say m_cnt is unsigned
void Counter::dec_counter()
{
if(0==m_cnt)
return;
--m_cnt;
if(0 == m_cnt)
{
// Do seomthing
}
}
Every thread that calls dec_counter must decrement it by one and "Do something" should be done only one time - at when the counter is decremented to 0.
After fighting with it, I did the follow code that does it well (I think), but I wonder if this is the way to do it, or is there a better way. Thanks.
// m_cnt is std::atomic<unsigned>
void Counter::dec_counter()
{
// loop until decrement done
unsigned uiExpectedValue;
unsigned uiNewValue;
do
{
uiExpectedValue = m_cnt.load();
// if other thread already decremented it to 0, then do nothing.
if (0 == uiExpectedValue)
return;
uiNewValue = uiExpectedValue - 1;
// at the short time from doing
// uiExpectedValue = m_cnt.load();
// it is possible that another thread had decremented m_cnt, and it won't be equal here to uiExpectedValue,
// thus the loop, to be sure we do a decrement
} while (!m_cnt.compare_exchange_weak(uiExpectedValue, uiNewValue));
// if we are here, that means we did decrement . so if it was to 0, then do something
if (0 == uiNewValue)
{
// do something
}
}
The thing with atomic is that only that one statement is atomic.
If you write
std::atomic<int> i {20}
...
if (!--i)
...
Then just 1 thread will enter the if.
However, if you split up the change and the test, then other threads can get into the gap, and you may get strange results:
std::atomic<int> i {20}
...
--i;
// other thread(s) can modify i just here
if (!i)
...
Of course you can split the condition test for the decrement by using a local variable:
std::atomic<int> i {20}
...
int j=--i;
// other thread(s) can modify i just here
if (!j)
...
All the simple math operations are generally efficiently supported for small atomics in c++
For more complex types and expressions, you need to use the read/modify/write member methods.
These allow you to read the current value, calculate the new value, and then call compare_exchange_strong or compare_exchange_weak say "if the value has not changed, then store my new value, otherwise give me the new current value" a a single atomic operation. You can stick this in a loop and keep recalculating the new value until you are lucky enough that your thread is the only writer. If there are not too many threads trying too often to change the value this is reasonably efficient as well.

How to create a fair multithreading double barrier?

I have a double barrier multi-thread program working, but I don't know how to create a fair mechanism (using POSIX mutex, conditional variable barrier functions) -
meaning: groups of threads will enter the first barrier by arrival time to barrier.
Pseodo code for the code I have till now (summarized, original code has more validations. Hope it's clear enough) -
mutex_lock;
++_barrier->m_predicate;
/* block all threads ( except last at thread) -
pending in barrier rendezvous point */
if(_barrier->m_predicate != _barrier->m_barrierSize)
{
pthread_cond_wait(&_barrier->m_cond, &_barrier->m_mutex);
}
else
{
/* *Unblock all threads (by scheduling policy order)
that are currently blocked by cond parameter in Barrier
**Reset: Predicate value is "0" --> new batch of threads
enter 1st barrier */
pthread_cond_broadcast (&_barrier->m_cond);
ResetBarrier (_barrier);
}
/* end of critical code block */
pthread_mutex_unlock(&_barrier->m_mutex);

Best equivalent for EnterCriticalSection on Mac OS X?

What's the best equivalent? I didn't find any reasonable solution for such a simple function. Choices I'm aware of:
1) MPEnterCriticalRegion - this is unfortunately extremely ineffective, probably because despite it's name it enters kernel mode, so for repeating locks it just takes way too much time...
2) OSSpinLockLock - unusable, because apparently not recursive. If it would be recursive, it would be the correct equivalent.
3) pthread_mutex_lock - didn't try, but I don't expect much, because it will probably be just emulated using the critical region or another system resource.
Assuming you have a correctly working non-recursive lock, it's rather easy to get an efficient recursive lock (no idea about Mac APIs, so this is pseudo code):
class RecursiveLock {
public:
void acquire() {
auto tid = get_thread_id();
if (owner == tid) {
lockCnt++;
} else {
AcquireLock(lock);
owner = tid;
lockCnt = 1;
}
}
void release() {
assert(owner == get_thread_id());
lockCnt--;
if (lockCnt == 0) {
owner = 0; // some illegal value for thread id
ReleaseLock(lock);
}
}
private:
int lockCnt;
std::atomic<void*> owner;
void *lock; // use whatever lock you like here
};
It's simple to reason:
if tid == owner it's guaranteed that we have already acquired the lock.
if tid != owner either someone else holds the lock or it's free, in both cases we try to acquire the lock and block until we get it. After we acquire the lock we set the owner to tid. So there's some time where we have acquired the lock but owner is still illegal. But no problem since the illegal tid won't compare equal to any real thread's tid either, so they'll go into the else branch as well and have to wait to get it.
Notice the std::atomic part - we do need ordering guarantees for the owner field to make this legal. If you don't have c++11 use some compiler intrinsic for that.

Resources