Difference between completion variables and semaphores - linux

In the linux kernel, semaphores are used to provide mutual exclusion for critical sections of data and Completion variables are used to synchronize between 2 threads waiting on an event. Why not use semaphores for such a synchronization ? Is there any advantage of using a completion variable over a semaphore ?

Explanation of why completions were originally implemented:
http://lkml.indiana.edu/hypermail/linux/kernel/0107.3/0674.html
The basic summary is that we had this
(fairly common) way of waiting for
certain events by having a locked
semaphore on the stack of the waiter,
and then having the waiter do a
"down()" which caused it to block
until the thing it was waiting for did
an "up()".
This works fairly well, but it has a
really small (and quite unlikely) race
on SMP, that is not so much a race of
the idea itself, as of the
implementation of the semaphores. We
could have fixed the semaphores, but
there were a few reasons not to:
the semaphores are optimized (on purpose) for the non-contention case.
The "wait for completion" usage has
the opposite default case
the semaphores are quite involved and architecture-specific, exactly
due to this optimization. Trying to
change them is painful as hell.
So instead, I introduced the notion of
"wait for completion":
More recent thread about completions vs semaphores
http://lkml.org/lkml/2008/4/11/323

There are two reasons you might want to use a completion instead of a semaphore. First, multiple threads can wait for a completion, and they can all be released with one call to complete_all(). It's more complex to have a semaphore wake up an unknown number of threads.
Second, if the waiting thread is going to deallocate the synchronization object, there is a race condition if you're using semaphores. That is, the waiter might get woken up and deallocate the object before the waking thread is done with up(). This race doesn't exist for completions. (See Lasse's post.)

Related

Terminology question: mutex lock, spin lock, sleepable lock

All over StackOverflow and the net I see folks to distinguish mutexes and spinlocks as like mutex is a mutual exclusion lock providing acquire() and release() functions and if the lock is taken, then acquire() will allow a process to be preempted.
Nevertheless, A. Silberschatz in his Operating System Concepts says in the section 6.5:
... The simplest of these tools is the mutex lock. (In fact, the term mutex is short for mutual exclusion.) We use the mutex lock to protect critical sections and thus prevent race conditions. That is, a process must acquire the lock before entering a critical section; it releases the lock when it exits the critical section. The acquire() function acquires the lock, and the release() function releases the lock.
and then he describes a spinlock, though adding a bit later
The type of mutex lock we have been describing is also called a spinlock because the process “spins” while waiting for the lock to become available.
so as spinlock is just a type of mutex lock as opposed to sleepable locks allowing a process to be preempted. That is, spinlocks and sleepable locks are all mutexes: locks by means of acquire() and release() functions.
I see totally logical to define mutex locks in the way Silberschatz did (though a bit implicitly).
What approach would you agree with?
The type of mutex lock we have been describing is also called a spinlock because the process “spins” while waiting for the lock to become available.
Maybe you're misreading the book (that is, "The type of mutex lock we have been describing" might not refer to the exact passage you think it does), or the book is outdated. Modern terminology is quite clear in what a mutex is, but spinlocks get a bit muddy.
A mutex is a concurrency primitive that allows one agent at a time access to its resource, while the others have to wait in the meantime until it the exclusive access is released. How they wait is not specified and irrelevant, their process might go to sleep, get written to disk, spin in a loop, or perhaps you are using cooperative concurrency (also known as "asynchronous programming") and passing control to the event loop as your 'waiting operation'.
A spinlock does not have a clear definition. It might be used to refer to:
A synonym for mutex (this is in my opinion wrong, but it happens).
A specific mutex implementation that always waits in a busy loop.
Any sort of busy-waiting loop waiting for a resource. A semaphore, for example, might also get implemented using a 'spinlock'.
I would consider any use of the word to refer to a (part of a) specific implementation of a concurrency primitive that waits in a busy loop to be correct, if a more general term is not appropriate. That is, use mutex (or whatever primitive you desire) unless you specifically want to talk about a busy-waiting concurrency primitive.
The words that one author uses in one book or manual will not always have the same exact meaning in every book and every manual. The meanings of the words evolve over time, and it can happen fast when the words are names for new ideas.
Not every book was written at the same time. Not every author is the same age or had the same teachers. It's just something you'll have to get used to.
"Mutex" was a name for a new idea not so very long ago.
In one book, it might mean nothing more than a thing that keeps two or more threads from entering the same critical section at the same time. In another book, it might refer to a specific type of object in a certain operating system or library that is used for that same purpose.
A spinlock is a lock/mutex whose implementation relies mainly on a spinning loop.
More advanced locks/mutexes may have spinning parts in their implementation, however those often last for no more than a few microseconds or so.

Is a spinlock lock free?

I am a little bit confused about the two concepts.
definition of lock-free on wiki:
A non-blocking algorithm is lock-free if there is guaranteed
system-wide progress
definition of non-blocking:
an algorithm is called non-blocking if failure or suspension of any
thread cannot cause failure or suspension of another thread
I thought spinlock is lock-free, or at least non-blocking. But now I'm not sure. Because by definition, "spinlock is not lock-free" also makes sense to me. Like, if the thread holding the spinlock gets suspended, then it will cause suspension of other threads spinning outside. So, by definition, spinlock is not even non-blocking, let alone lock-free.
I'm so confused now. Can anyone explain it clearly?
Anything that can be called a lock (exclude other threads from a critical section until the current thread unlocks) is by definition not lock-free. And yes, spinlocks are a kind of lock.
If a thread sleeps while holding the lock, no other thread can acquire it and make forward progress, and spinlocks can't prevent this. The OS can de-schedule a thread whenever it wants, even if it's in the middle of a critical section.
Note that "lock-free" isn't the same thing as "wait-free", so a lock-free algorithm can still have stuff like cmpxchg retry loops, but as long as one thread succeeds every time, it's lock free.
A wait-free algorithm can't even have that, and at most has to wait for cache misses / hardware arbitration of contended atomic operations. Wikipedia's non-blocking algorithm article defines wait-free and lock-free in more detail.
I think you're mixing up two definitions of "blocking".
I think you're talking about a spin_trylock function that tries to acquire a spinlock, and returns with an error if it fails instead of spinning. So this is non-blocking in the same sense as non-blocking I/O: fail with an error instead of waiting for resource availability.
That doesn't mean any thread in the system is making forward progress on the thing protected by the spinlock. It just means your thread can go and do something else before trying again, instead of needing to use separate threads to do something in parallel with waiting to acquire a lock.
Spinning in an infinite loop counts as blocking / not-making-progress. For this definition, there's no difference between a pure spinlock and one that (with OS assistance) sleeps until another thread unlocks.
The definition of lock-free isn't concerned with wasting CPU time / power to make room for independent work to happen.
Somewhat related: acquiring an uncontended spinlock doesn't require a system call, which means it's a "light-weight" lock. Some lock implementations always use a (relatively slow) system call even in the uncontended case. See Jeff Preshing's Always Use a Lightweight Mutex article. Also read Jeff's other posts to learn more about lock-free programming, because they're excellent. So good in fact that the [lock-free] tag wiki links to them.

mlock() and Threading

Can mlock() be called safely from independently executing OpenMP or Posix threads, given that each thread is operating on a different region of virtual memory? Does it create a systemwide synchronization barrier or force all threads to stall in some way?
I apologize if this is a duplicate; I was surprised when google searches for "mlock openmp"/"mlock thread safety" did not turn up the answer immediately. Closest I could find was the second answer of Non-blocking mlock(), which seems to indicate that mlock() CAN be called from separate threads and does not enforce or require any synchronization barriers.
mlock() is safe to call from multiple threads at once.
As to whether it synchronises against other calls to mlock(), it's a quality-of-implementation issue - in principle any system call could synchronise against any other, there's no text in POSIX that disallows it. In practice you will often find that system calls that work on the process's memory map tend to contend with each other (so mlock() might not just contend with other mlock() calls but also mmap()). You will need to test to see if contention is a actually a problem in your use case.

Difference between Mutex, Semaphore & Spin Locks

I am doing experiments with IPC, especially with Mutex, Semaphore and Spin Lock.
What I learnt is Mutex is used for Asynchronous Locking (with sleeping (as per theories I read on NET)) Mechanism, Semaphore are Synchronous Locking (with Signaling and Sleeping) Mechanism, and Spin Locks are Synchronous but Non-sleeping Mechanism.
Can anyone help me to clarify these stuff deeply?
And another doubt is about Mutex, when I wrote program with thread & mutex, while one thread is running another thread is not in Sleep state but it continuously tries to acquire the Lock. So Mutex is sleeping or Non-sleeping???
First, remember the goal of these 'synchronizing objects' :
These objects were designed to provide an efficient and coherent use of 'shared data' between more than 1 thread among 1 process or from different processes.
These objects can be 'acquired' or 'released'.
That is it!!! End of story!!!
Now, if it helps to you, let me put my grain of sand:
1) Critical Section= User object used for allowing the execution of just one active thread from many others within one process. The other non selected threads (# acquiring this object) are put to sleep.
[No interprocess capability, very primitive object].
2) Mutex Semaphore (aka Mutex)= Kernel object used for allowing the execution of just one active thread from many others, within one process or among different processes. The other non selected threads (# acquiring this object) are put to sleep. This object supports thread ownership, thread termination notification, recursion (multiple 'acquire' calls from same thread) and 'priority inversion avoidance'.
[Interprocess capability, very safe to use, a kind of 'high level' synchronization object].
3) Counting Semaphore (aka Semaphore)= Kernel object used for allowing the execution of a group of active threads from many others, within one process or among different processes. The other non selected threads (# acquiring this object) are put to sleep.
[Interprocess capability however not very safe to use because it lacks following 'mutex' attributes: thread termination notification, recursion?, 'priority inversion avoidance'?, etc].
4) And now, talking about 'spinlocks', first some definitions:
Critical Region= A region of memory shared by 2 or more processes.
Lock= A variable whose value allows or denies the entrance to a 'critical region'. (It could be implemented as a simple 'boolean flag').
Busy waiting= Continuosly testing of a variable until some value appears.
Finally:
Spin-lock (aka Spinlock)= A lock which uses busy waiting. (The acquiring of the lock is made by xchg or similar atomic operations).
[No thread sleeping, mostly used at kernel level only. Ineffcient for User level code].
As a last comment, I am not sure but I can bet you some big bucks that the above first 3 synchronizing objects (#1, #2 and #3) make use of this simple beast (#4) as part of their implementation.
Have a good day!.
References:
-Real-Time Concepts for Embedded Systems by Qing Li with Caroline Yao (CMP Books).
-Modern Operating Systems (3rd) by Andrew Tanenbaum (Pearson Education International).
-Programming Applications for Microsoft Windows (4th) by Jeffrey Richter (Microsoft Programming Series).
Here is a great explanation of the difference between semaphores and mutexes:
http://blog.feabhas.com/2009/09/mutex-vs-semaphores-–-part-1-semaphores/
The short answer has to do with ownership at least with binary semaphores but I suggest you read the entire article.
Mutex is the locking mechanism while the semaphore is the wait and signal mechanism.
Both have different applications.
There is a very good explanation given by the IISC professor.
Link for video

linux thread synchronization

I am new to linux and linux threads. I have spent some time googling to try to understand the differences between all the functions available for thread synchronization. I still have some questions.
I have found all of these different types of synchronizations, each with a number of functions for locking, unlocking, testing the lock, etc.
gcc atomic operations
futexes
mutexes
spinlocks
seqlocks
rculocks
conditions
semaphores
My current (but probably flawed) understanding is this:
semaphores are process wide, involve the filesystem (virtually I assume), and are probably the slowest.
Futexes might be the base locking mechanism used by mutexes, spinlocks, seqlocks, and rculocks. Futexes might be faster than the locking mechanisms that are based on them.
Spinlocks dont block and thus avoid context swtiches. However they avoid the context switch at the expense of consuming all the cycles on a CPU until the lock is released (spinning). They should only should be used on multi processor systems for obvious reasons. Never sleep in a spinlock.
The seq lock just tells you when you finished your work if a writer changed the data the work was based on. You have to go back and repeat the work in this case.
Atomic operations are the fastest synch call, and probably are used in all the above locking mechanisms. You do not want to use atomic operations on all the fields in your shared data. You want to use a lock (mutex, futex, spin, seq, rcu) or a single atomic opertation on a lock flag when you are accessing multiple data fields.
My questions go like this:
Am I right so far with my assumptions?
Does anyone know the cpu cycle cost of the various options? I am adding parallelism to the app so we can get better wall time response at the expense of running fewer app instances per box. Performances is the utmost consideration. I don't want to consume cpu with context switching, spinning, or lots of extra cpu cycles to read and write shared memory. I am absolutely concerned with number of cpu cycles consumed.
Which (if any) of the locks prevent interruption of a thread by the scheduler or interrupt...or am I just an idiot and all synchonization mechanisms do this. What kinds of interruption are prevented? Can I block all threads or threads just on the locking thread's CPU? This question stems from my fear of interrupting a thread holding a lock for a very commonly used function. I expect that the scheduler might schedule any number of other workers who will likely run into this function and then block because it was locked. A lot of context switching would be wasted until the thread with the lock gets rescheduled and finishes. I can re-write this function to minimize lock time, but still it is so commonly called I would like to use a lock that prevents interruption...across all processors.
I am writing user code...so I get software interrupts, not hardware ones...right? I should stay away from any functions (spin/seq locks) that have the word "irq" in them.
Which locks are for writing kernel or driver code and which are meant for user mode?
Does anyone think using an atomic operation to have multiple threads move through a linked list is nuts? I am thinking to atomicly change the current item pointer to the next item in the list. If the attempt works, then the thread can safely use the data the current item pointed to before it was moved. Other threads would now be moved along the list.
futexes? Any reason to use them instead of mutexes?
Is there a better way than using a condition to sleep a thread when there is no work?
When using gcc atomic ops, specifically the test_and_set, can I get a performance increase by doing a non atomic test first and then using test_and_set to confirm? I know this will be case specific, so here is the case. There is a large collection of work items, say thousands. Each work item has a flag that is initialized to 0. When a thread has exclusive access to the work item, the flag will be one. There will be lots of worker threads. Any time a thread is looking for work, they can non atomicly test for 1. If they read a 1, we know for certain that the work is unavailable. If they read a zero, they need to perform the atomic test_and_set to confirm. So if the atomic test_and_set is 500 cpu cycles because it is disabling pipelining, causes cpu's to communicate and L2 caches to flush/fill .... and a simple test is 1 cycle .... then as long as I had a better ratio of 500 to 1 when it came to stumbling upon already completed work items....this would be a win.
I hope to use mutexes or spinlocks to sparilngly protect sections of code that I want only one thread on the SYSTEM (not jsut the CPU) to access at a time. I hope to sparingly use gcc atomic ops to select work and minimize use of mutexes and spinlocks. For instance: a flag in a work item can be checked to see if a thread has worked it (0=no, 1=yes or in progress). A simple test_and_set tells the thread if it has work or needs to move on. I hope to use conditions to wake up threads when there is work.
Thanks!
Application code should probably use posix thread functions. I assume you have man pages so type
man pthread_mutex_init
man pthread_rwlock_init
man pthread_spin_init
Read up on them and the functions that operate on them to figure out what you need.
If you're doing kernel mode programming then it's a different story. You'll need to have a feel for what you are doing, how long it takes, and what context it gets called in to have any idea what you need to use.
Thanks to all who answered. We resorted to using gcc atomic operations to synchronize all of our threads. The atomic ops were about 2x slower than setting a value without synchronization, but magnitudes faster than locking a mutex, changeing the value, and then unlocking the mutex (this becomes super slow when you start having threads bang into the locks...) We only use pthread_create, attr, cancel, and kill. We use pthread_kill to signal threads to wake up that we put to sleep. This method is 40x faster than cond_wait. So basicly....use pthreads_mutexes if you have time to waste.
in addtion you should check the nexts books
Pthreads Programming: A POSIX
Standard for Better Multiprocessing
and
Programming with POSIX(R) Threads
regarding question # 8
Is there a better way than using a condition to sleep a thread when there is no work?
yes i think that the best aproach instead of using sleep
is using function like sem_post() and sem_wait of "semaphore.h"
regards
A note on futexes - they are more descriptively called fast userspace mutexes. With a futex, the kernel is involved only when arbitration is required, which is what provides the speed up and savings.
Implementing a futex can be extremely tricky (PDF), debugging them can lead to madness. Unless you really, really, really need the speed, its usually best to use the pthread mutex implementation.
Synchronization is never exactly easy, but trying to implement your own in userspace makes it inordinately difficult.

Resources