How does the current implementation of semaphores work? Does it use spinlocks or signals?
How does the scheduler know which one to invoke if signals are used?
Also how does it work in user space? Kernel locking recommends spinlocks but user space does not. So are the implementations different in user space and kernel space for semaphores?
Use the power of Open Source - just look at source code.
The kernel-space semaphore is defined as
struct semaphore {
raw_spinlock_t lock;
unsigned int count;
struct list_head wait_list;
};
lock is used to protect count and wait_list.
All tasks waiting on a semaphore reside in wait_list. When the semaphore is upped, one tasks is woken up.
User-space semaphores should rely on semaphore-related system calls, Kernel provides. The definition of user-space semaphores is:
/* One semaphore structure for each semaphore in the system. */
struct sem {
int semval; /* current value */
int sempid; /* pid of last operation */
spinlock_t lock; /* spinlock for fine-grained semtimedop */
struct list_head sem_pending; /* pending single-sop operations */
};
The kernel uses definition of the user-space semaphore similar to the kernel-space one. sem_pending is a list of waiting process plus some additional info.
I should highlight again that neither kernel-space semaphore, nor user-space one uses spinlock to wait on lock. Spinlock is included in both structures only to protect structure members from the concurrent access. After the structure is modified, spinlock is released and the task rests in list until woken.
Furthermore, spinlocks are unsuitable to wait on some event from another thread. Before acquiring a spinlock, kernel disables preemption. So, in this case, on uniprocessor machines, spinlock will never be released.
I should also notice that user-space semaphores, while serving on behalf of user-space, are executing in kernel-space.
P.S. Source code for the kernel-space semaphore resides in include/linux/semaphore.h and kernel/semaphore.c, for user-space one in ipc/sem.c
Related
In Linux, I have a scenario where two threads execute a critical section, one acquires the lock (thread A) and the other(thread B) will wait for the lock. Later threadA releases the mutex lock. I am trying to understand how threadB will be moved to the running state and acquire the lock? How threadB(or operating system) knows that the lock is released by threadA?
I have a theory, please correct if I am wrong. threadB enters TASK_INTERRUPTABLE (blocked at the mutex and so waiting) state and it receives signal when threadA unlocks the mutex so it comes back to the running queue(TASK_RUNNING).
The Linux mutex struct keeps track of the current owner of the mutex (if any):
struct mutex {
atomic_long_t owner;
// ...
There's also a struct to keep track of what other tasks are waiting on a mutex:
/*
* This is the control structure for tasks blocked on mutex,
* which resides on the blocked task's kernel stack:
*/
struct mutex_waiter {
struct list_head list;
struct task_struct *task;
struct ww_acquire_ctx *ww_ctx;
#ifdef CONFIG_DEBUG_MUTEXES
void *magic;
#endif
};
Simplifying quite a bit, when you unlock a mutex, the kernel looks at what other tasks are waiting on that mutex. It picks one of them to become the owner, sets the mutex's owner field to refer to the selected task, and removes that task from the list of tasks waiting for the mutex. At that point, there's at least a good chance that task has become un-blocked, in which case it'll be ready to run once it's unblocked. At that point, it's up to the scheduler to decide when to run it.
Optimization
Since mutexes are used a lot, and they get locked and unlocked quite a bit, they use some optimization to help speed. For example, consider the following:
/*
* #owner: contains: 'struct task_struct *' to the current lock owner,
* NULL means not owned. Since task_struct pointers are aligned at
* at least L1_CACHE_BYTES, we have low bits to store extra state.
*
* Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
* Bit1 indicates unlock needs to hand the lock to the top-waiter
* Bit2 indicates handoff has been done and we're waiting for pickup.
*/
#define MUTEX_FLAG_WAITERS 0x01
#define MUTEX_FLAG_HANDOFF 0x02
#define MUTEX_FLAG_PICKUP 0x04
#define MUTEX_FLAGS 0x07
So, when you ask the kernel to unlock a mutex, it can "glance" at one bit in the owner pointer to figure out whether this is a "simple" case (nobody's waiting on the mutex, so just mark it as unlocked, and off we go), or a more complex one (at least one task is waiting on the mutex, so a task needs to be selected to be unblocked, and be marked as the new owner of the mutex.
References
https://github.com/torvalds/linux/blob/master/include/linux/mutex.h
https://github.com/torvalds/linux/blob/master/kernel/locking/mutex.c
Disclaimer
The code extracts above are (I believe) current as I write this answer. But as noted above, mutexes get used a lot. If you look at the code for a mutex 5 or 10 years from now, chances are you'll find that somebody has done some work on optimizing the code, so it may not precisely match what I've quoted above. Most of the concepts are likely to remain similar, but changes in details (especially the optimizations) are to be expected.
I have the following situation:
Two C++11 threads are working on a calculation and they are synchronized through a std::mutex.
Thread A locks the mutex until the data is ready for the operation Thread B executes. When the mutex is unlocked Thread B starts to work.
Thread B tries to lock the mutex and is blocked until it is unlocked by Thread A.
void ThreadA (std::mutex* mtx, char* data)
{
mtx->lock();
//do something useful with data
mtx->unlock();
}
void ThreadB (std::mutex* mtx, char* data)
{
mtx->lock(); //wait until Thread A is ready
//do something useful with data
//.....
}
It is asserted that Thread A can block the mutex first.
Now I am wondering if the mtx->lock() in Thread B waits active or passive. So is Thread B polling the mutex state and wasting processor time or is released passively by the sheduler when the mutex is unlocked.
In the different C++ references it is only mentioned that the thread is blocked, but not in which way.
Could it be, however, that the std::mutex implementation is hardly depended on the used plattform and OS?
It's highly implementation defined, even for the same compiler and OS
for example,on VC++, in Visual Studio 2010, std::mutex was implemented with Win32 CRITICAL_SECTION. EnterCriticalSection(CRITICAL_SECTION*) has some nice feature: first it tries to lock the CRITICAL_SECTION by iterating on the lock again and again. after specified number of iteration, it makes a kernel-call which makes the thread go sleep, only to be awakened up again when the lock is released and the whole deal starts again.
in this case , the mechanism polls the lock again and again before going to sleep, then the control switches to the kernel.
Visual Studio 2012 came with a different implementation. std::mutex was implemented with Win32 mutex. Win32 mutex shifts the control immediately to the kernel. there is no active polling done by the lock.
you can read about the implementation switch in the answer : std::mutex performance compared to win32 CRITICAL_SECTION
So, it is unspecified how the mutex acquires the lock. it is the best not to rely on such behaviour.
ps. do not lock the mutex manually, use std::lock_guard instead. also, you might want to use condition_variable for more-refined way of controlling your synchronization.
I was studying the raw_spinlock struct, which is in /usr/src/linux/include/linux/spinlock_types.h:
typedef struct raw_spinlock {
arch_spinlock_t raw_lock;
#ifdef CONFIG_GENERIC_LOCKBREAK
unsigned int break_lock;
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu;
void *owner;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
} raw_spinlock_t;
I think raw_lock is for a lock which is dependent on an architecture and dep_map is a kind of data structure to avoid deadlocks, but what do break_lock, magic, owner_cpu, and *owner mean?
spinlock
spinlock is public API for spinlocks in kernel code.
See Documentation/locking/spinlocks.txt.
raw_spinlock
raw_spinlock is actual implementation of normal spinlocks. On not-RT kernels, spinlock is just a wrapper for raw_spinlock. On RT kernels, spinlock doesn't always use raw_spinlock.
See this article on LWN.
arch_spinlock
arch_spinlock is platform-specific part of spinlock implementation. raw_spinlock is generally platform-independent and delegates low-level operations to arch_spinlock.
lockdep_map
lockdep_map is a dependency map for locking correctness validator.
See Documentation/locking/lockdep-design.txt.
break_lock
On SMP kernels, when spin_lock() on one CPU starts looping while the lock is held on another CPU, it sets this flag to 1. Another CPU that holds the lock can periodically check this flag using spin_is_contended() and then call spin_unlock().
This allows to archive two goals at the same time:
avoid frequent locking/unlocking;
avoid holding lock for a long time, preventing others to acquire the lock.
See also this article.
magic, owner, owner_cpu
These fields are enabled when CONFIG_SPINLOCK_DEBUG is set and help to detect common bugs:
magic is set to some randomly choosen constant when spinlock is created (SPINLOCK_MAGIC which is 0xdead4ead)
owner is set to current process in spin_lock();
owner_cpu is set to current CPU id in spin_lock().
spin_unlock() checks that it is called when current process and CPU are the same as they were when spin_lock() was called.
spin_lock() checks that magic is equal to SPINLOCK_MAGIC to ensure that caller passed a pointer to correctly initialized spinlock and (hopefully) no memory corruption occurred.
See kernel/locking/spinlock_debug.c.
In the book Linux device drivers 3rd edition, the mutex is implemented by semaphore via init_MUTEX(sem). The newer kernel, such as kernel 3.2.X, however, has removed this function and added the support of mutex.
But when I encounter the codes:
if (down_interruptible(&sem))
return -ERESTARTSYS;
I can't ensure that whether there is a counterpart of this method for mutex. In other words, how can I interrupt the waiting on particular mutex?
I can't ensure that whether there is a counterpart of this method for mutex. In other words, how can I interrupt the waiting on particular mutex?
Yes, mutexs are pessimistic locks, which replace semaphores in the newer kernels. If you want to take a interuptable lock using a mutex, use, :
lock_interruptable()
Refer header file:
#include <linux/mutex.h>
With a friend of mine, we disagree on how synchronization is handled at userspace level (in the pthread library).
a. I think that during a pthread_mutex_lock, the thread actively waits. Meaning the linux scheduler rises this thread, let it execute his code, which should looks like:
while (mutex_resource->locked);
Then, another thread is scheduled which potentially free the locked field, etc.
So this means that the scheduler waits for the thread to complete its schedule time before switching to the next one, no matter what the thread is doing.
b. My friend thinks that the waiting thread somehow tells the kernel "Hey, I'm asleep, don't wait for me at all".
In this case, the kernel would schedule the next thread right away, without waiting for the current thread to complete its schedule time, being aware this thread is sleeping.
From what I see in the code of pthread, it seems there is loop handling the lock. But maybe I missed something.
In embedded systems, it could make sense to prevent the kernel from waiting. So he may be right (but I hope he does not :D).
Thanks!
a. I think that during a pthread_mutex_lock, the thread actively waits.
Yes, glibc's NPTL pthread_mutex_lock have active wait (spinning),
BUT the spinning is used only for very short amount of time and only for some types of mutexes. After this amount, pthread_mutex_lock will go to sleep, by calling linux syscall futex with WAIT argument.
Only mutexes with type PTHREAD_MUTEX_ADAPTIVE_NP will spin, and default is PTHREAD_MUTEX_TIMED_NP (normal mutex) without spinning. Check MAX_ADAPTIVE_COUNT in __pthread_mutex_lock sources).
If you want to do infinite spinning (active waiting), use pthread_spin_lock function with pthread_spinlock_t-types locks.
I'll consider the rest of your question as if you are using pthread_spin_lock:
Then, another thread is scheduled which potentially free the locked field, etc. So this means that the scheduler waits for the thread to complete its schedule time before switching to the next one, no matter what the thread is doing.
Yes, if there is contention for CPU cores, the your thread with active spinning may block other thread from execute, even if the other thread is the one who will unlock the mutex (spinlock) which is needed by your thread.
But if there is no contention (no thread oversubscribing), and threads are scheduled on different cores (by coincidence, or by manual setting of cpu affinity with sched_setaffinity or pthread_setaffinity_np), spinning will enable you to proceed faster, then using OS-based futex.
b. My friend thinks that the waiting thread somehow tells the kernel "Hey, I'm asleep, don't wait for me at all". In this case, the kernel would schedule the next thread right away, without waiting for the current thread to complete...
Yes, he is right.
futex is the modern way to say OS that this thread is waiting for some value in memory (for opening some mutex); and in current implementation futex also puts our thread to sleep. It is not needed to wake it to do spinning, if kernel knows when to wake up this thread. How it knows? The lock owner, when doing pthread_mutex_unlock, will check, is there any other threads, sleeping on this mutex. If there is any, lock owner will call futex with FUTEX_WAKE, telling OS to wake some thread, registered as sleeper on this mutex.
There is no need to spin, if thread registers itself as waiter in OS.
Some debuging with gdb for this test program:
#include <pthread.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
pthread_mutex_t x = PTHREAD_MUTEX_INITIALIZER;
void* thr_func(void *arg)
{
pthread_mutex_lock(&x);
}
int main(int argc, char **argv)
{
pthread_t thr;
pthread_mutex_lock(&x);
pthread_create(&thr, NULL, thr_func, NULL);
pthread_join(thr,NULL);
return 0;
}
shows that a call to pthread_mutex_lock on a mutex results in a calling a system call futex with the op parameter set to FUTEX_WAIT (http://man7.org/linux/man-pages/man2/futex.2.html)
And this is description of FUTEX_WAIT:
FUTEX_WAIT
This operation atomically verifies that the futex address
uaddr still contains the value val, and sleeps awaiting FUTEX_WAKE on
this futex address. If the timeout argument is
non-NULL, its contents describe the maximum duration of the wait,
which is infinite otherwise. The arguments uaddr2 and val3 are
ignored.
So from this description I can say that if a mutex is locked then a thread will sleep and not actively wait. And it will sleep until futex with op equal to FUTEX_WAKE is called.