I was studying the raw_spinlock struct, which is in /usr/src/linux/include/linux/spinlock_types.h:
typedef struct raw_spinlock {
arch_spinlock_t raw_lock;
#ifdef CONFIG_GENERIC_LOCKBREAK
unsigned int break_lock;
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu;
void *owner;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
} raw_spinlock_t;
I think raw_lock is for a lock which is dependent on an architecture and dep_map is a kind of data structure to avoid deadlocks, but what do break_lock, magic, owner_cpu, and *owner mean?
spinlock
spinlock is public API for spinlocks in kernel code.
See Documentation/locking/spinlocks.txt.
raw_spinlock
raw_spinlock is actual implementation of normal spinlocks. On not-RT kernels, spinlock is just a wrapper for raw_spinlock. On RT kernels, spinlock doesn't always use raw_spinlock.
See this article on LWN.
arch_spinlock
arch_spinlock is platform-specific part of spinlock implementation. raw_spinlock is generally platform-independent and delegates low-level operations to arch_spinlock.
lockdep_map
lockdep_map is a dependency map for locking correctness validator.
See Documentation/locking/lockdep-design.txt.
break_lock
On SMP kernels, when spin_lock() on one CPU starts looping while the lock is held on another CPU, it sets this flag to 1. Another CPU that holds the lock can periodically check this flag using spin_is_contended() and then call spin_unlock().
This allows to archive two goals at the same time:
avoid frequent locking/unlocking;
avoid holding lock for a long time, preventing others to acquire the lock.
See also this article.
magic, owner, owner_cpu
These fields are enabled when CONFIG_SPINLOCK_DEBUG is set and help to detect common bugs:
magic is set to some randomly choosen constant when spinlock is created (SPINLOCK_MAGIC which is 0xdead4ead)
owner is set to current process in spin_lock();
owner_cpu is set to current CPU id in spin_lock().
spin_unlock() checks that it is called when current process and CPU are the same as they were when spin_lock() was called.
spin_lock() checks that magic is equal to SPINLOCK_MAGIC to ensure that caller passed a pointer to correctly initialized spinlock and (hopefully) no memory corruption occurred.
See kernel/locking/spinlock_debug.c.
Related
In Linux, I have a scenario where two threads execute a critical section, one acquires the lock (thread A) and the other(thread B) will wait for the lock. Later threadA releases the mutex lock. I am trying to understand how threadB will be moved to the running state and acquire the lock? How threadB(or operating system) knows that the lock is released by threadA?
I have a theory, please correct if I am wrong. threadB enters TASK_INTERRUPTABLE (blocked at the mutex and so waiting) state and it receives signal when threadA unlocks the mutex so it comes back to the running queue(TASK_RUNNING).
The Linux mutex struct keeps track of the current owner of the mutex (if any):
struct mutex {
atomic_long_t owner;
// ...
There's also a struct to keep track of what other tasks are waiting on a mutex:
/*
* This is the control structure for tasks blocked on mutex,
* which resides on the blocked task's kernel stack:
*/
struct mutex_waiter {
struct list_head list;
struct task_struct *task;
struct ww_acquire_ctx *ww_ctx;
#ifdef CONFIG_DEBUG_MUTEXES
void *magic;
#endif
};
Simplifying quite a bit, when you unlock a mutex, the kernel looks at what other tasks are waiting on that mutex. It picks one of them to become the owner, sets the mutex's owner field to refer to the selected task, and removes that task from the list of tasks waiting for the mutex. At that point, there's at least a good chance that task has become un-blocked, in which case it'll be ready to run once it's unblocked. At that point, it's up to the scheduler to decide when to run it.
Optimization
Since mutexes are used a lot, and they get locked and unlocked quite a bit, they use some optimization to help speed. For example, consider the following:
/*
* #owner: contains: 'struct task_struct *' to the current lock owner,
* NULL means not owned. Since task_struct pointers are aligned at
* at least L1_CACHE_BYTES, we have low bits to store extra state.
*
* Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
* Bit1 indicates unlock needs to hand the lock to the top-waiter
* Bit2 indicates handoff has been done and we're waiting for pickup.
*/
#define MUTEX_FLAG_WAITERS 0x01
#define MUTEX_FLAG_HANDOFF 0x02
#define MUTEX_FLAG_PICKUP 0x04
#define MUTEX_FLAGS 0x07
So, when you ask the kernel to unlock a mutex, it can "glance" at one bit in the owner pointer to figure out whether this is a "simple" case (nobody's waiting on the mutex, so just mark it as unlocked, and off we go), or a more complex one (at least one task is waiting on the mutex, so a task needs to be selected to be unblocked, and be marked as the new owner of the mutex.
References
https://github.com/torvalds/linux/blob/master/include/linux/mutex.h
https://github.com/torvalds/linux/blob/master/kernel/locking/mutex.c
Disclaimer
The code extracts above are (I believe) current as I write this answer. But as noted above, mutexes get used a lot. If you look at the code for a mutex 5 or 10 years from now, chances are you'll find that somebody has done some work on optimizing the code, so it may not precisely match what I've quoted above. Most of the concepts are likely to remain similar, but changes in details (especially the optimizations) are to be expected.
Im working on porting some existing windows code over to linux, and I've come across something I'm not entirely sure how to handle.
The code is originally RTX windows, and must be deterministic. The first thing I've come across is a structure that contains a semaphore and mutex objects, and sets up pointers to the mutex and semaphore to be passed around/used by other callers.
volatile struct mystruct{
volatile pthread_mutex_t *qmutexid
volatile sem_t *qsemid
volatile int processID
volatile int msize
volatile char msgarray[]
}
this struct is cast over a large piece of memory that has data coming in and out of it via a linked list queue, but the semaphore and mutexes are a necessity to enure integrity.
What i want to know is if they following assignment for the pointer is valid.
myfunctioninit (*qname, msg_size, depth)
{
struct muStruct struct1
pthread_mutex_t mutexQueAccess
status = pthread_mutex_init(&mutexQueAccess, null)
struct1->qmutexid = mutexAccess
}
The other part of this, is that mutex's in windows are assigned/accessed by name. Other processes need access to this mutex, how do i go about doing it so the mutex can be shared across multiple processes/thread?
Say for example, I have an exclusive atomic-ops-based spin lock implementation as below:
bool TryLock(volatile TInt32 * pFlag)
{
return !(AtomicOps::Exchange32(pFlag, 1) == 1);
}
void Lock (volatile TInt32 * pFlag)
{
while (AtomicOps::Exchange32(pFlag, 1) == 1) {
AtomicOps::ThreadYield();
}
}
void Unlock (volatile TInt32 * pFlag)
{
*pFlag = 0; // is this ok? or here as well a atomicity is needed for load and store
}
Where AtomicOps::Exchange32 is implemented on windows using InterlockedExchange and on linux using __atomic_exchange_n.
In most cases, for releasing the resource, just resetting the lock to zero (as you do) is almost OK (e.g. on an Intel Core processor) but you need also to make sure that the compiler will not exchange instructions (see below, see also g-v's post). If you want to be rigorous (and portable), there are two things that need to be considered :
What the compiler does: It may exchange instructions for optimizing the code, and thus introduce some subtle bugs if it is not "aware" of the multithreaded nature of the code. To avoid that, it is possible to insert a compiler barrier.
What the processor does: Some processors (like Intel Itanium, used in professional servers, or ARM processors used in smart phones) have a so-called "relaxed memory model". In practice, it means that the processor may decide to change the order of the operations. Again, this can be avoided by using special instructions (load barrier and store barrier). For instance, in an ARM processor, the instruction DMB ensures that all store operations are completed before the next instruction (and it needs to be inserted in the function that releases a lock)
Conclusion: It is very tricky to make the code correct, if you have some compiler / OS support for these functionalities (e.g., stdatomics.h, or std::atomic in C++0x), it is much better to rely on them than writing your own (but sometimes you have no choice). In the specific case of standard Intel Core processor, I think that what you do is correct, provided you insert a compiler-barrier in the release operation (see g-v's post).
On compile-time versus run-time memory ordering, see: https://en.wikipedia.org/wiki/Memory_ordering
My code for some atomic / spinlocks implemented on different architectures:
http://alice.loria.fr/software/geogram/doc/html/atomics_8h.html
(but I'm unsure it's 100 % correct)
You need two memory barriers in spinlock implementation:
"acquire barrier" or "import barrier" in TryLock() and Lock(). It forces operations issued while spinlock is acquired to be visible only after pFlag value is updated.
"release barrier" or "export barrier" in Unlock(). It forces operations issued until spinlock was released to be visible before pFlag value is updated.
You also need two compiler barriers for the same reasons.
See this article for details.
This approach is for generic case. On x86/64:
there are no separate acquire/release barriers, but only single full barrier (memory fence);
there is no need for memory barriers here at all, since this architecture is strongly ordered;
you still need compiler barriers.
More details are provided here.
Below is an example implementation using GCC atomic builtins. It will work for all architectures supported by GCC:
it will insert acquire/release memory barriers on architectures where they are required (or full barrier if acquire/release barriers are not supported but architecture is weakly ordered);
it will insert compiler barriers on all architectures.
Code:
bool TryLock(volatile bool* pFlag)
{
// acquire memory barrier and compiler barrier
return !__atomic_test_and_set(pFlag, __ATOMIC_ACQUIRE);
}
void Lock(volatile bool* pFlag)
{
for (;;) {
// acquire memory barrier and compiler barrier
if (!__atomic_test_and_set(pFlag, __ATOMIC_ACQUIRE)) {
return;
}
// relaxed waiting, usually no memory barriers (optional)
while (__atomic_load_n(pFlag, __ATOMIC_RELAXED)) {
CPU_RELAX();
}
}
}
void Unlock(volatile bool* pFlag)
{
// release memory barrier and compiler barrier
__atomic_clear(pFlag, __ATOMIC_RELEASE);
}
For "relaxed waiting" loop, see this and this questions.
See also Linux kernel memory barriers as a good reference.
In your implementation:
Lock() calls AtomicOps::Exchange32() which already includes compiler barrier and perhaps acquire or full memory barrier (we don't know because you didn't provide actual arguments to __atomic_exchange_n()).
Unlock() misses both memory and compiler barriers so it's broken.
Also consider using pthread_spin_lock() if it is an option.
typedef struct { int counter; } atomic_t;
what does atomic_t means? HOW does compiled treats it? Historically, counter has been declared volatile, which implied it's a CPU register right?
The reason it is declared as a struct like that is so that the programmer using it is forced (gently reminded, rather) to use the access functions to manipulate it. For example, aval = 27 would not compile. Neither would aval++.
The volatile keyword has always meant the opposite of a CPU register: it means a value that has to be read from and written to memory directly.
If counter was historically volatile it was wrong because volatile has never been good enough on its own to ensure proper atomic updates. I believe that the current atomic manipulator functions use a cast through a volatile pointer combined with the appropriate write barrier functions, and machine code for some operations that the compiler cannot do properly.
atomic_t indicates it's an atomic type. Compiler will treats it as typedefed struct. I don't know what history says, but volatile is usually used to skip compiler optimizations and it doesn't imply CPU register.
Well, as it's name implies, all of it's operation is atomic i.e done at once, can't be scheduled out. atomic_t types have few helpers (like atomic_{inc,dec}, atomic_or and many) for manipulating any atomic type data. During manipulation of an atomic type, helpers usually inserts bus lock, as if they're not interrupted and make the whole thing atomic.
How does the current implementation of semaphores work? Does it use spinlocks or signals?
How does the scheduler know which one to invoke if signals are used?
Also how does it work in user space? Kernel locking recommends spinlocks but user space does not. So are the implementations different in user space and kernel space for semaphores?
Use the power of Open Source - just look at source code.
The kernel-space semaphore is defined as
struct semaphore {
raw_spinlock_t lock;
unsigned int count;
struct list_head wait_list;
};
lock is used to protect count and wait_list.
All tasks waiting on a semaphore reside in wait_list. When the semaphore is upped, one tasks is woken up.
User-space semaphores should rely on semaphore-related system calls, Kernel provides. The definition of user-space semaphores is:
/* One semaphore structure for each semaphore in the system. */
struct sem {
int semval; /* current value */
int sempid; /* pid of last operation */
spinlock_t lock; /* spinlock for fine-grained semtimedop */
struct list_head sem_pending; /* pending single-sop operations */
};
The kernel uses definition of the user-space semaphore similar to the kernel-space one. sem_pending is a list of waiting process plus some additional info.
I should highlight again that neither kernel-space semaphore, nor user-space one uses spinlock to wait on lock. Spinlock is included in both structures only to protect structure members from the concurrent access. After the structure is modified, spinlock is released and the task rests in list until woken.
Furthermore, spinlocks are unsuitable to wait on some event from another thread. Before acquiring a spinlock, kernel disables preemption. So, in this case, on uniprocessor machines, spinlock will never be released.
I should also notice that user-space semaphores, while serving on behalf of user-space, are executing in kernel-space.
P.S. Source code for the kernel-space semaphore resides in include/linux/semaphore.h and kernel/semaphore.c, for user-space one in ipc/sem.c