All over StackOverflow and the net I see folks to distinguish mutexes and spinlocks as like mutex is a mutual exclusion lock providing acquire() and release() functions and if the lock is taken, then acquire() will allow a process to be preempted.
Nevertheless, A. Silberschatz in his Operating System Concepts says in the section 6.5:
... The simplest of these tools is the mutex lock. (In fact, the term mutex is short for mutual exclusion.) We use the mutex lock to protect critical sections and thus prevent race conditions. That is, a process must acquire the lock before entering a critical section; it releases the lock when it exits the critical section. The acquire() function acquires the lock, and the release() function releases the lock.
and then he describes a spinlock, though adding a bit later
The type of mutex lock we have been describing is also called a spinlock because the process “spins” while waiting for the lock to become available.
so as spinlock is just a type of mutex lock as opposed to sleepable locks allowing a process to be preempted. That is, spinlocks and sleepable locks are all mutexes: locks by means of acquire() and release() functions.
I see totally logical to define mutex locks in the way Silberschatz did (though a bit implicitly).
What approach would you agree with?
The type of mutex lock we have been describing is also called a spinlock because the process “spins” while waiting for the lock to become available.
Maybe you're misreading the book (that is, "The type of mutex lock we have been describing" might not refer to the exact passage you think it does), or the book is outdated. Modern terminology is quite clear in what a mutex is, but spinlocks get a bit muddy.
A mutex is a concurrency primitive that allows one agent at a time access to its resource, while the others have to wait in the meantime until it the exclusive access is released. How they wait is not specified and irrelevant, their process might go to sleep, get written to disk, spin in a loop, or perhaps you are using cooperative concurrency (also known as "asynchronous programming") and passing control to the event loop as your 'waiting operation'.
A spinlock does not have a clear definition. It might be used to refer to:
A synonym for mutex (this is in my opinion wrong, but it happens).
A specific mutex implementation that always waits in a busy loop.
Any sort of busy-waiting loop waiting for a resource. A semaphore, for example, might also get implemented using a 'spinlock'.
I would consider any use of the word to refer to a (part of a) specific implementation of a concurrency primitive that waits in a busy loop to be correct, if a more general term is not appropriate. That is, use mutex (or whatever primitive you desire) unless you specifically want to talk about a busy-waiting concurrency primitive.
The words that one author uses in one book or manual will not always have the same exact meaning in every book and every manual. The meanings of the words evolve over time, and it can happen fast when the words are names for new ideas.
Not every book was written at the same time. Not every author is the same age or had the same teachers. It's just something you'll have to get used to.
"Mutex" was a name for a new idea not so very long ago.
In one book, it might mean nothing more than a thing that keeps two or more threads from entering the same critical section at the same time. In another book, it might refer to a specific type of object in a certain operating system or library that is used for that same purpose.
A spinlock is a lock/mutex whose implementation relies mainly on a spinning loop.
More advanced locks/mutexes may have spinning parts in their implementation, however those often last for no more than a few microseconds or so.
When a lock (or try_lock) function of a mutex discovers that the mutex is already locked (by another thread (presumably)), could it try to determine whether the owning thread is (or was extremely recently) running on another core?
Knowing whether the owner is running gives an indication of a possible reason the thread still holds the lock (why it couldn't unlock it): if the thread owning the mutex is either "runnable" (waiting for an available core and time slice) or "sleeping" or waiting for I/O... then clearly spinning on the mutex is not useful.
I understand that non recursive, non "safe", non priority inheritance mutexes (that I call "anonymous" mutexes: mutexes that in practice can be unlocked by another thread as they are just common semaphores) carry as little information as possible, but surely that could be determined for those "owned" mutexes that know the identify of the locking thread.
Another more useful information would be more how much time it was locked, but that would potentially require adding a field in the mutex object.
Is that ever done?
I am a little bit confused about the two concepts.
definition of lock-free on wiki:
A non-blocking algorithm is lock-free if there is guaranteed
system-wide progress
definition of non-blocking:
an algorithm is called non-blocking if failure or suspension of any
thread cannot cause failure or suspension of another thread
I thought spinlock is lock-free, or at least non-blocking. But now I'm not sure. Because by definition, "spinlock is not lock-free" also makes sense to me. Like, if the thread holding the spinlock gets suspended, then it will cause suspension of other threads spinning outside. So, by definition, spinlock is not even non-blocking, let alone lock-free.
I'm so confused now. Can anyone explain it clearly?
Anything that can be called a lock (exclude other threads from a critical section until the current thread unlocks) is by definition not lock-free. And yes, spinlocks are a kind of lock.
If a thread sleeps while holding the lock, no other thread can acquire it and make forward progress, and spinlocks can't prevent this. The OS can de-schedule a thread whenever it wants, even if it's in the middle of a critical section.
Note that "lock-free" isn't the same thing as "wait-free", so a lock-free algorithm can still have stuff like cmpxchg retry loops, but as long as one thread succeeds every time, it's lock free.
A wait-free algorithm can't even have that, and at most has to wait for cache misses / hardware arbitration of contended atomic operations. Wikipedia's non-blocking algorithm article defines wait-free and lock-free in more detail.
I think you're mixing up two definitions of "blocking".
I think you're talking about a spin_trylock function that tries to acquire a spinlock, and returns with an error if it fails instead of spinning. So this is non-blocking in the same sense as non-blocking I/O: fail with an error instead of waiting for resource availability.
That doesn't mean any thread in the system is making forward progress on the thing protected by the spinlock. It just means your thread can go and do something else before trying again, instead of needing to use separate threads to do something in parallel with waiting to acquire a lock.
Spinning in an infinite loop counts as blocking / not-making-progress. For this definition, there's no difference between a pure spinlock and one that (with OS assistance) sleeps until another thread unlocks.
The definition of lock-free isn't concerned with wasting CPU time / power to make room for independent work to happen.
Somewhat related: acquiring an uncontended spinlock doesn't require a system call, which means it's a "light-weight" lock. Some lock implementations always use a (relatively slow) system call even in the uncontended case. See Jeff Preshing's Always Use a Lightweight Mutex article. Also read Jeff's other posts to learn more about lock-free programming, because they're excellent. So good in fact that the [lock-free] tag wiki links to them.
Does a mutex lock access to variables globally, or just those in the same scope as the locked mutex?
Note that I had to change the title of this question, as a lot of answers seem to be confused as to what I was asking. This is not a question about the scope (global or otherwise) of a "mutex object", it is a question about what scope of variables are "locked" by a mutex.
I believe the answer to be that a mutex locks access to all variables, ie; all global and locally scoped variables. (This is a result of a mutex blocking thread execution rather than access to specific regions of memory.)
I am attempting to understand Mutexes.
I was attempting to understand what sections of memory, or equivalently, which variables, a mutex would lock.
However my understanding from reading around online is that Mutexes do not lock memory, they lock (or block) simultaneously running threads which are all members of the same process. (Is that correct?)
https://mortoray.com/2011/12/16/how-does-a-mutex-work-what-does-it-cost/
So my question has become simply "are mutexes global?"
... or are they perhaps "generally speaking global, but the stackoverflow community can imagine some special cases in which they are not?"
When originally considering my question, I was interested in things such as those shown in the following example.
// both in global scope, this mutex will lock any global scope variable?
int global_variable;
mutex global_variable_mutex;
int main()
{
// one thread operates here and locks global_variable_mutex
// before reading/writing
{
// local variables in a loop
// launch some threads here, and wait later
int local_variable;
mutex local_variable_mutex;
// wait for launched thread to return
// does the mutex here prevent data races to the variable
// global_variable ???
}
}
One may assume this is pseudo-code for C++ or C, or any other similarly relevant language.
2021 edit: Question title has been changed to better reflect the contents of the question and associated answers.
So my question has become simply "are mutexes global?"
No. A mutex has a lock() and an unlock() method, and the only thing a mutex does is cause its lock() call (from any thread) not to return for as long as another thread has that mutex locked. When the thread that was holding the mutex locked calls unlock(), that is when the lock() call will return in the first thread. That way it is guaranteed that only a single thread will be holding the mutex-lock (i.e. executing in the region between its lock() call and its unlock() call) at any given time.
That's really all there is to it. So a mutex will effect only the threads that call lock() on that particular mutex, and nothing else.
Mutex stands for "Mutual Exclusion" - using one correctly ensures that only one thread at a time will ever be executing any "critical section" protected by the same mutex.
If there are some variables you only ever modify inside critical sections protected by the same mutex, your code doesn't have a data race. No matter whether they're global, static, or pointed to by different variables in different threads or any other way two threads might have a reference to the same object.
When I asked this question I was confused...
When I originally asked this question, I was confused because I had no conceputal understanding of how a "mutex" functions in hardware, whereas I did have a conceptual understanding of many other things that exist in hardware. (For example, how a compiler converts text into machine readable instructions. How cache and memory work. How graphics or coprocessors work. How network hardware and interfaces work, etc.)
Misconception 1: Mutex does not lock memory locations
When I first heard about Mutex, long before writing this question, I misunderstood a mutex to be a feature which locks regions of memory. (That region might be global.)
This is not what happens. Other threads and processes can continue to access main memory and cache if another thread locks a mutex. You can see immediatly why such a design would be inefficient, since it would block all other system processes, for the sake of synchronizing one.
Misconception 2: The scope in which a mutex object is declared is irrelevant
The context of this is C code, and C like languages where you have scoped blocks defined by { and } however the same logic could apply to Python where scope is defined by indentation.
I believe that this misunderstanding came from the existance of scoped_lock objects, and similar concepts where scope is used to manage the lifetime (locking and unlocking, resources) of a Mutex object.
One could also argue that since pointers and references to a Mutex can be passed around a program, the scope of a Mutex couldn't be used to define what variables are "locked" by a mutex.
For example, I misunderstood the following snippet:
{
int x, y, z;
Mutex m;
m.lock();
}
I believed that the above snippet would lock access to variables x, y and z from all other threads because x, y and z are declared in the same scope as the mutex m. This is also not how a mutex works.
Understanding 1: Mutex is typically implemented in hardware using atomic operations
Atomic operations are completely seperate from the concept of mutex, however they are a prerequisite to understanding how a mutex can exist, and how it can work.
When a CPU executes something like c = a + b, this involves a sequence of individual (atomic) operations. The word Atom is derived from Atomos meaning "indivisible", or "fundamental". (Atoms are divisible, but when theorists of Ancient Greece originally concieved of the objects from which matter was composed, they assumed that particles must be divisible down to some fundamental smallest possible component, which itself is indivisible. They were not too far wrong, since an atom is made from other fundamental particles which so far we understand to be indivisible.)
Returning to the point: c = a + b is something like the following:
load a from memory into register 1
load b from memory into register 2
do operation add: add contents of register 2 to register 1, result is in register 1
save register 1 to memory c
The add operation might take several clock cycles, and loading/saving to memory takes typically of order 100 clock cycles on modern x86 machines. However each operation is atomic in the sense that a single CPU instruction is being completed, and this instruction cannot be divided into any smaller step of smaller instructions. The instructions are themselves fundamental computing operations.
With that understood, there exists a set of atomic instructions which can do things such as:
load a value from memory increment it and save it to memory
load a value from memory decrement it and save it to memory
load a value from memory, compare it to a value which is already loaded into a register, and branch depending on the comparison result
Note that such operations are typically significantly slower than their non-atomic sequence counterparts. This is because optimizations such as pipelining are forfit when executing the above instructions. (I think?)
At this point my knowledge becomes a bit less accurate and more hand-wavey, but as far as I understand, these operations are typically implemented by having some digital logic inside the processor which blocks all other processes from running while these atomic operations (listed above) are executing.
Meaning: If there are 8 CPU cores running, if one core encounters an instruction like the above, it signals the other cores to stop running until it has finished that atomic operation. (It is at least something approximatly along these lines.)
Understanding 2: Actual mutex operation
Given the above, it is possible to implement a mutex using these atomic machine instructions. Other answers posted here suggest possible ways of doing it including something similar to reference counting. (Semaphore.)
How an acutal mutex in C++ works is this:
Each mutex object has a variable in memory associated with it, the value of this variable indicates whether a mutex is locked or not
This mutex variable is updated using the special atomic operations that a CPU supports for the purpose of allowing a mutex to be programmed
Elsewhere in memory there are some other variables/data which you want to protect/synchronize access to
This synchronization is done using the mutex variable/data
Before a thread reads/writes to some data/variable which needs to be accessed mutually exclusively by all threads which operate on it, that thread must first "lock" the special mutex data/variable
This is done using the atomic operations built into a CPU for the purpose of supporting mutex programming
So you see, the data which is "locked" and accessed mutually exclusively is entirely independent from the actual data used to store the state of the mutex.
If another thread wants to read/write the data which must be accessed mutually exclusively, it will try to lock the mutex. If the mutex is already locked, that means another thread has the right to access this data, and no other thread is permitted to, therefore this thread will typically go to sleep, and will be re-woken by the operating system when the mutex is next unlocked.
It is important to note the operating system thread (kernel) is critically involved in the mutex process. Typically, before a thread sleeps, it will tell the operating sytem that it wishes to be woken up again when the mutex is free. The operating system is also notified when other threads lock or unlock a mutex. Hence synchronization of information about the state of a mutex is passed via messages through the operating system kernel.
This is why writing a multiple thread OS kernel is (proabably) impossible (if not very difficult). I don't know if this has actually been done successfully. It sounds like a difficult problem which might be the subject of current CS research.
This is pretty much everything I know about the subject. Obviously my knowledge is not absolute...
Note: Feel free to correct my Greek history or x86 Machine Instruction knowledge in the comments section. No doubt not everything here is perfectly accurate.
As your question suggests, I assume you are asking your question independent of any programming language.
First it is important to understand what is a mutex and how it works? A mutex is a binary semaphore. Then what is a semaphore? A semaphore is an integer with following attributes,
You can initialize it into any permitted value (For a mutex, it is 1 or 0).
A thread can access the semaphore and it can increment or decrement its integer value.
When a thread decrements it,
If the result is positive or zero, that thread can continue its process.
If the result is negative, that thread will be waiting and the semaphore value will not be further decremented by any later thread.
If a thread increments it, (in that case semaphore value will be either positive or 0) and the result is 0, one of the waiting threads can continue execution.
So when there's a situation where a thread is trying to access a shared resource it will decrement the mutex value (from 0, so that other thread is waiting). And when it finishes, it will increment the mutex value (So that the waiting thread can continue). That's how the access control happens by means of a mutex (Binary semaphore).
I think you understand that your question is a non-applicable one here. As a simple answer for
So my question has become simply "are mutexes global?"
is simply NO.
A mutex has whatever scope you assign to it. It can be global or local again based on where and how you declare it. If for example you declare a mutex in global memory in a place where you can access it globally, then it is indeed global. If instead you declare it at function or private class scope level, then only that function or class will have access to it.
That said, in order to be useful for synchronization, the mutex needs to be declared in a scope that can be accessed by the threads needing to synchronize on it. Whether that's at global scope or some local scope depends on your program structure. I'd advise declaring it at the highest scope accessible to the threads but no higher.
In your particular example, the mutex is indeed global because you've declared it in global memory.
Locking doesn't operate on the variables it protects, it just works by giving threads a way to arrange that only one thread at a time will be doing something (like reading+writing a data structure). And that it will be finished, with memory effects visible, before the next thread's turn to read and maybe modify that data. (A readers+writers lock allows multiple readers but only one writer).
Any thread that can access the mutex object can lock / unlock it. The mutex object itself is a normal variable that you can put in any scope you want, even a local variable and then put a pointer to it somewhere that other threads can see. (Although normally you wouldn't do that.)
Mutex is named for "Mutual Exclusion" - using one correctly ensures that only one thread at a time will ever be executing any "critical section" (wikipedia) protected by the same mutex. Separate mutexes can allow different threads to hold different locks. Different functions or blocks that use the same mutex (normally because they access the same data) won't both run at once.
If there are some variables you only ever modify inside critical sections protected by the same mutex, those accesses won't be data race, and if you don't have other bugs, your code is thread-safe. No matter whether they're global, static, or pointed to by different variables in different threads or any other way two threads might have a reference to the same object.
If you write code that accesses shared data without taking a lock on a mutex, it might see a partially-updated value, especially for a struct with multiple pointers / integers. (And in C++, simultaneous accesses to non-atomic variables is undefined behaviour if they're not all reads).
Locking is a cooperative activity, normally nothing stops you from getting it wrong. If you're familiar with file locking, you may have heard of advisory vs. mandatory locks (the OS will deny open calls by other programs). Mutexes in multi-threaded programs are advisory; no memory protection or other hardware mechanism stops another thread from executing code that accesses the bytes of an object.
(At a low enough level, that's actually useful for lock-free atomics, especially with some control over ordering of those operations from memory barriers and/or release-store / acquire-load. And CPU cache hardware is up to the task of maintaining coherency from multiple accesses. But if you use locking, you don't have to worry about any of that. If you use locking incorrectly, understanding the possible symptoms might help identify that there is a locking problem.)
Some programs have phases where only a single thread is running, or only one that would need to touch certain variables, so enforced locking for every access to a variable isn't something that every language provides. (C++ std::atomic<T> is sort of like that; every access is as-if there was a lock/unlock of a lock protecting just that T object, except it's limited to operations that most CPUs can do without needing to lock/unlock a separate lock. Unless you use a large T, then there actually is a lock. Or if you use a memory order weaker than the default seq_cst, you can see orderings that wouldn't have been possible if all accesses acquiring/releasing locks.)
Besides, consistency between multiple variables is often important, so it matters that you hold one lock across multiple operations on multiple variables, or multiple members of the same struct.
Some tools can help detect code that doesn't respect a mutex while other threads are running, though, like clang -fsanitize=thread.
Back in my days as a BeOS programmer, I read this article by Benoit Schillings, describing how to create a "benaphore": a method of using atomic variable to enforce a critical section that avoids the need acquire/release a mutex in the common (no-contention) case.
I thought that was rather clever, and it seems like you could do the same trick on any platform that supports atomic-increment/decrement.
On the other hand, this looks like something that could just as easily be included in the standard mutex implementation itself... in which case implementing this logic in my program would be redundant and wouldn't provide any benefit.
Does anyone know if modern locking APIs (e.g. pthread_mutex_lock()/pthread_mutex_unlock()) use this trick internally? And if not, why not?
What your article describes is in common use today. Most often it's called "Critical Section", and it consists of an interlocked variable, a bunch of flags and an internal synchronization object (Mutex, if I remember correctly). Generally, in the scenarios with little contention, the Critical Section executes entirely in user mode, without involving the kernel synchronization object. This guarantees fast execution. When the contention is high, the kernel object is used for waiting, which releases the time slice conductive for faster turnaround.
Generally, there is very little sense in implementing synchronization primitives in this day and age. Operating systems come with a big variety of such objects, and they are optimized and tested in significantly wider range of scenarios than a single programmer can imagine. It literally takes years to invent, implement and test a good synchronization mechanism. That's not to say that there is no value in trying :)
Java's AbstractQueuedSynchronizer (and its sibling AbstractQueuedLongSynchronizer) works similarly, or at least it could be implemented similarly. These types form the basis for several concurrency primitives in the Java library, such as ReentrantLock and FutureTask.
It works by way of using an atomic integer to represent state. A lock may define the value 0 as unlocked, and 1 as locked. Any thread wishing to acquire the lock attempts to change the lock state from 0 to 1 via an atomic compare-and-set operation; if the attempt fails, the current state is not 0, which means that the lock is owned by some other thread.
AbstractQueuedSynchronizer also facilitates waiting on locks and notification of conditions by maintaining CLH queues, which are lock-free linked lists representing the line of threads waiting either to acquire the lock or to receive notification via a condition. Such notification moves one or all of the threads waiting on the condition to the head of the queue of those waiting to acquire the related lock.
Most of this machinery can be implemented in terms of an atomic integer representing the state as well as a couple of atomic pointers for each waiting queue. The actual scheduling of which threads will contend to inspect and change the state variable (via, say, AbstractQueuedSynchronizer#tryAcquire(int)) is outside the scope of such a library and falls to the host system's scheduler.