How does a Mutex work? Does a mutex protect variables globally? Does the scope in which it is defined matter?

How does a Mutex work? Does a mutex protect variables globally? Does the scope in which it is defined matter? - multithreading

Does a mutex lock access to variables globally, or just those in the same scope as the locked mutex?
Note that I had to change the title of this question, as a lot of answers seem to be confused as to what I was asking. This is not a question about the scope (global or otherwise) of a "mutex object", it is a question about what scope of variables are "locked" by a mutex.
I believe the answer to be that a mutex locks access to all variables, ie; all global and locally scoped variables. (This is a result of a mutex blocking thread execution rather than access to specific regions of memory.)
I am attempting to understand Mutexes.
I was attempting to understand what sections of memory, or equivalently, which variables, a mutex would lock.
However my understanding from reading around online is that Mutexes do not lock memory, they lock (or block) simultaneously running threads which are all members of the same process. (Is that correct?)
https://mortoray.com/2011/12/16/how-does-a-mutex-work-what-does-it-cost/
So my question has become simply "are mutexes global?"
... or are they perhaps "generally speaking global, but the stackoverflow community can imagine some special cases in which they are not?"
When originally considering my question, I was interested in things such as those shown in the following example.
// both in global scope, this mutex will lock any global scope variable?
int global_variable;
mutex global_variable_mutex;
int main()
{
// one thread operates here and locks global_variable_mutex
// before reading/writing
{
// local variables in a loop
// launch some threads here, and wait later
int local_variable;
mutex local_variable_mutex;
// wait for launched thread to return
// does the mutex here prevent data races to the variable
// global_variable ???
}
}
One may assume this is pseudo-code for C++ or C, or any other similarly relevant language.
2021 edit: Question title has been changed to better reflect the contents of the question and associated answers.

So my question has become simply "are mutexes global?"
No. A mutex has a lock() and an unlock() method, and the only thing a mutex does is cause its lock() call (from any thread) not to return for as long as another thread has that mutex locked. When the thread that was holding the mutex locked calls unlock(), that is when the lock() call will return in the first thread. That way it is guaranteed that only a single thread will be holding the mutex-lock (i.e. executing in the region between its lock() call and its unlock() call) at any given time.
That's really all there is to it. So a mutex will effect only the threads that call lock() on that particular mutex, and nothing else.
Mutex stands for "Mutual Exclusion" - using one correctly ensures that only one thread at a time will ever be executing any "critical section" protected by the same mutex.
If there are some variables you only ever modify inside critical sections protected by the same mutex, your code doesn't have a data race. No matter whether they're global, static, or pointed to by different variables in different threads or any other way two threads might have a reference to the same object.

When I asked this question I was confused...
When I originally asked this question, I was confused because I had no conceputal understanding of how a "mutex" functions in hardware, whereas I did have a conceptual understanding of many other things that exist in hardware. (For example, how a compiler converts text into machine readable instructions. How cache and memory work. How graphics or coprocessors work. How network hardware and interfaces work, etc.)
Misconception 1: Mutex does not lock memory locations
When I first heard about Mutex, long before writing this question, I misunderstood a mutex to be a feature which locks regions of memory. (That region might be global.)
This is not what happens. Other threads and processes can continue to access main memory and cache if another thread locks a mutex. You can see immediatly why such a design would be inefficient, since it would block all other system processes, for the sake of synchronizing one.
Misconception 2: The scope in which a mutex object is declared is irrelevant
The context of this is C code, and C like languages where you have scoped blocks defined by { and } however the same logic could apply to Python where scope is defined by indentation.
I believe that this misunderstanding came from the existance of scoped_lock objects, and similar concepts where scope is used to manage the lifetime (locking and unlocking, resources) of a Mutex object.
One could also argue that since pointers and references to a Mutex can be passed around a program, the scope of a Mutex couldn't be used to define what variables are "locked" by a mutex.
For example, I misunderstood the following snippet:
{
int x, y, z;
Mutex m;
m.lock();
}
I believed that the above snippet would lock access to variables x, y and z from all other threads because x, y and z are declared in the same scope as the mutex m. This is also not how a mutex works.
Understanding 1: Mutex is typically implemented in hardware using atomic operations
Atomic operations are completely seperate from the concept of mutex, however they are a prerequisite to understanding how a mutex can exist, and how it can work.
When a CPU executes something like c = a + b, this involves a sequence of individual (atomic) operations. The word Atom is derived from Atomos meaning "indivisible", or "fundamental". (Atoms are divisible, but when theorists of Ancient Greece originally concieved of the objects from which matter was composed, they assumed that particles must be divisible down to some fundamental smallest possible component, which itself is indivisible. They were not too far wrong, since an atom is made from other fundamental particles which so far we understand to be indivisible.)
Returning to the point: c = a + b is something like the following:
load a from memory into register 1
load b from memory into register 2
do operation add: add contents of register 2 to register 1, result is in register 1
save register 1 to memory c
The add operation might take several clock cycles, and loading/saving to memory takes typically of order 100 clock cycles on modern x86 machines. However each operation is atomic in the sense that a single CPU instruction is being completed, and this instruction cannot be divided into any smaller step of smaller instructions. The instructions are themselves fundamental computing operations.
With that understood, there exists a set of atomic instructions which can do things such as:
load a value from memory increment it and save it to memory
load a value from memory decrement it and save it to memory
load a value from memory, compare it to a value which is already loaded into a register, and branch depending on the comparison result
Note that such operations are typically significantly slower than their non-atomic sequence counterparts. This is because optimizations such as pipelining are forfit when executing the above instructions. (I think?)
At this point my knowledge becomes a bit less accurate and more hand-wavey, but as far as I understand, these operations are typically implemented by having some digital logic inside the processor which blocks all other processes from running while these atomic operations (listed above) are executing.
Meaning: If there are 8 CPU cores running, if one core encounters an instruction like the above, it signals the other cores to stop running until it has finished that atomic operation. (It is at least something approximatly along these lines.)
Understanding 2: Actual mutex operation
Given the above, it is possible to implement a mutex using these atomic machine instructions. Other answers posted here suggest possible ways of doing it including something similar to reference counting. (Semaphore.)
How an acutal mutex in C++ works is this:
Each mutex object has a variable in memory associated with it, the value of this variable indicates whether a mutex is locked or not
This mutex variable is updated using the special atomic operations that a CPU supports for the purpose of allowing a mutex to be programmed
Elsewhere in memory there are some other variables/data which you want to protect/synchronize access to
This synchronization is done using the mutex variable/data
Before a thread reads/writes to some data/variable which needs to be accessed mutually exclusively by all threads which operate on it, that thread must first "lock" the special mutex data/variable
This is done using the atomic operations built into a CPU for the purpose of supporting mutex programming
So you see, the data which is "locked" and accessed mutually exclusively is entirely independent from the actual data used to store the state of the mutex.
If another thread wants to read/write the data which must be accessed mutually exclusively, it will try to lock the mutex. If the mutex is already locked, that means another thread has the right to access this data, and no other thread is permitted to, therefore this thread will typically go to sleep, and will be re-woken by the operating system when the mutex is next unlocked.
It is important to note the operating system thread (kernel) is critically involved in the mutex process. Typically, before a thread sleeps, it will tell the operating sytem that it wishes to be woken up again when the mutex is free. The operating system is also notified when other threads lock or unlock a mutex. Hence synchronization of information about the state of a mutex is passed via messages through the operating system kernel.
This is why writing a multiple thread OS kernel is (proabably) impossible (if not very difficult). I don't know if this has actually been done successfully. It sounds like a difficult problem which might be the subject of current CS research.
This is pretty much everything I know about the subject. Obviously my knowledge is not absolute...
Note: Feel free to correct my Greek history or x86 Machine Instruction knowledge in the comments section. No doubt not everything here is perfectly accurate.

As your question suggests, I assume you are asking your question independent of any programming language.
First it is important to understand what is a mutex and how it works? A mutex is a binary semaphore. Then what is a semaphore? A semaphore is an integer with following attributes,
You can initialize it into any permitted value (For a mutex, it is 1 or 0).
A thread can access the semaphore and it can increment or decrement its integer value.
When a thread decrements it,
If the result is positive or zero, that thread can continue its process.
If the result is negative, that thread will be waiting and the semaphore value will not be further decremented by any later thread.
If a thread increments it, (in that case semaphore value will be either positive or 0) and the result is 0, one of the waiting threads can continue execution.
So when there's a situation where a thread is trying to access a shared resource it will decrement the mutex value (from 0, so that other thread is waiting). And when it finishes, it will increment the mutex value (So that the waiting thread can continue). That's how the access control happens by means of a mutex (Binary semaphore).
I think you understand that your question is a non-applicable one here. As a simple answer for
So my question has become simply "are mutexes global?"
is simply NO.

A mutex has whatever scope you assign to it. It can be global or local again based on where and how you declare it. If for example you declare a mutex in global memory in a place where you can access it globally, then it is indeed global. If instead you declare it at function or private class scope level, then only that function or class will have access to it.
That said, in order to be useful for synchronization, the mutex needs to be declared in a scope that can be accessed by the threads needing to synchronize on it. Whether that's at global scope or some local scope depends on your program structure. I'd advise declaring it at the highest scope accessible to the threads but no higher.
In your particular example, the mutex is indeed global because you've declared it in global memory.

Locking doesn't operate on the variables it protects, it just works by giving threads a way to arrange that only one thread at a time will be doing something (like reading+writing a data structure). And that it will be finished, with memory effects visible, before the next thread's turn to read and maybe modify that data. (A readers+writers lock allows multiple readers but only one writer).
Any thread that can access the mutex object can lock / unlock it. The mutex object itself is a normal variable that you can put in any scope you want, even a local variable and then put a pointer to it somewhere that other threads can see. (Although normally you wouldn't do that.)
Mutex is named for "Mutual Exclusion" - using one correctly ensures that only one thread at a time will ever be executing any "critical section" (wikipedia) protected by the same mutex. Separate mutexes can allow different threads to hold different locks. Different functions or blocks that use the same mutex (normally because they access the same data) won't both run at once.
If there are some variables you only ever modify inside critical sections protected by the same mutex, those accesses won't be data race, and if you don't have other bugs, your code is thread-safe. No matter whether they're global, static, or pointed to by different variables in different threads or any other way two threads might have a reference to the same object.
If you write code that accesses shared data without taking a lock on a mutex, it might see a partially-updated value, especially for a struct with multiple pointers / integers. (And in C++, simultaneous accesses to non-atomic variables is undefined behaviour if they're not all reads).
Locking is a cooperative activity, normally nothing stops you from getting it wrong. If you're familiar with file locking, you may have heard of advisory vs. mandatory locks (the OS will deny open calls by other programs). Mutexes in multi-threaded programs are advisory; no memory protection or other hardware mechanism stops another thread from executing code that accesses the bytes of an object.
(At a low enough level, that's actually useful for lock-free atomics, especially with some control over ordering of those operations from memory barriers and/or release-store / acquire-load. And CPU cache hardware is up to the task of maintaining coherency from multiple accesses. But if you use locking, you don't have to worry about any of that. If you use locking incorrectly, understanding the possible symptoms might help identify that there is a locking problem.)
Some programs have phases where only a single thread is running, or only one that would need to touch certain variables, so enforced locking for every access to a variable isn't something that every language provides. (C++ std::atomic<T> is sort of like that; every access is as-if there was a lock/unlock of a lock protecting just that T object, except it's limited to operations that most CPUs can do without needing to lock/unlock a separate lock. Unless you use a large T, then there actually is a lock. Or if you use a memory order weaker than the default seq_cst, you can see orderings that wouldn't have been possible if all accesses acquiring/releasing locks.)
Besides, consistency between multiple variables is often important, so it matters that you hold one lock across multiple operations on multiple variables, or multiple members of the same struct.
Some tools can help detect code that doesn't respect a mutex while other threads are running, though, like clang -fsanitize=thread.

Related

What does a mutex lock?

In every tutorial about mutex, mutex is described as a way to prevent for example multiple threads to access the same resources at the same time. But what are those resources. I know that the resources can be a lot of things, like for example variables, but how do i define those variables that shouldnt be used at the same time by another thread? How does Mutex know which variables to "lock"? I dont understand how the compiler can know before executing the code what Mutex should lock between the functions mutex.lock and mutex.release.

The answer depends on how you want to think about it.
At a low level, a mutex locks nothing but itself. Two threads will never be allowed to lock the same mutex at the same time. End of story.
At a higher level, a mutex locks whatever data you want to lock with it. A mutex is a type of advisory lock. It's like a sign hanging on a door knob that says, "in-use, do not enter." It will keep out whoever respects the sign, but it has no actual power to keep anybody out.
If you have some data shared by several threads, and if you don't want any two threads to ever access* the data at the same time, then you can set up a mutex, and declare that, "None shall access these data unless they have the mutex locked." That declaration is what #Wyck called a "protocol" in a comment, above.
It's up to you to ensure that no thread in your program ever accesses the data without holding the mutex locked. I.e., it's up to you to ensure that your code obeys the protocol.
Also note! Nowhere did I mention "method" or "function." There's never any inherent benefit to locking a method or a function. It's always about the data that the method or the function accesses.
* "Access" doesn't just mean "update." If one thread merely tries to read the data while some other thread is in the middle of updating it, the reading thread could see an inconsistent or invalid snapshot of the data, and it could make arbitrarily bad decisions based on what it saw. The consequences could be fatal to the process, or worse.

Mutexes. What even?

I am learning about computer architecture and how operating systems work. I have a few questions about how mutexes work.
Question 1
add_to_list(&list, &elem):
mutex m;
lock_mutex(m);
...
remove_from_list(&list):
mutex m;
lock_mutex(m);
...
These two functions instantiate their own mutex, which means they point to different places in memory and so one does not lock the other and effectively doesn't accomplish what we want--list to be protected.
How do we get two different functions to use the same mutex? Do we define a global variable? If so, how do you share this global variable throughout an entire program that is potentially spread throughout multiple files?
Question 2
mutex m;
modify_A():
lock_mutex(m);
A += 1;
modify_B():
lock_mutex(m);
B += 1;
These two functions modify different spaces in memory. Does that mean I need a unique mutex for each function / or piece of data? If I were to have a global mutex variable that I used for both functions, a thread calling modify_A() would block another thread trying to call modify_B()
Which brings me to my last question...
Question 3
A mutex seems like it just blocks a thread from running a piece of code until whatever thread is currently running that same code finishes. This is to create atomicity and protect the integrity of the data being used by a thread. However, the same piece of memory can be modified from many different places in a program. Which makes me think we have to use one mutex throughout an entire program, which would result in a lot of needless blocking of other threads.
Considering that pretty much every function in a given program is going to be modifying data, if we use a single mutex throughout a program, that means each function call will be blocked while that mutex is in use by another thread, even if the data it needs to access is unrelated.
Doesn't that effectively eliminate the gains from having multiple threads? If only one thread can run at a given time?
I feel like I'm totally misunderstanding how mutexes work, so please ELI5!
Thanks in advance.

Yes, you make it a global variable, or otherwise accessible to the required functions through some kind of convenience method or whatever. Global variables can be shared between translation units too, but that's language/system dependent. In C you'd just put an extern mutex m in a header that everyone shares and then define that mutex as mutex m in exactly one of your translation units.
If you don't want changes to B to block other threads from modifying A, yes, you'd use two different mutexes. If you want to lock both at the same time, you would share the mutex.
Multiple threads can run at the same time as long as no two of them are inside the critical section protected by a certain mutex at the same time. That's the whole point - everything goes on nice and parallel, but you use the mutex to serialize access to a specific resource or critical section you need protected.

You typically use a mutex to protect some particular piece of shared data. If the vast majority of your code's time is spent accessing one single piece of shared data, then you won't get much of a performance improvement from threads precisely because only one thread can safely access that piece of shared data at a time.
If you happen to fall into this situation, there are more complex techniques than mutexes. Fortunately, it's fairly rare (unless you're implementing operating systems or low-level libraries) so you can get away with using mutexes for a very large fraction of your synchronization needs.

Will Mutex protection failed for register promotion?

In an article about c++11 memory order, author show an example reasoning "threads lib will not work in c++03"
for (...){
...
if (mt) pthread_mutex_lock(...);
x=...x...
if (mt) pthread_mutex_unlock(...);
}
//should not have data-race
//but if "clever" compiler use a technique called
//"register promotion" , code become like this:
r = x;
for (...){
...
if (mt) {
x=r; pthread_mutex_lock(...); r=x;
}
r=...r...
if (mt) {
x=r; pthread_mutex_unlock(...); r=x;
}
x=r;
There are 3 question:
1.Is this promotion only break the mutex protection in c++03?What about c language?
2.c++03 thread libs become unwork?
3.Any other promotion may caused same problem?
If it's wrong example, then thread libs work, what about the 《Threads Cannot be Implemented as a Library》by Hans Boehm.

POSIX functions pthread_mutex_lock and pthread_mutex_unlock are memory barriers, the compiler and/or CPU cannot reorder loads and stores around them. Otherwise the mutexes would be useless. That article is probably inaccurate.
See POSIX 4.12 Memory Synchronization:
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. The following functions synchronize memory with respect to other threads: [see the list on the website]

For single thread code, the state in the abstract machine is not directly observable: objects that aren't volatile are not guaranteed to have any particular state when you pause the only thread with a signal and observe it via ptrace or the equivalent. The only requirement is that the program execution has the same observable behavior as a behavior of one possible execution of the abstract machine.
The observables are the interactions with external world; basically, input/output on streams and actions on volatile objects.
A compiler for mono-thread code can generate code that perform operations on global variables or other object that happen to be shared between threads, as long as the single thread semantic is respected. This is obviously the case if a global variable to changed in such a way that it gets back its original value.
For example, a compiler might emit code that increment then decrement a variable, at least in some rare cases; the goal would be to emit simple code, at the cost of the occasional few unneeded operations.
Such changes to shared variables that don't exist in the abstract machine would obviously break multithreaded code that concurrently performs a real operation; such code does not have any race condition on the accesses of the shared variable, that are properly serialized, but the generated code introduced a race that breaks the program.

Kernel Programming - Mutexes

So I'm trying to use mutex_init(), mutex_lock(), mutex_unlock() for thread synchronization.
I am currently trying to schedule threads in a round robin fashion(but more than 1 thread could be running at a time) and I set the current state of a thread to TASK_INTERRUPTIBLE, followed by waking up another thread whose PID, I have in a list.
I need to iterate over this list for my logic.
As I understand it, I need to lock this list as I access its elements, or another thread might miss a new entry while I'm making changes to it. Also, as one mutex has locked a resource, no other mutex can unlock it, until the original mutex releases it.
But, I'm still not sure if I'm locking it correctly. (I release the lock before I call schedule(), and re-lock after that)
I declare a mutex locally within a thread and lock the list. After my current thread locks
mutex_lock(&lock);
and I iterate over the list, till I find something(or ends if it doesn't find anything), then unlocks.
mutex_unlock(&lock);
I assume locking while I iterate is legal. I have never seen examples of this though.
Also, is it normal for the process to have a state of (TASK_UNINTERRUPTIBLE) while it holds a mutex lock?
EDIT : I am adding some more information based on the answer below.
It is possible my program may be run on a virtual machine with a single core. Therefore, I do not want to risk infinite polling using spin_lock().
I am trying to maintain scheduling between threads that have a certain id. For example if there are 4 threads. 2 in set 'A' and 2 in set 'B'. I allow only 1 thread to run in each set. But I switch between threads in a given set. However, a thread in set 'A' should not switch to any thread in set 'B'
(I know the kernel scheduler wont be perfect, so an approximate switching will do).
My Reasoning for TASK_STATE's:
1) Initial thread that gets created is running.
2) If another thread in the same set is running (and this one hasn't executed for a given time). Set other thread to TASK_INTERRUPTIPLE, while calling schedule(); Note: There can be more than 2 threads in each set, but let's keep it simple by considering only 2 for now.
3) If it has executed for enough time, set this task to TASK_INTERRUPTIPLE, set the other task in the same set to TASK_RUNNING, while calling schedule();
All this logic happens while I am accessing certain data structures which are locked by a (now) Global Mutex. I unlock the mutex just before I call schedule(), and instantly re-lock afterward. After my logic part is done, I completely unlock the mutex.
Is there anything fundamentally wrong with the approach?

As I understand it, I need to lock this list as I access its elements
Yes, that is true. But if you use a mutex, you're going to be really sad because a call to lock/unlock is a call to the scheduler. Therefore, calling it from inside the scheduler should result in deadlock. What you need to do depends on if your processor is multi-core or (the mythical) single-core. (Is this a virtual system?) On a single-core processor you can disable interrupts. On a multi-core processor, disabling interrupts is not sufficient (it only disables interrupts for that one core, and another core may still be interrupted). The simplest thing to do on a multi-core is to use a spinlock. Unlike the mutex, both of these locking mechanisms can be unlocked from different threads.
I set the current state of a thread to TASK_INTERRUPTIBLE
Is the thread being taken off the CPU? If so, it's not running, so I suspect that TASK_INTERRUPTIBLE is the wrong state. It would be helpful if you could list the possible states for me or if you could describe what the state is supposed to indicate. Because to me "TASK_INTERRUPTIBLE" sounds like a running task.
I declare a mutex locally within a thread and lock the list
Local mutexes are a red flag! The resource you are locking should be protected by a mutex with the same scope. If the list is global, it should have a global mutex to protect it. Threads that want to use the list must first acquire its mutex. Of course, as I already talked about, you probably want to use a different kind of locking to protect the list of ready-to-run processes.
I assume locking while I iterate is legal
It is perfectly legal (assuming of course that your mutual exclusion scheme is bug-free). In fact, it's required. If another thread were allowed to, for example, remove a node from the list while you were reading it, you could end up dereferencing a deleted node.
Also, is it normal for the process to have a state of TASK_UNINTERRUPTIBLE while it holds a mutex lock?
No, not while it holds the lock if the process is currently running on a CPU. A mutex is available to user code. If holding a mutex made the process uninterruptible, that would mean that a process could hijack the system by simply locking a mutex and never releasing it. Now, you will find that the lock and unlock functions need to be uninterruptible on a single-core processor. However, it doesn't make sense to set the state for the process because it's actually the scheduler that must not be interrupted.

Does a PTHREAD mutex only avoid simultaneous access to a resource, or it does anything more?

Example:
A thread finishes writing to a shared variable, and then it unlocks it, but continues to use that variable's value (without changing it).
And immediately, another thread successfully unlocks() that mutex and reads the shared variable.
For my (mis-)understanding, some things could be happening on this situation:
On the WRITER thread:
A compiler optimization could make the write occur only at some later point
The written value could be retained in the current CPU core's cache, and flushed to the memory at some later point
On the READER thread:
The value of the variable may have been read before the mutex lock(), and because of some compiler optimization or just the usual work of the CPU cache, still be considered "already read from memory" and thus, not fetched from the memory again.
Thus, the value we have here is not the updated one from the other thread.
Does the pthread mutex lock/unlock() functions execute any code to "flush" the current cache to the memory and anything else needed to make sure the current thread is synchronized with everything else (I cannot think of anything else than the cache), or is it just not needed (at least in all known architectures)?
Because if all the mutexes do is just what the name does - mutual exclusion to it's reference - then, if I have thousands of threads dealing with the same data and from my algorithm's point of view, I already know that when one thread is using a variable, no other thread will try to use it at the same time, than it means I don't need a mutex? Or will my code be missing some low level and architecture-specific method(s) implemented inside the PTHREAD library to avoid the problems above?

The pthreads mutex lock and unlock functions are among the list of functions in POSIX "...that synchronize thread execution and also synchronize memory with respect to other threads". So yes, they do more than just interlock execution.
Whether or not they need to issue additional instructions to the hardware is of course architecture dependent (noting that almost every modern CPU architecture will at least happily reorder reads with respect to each other unless told otherwise), but in every case those functions must act as "compiler barriers" - that is, they ensure that the compiler won't reorder, coalesce or omit memory accesses in situations where it would otherwise be allowed to.
It is allowed to have multiple threads reading a shared value without mutual exclusion though - all you need to ensure is that both the writing and reading threads executed some synchronising function between the write and the read. For example, an allowable situation is to have many reading threads that defer reading the shared state until they have passed a barrier (pthread_barrier_wait()) and a writing thread that performs all its writes to the shared state before it passes the barrier. Reader-writer locks (pthread_rwlock_*) are also built around this idea.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string