What is the difference between std::sync::Mutex vs tokio::sync::Mutex? - rust

What is an "async" mutex as opposed to a "normal" mutex? I believe this is the difference between tokio's Mutex and the normal std lib Mutex. But I don't get, conceptually, how a mutex can be "async". Isn't the whole point that only one thing can use it at a time?

Here's a simple comparison of their usage:
let mtx = std::sync::Mutex::new(0);
let _guard = mtx.lock().unwrap();
let mtx = tokio::sync::Mutex::new(0);
let _guard = mtx.lock().await;
Both ensure mutual exclusivity. The only difference between an asynchronous mutex and a synchronous mutex is dictated by their behavior when trying to acquire a lock. If a synchronous mutex tries to acquire the lock while it is already locked, it will block execution on the thread. If an asynchronous mutex tries to acquire the lock while it is already locked, it will yield execution to the executor.
If your code is synchronous, there's no reason to use an asynchronous mutex. As shown above, locking an asynchronous mutex is Future-based and is designed to be using in async/await contexts.
If your code is asynchronous, you may still want to use a synchronous mutex since there is less overhead. However, you should be mindful that blocking in an async/await context is to be avoided at all costs. Therefore, you should only use a synchronous mutex if acquiring the lock is not expected to block. Some cases to keep in mind:
If you need to hold the lock over an .await call, use an asynchronous mutex. The compiler will usually reject this anyway when using thread-safe futures since most synchronous mutex locks can't be sent to another thread.
If your lock is contentious (i.e. if you expect the mutex to already be locked when you want it), you should use an asynchronous mutex. This can happen when synchronizing multiple tasks into a pool or bounded queue.
If you have complicated and/or computationally-heavy updates, those should probably be moved to a blocking pool anyway where you'd use a synchronous mutex.
The above cases are all three sides of the same coin: if you expect to block, use an asynchronous mutex. If you don't know whether your mutex usage will block or not, err on the side of caution and use an asynchronous mutex. Using an asynchronous mutex where a synchronous one would suffice only leaves a small amount of performance on the table, but using a synchronous mutex where you should've used an asynchronous one could be catastrophic.
Most situations I run into with mutexes are when synchronizing simple data structures, where the update methods are well-encapsulated to acquire the lock, update the data, and release the lock. Did you know a simple println! requires locking a mutex? Those uses of mutexes can be synchronous and used even in an asynchronous context. Even if the lock does block, it often is no more impactful than a process context switch which happens all the time anyway.
Note: Tokio's Mutex does have a .blocking_lock() method which is helpful if both locking behaviors are needed. So the mutex can be both synchronous and asynchronous!
See also:
Why do I get a deadlock when using Tokio with a std::sync::Mutex?
std::sync::Mutex vs futures:lock:Mutex vs futures_lock::Mutex for async on the Rust forum
Which kind of mutex should you use? in the Tokio documentation
On using std::sync::Mutex in the Tokio tutorial on shared state

Related

When or why should I use a Mutex over an RwLock?

When I read the documentations of Mutex and RwLock, the difference I see is the following:
Mutex can have only one reader or writer at a time,
RwLock can have one writer or multiple readers at a time.
When you put it that way, RwLock seems always better (less limited) than Mutex, why would I use it, then?
Sometimes it is better to use a Mutex over an RwLock in Rust:
RwLock<T> needs more bounds for T to be thread-safe:
Mutex requires T: Send to be Sync,
RwLock requires T to be Send and Sync to be itself Sync.
In other words, Mutex is the only wrapper that can make a T syncable. I found a good and intuitive explanation in reddit:
Because of those bounds, RwLock requires its contents to be Sync, i.e. it's safe for two threads to have a &ptr to that type at the same time. Mutex only requires the data to be Send, because conceptually you can think of it like when you lock the Mutex it sends the data to your thread, and when you unlock it the data gets sent to another thread.
Use Mutex when your T is only Send and not Sync.
Preventing writer starvation
RwLock does not have a specified implementation because it uses the implementation of the system. Some read-write locks can be subject to writer starvation while Mutex cannot have this kind of issue.
Mutex should be used when you have possibly too many readers to let the writers have the lock.
Mutex is a simple method of locking to control access to shared resources.
At the same time, only one thread can master a mutex, and threads with locked status can access shared resources.
If another thread wants to lock a resource that has been mutexed, the thread hangs until the locked thread releases the mutex.
Read write locks are more complex than mutex locks.
Threads using mutex lack read concurrency.
When there are more read operations and fewer write operations, read-write locks can be used to improve thread read concurrency.
Let me summarize for myself:
The implementation of read-write lock is more complex than that of
mutual exclusion lock, and the performance is poor.
The read-write lock supports simultaneous reading by multiple threads. The mutex lock does not support simultaneous reading by multiple threads, so the read-write lock has high concurrency.

Why isn't mutex_trylock safe for use in interrupts?

Linux Kernel Development by Robert Love states:
A mutex cannot be acquired by an interrupt handler or bottom half, even with
mutex_trylock()
At http://landley.net/kdocs/htmldocs/kernel-locking.html, its mentioned that
mutex_trylock() does not suspend your task but returns non-zero if it could lock the mutex on the first try or 0 if not. This function cannot be safely used in hardware or software interrupt contexts despite not sleeping.
I don't understand why it can't be used in such cases when it doesn't go to sleep?
Imagine if you had a platform whose native, low-level primitive mutexes do not have a "try lock" operation. In that case, to implement a high-level mutex that does, you'd have to use a condition variable and a boolean "is locked" protected by the low-level mutex to indicate the high-level mutex was locked.
So a waitable mutex could be implemented using a low-level primitive mutex (that does not support a "trylock" operation) to implement a high-level mutex (that does). The "high-level mutex" can just be a boolean that's protected by the low-level mutex.
With that design, mutex_lock would be implemented as follows:
Acquire low-level mutex (this is a real lock operation on the primitive, implementation mutex).
If high-level mutex is held, do a condition wait for the high-level mutex.
Acquire high-level mutex (just locked = true;).
Release low-level mutex.
And mutex_unlock would be implemented as follows:
Acquire low-level mutex.
Release high-level mutex (just locked = false;)
Signal the condition variable.
Release the low-level mutex.
In that case, mutex_trylock would be implemented as follows:
Acquire low-level mutex.
Check if high-level mutex is held.
If so, release low-level mutex and return failure.
Take high-level mutex.
Release low-level mutex.
Return success.
Imagine if we're interrupted after step 2 but before step 3.
may be it is because in mutex_trylock interrupts are not disabled. if we have a scenario where in interrupt context a mutex is locked using mutex_trylock and another interrupt comes that tries to acquire the same mutex. It can result in deadlock kind of situation.
The reason is that we require the mutex semantics to allow for full Priority-Inheritance, even if the default implementation does not do that (PREEMPT_RT switches mutex to mutex_rt and then it does get to have PI).
Suppose your interrupt does mutex_trylock() successfully, then the interrupt, or rather the task that was interrupted, becomes the lock owner. Any contending mutex_lock() would try and PI-boost that owner. One might argue that due to interrupts being non-preemptible, the context is effectively a prio-ceiling and all is well. However, if the interrupt hit the idle task, we'll end up trying to boost the idle task, and that is a big no-no.
Also read the thread here:
https://lkml.kernel.org/r/20191218135047.GS2844#hirez.programming.kicks-ass.net

How is the atomic unlock-and-block achieved in the implementation of synchronization primitives like mutexes and condition variables?

For example, suppose you are using atomic spinlock on an integer flag to ensure only one thread modifies the wait-queue that the mutex maintains at any given time. When a thread tries to lock the mutex, we want it to enqueue itself and set the flag to zero before it blocks itself and the unlocker to dequeue a thread from the queue and set it to runnable.
Consider only two threads to be present, one locking and the other releasing the mutex at the same time. if the locker was preempted after it added himself to the queue and set the flag to zero (but not blocked itself yet) and the unlocker then tried to dequeue and make the thread runnable, it wouldn't be useful since the thread hasn't blocked itself yet. So the make-runnable call would be waste but more importantly, the locker thread would then block itself after that and would remain blocked forever.
How is this atomicity achieved to ensure correctness? A similar scenario can be imagined in condition variables with the release of mutex and blocking itself.

What is the difference between semaphore and mutex in implementation?

I read that mutex and binary semaphore are different in only one aspect, in the case of mutex the locking thread has to unlock, but in semaphore the locking and unlocking thread can be different?
Which one is more efficient?
Assuming you know the basic differences between a sempahore and mutex :
For fast, simple synchronization, use a critical section.
To synchronize threads across process boundaries, use mutexes.
To synchronize access to limited resources, use a semaphore.
Apart from the fact that mutexes have an owner, the two objects may be optimized for different usage. Mutexes are designed to be held only for a short time; violating this can cause poor performance and unfair scheduling. For example, a running thread may be permitted to acquire a mutex, even though another thread is already blocked on it, creating a deadlock. Semaphores may provide more fairness, or fairness can be forced using several condition variables.

Recursive Lock (Mutex) vs Non-Recursive Lock (Mutex)

POSIX allows mutexes to be recursive. That means the same thread can lock the same mutex twice and won't deadlock. Of course it also needs to unlock it twice, otherwise no other thread can obtain the mutex. Not all systems supporting pthreads also support recursive mutexes, but if they want to be POSIX conform, they have to.
Other APIs (more high level APIs) also usually offer mutexes, often called Locks. Some systems/languages (e.g. Cocoa Objective-C) offer both, recursive and non recursive mutexes. Some languages also only offer one or the other one. E.g. in Java mutexes are always recursive (the same thread may twice "synchronize" on the same object). Depending on what other thread functionality they offer, not having recursive mutexes might be no problem, as they can easily be written yourself (I already implemented recursive mutexes myself on the basis of more simple mutex/condition operations).
What I don't really understand: What are non-recursive mutexes good for? Why would I want to have a thread deadlock if it locks the same mutex twice? Even high level languages that could avoid that (e.g. testing if this will deadlock and throwing an exception if it does) usually don't do that. They will let the thread deadlock instead.
Is this only for cases, where I accidentally lock it twice and only unlock it once and in case of a recursive mutex, it would be harder to find the problem, so instead I have it deadlock immediately to see where the incorrect lock appears? But couldn't I do the same with having a lock counter returned when unlocking and in a situation, where I'm sure I released the last lock and the counter is not zero, I can throw an exception or log the problem? Or is there any other, more useful use-case of non recursive mutexes that I fail to see? Or is it maybe just performance, as a non-recursive mutex can be slightly faster than a recursive one? However, I tested this and the difference is really not that big.
The difference between a recursive and non-recursive mutex has to do with ownership. In the case of a recursive mutex, the kernel has to keep track of the thread who actually obtained the mutex the first time around so that it can detect the difference between recursion vs. a different thread that should block instead. As another answer pointed out, there is a question of the additional overhead of this both in terms of memory to store this context and also the cycles required for maintaining it.
However, there are other considerations at play here too.
Because the recursive mutex has a sense of ownership, the thread that grabs the mutex must be the same thread that releases the mutex. In the case of non-recursive mutexes, there is no sense of ownership and any thread can usually release the mutex no matter which thread originally took the mutex. In many cases, this type of "mutex" is really more of a semaphore action, where you are not necessarily using the mutex as an exclusion device but use it as synchronization or signaling device between two or more threads.
Another property that comes with a sense of ownership in a mutex is the ability to support priority inheritance. Because the kernel can track the thread owning the mutex and also the identity of all the blocker(s), in a priority threaded system it becomes possible to escalate the priority of the thread that currently owns the mutex to the priority of the highest priority thread that is currently blocking on the mutex. This inheritance prevents the problem of priority inversion that can occur in such cases. (Note that not all systems support priority inheritance on such mutexes, but it is another feature that becomes possible via the notion of ownership).
If you refer to classic VxWorks RTOS kernel, they define three mechanisms:
mutex - supports recursion, and optionally priority inheritance. This mechanism is commonly used to protect critical sections of data in a coherent manner.
binary semaphore - no recursion, no inheritance, simple exclusion, taker and giver does not have to be same thread, broadcast release available. This mechanism can be used to protect critical sections, but is also particularly useful for coherent signalling or synchronization between threads.
counting semaphore - no recursion or inheritance, acts as a coherent resource counter from any desired initial count, threads only block where net count against the resource is zero.
Again, this varies somewhat by platform - especially what they call these things, but this should be representative of the concepts and various mechanisms at play.
The answer is not efficiency. Non-reentrant mutexes lead to better code.
Example: A::foo() acquires the lock. It then calls B::bar(). This worked fine when you wrote it. But sometime later someone changes B::bar() to call A::baz(), which also acquires the lock.
Well, if you don't have recursive mutexes, this deadlocks. If you do have them, it runs, but it may break. A::foo() may have left the object in an inconsistent state before calling bar(), on the assumption that baz() couldn't get run because it also acquires the mutex. But it probably shouldn't run! The person who wrote A::foo() assumed that nobody could call A::baz() at the same time - that's the entire reason that both of those methods acquired the lock.
The right mental model for using mutexes: The mutex protects an invariant. When the mutex is held, the invariant may change, but before releasing the mutex, the invariant is re-established. Reentrant locks are dangerous because the second time you acquire the lock you can't be sure the invariant is true any more.
If you are happy with reentrant locks, it is only because you have not had to debug a problem like this before. Java has non-reentrant locks these days in java.util.concurrent.locks, by the way.
As written by Dave Butenhof himself:
"The biggest of all the big problems with recursive mutexes is that
they encourage you to completely lose track of your locking scheme and
scope. This is deadly. Evil. It's the "thread eater". You hold locks for
the absolutely shortest possible time. Period. Always. If you're calling
something with a lock held simply because you don't know it's held, or
because you don't know whether the callee needs the mutex, then you're
holding it too long. You're aiming a shotgun at your application and
pulling the trigger. You presumably started using threads to get
concurrency; but you've just PREVENTED concurrency."
The right mental model for using
mutexes: The mutex protects an
invariant.
Why are you sure that this is really right mental model for using mutexes?
I think right model is protecting data but not invariants.
The problem of protecting invariants presents even in single-threaded applications and has nothing common with multi-threading and mutexes.
Furthermore, if you need to protect invariants, you still may use binary semaphore wich is never recursive.
One main reason that recursive mutexes are useful is in case of accessing the methods multiple times by the same thread. For example, say if mutex lock is protecting a bank A/c to withdraw, then if there is a fee also associated with that withdrawal, then the same mutex has to be used.
The only good use case for recursion mutex is when an object contains multiple methods. When any of the methods modify the content of the object, and therefore must lock the object before the state is consistent again.
If the methods use other methods (ie: addNewArray() calls addNewPoint(), and finalizes with recheckBounds()), but any of those functions by themselves need to lock the mutex, then recursive mutex is a win-win.
For any other case (solving just bad coding, using it even in different objects) is clearly wrong!
What are non-recursive mutexes good for?
They are absolutely good when you have to make sure the mutex is unlocked before doing something. This is because pthread_mutex_unlock can guarantee that the mutex is unlocked only if it is non-recursive.
pthread_mutex_t g_mutex;
void foo()
{
pthread_mutex_lock(&g_mutex);
// Do something.
pthread_mutex_unlock(&g_mutex);
bar();
}
If g_mutex is non-recursive, the code above is guaranteed to call bar() with the mutex unlocked.
Thus eliminating the possibility of a deadlock in case bar() happens to be an unknown external function which may well do something that may result in another thread trying to acquire the same mutex. Such scenarios are not uncommon in applications built on thread pools, and in distributed applications, where an interprocess call may spawn a new thread without the client programmer even realising that. In all such scenarios it's best to invoke the said external functions only after the lock is released.
If g_mutex was recursive, there would be simply no way to make sure it is unlocked before making a call.
IMHO, most arguments against recursive locks (which are what I use 99.9% of the time over like 20 years of concurrent programming) mix the question if they are good or bad with other software design issues, which are quite unrelated. To name one, the "callback" problem, which is elaborated on exhaustively and without any multithreading related point of view, for example in the book Component software - beyond Object oriented programming.
As soon as you have some inversion of control (e.g. events fired), you face re-entrance problems. Independent of whether there are mutexes and threading involved or not.
class EvilFoo {
std::vector<std::string> data;
std::vector<std::function<void(EvilFoo&)> > changedEventHandlers;
public:
size_t registerChangedHandler( std::function<void(EvilFoo&)> handler) { // ...
}
void unregisterChangedHandler(size_t handlerId) { // ...
}
void fireChangedEvent() {
// bad bad, even evil idea!
for( auto& handler : changedEventHandlers ) {
handler(*this);
}
}
void AddItem(const std::string& item) {
data.push_back(item);
fireChangedEvent();
}
};
Now, with code like the above you get all error cases, which would usually be named in the context of recursive locks - only without any of them. An event handler can unregister itself once it has been called, which would lead to a bug in a naively written fireChangedEvent(). Or it could call other member functions of EvilFoo which cause all sorts of problems. The root cause is re-entrance.
Worst of all, this could not even be very obvious as it could be over a whole chain of events firing events and eventually we are back at our EvilFoo (non- local).
So, re-entrance is the root problem, not the recursive lock.
Now, if you felt more on the safe side using a non-recursive lock, how would such a bug manifest itself? In a deadlock whenever unexpected re-entrance occurs.
And with a recursive lock? The same way, it would manifest itself in code without any locks.
So the evil part of EvilFoo are the events and how they are implemented, not so much a recursive lock. fireChangedEvent() would need to first create a copy of changedEventHandlers and use that for iteration, for starters.
Another aspect often coming into the discussion is the definition of what a lock is supposed to do in the first place:
Protect a piece of code from re-entrance
Protect a resource from being used concurrently (by multiple threads).
The way I do my concurrent programming, I have a mental model of the latter (protect a resource). This is the main reason why I am good with recursive locks. If some (member) function needs locking of a resource, it locks. If it calls another (member) function while doing what it does and that function also needs locking - it locks. And I don't need an "alternate approach", because the ref-counting of the recursive lock is quite the same as if each function wrote something like:
void EvilFoo::bar() {
auto_lock lock(this); // this->lock_holder = this->lock_if_not_already_locked_by_same_thread())
// do what we gotta do
// ~auto_lock() { if (lock_holder) unlock() }
}
And once events or similar constructs (visitors?!) come into play, I do not hope to get all the ensuing design problems solved by some non-recursive lock.

Resources