RwLock deadlock and safety of the primitive

RwLock deadlock and safety of the primitive - rust

In a textbook read/write lock, if a writer takes a lock, it blocks all new readers and waits for existing readers to exit. However, Rust docs suggest that some system-specific implementations can deadlock (while the textbook one can't?):
The priority policy of the lock is dependent on the underlying operating system’s implementation, and this type does not guarantee that any particular policy will be used. In particular, a writer which is waiting to acquire the lock in write might or might not block concurrent calls to read [...]
(docs).
Curious if anyone has more details that could explain this difference in implementations or maybe how to select a desired safe policy? It seems like without a guarantee a deadlock is almost certain.

The documentation is saying the fairness of the lock is OS dependent. If the lock is unfair to writers, then a continuous stream of readers (even if they individually unlock and relock) could keep the writer from acquiring the lock. If this happens its not deadlocked, it is starved.
If this is a concern, you can consider using a fair lock like parking-lot's RwLock.

Your textbook is correct, however I don't see any part of the documentation stating a deadlock may occur. The documentation states this since thread safety and concurrency is a big part of rust. However since Rust can not make any guarantees regarding system specific implementations beyond what other existing standards are available. If even a single compilation target may block with a system specific implementation, then it should be stated as such in the documentation since it could otherwise be count as undefined behavior in safe Rust.
But this really isn't cause for concern. A RwLock can offer better performance for many concurrent reads than a mutex by not blocking. Even if a system specific policy is problematic, you will at least have the protection provided by the type system enforcing that a RwLock only be Sync if its contents are also Sync.
// This is also the reason why a RwLock has extra requirements on Sync
impl<T: ?Sized + Send> Sync for Mutex<T> {...}
impl<T: ?Sized + Send + Sync> Sync for RwLock<T> {...}

Related

Terminology question: mutex lock, spin lock, sleepable lock

All over StackOverflow and the net I see folks to distinguish mutexes and spinlocks as like mutex is a mutual exclusion lock providing acquire() and release() functions and if the lock is taken, then acquire() will allow a process to be preempted.
Nevertheless, A. Silberschatz in his Operating System Concepts says in the section 6.5:
... The simplest of these tools is the mutex lock. (In fact, the term mutex is short for mutual exclusion.) We use the mutex lock to protect critical sections and thus prevent race conditions. That is, a process must acquire the lock before entering a critical section; it releases the lock when it exits the critical section. The acquire() function acquires the lock, and the release() function releases the lock.
and then he describes a spinlock, though adding a bit later
The type of mutex lock we have been describing is also called a spinlock because the process “spins” while waiting for the lock to become available.
so as spinlock is just a type of mutex lock as opposed to sleepable locks allowing a process to be preempted. That is, spinlocks and sleepable locks are all mutexes: locks by means of acquire() and release() functions.
I see totally logical to define mutex locks in the way Silberschatz did (though a bit implicitly).
What approach would you agree with?

The type of mutex lock we have been describing is also called a spinlock because the process “spins” while waiting for the lock to become available.
Maybe you're misreading the book (that is, "The type of mutex lock we have been describing" might not refer to the exact passage you think it does), or the book is outdated. Modern terminology is quite clear in what a mutex is, but spinlocks get a bit muddy.
A mutex is a concurrency primitive that allows one agent at a time access to its resource, while the others have to wait in the meantime until it the exclusive access is released. How they wait is not specified and irrelevant, their process might go to sleep, get written to disk, spin in a loop, or perhaps you are using cooperative concurrency (also known as "asynchronous programming") and passing control to the event loop as your 'waiting operation'.
A spinlock does not have a clear definition. It might be used to refer to:
A synonym for mutex (this is in my opinion wrong, but it happens).
A specific mutex implementation that always waits in a busy loop.
Any sort of busy-waiting loop waiting for a resource. A semaphore, for example, might also get implemented using a 'spinlock'.
I would consider any use of the word to refer to a (part of a) specific implementation of a concurrency primitive that waits in a busy loop to be correct, if a more general term is not appropriate. That is, use mutex (or whatever primitive you desire) unless you specifically want to talk about a busy-waiting concurrency primitive.

The words that one author uses in one book or manual will not always have the same exact meaning in every book and every manual. The meanings of the words evolve over time, and it can happen fast when the words are names for new ideas.
Not every book was written at the same time. Not every author is the same age or had the same teachers. It's just something you'll have to get used to.
"Mutex" was a name for a new idea not so very long ago.
In one book, it might mean nothing more than a thing that keeps two or more threads from entering the same critical section at the same time. In another book, it might refer to a specific type of object in a certain operating system or library that is used for that same purpose.

A spinlock is a lock/mutex whose implementation relies mainly on a spinning loop.
More advanced locks/mutexes may have spinning parts in their implementation, however those often last for no more than a few microseconds or so.

Understanding the Send trait

I am trying to wrap my head around Send + Sync traits. I get the intuition behind Sync - this is the traditional thread safety(like in C++). The object does the necessary locking(interior mutability if needed), so threads can safely access it.
But the Send part is bit unclear. I understand why things like Rc are Send only - the object can be given to a different thread, but non-atomic operations make it thread unsafe.
What is the intuition behind Send? Does it mean the object can be copied/moved into another thread context, and continues to be valid after the copy/move?
Any examples scenarios for "Sync but no Send" would really help. Please also point to any rust libraries for this case (I found several for the opposite though)
For (2), I found some threads which use structs with pointers to data on stack/thread local storage as examples. But these are unsafe anyways(Sync or otherwise).

Sync allows an object to to be used by two threads A and B at the same time. This is trivial for non-mutable objects, but mutations need to be synchronized (performed in sequence with the same order being seen by all threads). This is often done using a Mutex or RwLock which allows one thread to proceed while others must wait. By enforcing a shared order of changes, these types can turn a non-Sync object into a Sync object. Another mechanism for making objects Sync is to use atomic types, which are essentially Sync primitives.
Send allows an object to be used by two threads A and B at different times. Thread A can create and use an object, then send it to thread B, so thread B can use the object while thread A cannot. The Rust ownership model can be used to enforce this non-overlapping use. Hence the ownership model is an important part of Rust's Send thread safety, and may be the reason that Send is less intuitive than Sync when comparing with other languages.
Using the above definitions, it should be apparent why there are few examples of types that are Sync but not Send. If an object can be used safely by two threads at the same time (Sync) then it can be used safely by two threads at different times (Send). Hence, Sync usually implies Send. Any exception probably relates to Send's transfer of ownership between threads, which affects which thread runs the Drop handler and deallocates the value.
Most objects can be used safely by different threads if the uses can be guaranteed to be at different times. Hence, most types are Send.
Rc is an exception. It does not implement Send. Rc allows data to have multiple owners. If one owner in thread A could send the Rc to another thread, giving ownership to thread B, there could be other owners in thread A that can still use the object. Since the reference count is modified non-atomically, the value of the count on the two threads may get out of sync and one thread may drop the pointed-at value while there are owners in the other thread.
Arc is an Rc that uses an atomic type for the reference count. Hence it can be used by multiple threads without the count getting out of sync. If the data that the Arc points to is Sync, the entire object is Sync. If the data is not Sync (e.g. a mutable type), it can be made Sync using a Mutex. Hence the proliferation of Arc<Mutex<T>> types in multithreaded Rust code.

Send means that a type is safe to move from one thread to another. If the same type also implements Copy, this also means that it is safe to copy from one thread to another.
Sync means that a type is safe to reference from multiple threads at the same time. Specifically, that &T is Send and can be moved/copied to another thread if T is Sync.
So Send and Sync capture two different aspects of thread safety:
Non-Send types can only ever be owned by a single thread, since they cannot be moved or copied to other threads.
Non-Sync types can only be used by a single thread at any single time, since their references cannot be moved or copied to other threads. They can still be moved between threads if they implement Send.
It rarely makes sense to have Sync without Send, as being able to use a type from different threads would usually mean that moving ownership between threads should also be possible. Although they are technically different, so it is conceivable that certain types can be Sync but not Send.
Most types that own data will be Send, as there are few cases where data can't be moved from one thread to another (and not be accessed from the original thread afterwards).
Some common exceptions:
Raw pointers are never Send nor Sync.
Types that share ownership of data without thread synchronization (for instance Rc).
Types that borrow data that is not Sync.
Types from external libraries or the operating system that are not thread safe.

Overall
Send and Sync exist to help thinking about the types when many threads are involved. In a single thread world, there is no need for Send and Sync to exist.
It may help also to not always think about Send and Sync as allowing you to do something, or giving you power to do something. On the contrary, think about !Send and !Sync as ways of forbidding or preventing you of doing multi-thread problematic stuff.
For the definition of Send and Sync
If some type X is Send, then if you have an owned X, you can move it into another thread.
This can be problematic if X is somehow related to multi/shared-ownership.
Rc has a problem with this, since having one Rc allows you to create more owned Rc's (by cloning it), but you don't want any of those to pass into other threads. The problem is that many threads could be making more clones of that Rc at the same time, and the counter of the owners inside of it doesn't work well in that multi-thread situation - because even if each thread would own an Rc, there would be only one counter really, and access into it would not be synchronized.
Arc may work better. At least it's owner's counter is capable of dealing with the situation mentioned above. So in that regard, Arc is ok to allow Send'ing. But only if the inner type is both Send and Sync. For example, an Arc<Rc> is still problematic - remembering that Rc forbids Send (!Send) - because multiple threads having their own owned clone of that Arc<Rc> could still invoke the Rc's own "multi-thread" problems - the Arc itself can't protect the threads from doing that. The other requirement of Arc<T>, to being Send, also requiring T to be Sync is not a big of a deal, because if a type is already forbidding Send'ing, it will likely also be forbidding Sync'ing.
So if some type forbids Sending, then doesn't matter what other types you try wrapping around it, you won't be able to make it "sendable" into another thread.
If some type X is Sync, then if multiple threads happened to somehow have an &X each, they all can safely use that &X.
This is problematic if &X allows interior mutability, and you'd want to forbid Sync if you want to prevent multiple threads having &X.
So if X has a problem with Sending, it will basically also have a problem with Syncing.
It's also problematic for Cell - which doesn't actually forbids Sending. Since Cell allows interior mutation by only having an &Cell, and that mutation access doesn't guarantee anything in a multithread situation, it must forbid Syncing - that is, the situation of multiple threads having &Cell must not be allowed (in general). Regarding it being Send, an owned Cell can still be moved into another thread, as long as there won't be &Cell's anywhere else.
Mutex may work better. It also allows interior mutation, and in which case it knows how to deal when many threads are trying to do it - the Mutex will only require that nothing inside of it forbids Send'ing - otherwise, it's the same problem that Arc would have to deal with. All being good, the Mutex is both Send and Sync.
This is not a practical example, but a curious note: if we have a Mutex<Cell> (which is redundant, but oh well), where Cell itself forbids Sync, the Mutex is able to deal with that problem, and still be (or "re-allow") Sync. This is because, once a thread got access into that Cell, we known it won't have to deal with other threads still trying to access others &Cell at the same time, since the Mutex will be locked and preventing this from happening.
Mutate a value in multi-thread
In theory you could share a Mutex between threads!
If you try to simply move an owned Mutex, you will get it done, but this is of no use, since you'd want multiple threads having some access to it at the same time.
Since it's Sync, you're allowed to share a &Mutex between threads, and it's lock method indeed only requires a &Mutex.
But trying this is problematic, let's say: you're in the main thread, then you create a Mutex and then a reference to it, a &Mutex, and then create another thread Z which you try to pass the &Mutex into.
The problem is that the Mutex has only one owner, and that is inside the main thread. If for some reason the thread Z outlives the main thread, that &Mutex would be dangling. So even if the Sync in the Mutex doesn't particularly forbids you of sending/sharing &Mutex between threads, you'll likely not get it done in this way, for lifetime reasons. Arc to the rescue!
Arc will get rid of that lifetime problem. instead of it being owned by a particular scope in a particular thread, it can be multi-owned, by multi-threads.
So using an Arc<Mutex> will allow a value to be co-owned and shared, and offer interior mutability between many threads. In sum, the Mutex itself re-allows Syncing while not particularly forbidding Sending, and the Arc doesn't particularly forbids neither while offering shared ownership (avoiding lifetime problems).
Small list of types
Types that are Send and Sync, are those that don't particularly forbids neither:
primitives, Arc, Mutex - depending on the inner types
Types that are Send and !Sync, are those that offer (multithread unsync) interior mutability:
Cell, RefCell - depending on the inner type
Types that are !Send and !Sync, are those that offer (multithread unsync) co-ownership:
Rc
I don't know types that are !Send and Sync;

According to
Rustonomicon: Send and Sync
A type is Send if it is safe to send it to another thread.
A type is Sync if it is safe to share between threads (T is Sync if and only if &T is Send).

In which cases is Arc<Mutex<T>> not the best way for sharing data across threads in Rust?

From what I've learned, I should always choose Arc<T> for shared read access across threads and Arc<Mutex<T>> for shared write access across threads. Are there cases where I don't want to use Arc<T>/Arc<Mutex<T>> and instead do something completely different? E.g. do something like this:
unsafe impl Sync for MyStruct {}
unsafe impl Send for MyStruct {}
let shared_data_for_writing = Arc::from(MyStruct::new());

Sharing across threads
Besides Arc<T>, we can share objects across threads using scoped threads, e.g. by using crossbeam::scope and Scope::spawn. Scoped threads allow us to send borrowed pointers (&'a T) to threads spawned in a scope. The scope guarantees that the thread will terminate before the referent is dropped. Borrowed pointers have no runtime overhead compared to Arc<T> (Arc<T> takes a bit more memory and needs to maintain a reference counter using atomic instructions).
Mutating across threads
Mutex<T> is the most basic general-purpose wrapper for ensuring at most one thread may mutate a value at any given time. Mutex<T> has one drawback: if there are many threads that only want to read the value in the mutex, they can't do so concurrently, even though it would be safe. RwLock<T> solves this by allowing multiple concurrent readers (while still ensuring a writer has exclusive access).
Atomic types such as AtomicUsize also allow mutation across threads, but only for small values (8, 16, 32 or 64 bits – some processors support atomic operations on 128-bit values, but that's not exposed in the standard library yet; see atomic::Atomic for that). For example, instead of Arc<Mutex<usize>>, you could use Arc<AtomicUsize>. Atomic types do not require locking, but they are manipulated through atomic machine instructions. The set of atomic instructions is a bit different from the set of non-atomic instructions, so switching from a non-atomic type to an atomic type might not always be a "drop-in replacement".

Do I ever not want to use a read/write lock instead of a vanilla mutex?

When synchronizing access to a shared resource, is there ever a reason not to use a read/write lock instead of a vanilla mutex (which is basically just a write lock), besides the philosophical reason of it having more features than I may need?
In other words, if I just default to read/write locks as my preferred synchronization construct of choice, am I shooting myself in the foot?
It seems to me that one good rationale for always choosing a read/write lock, and using the read vs. write lock accordingly, is I can implement some synchronization, then never have to think about it again while gaining the possible benefit of better performance scalability in the future if one day I drop the code into a more high-contention environment. So assuming it has a potential benefit with no actual cost, it would make sense to use it all the time. Does that make sense?
This is on a system that isn't really resource limited, it's probably more of a performance question. Also I've phrased this question generally but I have Qt's QReadWriteLock and QMutex (C++) specifically in mind, if it matters.

In practice, the write lock in a read/write lock pair is more expensive than a simple mutex. Read/write locks always have some coordination strategy that must be applied when you acquire or release a lock. Depending on the particular implementation, this strategy can be cheap or expensive but it always exists.
In case of QReadWriteLock, there's some logic that gives priority to writers. Even though the implementation of such logic might be efficient, and there're no readers in the waiting queue, it's never totally free.
I'm not familiar with all the details of the QMutex and QReadWriteLock implementation, but the documentation says that QMutex is heavily optimized for non-contended cases. QReadWriteLock has no such remark. Maybe because they just forgot to make such note, but maybe because its behaviour under such circumstances is not as good as of QMutex.
I think, in the best scenario, a penalty for using read/write lock is negligible. But in the worst scenario, when you fight for every nanosecond, it might be noticeable.

It really boils down on the characteristic of the congestion of your lock. The case when the simple mutex will perform significanly better is heavy congestion with writer preference.
This is a very lengthy and debatable subject. May I recommend Spinlocks and Read-Write Locks and Sleeping Read-Write Locks reading, so you can make an educated decision?

Recursive Lock (Mutex) vs Non-Recursive Lock (Mutex)

POSIX allows mutexes to be recursive. That means the same thread can lock the same mutex twice and won't deadlock. Of course it also needs to unlock it twice, otherwise no other thread can obtain the mutex. Not all systems supporting pthreads also support recursive mutexes, but if they want to be POSIX conform, they have to.
Other APIs (more high level APIs) also usually offer mutexes, often called Locks. Some systems/languages (e.g. Cocoa Objective-C) offer both, recursive and non recursive mutexes. Some languages also only offer one or the other one. E.g. in Java mutexes are always recursive (the same thread may twice "synchronize" on the same object). Depending on what other thread functionality they offer, not having recursive mutexes might be no problem, as they can easily be written yourself (I already implemented recursive mutexes myself on the basis of more simple mutex/condition operations).
What I don't really understand: What are non-recursive mutexes good for? Why would I want to have a thread deadlock if it locks the same mutex twice? Even high level languages that could avoid that (e.g. testing if this will deadlock and throwing an exception if it does) usually don't do that. They will let the thread deadlock instead.
Is this only for cases, where I accidentally lock it twice and only unlock it once and in case of a recursive mutex, it would be harder to find the problem, so instead I have it deadlock immediately to see where the incorrect lock appears? But couldn't I do the same with having a lock counter returned when unlocking and in a situation, where I'm sure I released the last lock and the counter is not zero, I can throw an exception or log the problem? Or is there any other, more useful use-case of non recursive mutexes that I fail to see? Or is it maybe just performance, as a non-recursive mutex can be slightly faster than a recursive one? However, I tested this and the difference is really not that big.

The difference between a recursive and non-recursive mutex has to do with ownership. In the case of a recursive mutex, the kernel has to keep track of the thread who actually obtained the mutex the first time around so that it can detect the difference between recursion vs. a different thread that should block instead. As another answer pointed out, there is a question of the additional overhead of this both in terms of memory to store this context and also the cycles required for maintaining it.
However, there are other considerations at play here too.
Because the recursive mutex has a sense of ownership, the thread that grabs the mutex must be the same thread that releases the mutex. In the case of non-recursive mutexes, there is no sense of ownership and any thread can usually release the mutex no matter which thread originally took the mutex. In many cases, this type of "mutex" is really more of a semaphore action, where you are not necessarily using the mutex as an exclusion device but use it as synchronization or signaling device between two or more threads.
Another property that comes with a sense of ownership in a mutex is the ability to support priority inheritance. Because the kernel can track the thread owning the mutex and also the identity of all the blocker(s), in a priority threaded system it becomes possible to escalate the priority of the thread that currently owns the mutex to the priority of the highest priority thread that is currently blocking on the mutex. This inheritance prevents the problem of priority inversion that can occur in such cases. (Note that not all systems support priority inheritance on such mutexes, but it is another feature that becomes possible via the notion of ownership).
If you refer to classic VxWorks RTOS kernel, they define three mechanisms:
mutex - supports recursion, and optionally priority inheritance. This mechanism is commonly used to protect critical sections of data in a coherent manner.
binary semaphore - no recursion, no inheritance, simple exclusion, taker and giver does not have to be same thread, broadcast release available. This mechanism can be used to protect critical sections, but is also particularly useful for coherent signalling or synchronization between threads.
counting semaphore - no recursion or inheritance, acts as a coherent resource counter from any desired initial count, threads only block where net count against the resource is zero.
Again, this varies somewhat by platform - especially what they call these things, but this should be representative of the concepts and various mechanisms at play.

The answer is not efficiency. Non-reentrant mutexes lead to better code.
Example: A::foo() acquires the lock. It then calls B::bar(). This worked fine when you wrote it. But sometime later someone changes B::bar() to call A::baz(), which also acquires the lock.
Well, if you don't have recursive mutexes, this deadlocks. If you do have them, it runs, but it may break. A::foo() may have left the object in an inconsistent state before calling bar(), on the assumption that baz() couldn't get run because it also acquires the mutex. But it probably shouldn't run! The person who wrote A::foo() assumed that nobody could call A::baz() at the same time - that's the entire reason that both of those methods acquired the lock.
The right mental model for using mutexes: The mutex protects an invariant. When the mutex is held, the invariant may change, but before releasing the mutex, the invariant is re-established. Reentrant locks are dangerous because the second time you acquire the lock you can't be sure the invariant is true any more.
If you are happy with reentrant locks, it is only because you have not had to debug a problem like this before. Java has non-reentrant locks these days in java.util.concurrent.locks, by the way.

As written by Dave Butenhof himself:
"The biggest of all the big problems with recursive mutexes is that
they encourage you to completely lose track of your locking scheme and
scope. This is deadly. Evil. It's the "thread eater". You hold locks for
the absolutely shortest possible time. Period. Always. If you're calling
something with a lock held simply because you don't know it's held, or
because you don't know whether the callee needs the mutex, then you're
holding it too long. You're aiming a shotgun at your application and
pulling the trigger. You presumably started using threads to get
concurrency; but you've just PREVENTED concurrency."

The right mental model for using
mutexes: The mutex protects an
invariant.
Why are you sure that this is really right mental model for using mutexes?
I think right model is protecting data but not invariants.
The problem of protecting invariants presents even in single-threaded applications and has nothing common with multi-threading and mutexes.
Furthermore, if you need to protect invariants, you still may use binary semaphore wich is never recursive.

One main reason that recursive mutexes are useful is in case of accessing the methods multiple times by the same thread. For example, say if mutex lock is protecting a bank A/c to withdraw, then if there is a fee also associated with that withdrawal, then the same mutex has to be used.

The only good use case for recursion mutex is when an object contains multiple methods. When any of the methods modify the content of the object, and therefore must lock the object before the state is consistent again.
If the methods use other methods (ie: addNewArray() calls addNewPoint(), and finalizes with recheckBounds()), but any of those functions by themselves need to lock the mutex, then recursive mutex is a win-win.
For any other case (solving just bad coding, using it even in different objects) is clearly wrong!

What are non-recursive mutexes good for?
They are absolutely good when you have to make sure the mutex is unlocked before doing something. This is because pthread_mutex_unlock can guarantee that the mutex is unlocked only if it is non-recursive.
pthread_mutex_t g_mutex;
void foo()
{
pthread_mutex_lock(&g_mutex);
// Do something.
pthread_mutex_unlock(&g_mutex);
bar();
}
If g_mutex is non-recursive, the code above is guaranteed to call bar() with the mutex unlocked.
Thus eliminating the possibility of a deadlock in case bar() happens to be an unknown external function which may well do something that may result in another thread trying to acquire the same mutex. Such scenarios are not uncommon in applications built on thread pools, and in distributed applications, where an interprocess call may spawn a new thread without the client programmer even realising that. In all such scenarios it's best to invoke the said external functions only after the lock is released.
If g_mutex was recursive, there would be simply no way to make sure it is unlocked before making a call.

IMHO, most arguments against recursive locks (which are what I use 99.9% of the time over like 20 years of concurrent programming) mix the question if they are good or bad with other software design issues, which are quite unrelated. To name one, the "callback" problem, which is elaborated on exhaustively and without any multithreading related point of view, for example in the book Component software - beyond Object oriented programming.
As soon as you have some inversion of control (e.g. events fired), you face re-entrance problems. Independent of whether there are mutexes and threading involved or not.
class EvilFoo {
std::vector<std::string> data;
std::vector<std::function<void(EvilFoo&)> > changedEventHandlers;
public:
size_t registerChangedHandler( std::function<void(EvilFoo&)> handler) { // ...
}
void unregisterChangedHandler(size_t handlerId) { // ...
}
void fireChangedEvent() {
// bad bad, even evil idea!
for( auto& handler : changedEventHandlers ) {
handler(*this);
}
}
void AddItem(const std::string& item) {
data.push_back(item);
fireChangedEvent();
}
};
Now, with code like the above you get all error cases, which would usually be named in the context of recursive locks - only without any of them. An event handler can unregister itself once it has been called, which would lead to a bug in a naively written fireChangedEvent(). Or it could call other member functions of EvilFoo which cause all sorts of problems. The root cause is re-entrance.
Worst of all, this could not even be very obvious as it could be over a whole chain of events firing events and eventually we are back at our EvilFoo (non- local).
So, re-entrance is the root problem, not the recursive lock.
Now, if you felt more on the safe side using a non-recursive lock, how would such a bug manifest itself? In a deadlock whenever unexpected re-entrance occurs.
And with a recursive lock? The same way, it would manifest itself in code without any locks.
So the evil part of EvilFoo are the events and how they are implemented, not so much a recursive lock. fireChangedEvent() would need to first create a copy of changedEventHandlers and use that for iteration, for starters.
Another aspect often coming into the discussion is the definition of what a lock is supposed to do in the first place:
Protect a piece of code from re-entrance
Protect a resource from being used concurrently (by multiple threads).
The way I do my concurrent programming, I have a mental model of the latter (protect a resource). This is the main reason why I am good with recursive locks. If some (member) function needs locking of a resource, it locks. If it calls another (member) function while doing what it does and that function also needs locking - it locks. And I don't need an "alternate approach", because the ref-counting of the recursive lock is quite the same as if each function wrote something like:
void EvilFoo::bar() {
auto_lock lock(this); // this->lock_holder = this->lock_if_not_already_locked_by_same_thread())
// do what we gotta do
// ~auto_lock() { if (lock_holder) unlock() }
}
And once events or similar constructs (visitors?!) come into play, I do not hope to get all the ensuing design problems solved by some non-recursive lock.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string