Code Re-entrancy vs. Thread Safety

Code Re-entrancy vs. Thread Safety - multithreading

What is the difference between the concepts of "Code Re-entrancy" and "Thread Safety"? As per the link mentioned below, a piece of code can be either of them, both of them or neither of them.
Reentrant and Thread safe code
I was not able to understand the explaination clearly. Help would be appreciated.

Re-entrant code has no state in a single point. You can call the code while something is executing in the code. If the code uses global state, one call can conceivably overwrite the global state, breaking the computation in the other call.
Thread safe code is code with no race conditions or other concurrency issues. A race condition is where the order in which two threads do something affects the computation. A typical concurrency issue is where a change to a shared data structure can be partially completed and left in an inconsistent state. In order to avoid this, you have to use concurrency control mechanisms such as semaphores of mutexes to ensure that nothing else can access the data structure until the operation is completed.
For example, a piece of code can be non re-entrant but thread-safe if it is guarded externally by a mutex but still has a global data structure where the state must be consistent for the entire duration of the call. In this case, the same thread could initiate a call-back into the procedure while still protected by an external coarse-grained mutex. If the call-back occured from within the non re-entrant procedure the call could leave the data structure in a state that could break the computation from the caller's point of view.
A piece of code can be re-entrant but non thread-safe if it can make a non-atomic change to a shared (and sharable) data structure that could be interrupted in the middle of the update leaving the data structure in an incosistent state. In this case another thread accessing the data structure could be affected by the half-changed data structure and either crash or perform an operation that corrupts the data.

That article says:
"a function can be either reentrant, thread-safe, both, or neither."
It also says:
"Non-reentrant functions are thread-unsafe".
I can see how this may cause a muddle. They mean that standard functions documented as not required to be re-entrant are also not required to be thread-safe, which is true of the POSIX libraries iirc (and POSIX declares it to be true of the ANSI/ISO libraries too, ISO having no concept of threads and hence no concept of thread-safety). In other words, "if a function says it is non-reentrant, then it is saying it's thread-unsafe too". That's not a logical necessity, it's just a convention.
Here's some pseudo-code which is thread-safe (well, there's plenty of opportunity for callbacks to create deadlocks due to locking inversion, but let's assume the documentation contains sufficient information for users to avoid that) but not re-entrant. It is supposed to increment the global counter, and perform the callback:
take_global_lock();
int i = get_global_counter();
do_callback(i);
set_global_counter(i+1);
release_global_lock();
If the callback calls this routine again, resulting in another callback, then both levels of callback will get the same parameter (which might be OK, depending on the API), but the counter will only be incremented once (which is almost certainly not the API you want, so it would have to be banned).
That's assuming the lock is recursive, of course. If the lock is non-recursive, then of course the code is non-reentrant anyway, since taking the lock the second time won't work.
Here's some pseudo-code which is "weakly re-entrant" but not thread-safe:
int i = get_global_counter();
do_callback(i);
set_global_counter(get_global_counter()+1);
Now it's fine to call the function from the callback, but it's not safe to call the function concurrently from different threads. It's also not safe to call it from a signal handler, because re-entrancy from a signal handler could likewise break the count if the signal happened to occur at the right time. So the code is non-re-entrant by the proper definition.
Here's some code which arguably is fully re-entrant (except I think the standard distinguishes between reentrant and 'non-interruptible by signals', and I'm not sure where this falls), but still isn't thread-safe:
int i = get_global_counter();
do_callback(i);
disable_signals(); // and any other kind of interrupts on your system
set_global_counter(get_global_counter()+1);
restore_signal_state();
On a single-threaded app, this is fine, assuming that the OS supports disabling everything that needs to be disabled. It prevents re-entrancy from occurring at the critical point. Depending how signals are disabled, it may be safe to call from a signal handler, although in this particular example there's still the issue of the parameter passed to the callback being the same for separate calls. It can still go wrong multi-threaded, though.
In practice, non-thread-safe often implies non-re-entrant, since (informally) anything that can go wrong due to the thread being interrupted by the scheduler, and the function called again from another thread, can also go wrong if the thread is interrupted by a signal, and the function is called again from the signal handler. But then the "fix" to prevent signals (disabling them) is different from the "fix" to prevent concurrency (locks, usually). This is at best a rule of thumb.
Note that I've implied globals here, but exactly the same considerations would apply if the function took as a parameter a pointer to the counter and the lock. It's just that the various cases would be thread-unsafe or non-re-entrant when called with the same parameter, rather than when called at all.

Related

Calling action from within lock/semaphore

I have a function which accepts an action. The function obtains a semaphore lock (but for the purposes of the question could also be a monitor lock) and then calls the action.
A code reviewer has stated does not represent an effective way to implement thread-safety because it is prone to deadly embrace. Thread-safe code should be encapsulated but you break this by allowing a third-party to invoke an external action. (It's like raising an event inside a lock.)
Ignoring the encapsulation bit, is there any special case with calling actions from with a lock? My instinct is to say an action is no more likely to incur a deadlock than any other code but before i challenge that, is he right??

The problem with calling external code when lock is acquired is that you cannot guarantee anymore that your code is deadlock-safe.
The caller can do anything in the callback action.
Here is a few examples when action might perform 'dangerous':
Recursive call of a function. Deadlock is possible in that case.
Perform some long running operation. That might downgrade the others threads performance if they need the same synch object (monitor or semaphore).
I believe there are the other negative cases also possible.
The purpose of a monitor (or a semaphore) is to prevent simultaneous entering into the code section, which is definitely should not be run simultaneously. That is not the case for the callback action.
So, there is no good reason to call action within the lock.
I would suggest here instead to call the callback action either before the lock is acquired or after it is released.

Why do we need a lock to be reentrant?

I understand (somewhat) the features of the jdk 5 ReentrantLock here
But why we would want a 're-entrant' lock? i.e if a Thread already has the lock on an Object, why would it need to acquire it again?

Consider this theoretical example: You are using a lock to protect some back-end data while updating some items in a list box in your GUI. You loop through and modify the items. While doing so, the list box fires an event (perhaps a Selection Changed event or something) for which you have a handler registered. This handler also locks the same lock in order to process the new item. If the lock is not recursive, this thread would deadlock on the second attempt to acquire the lock.

Reentrant locks are useful in cases where a resource cannot tolerate all forms of arbitrarily-timed accesses, but can tolerate certain patterns of access which can occur in nested execution contexts. In many cases their usage is unaesthetic and sloppy, but it may be easier to arrange things so that a reentrant lock can be guaranteed to work than it would be to arrange things so as to make one unnecessary.
Note that while many languages default to making locks reentrant, that is not necessarily a good thing. If code acquires a lock and then other code in that thread tries to acquire a token for that same lock, it's clear that that having the second request wait until lock has been released isn't going to be very productive. That does not imply, however, that the second request should allow access to the lock. In many cases a proper course of action would be for the second request to throw an immediate exception (access shouldn't be granted until the lock is released, and that can't happen until either the request is granted (which shouldn't happen) or the code exits some other way (an exception being the most natural choice). Such a situation would apply if the a method which was modifying a lock-guarded data structure called some outside code which wasn't expected to use the data structure while the data structure was in an inconsistent state. If the code unexpectedly does try to use the data structure, having it fail immediately with an exception may be better than having it wait forever for a lock it's never going to get, or blithely proceed into a lock and access invalid data.
There are many cases where code will call nested routines at times when a guarded resource satisfies some but not all of its invariants, and where the outside code may expect the nested routines to make some kinds of changes to it but not others. In such cases, reentrant locks may be appropriate, but care is required to ensure that code doesn't do things it shouldn't. One advantage of reentrant locks is that if code which makes nested calls with the lock held sets flags to indicate its promises/requirements, and code which acquires the lock tests those flags on entry, one can guarantee that the flags will only be manipulated in predictable sequences. Such a thing would not be possible if two different threads were trying to use the resource simultaneously.

Should access to a shared resource be locked by a parent thread before spawning a child thread that accesses it?

If I have the following psuedocode:
sharedVariable = somevalue;
CreateThread(threadWhichUsesSharedVariable);
Is it theoretically possible for a multicore CPU to execute code in threadWhichUsesSharedVariable() which reads the value of sharedVariable before the parent thread writes to it? For full theoretical avoidance of even the remote possibility of a race condition, should the code look like this instead:
sharedVariableMutex.lock();
sharedVariable = somevalue;
sharedVariableMutex.unlock();
CreateThread(threadWhichUsesSharedVariable);
Basically I want to know if the spawning of a thread explicitly linearizes the CPU at that point, and is guaranteed to do so.
I know that the overhead of thread creation probably takes enough time that this would never matter in practice, but the perfectionist in me is afraid of the theoretical race condition. In extreme conditions, where some threads or cores might be severely lagged and others are running fast and efficiently, I can imagine that it might be remotely possible for the order of execution (or memory access) to be reversed unless there was a lock.

I would say that your pseudocode is safe on any correctly functioning
multiprocessor system. The C++ compiler cannot generate a call to
CreateThread() before sharedVariable has received a correct value
unless it can prove to itself that doing so is safe. You are guaranteed
that your single-threaded code executes equivalently to a completely
non-reordered linear execution path. Any system that "time warps" the
thread creation ahead of the variable assignment is seriously broken.
I don't think declaring sharedVariable as volatile does anything
useful in this case.

Given your example and if you were using Java then the answer would be "No". In Java it is not possible for the thread to spawn and read your value before the assignment operation is complete. In some other languages this might be a different story.
"Variables shared between multiple threads (e.g., instance variables of objects) have atomic assignment guaranteed by the Java language specification for all data types except longs and doubles... If a method consists solely of a single variable access or assignment, there is no need to make it synchronized for thread-safety, and every reason not to do so for performance."
reference
If your double or long is declared volatile, then you are also guaranteed that the assignment is an atomic operation.
Update:
Your example is going to work in C++ just like it works in Java. Theoretically there is no way that the thread spawning will begin or complete before the assignment, even with Out of Order Execution.
Note that your example is VERY specific and in any other case it is recommended that you ensure the shared resource is protected properly. The new C++ standard is coming out with a lot of atomic stuff, so you could declare your variable as atomic and the assignment operation will be visible to all threads without the need of locking. CAS (compare and set) is a your next best option.

How to define threadsafe?

Threadsafe is a term that is thrown around documentation, however there is seldom an explanation of what it means, especially in a language that is understandable to someone learning threading for the first time.
So how do you explain Threadsafe code to someone new to threading?
My ideas for options are the moment are:
Do you use a list of what makes code
thread safe vs. thread unsafe
The book definition
A useful metaphor

Multithreading leads to non-deterministic execution - You don't know exactly when a certain piece of parallel code is run.
Given that, this wonderful multithreading tutorial defines thread safety like this:
Thread-safe code is code which has no indeterminacy in the face of any multithreading scenario. Thread-safety is achieved primarily with locking, and by reducing the possibilities for interaction between threads.
This means no matter how the threads are run in particular, the behaviour is always well-defined (and therefore free from race conditions).

Eric Lippert says:
When I'm asked "is this code thread safe?" I always have to push back and ask "what are the exact threading scenarios you are concerned about?" and "exactly what is correct behaviour of the object in every one of those scenarios?".
It is unhelpful to say that code is "thread safe" without somehow communicating what undesirable behaviors the utilized thread safety mechanisms do and do not prevent.

G'day,
A good place to start is to have a read of the POSIX paper on thread safety.
Edit: Just the first few paragraphs give you a quick overview of thread safety and re-entrant code.
HTH
cheers,

i maybe wrong but one of the criteria for being thread safe is to use local variables only. Using global variables can have undefined result if the same function is called from different threads.

A thread safe function / object (hereafter referred to as an object) is an object which is designed to support multiple concurrent calls. This can be achieved by serialization of the parallel requests or some sort of support for intertwined calls.
Essentially, if the object safely supports concurrent requests (from multiple threads), it is thread safe. If it is not thread safe, multiple concurrent calls could corrupt its state.
Consider a log book in a hotel. If a person is writing in the book and another person comes along and starts to concurrently write his message, the end result will be a mix of both messages. This can also be demonstrated by several threads writing to an output stream.

I would say to understand thread safe, start with understanding difference between thread safe function and reentrant function.
Please check The difference between thread-safety and re-entrancy for details.

Tread-safe code is code that won't fail because the same data was changed in two places at once. Thread safe is a smaller concept than concurrency-safe, because it presumes that it was in fact two threads of the same program, rather than (say) hardware modifying data, or the OS.

A particularly valuable aspect of the term is that it lies on a spectrum of concurrent behavior, where thread safe is the strongest, interrupt safe is a weaker constraint than thread safe, and reentrant even weaker.
In the case of thread safe, this means that the code in question conforms to a consistent api and makes use of resources such that other code in a different thread (such as another, concurrent instance of itself) will not cause an inconsistency, so long as it also conforms to the same use pattern. the use pattern MUST be specified for any reasonable expectation of thread safety to be had.
The interrupt safe constraint doesn't normally appear in modern userland code, because the operating system does a pretty good job of hiding this, however, in kernel mode this is pretty important. This means that the code will complete successfully, even if an interrupt is triggered during its execution.
The last one, reentrant, is almost guaranteed with all modern languages, in and out of userland, and it just means that a section of code may be entered more than once, even if execution has not yet preceeded out of the code section in older cases. This can happen in the case of recursive function calls, for instance. It's very easy to violate the language provided reentrancy by accessing a shared global state variable in the non-reentrant code.

Recursive Lock (Mutex) vs Non-Recursive Lock (Mutex)

POSIX allows mutexes to be recursive. That means the same thread can lock the same mutex twice and won't deadlock. Of course it also needs to unlock it twice, otherwise no other thread can obtain the mutex. Not all systems supporting pthreads also support recursive mutexes, but if they want to be POSIX conform, they have to.
Other APIs (more high level APIs) also usually offer mutexes, often called Locks. Some systems/languages (e.g. Cocoa Objective-C) offer both, recursive and non recursive mutexes. Some languages also only offer one or the other one. E.g. in Java mutexes are always recursive (the same thread may twice "synchronize" on the same object). Depending on what other thread functionality they offer, not having recursive mutexes might be no problem, as they can easily be written yourself (I already implemented recursive mutexes myself on the basis of more simple mutex/condition operations).
What I don't really understand: What are non-recursive mutexes good for? Why would I want to have a thread deadlock if it locks the same mutex twice? Even high level languages that could avoid that (e.g. testing if this will deadlock and throwing an exception if it does) usually don't do that. They will let the thread deadlock instead.
Is this only for cases, where I accidentally lock it twice and only unlock it once and in case of a recursive mutex, it would be harder to find the problem, so instead I have it deadlock immediately to see where the incorrect lock appears? But couldn't I do the same with having a lock counter returned when unlocking and in a situation, where I'm sure I released the last lock and the counter is not zero, I can throw an exception or log the problem? Or is there any other, more useful use-case of non recursive mutexes that I fail to see? Or is it maybe just performance, as a non-recursive mutex can be slightly faster than a recursive one? However, I tested this and the difference is really not that big.

The difference between a recursive and non-recursive mutex has to do with ownership. In the case of a recursive mutex, the kernel has to keep track of the thread who actually obtained the mutex the first time around so that it can detect the difference between recursion vs. a different thread that should block instead. As another answer pointed out, there is a question of the additional overhead of this both in terms of memory to store this context and also the cycles required for maintaining it.
However, there are other considerations at play here too.
Because the recursive mutex has a sense of ownership, the thread that grabs the mutex must be the same thread that releases the mutex. In the case of non-recursive mutexes, there is no sense of ownership and any thread can usually release the mutex no matter which thread originally took the mutex. In many cases, this type of "mutex" is really more of a semaphore action, where you are not necessarily using the mutex as an exclusion device but use it as synchronization or signaling device between two or more threads.
Another property that comes with a sense of ownership in a mutex is the ability to support priority inheritance. Because the kernel can track the thread owning the mutex and also the identity of all the blocker(s), in a priority threaded system it becomes possible to escalate the priority of the thread that currently owns the mutex to the priority of the highest priority thread that is currently blocking on the mutex. This inheritance prevents the problem of priority inversion that can occur in such cases. (Note that not all systems support priority inheritance on such mutexes, but it is another feature that becomes possible via the notion of ownership).
If you refer to classic VxWorks RTOS kernel, they define three mechanisms:
mutex - supports recursion, and optionally priority inheritance. This mechanism is commonly used to protect critical sections of data in a coherent manner.
binary semaphore - no recursion, no inheritance, simple exclusion, taker and giver does not have to be same thread, broadcast release available. This mechanism can be used to protect critical sections, but is also particularly useful for coherent signalling or synchronization between threads.
counting semaphore - no recursion or inheritance, acts as a coherent resource counter from any desired initial count, threads only block where net count against the resource is zero.
Again, this varies somewhat by platform - especially what they call these things, but this should be representative of the concepts and various mechanisms at play.

The answer is not efficiency. Non-reentrant mutexes lead to better code.
Example: A::foo() acquires the lock. It then calls B::bar(). This worked fine when you wrote it. But sometime later someone changes B::bar() to call A::baz(), which also acquires the lock.
Well, if you don't have recursive mutexes, this deadlocks. If you do have them, it runs, but it may break. A::foo() may have left the object in an inconsistent state before calling bar(), on the assumption that baz() couldn't get run because it also acquires the mutex. But it probably shouldn't run! The person who wrote A::foo() assumed that nobody could call A::baz() at the same time - that's the entire reason that both of those methods acquired the lock.
The right mental model for using mutexes: The mutex protects an invariant. When the mutex is held, the invariant may change, but before releasing the mutex, the invariant is re-established. Reentrant locks are dangerous because the second time you acquire the lock you can't be sure the invariant is true any more.
If you are happy with reentrant locks, it is only because you have not had to debug a problem like this before. Java has non-reentrant locks these days in java.util.concurrent.locks, by the way.

As written by Dave Butenhof himself:
"The biggest of all the big problems with recursive mutexes is that
they encourage you to completely lose track of your locking scheme and
scope. This is deadly. Evil. It's the "thread eater". You hold locks for
the absolutely shortest possible time. Period. Always. If you're calling
something with a lock held simply because you don't know it's held, or
because you don't know whether the callee needs the mutex, then you're
holding it too long. You're aiming a shotgun at your application and
pulling the trigger. You presumably started using threads to get
concurrency; but you've just PREVENTED concurrency."

The right mental model for using
mutexes: The mutex protects an
invariant.
Why are you sure that this is really right mental model for using mutexes?
I think right model is protecting data but not invariants.
The problem of protecting invariants presents even in single-threaded applications and has nothing common with multi-threading and mutexes.
Furthermore, if you need to protect invariants, you still may use binary semaphore wich is never recursive.

One main reason that recursive mutexes are useful is in case of accessing the methods multiple times by the same thread. For example, say if mutex lock is protecting a bank A/c to withdraw, then if there is a fee also associated with that withdrawal, then the same mutex has to be used.

The only good use case for recursion mutex is when an object contains multiple methods. When any of the methods modify the content of the object, and therefore must lock the object before the state is consistent again.
If the methods use other methods (ie: addNewArray() calls addNewPoint(), and finalizes with recheckBounds()), but any of those functions by themselves need to lock the mutex, then recursive mutex is a win-win.
For any other case (solving just bad coding, using it even in different objects) is clearly wrong!

What are non-recursive mutexes good for?
They are absolutely good when you have to make sure the mutex is unlocked before doing something. This is because pthread_mutex_unlock can guarantee that the mutex is unlocked only if it is non-recursive.
pthread_mutex_t g_mutex;
void foo()
{
pthread_mutex_lock(&g_mutex);
// Do something.
pthread_mutex_unlock(&g_mutex);
bar();
}
If g_mutex is non-recursive, the code above is guaranteed to call bar() with the mutex unlocked.
Thus eliminating the possibility of a deadlock in case bar() happens to be an unknown external function which may well do something that may result in another thread trying to acquire the same mutex. Such scenarios are not uncommon in applications built on thread pools, and in distributed applications, where an interprocess call may spawn a new thread without the client programmer even realising that. In all such scenarios it's best to invoke the said external functions only after the lock is released.
If g_mutex was recursive, there would be simply no way to make sure it is unlocked before making a call.

IMHO, most arguments against recursive locks (which are what I use 99.9% of the time over like 20 years of concurrent programming) mix the question if they are good or bad with other software design issues, which are quite unrelated. To name one, the "callback" problem, which is elaborated on exhaustively and without any multithreading related point of view, for example in the book Component software - beyond Object oriented programming.
As soon as you have some inversion of control (e.g. events fired), you face re-entrance problems. Independent of whether there are mutexes and threading involved or not.
class EvilFoo {
std::vector<std::string> data;
std::vector<std::function<void(EvilFoo&)> > changedEventHandlers;
public:
size_t registerChangedHandler( std::function<void(EvilFoo&)> handler) { // ...
}
void unregisterChangedHandler(size_t handlerId) { // ...
}
void fireChangedEvent() {
// bad bad, even evil idea!
for( auto& handler : changedEventHandlers ) {
handler(*this);
}
}
void AddItem(const std::string& item) {
data.push_back(item);
fireChangedEvent();
}
};
Now, with code like the above you get all error cases, which would usually be named in the context of recursive locks - only without any of them. An event handler can unregister itself once it has been called, which would lead to a bug in a naively written fireChangedEvent(). Or it could call other member functions of EvilFoo which cause all sorts of problems. The root cause is re-entrance.
Worst of all, this could not even be very obvious as it could be over a whole chain of events firing events and eventually we are back at our EvilFoo (non- local).
So, re-entrance is the root problem, not the recursive lock.
Now, if you felt more on the safe side using a non-recursive lock, how would such a bug manifest itself? In a deadlock whenever unexpected re-entrance occurs.
And with a recursive lock? The same way, it would manifest itself in code without any locks.
So the evil part of EvilFoo are the events and how they are implemented, not so much a recursive lock. fireChangedEvent() would need to first create a copy of changedEventHandlers and use that for iteration, for starters.
Another aspect often coming into the discussion is the definition of what a lock is supposed to do in the first place:
Protect a piece of code from re-entrance
Protect a resource from being used concurrently (by multiple threads).
The way I do my concurrent programming, I have a mental model of the latter (protect a resource). This is the main reason why I am good with recursive locks. If some (member) function needs locking of a resource, it locks. If it calls another (member) function while doing what it does and that function also needs locking - it locks. And I don't need an "alternate approach", because the ref-counting of the recursive lock is quite the same as if each function wrote something like:
void EvilFoo::bar() {
auto_lock lock(this); // this->lock_holder = this->lock_if_not_already_locked_by_same_thread())
// do what we gotta do
// ~auto_lock() { if (lock_holder) unlock() }
}
And once events or similar constructs (visitors?!) come into play, I do not hope to get all the ensuing design problems solved by some non-recursive lock.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string