libuv: uv_close and thread safety

libuv: uv_close and thread safety - multithreading

I've seen it mentioned that "uv_close is not thread safe". I'm experienced with writing multi-threaded C/C++ code, but I'm still not sure what is being said here.
Does this mean that uv_close must always be called in the main thread?
Or, is this simply warning that uv_close must not be called in parallel to other uses of the handle (seems obvious..)?
I'm dealing with an uv_async_t handle, if that's relevant...

Per the discussion here:
https://github.com/libuv/libuv/issues/709
uv_close may not be called outside of the loop thread. In addition, it should be mentioned that no libuv functions, aside from uv_async_send, are safe to call outside of the loop thread.

Related

Can the functions in AXUIElement.h be safely called from threads other than the main thread?

Are the macOS accessibility functions thread-safe, or safe to be called in threads other than the application's main thread? (i.e. the functions defined in AXUIElement.h)
I've been working with them for years, and I still haven't been able to figure out in what contexts it's safe to call these functions. In the past I've had issues with calling them from threads other than an application's main thread, but oftentimes these functions are slow and it's not possible to use them in the main thread without causing the application's GUI to block, hence me needing to use them in a separate thread.
As near as I've been able to find, the documentation and header file give no indication of what context it's safe to call these functions in or their thread safety.

I contacted an Apple engineer to get an answer directly, and the answer is that Accessibility functions must be called in the application's main thread.

In which thread methods should Synchronize be used?

I know Synchronize must be used in the Execute procedure, but should it be used in Create and Destroy methods too, or is it safe to do whatever I want?

I know Synchronize must be used in the Execute procedure.
That is somewhat vague. You need to use Synchronize when you have code that must execute on the main thread. So the answer to whether or not you will need to use Synchronize depends crucially on what the code under consideration actually does. The question that you must ask yourself, and which is one that only you can answer, is do you have code that must run on the main thread?
As a general rule it would be considered prudent for you not to need to call Synchronize outside the Execute method. If you can find a way to avoid doing so then that would be wise. Remember that the ideal scenario with threads is that they never need to block with Synchronize if at all possible.
You might also wish to consider which thread executes the constructor and destructor.
The constructor Create runs in the thread that calls it. It does not run in the newly created thread. Therefore it is unlikely that you would need to use Synchronize there.
The destructor Destroy runs in the thread that calls it. Typically this is the thread that calls Free on the thread object. And usually that would be called from the same thread that originally created the thread. The common exception to that is a FreeOnTerminate thread which calls Free from the thread.

There is a need to use Synchronize() when the code is executing outside of the context of the main (GUI) thread of the application. Therefore the answer to your question depends on whether the constructor and destructor are called from that thread or not.
If you are unsure you can check that by comparing the result of the Windows API function GetCurrentThreadId() with the variable MainThreadID - if they equal the code executes in the context of the main thread.
Threads that have FreeOnTerminate set will have their destructor called from another thread context, so you would need to use Synchronize() or Queue(). Or you use the termination event the VCL already provides, I believe it is executed in the main thread, but check the documentation for details.

First of all, you don't want to call Synchronize() unnecessarily, because that simply defeats the purpose of using a thread. So the decision should be based on whether: (a) it's possible to encounter race conditions with shared data. (b) you'll be using VCL code which usually has to run on the main thread.
It's unlikely you would need to synchronise in the constructor because TThread instances are usually created from the main thread already. (The exception being if you're creating some TThread's from another child thread.)
NOTE: It won't cause any harm though because Synchronize() already checks if you're on the main thread and will call the synchronised method immediately if you are.
class procedure TThread.Synchronize(ASyncRec: PSynchronizeRecord; QueueEvent: Boolean = False);
var
SyncProc: TSyncProc;
SyncProcPtr: PSyncProc;
begin
if GetCurrentThreadID = MainThreadID then
ASyncRec.FMethod
As for the destructor there are 3 usage patterns:
The TThread instances destroys itself.
Another thread (possibly the main thread) can WaitFor the instance to finish, then destroy it.
You can intercept the OnTerminate event. This is fired when the instance is finished, and you could then destroy it.
NOTE: The OnTerminate event will already be synchronised.
procedure TThread.DoTerminate;
begin
if Assigned(FOnTerminate) then Synchronize(CallOnTerminate);
end;
Given the above, the only time you might need to synchronise is if the thread self-destructs.
However, I'd advise that you rather avoid putting code into your destructor that might need to be synchronised. If you need some results of a calculation from your thread instance, OnTerminate is the more appropriate place to get this.

To add to what has been said in other answers...
You never need to use Synchronize at all. Synchronize may be useful, however, in the following circumstance:
In the context of your thread you need to execute code that touches objects that have affinity to the main thread.
You require your thread to block until that code has been executed.
Even in that case, there are other ways to achive the same goal, but Synchronize provides a convenient way to satisfy those two needs. If you need only one of those two items, there are better strategies available.
On topic #1, the obvious objects are user interface objects. These are objects that have thread affinity to the main thread simply by virtue of the fact that the main thread is continually reading and writing the properties of those objects (not the least because it needs to paint them to the screen, etc) and it does so at its own convenience. This means that your thread cannot safely access those components with a guarantee that the main thread will not also be accessing or modifying them at the same time. In order to prevent corruption, the thread has to pass the work to the main thread (since the main thread can only do one thing at a time and can't, obviously, interfere with itself). Synchronize simply places the work onto the main thread's queue and waits until the main thread gets around to completing it before returning.
This gets to point #2. Do you need to (or, equally, can you afford to) wait around until the main thread finishes the work? There are three cases and two options.
Yes, you can or must wait. (Synchronize is a good fit)
No, you cannot wait. (Synchronize is not a good fit)
Don't care. (Synchronize is easy, so it's a sensible option)
If you are simply updating a status display that will soon be overwritten anyway and your thread has more pressing issues, then it's probably sensible to just post a message to the main thread and carry on doing things, for example. If your thread is just waiting around doing nothing, mostly, and it's not worth the time to code anything more sophisticated, then Synchronize is just fine, and it can be replaced with something better if needs dictate so in the future.
As others have said, it really depends on what you are doing. The more important question, I think, at least conceptually, is to sort out when you need to worry about concurrency and when you don't. Any time you have more than one thread that requires access to a single resource you need to use some sort of mechanism to coordinate that access to avoid the threads crashing into each other. Synchronize is one of those methods, but it not the least nor the last of them.

Does Node.js actually use multiple threads underneath?

After all the literature i've read on node.js I still come back to the question, does node.js itself make use of multiple threads under the hood? I think the answer is yes because if we use the simple asynch file read example something has to be doing the work to read the file but if the main event loop of node is not processing this work than that must mean there should be a POSIX thread running somewhere that takes care of the file reading and then upon completion places the call back in the event loop to be executed. So when we say Node.js runs in one thread do we actually mean that the event loop of node.js is only one thread? Or am i missing something here.....

To a Javascript program on node.js, there is only one thread.
If you're looking for technicalities, node.js is free to use threads to solve asynchronous I/O if the underlying operating system requires it.
The important thing is to never break the "there is only one thread" abstraction to the Javascript program. If there are more threads, all they can do is queue up work for the main thread in the Javascript program, they can never execute any Javascript code themselves.

How to define threadsafe?

Threadsafe is a term that is thrown around documentation, however there is seldom an explanation of what it means, especially in a language that is understandable to someone learning threading for the first time.
So how do you explain Threadsafe code to someone new to threading?
My ideas for options are the moment are:
Do you use a list of what makes code
thread safe vs. thread unsafe
The book definition
A useful metaphor

Multithreading leads to non-deterministic execution - You don't know exactly when a certain piece of parallel code is run.
Given that, this wonderful multithreading tutorial defines thread safety like this:
Thread-safe code is code which has no indeterminacy in the face of any multithreading scenario. Thread-safety is achieved primarily with locking, and by reducing the possibilities for interaction between threads.
This means no matter how the threads are run in particular, the behaviour is always well-defined (and therefore free from race conditions).

Eric Lippert says:
When I'm asked "is this code thread safe?" I always have to push back and ask "what are the exact threading scenarios you are concerned about?" and "exactly what is correct behaviour of the object in every one of those scenarios?".
It is unhelpful to say that code is "thread safe" without somehow communicating what undesirable behaviors the utilized thread safety mechanisms do and do not prevent.

G'day,
A good place to start is to have a read of the POSIX paper on thread safety.
Edit: Just the first few paragraphs give you a quick overview of thread safety and re-entrant code.
HTH
cheers,

i maybe wrong but one of the criteria for being thread safe is to use local variables only. Using global variables can have undefined result if the same function is called from different threads.

A thread safe function / object (hereafter referred to as an object) is an object which is designed to support multiple concurrent calls. This can be achieved by serialization of the parallel requests or some sort of support for intertwined calls.
Essentially, if the object safely supports concurrent requests (from multiple threads), it is thread safe. If it is not thread safe, multiple concurrent calls could corrupt its state.
Consider a log book in a hotel. If a person is writing in the book and another person comes along and starts to concurrently write his message, the end result will be a mix of both messages. This can also be demonstrated by several threads writing to an output stream.

I would say to understand thread safe, start with understanding difference between thread safe function and reentrant function.
Please check The difference between thread-safety and re-entrancy for details.

Tread-safe code is code that won't fail because the same data was changed in two places at once. Thread safe is a smaller concept than concurrency-safe, because it presumes that it was in fact two threads of the same program, rather than (say) hardware modifying data, or the OS.

A particularly valuable aspect of the term is that it lies on a spectrum of concurrent behavior, where thread safe is the strongest, interrupt safe is a weaker constraint than thread safe, and reentrant even weaker.
In the case of thread safe, this means that the code in question conforms to a consistent api and makes use of resources such that other code in a different thread (such as another, concurrent instance of itself) will not cause an inconsistency, so long as it also conforms to the same use pattern. the use pattern MUST be specified for any reasonable expectation of thread safety to be had.
The interrupt safe constraint doesn't normally appear in modern userland code, because the operating system does a pretty good job of hiding this, however, in kernel mode this is pretty important. This means that the code will complete successfully, even if an interrupt is triggered during its execution.
The last one, reentrant, is almost guaranteed with all modern languages, in and out of userland, and it just means that a section of code may be entered more than once, even if execution has not yet preceeded out of the code section in older cases. This can happen in the case of recursive function calls, for instance. It's very easy to violate the language provided reentrancy by accessing a shared global state variable in the non-reentrant code.

Code Re-entrancy vs. Thread Safety

What is the difference between the concepts of "Code Re-entrancy" and "Thread Safety"? As per the link mentioned below, a piece of code can be either of them, both of them or neither of them.
Reentrant and Thread safe code
I was not able to understand the explaination clearly. Help would be appreciated.

Re-entrant code has no state in a single point. You can call the code while something is executing in the code. If the code uses global state, one call can conceivably overwrite the global state, breaking the computation in the other call.
Thread safe code is code with no race conditions or other concurrency issues. A race condition is where the order in which two threads do something affects the computation. A typical concurrency issue is where a change to a shared data structure can be partially completed and left in an inconsistent state. In order to avoid this, you have to use concurrency control mechanisms such as semaphores of mutexes to ensure that nothing else can access the data structure until the operation is completed.
For example, a piece of code can be non re-entrant but thread-safe if it is guarded externally by a mutex but still has a global data structure where the state must be consistent for the entire duration of the call. In this case, the same thread could initiate a call-back into the procedure while still protected by an external coarse-grained mutex. If the call-back occured from within the non re-entrant procedure the call could leave the data structure in a state that could break the computation from the caller's point of view.
A piece of code can be re-entrant but non thread-safe if it can make a non-atomic change to a shared (and sharable) data structure that could be interrupted in the middle of the update leaving the data structure in an incosistent state. In this case another thread accessing the data structure could be affected by the half-changed data structure and either crash or perform an operation that corrupts the data.

That article says:
"a function can be either reentrant, thread-safe, both, or neither."
It also says:
"Non-reentrant functions are thread-unsafe".
I can see how this may cause a muddle. They mean that standard functions documented as not required to be re-entrant are also not required to be thread-safe, which is true of the POSIX libraries iirc (and POSIX declares it to be true of the ANSI/ISO libraries too, ISO having no concept of threads and hence no concept of thread-safety). In other words, "if a function says it is non-reentrant, then it is saying it's thread-unsafe too". That's not a logical necessity, it's just a convention.
Here's some pseudo-code which is thread-safe (well, there's plenty of opportunity for callbacks to create deadlocks due to locking inversion, but let's assume the documentation contains sufficient information for users to avoid that) but not re-entrant. It is supposed to increment the global counter, and perform the callback:
take_global_lock();
int i = get_global_counter();
do_callback(i);
set_global_counter(i+1);
release_global_lock();
If the callback calls this routine again, resulting in another callback, then both levels of callback will get the same parameter (which might be OK, depending on the API), but the counter will only be incremented once (which is almost certainly not the API you want, so it would have to be banned).
That's assuming the lock is recursive, of course. If the lock is non-recursive, then of course the code is non-reentrant anyway, since taking the lock the second time won't work.
Here's some pseudo-code which is "weakly re-entrant" but not thread-safe:
int i = get_global_counter();
do_callback(i);
set_global_counter(get_global_counter()+1);
Now it's fine to call the function from the callback, but it's not safe to call the function concurrently from different threads. It's also not safe to call it from a signal handler, because re-entrancy from a signal handler could likewise break the count if the signal happened to occur at the right time. So the code is non-re-entrant by the proper definition.
Here's some code which arguably is fully re-entrant (except I think the standard distinguishes between reentrant and 'non-interruptible by signals', and I'm not sure where this falls), but still isn't thread-safe:
int i = get_global_counter();
do_callback(i);
disable_signals(); // and any other kind of interrupts on your system
set_global_counter(get_global_counter()+1);
restore_signal_state();
On a single-threaded app, this is fine, assuming that the OS supports disabling everything that needs to be disabled. It prevents re-entrancy from occurring at the critical point. Depending how signals are disabled, it may be safe to call from a signal handler, although in this particular example there's still the issue of the parameter passed to the callback being the same for separate calls. It can still go wrong multi-threaded, though.
In practice, non-thread-safe often implies non-re-entrant, since (informally) anything that can go wrong due to the thread being interrupted by the scheduler, and the function called again from another thread, can also go wrong if the thread is interrupted by a signal, and the function is called again from the signal handler. But then the "fix" to prevent signals (disabling them) is different from the "fix" to prevent concurrency (locks, usually). This is at best a rule of thumb.
Note that I've implied globals here, but exactly the same considerations would apply if the function took as a parameter a pointer to the counter and the lock. It's just that the various cases would be thread-unsafe or non-re-entrant when called with the same parameter, rather than when called at all.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string