What does it mean that Python3 asyncio's Queue is not thread safe - multithreading

I find it quite confusing about the how thread-safty defined in python language.
Someone said it is thread safe by implementation of CPython.
Asyncio's Queue, on the other hand, said it is not Thread safe.
It seems like they mean different things when they talk about thread-safe. What is it really?

asyncio's Queue is not thread safe
Someone said it is thread safe by implementation of CPython.
No, in link you provided said that "Python's built-in structures" are thread-safe. It means that avaliable without imports data types (like int, list, dict, etc) are thread-safe.
It doesn't mean that every object in Python standard library is thread-safe.

Related

How can you wait for detached threads or control how threads are spawned?

When spawn is called, a JoinHandle is returned, but if that handle is discarded (or not available, somewhere inside a crate) the thread is "detached".
Is there any way to find all threads currently running and recover a JoinHandle for them?
...my feeling is that, in general, the answer is no.
In that case, is there any way to override either how Thread is invoked, or how JoinHandle is dropped, globally?
...but looking through the source, I can't see any way this might be possible.
As motivation, this very long discussion proposes using scopes to mandate the termination of child threads; effectively executing a join on every child thread when the scope ends. However, it requires that child threads be spawned via a custom method to work; it would very interesting to be able to do something similar in Rust where any thread spawned was intercepted and parented to the active ambient scope on the threadlocal.
I'll accept any answer that either:
demonstrates how to recover a JoinHandle by whatever means possible
demonstrates how to override the behavior of thread::spawn() in some way so that a discarded JoinHandle from a thread invoked in some arbitrary sub-function can be recovered before it is dropped.
Is there any way to find all threads currently running and recover a JoinHandle for them?
No, this would likely impose restrictions/overhead on everyone who wanted to use threads, which is antithetical for a systems programming language.
You could write your own solution for this by using something like Arc/Weak and a global singleton. Then you have your own registry of threads.
is there any way to override either how Thread is invoked, or how JoinHandle is dropped, globally?
No, there is no ability to do this with the Rust libraries as they exist now. In fact "overriding" something on that scale is fairly antithetical to the concepts of a statically-compiled language. Imagine if any library you use could decide to "override" how addition worked or what println did. Some languages do allow this dynamicism, but it comes at a cost. Rust is not the right language for that.
In fact, the right solution for this is nothing new: just use dependency injection. "Starting a thread" is a non-trivial collaborator and likely doesn't belong to the purview of most libraries as it's an application-wide resource.
can be recovered before it is dropped
In Rust, values are dropped at the end of the scope where they are last used. This would require running arbitrary code at the prologue of arbitrary functions anywhere in the program. Such a feature is highly unlikely to ever be implemented.
There's some discussion about creating a method that will return a handle that joins a thread when it is dropped, which might do what you want, but people still have to call it
.

Is lua table thread safe?

Say, I have a lua table t={"a"={}, "b"={}}.
My question is, I have two threads, thread A, thread B.
And if I create two lua_State individually for these two threads through lua_newthread, and thread A only read/write t.a, thread B only read/write t.b.
Should I use lua_lock in each thread above?
If the answer is YES, then, does any of operation on t need a lua_lock?
TL;DR: A Lua engine-state is not thread-safe, so there's no reason to make the Lua table thread-safe.
A lua_State is not the engine state, though it references it. Instead, it is the state of a Lua thread, which is no relation to an application thread. Lua threads with the same engine state cannot execute concurrently, as the Lua engine is inherently single-threaded (you may use an arbitrary number of engines at the same time though), instead they are cooperatively multitasked.
So lua_State *lua_newthread (lua_State *L); creates a new Lua thread, not an OS thread.
lua_lock and the like do not refer to thread safety, but were the way native code can keep hold of Lua objects across calls into the engine in version 2 of the implementation: https://www.lua.org/manual/2.1/subsection3_5_6.html
The modern way is using the registry, a Lua table accessible from native code: http://www.lua.org/manual/5.3/manual.html#4.5
lua table isn't thread safe, but there's no need to lock since threads don't read/write on the same element.
NO, lua table is not thread safe.
And yes, all operation on table t will need lua_lock, because none of them is atomic operation.

Thread in Tcl is not really working as C threads

In Tclsh thread package, a created thread is not sharing variables and namespace with main thread, which is quite different from C implementation of threads. Why is this contradiction in tcl thread design. Or am i missing something in the code? Does all scripting language have similar threaded design with them?
Below is the quote from Tcl thread documentation PDF,
thread::create
. All other extensions must be loaded
explicitly into each thread
that needs to use them
It's not a contradiction. It's just a different model. It has its advantages and its disadvantages. The key disadvantage you already know: scripts and variables are not shared (unless you take special steps). The key advantage is that the Tcl implementation has no big global locks, and that makes it much easier to use multi-core hardware effectively and means that there are very few gotchas when doing so. Contrast this with the Python Global Interpreter Lock, which is necessary because Python uses the C-like global shared state model.
At the low level, Tcl's threading is strongly isolated with plenty of thread-shared variables behind the scenes so that locks can be avoided (including in the memory management a lot of time, which would otherwise be a key bottleneck). Inter-thread communications are based on top of Tcl's built-in event queueing system; when two threads communicate, one sends a message and (optionally) waits for the other to respond, with the receiver getting the message placed on its internal queue of events until it is in a state that is ready to handle it. This does slow down inter-thread communications, but is much faster when they're not communicating.
It is actually similar to one way you'd use threads in C: message passing. Of course, you can use threads in other ways as well in C. But message passing is one way to completely avoid deadlocks since the semaphores/mutexes can be completely managed around the message queues and you don't need them anywhere else in your code.
This is in fact what Tcl implements at the C level. And it is in fact why it was done this way: to avoid the need for semaphores (to prevent the user form deadlocking himself).
Most other scripting languages simply provide a thin wrapper around pthreads so you can deadlock yourself if you're not careful. I remember way back in the early 2000s the general advice for threaded programming in C and most other languages is to implement a message passing architecture to avoid deadlocks.
Since tcl generally takes the view that API exposed at the script level should be high level, the thread implementation was implemented with a message passing architecture built-in. Of course, there is also the convenient fact that it also avoids having to make the tcl interpreter thread-safe (thus introducing mutexes all over the interpreter source code).
Making interpreters thread-safe is non trivial. Some languages suffer mysterious crashes to this day when running threaded applications. Some languages took over a decade to iron out all threading bugs. Tcl just decided not to try. The tcl interpreter is small enough and spins up quite fast so the solution was to simply run one interpreter per thread.

How to define threadsafe?

Threadsafe is a term that is thrown around documentation, however there is seldom an explanation of what it means, especially in a language that is understandable to someone learning threading for the first time.
So how do you explain Threadsafe code to someone new to threading?
My ideas for options are the moment are:
Do you use a list of what makes code
thread safe vs. thread unsafe
The book definition
A useful metaphor
Multithreading leads to non-deterministic execution - You don't know exactly when a certain piece of parallel code is run.
Given that, this wonderful multithreading tutorial defines thread safety like this:
Thread-safe code is code which has no indeterminacy in the face of any multithreading scenario. Thread-safety is achieved primarily with locking, and by reducing the possibilities for interaction between threads.
This means no matter how the threads are run in particular, the behaviour is always well-defined (and therefore free from race conditions).
Eric Lippert says:
When I'm asked "is this code thread safe?" I always have to push back and ask "what are the exact threading scenarios you are concerned about?" and "exactly what is correct behaviour of the object in every one of those scenarios?".
It is unhelpful to say that code is "thread safe" without somehow communicating what undesirable behaviors the utilized thread safety mechanisms do and do not prevent.
G'day,
A good place to start is to have a read of the POSIX paper on thread safety.
Edit: Just the first few paragraphs give you a quick overview of thread safety and re-entrant code.
HTH
cheers,
i maybe wrong but one of the criteria for being thread safe is to use local variables only. Using global variables can have undefined result if the same function is called from different threads.
A thread safe function / object (hereafter referred to as an object) is an object which is designed to support multiple concurrent calls. This can be achieved by serialization of the parallel requests or some sort of support for intertwined calls.
Essentially, if the object safely supports concurrent requests (from multiple threads), it is thread safe. If it is not thread safe, multiple concurrent calls could corrupt its state.
Consider a log book in a hotel. If a person is writing in the book and another person comes along and starts to concurrently write his message, the end result will be a mix of both messages. This can also be demonstrated by several threads writing to an output stream.
I would say to understand thread safe, start with understanding difference between thread safe function and reentrant function.
Please check The difference between thread-safety and re-entrancy for details.
Tread-safe code is code that won't fail because the same data was changed in two places at once. Thread safe is a smaller concept than concurrency-safe, because it presumes that it was in fact two threads of the same program, rather than (say) hardware modifying data, or the OS.
A particularly valuable aspect of the term is that it lies on a spectrum of concurrent behavior, where thread safe is the strongest, interrupt safe is a weaker constraint than thread safe, and reentrant even weaker.
In the case of thread safe, this means that the code in question conforms to a consistent api and makes use of resources such that other code in a different thread (such as another, concurrent instance of itself) will not cause an inconsistency, so long as it also conforms to the same use pattern. the use pattern MUST be specified for any reasonable expectation of thread safety to be had.
The interrupt safe constraint doesn't normally appear in modern userland code, because the operating system does a pretty good job of hiding this, however, in kernel mode this is pretty important. This means that the code will complete successfully, even if an interrupt is triggered during its execution.
The last one, reentrant, is almost guaranteed with all modern languages, in and out of userland, and it just means that a section of code may be entered more than once, even if execution has not yet preceeded out of the code section in older cases. This can happen in the case of recursive function calls, for instance. It's very easy to violate the language provided reentrancy by accessing a shared global state variable in the non-reentrant code.

How to output data form a thread to another thread without locking?

I'm developing a DirectShow application. I encounter a deadlock problem, the problem seems caused by acquire lock in a callback function called from a thread. This is the quest I asked in MSDN forum:
http://social.msdn.microsoft.com/Forums/en-US/windowsdirectshowdevelopment/thread/f9430f17-6274-45fc-abd1-11ef14ef4c6a
Now I have to avoid to acquire lock in that thread. But the problem is, I have to output the audio to another thread, how can I put data to another thread without lock?
There's someone tell me that I can use PostMessage of win32 sdk to post data to another thread. But however, to get the message, I have to run a windows program. My program is a Python C++ extension module. That might be very difficult to add a loop to pull message. So I am think another way to pass data among threads without locking.
(Actually... the producer thread can't be locked, but the consumer thread can do that. )
To lock or not to lock, that's the question.
So the question is how to do?
Thanks.
------EDIT------
I think I know why I got a deadlock, that might not be the problem of DirectShow.
The main thread is own by Python, it call stop, namely, it hold GIL. And the stop wait for callback of DirectShow in thread return. But callback acquire the GIL.
It looks like this
Main(Hold GIL) -> Stop(Wait callback) -> Callback(Wait GIL) -> GIL(Hold by Main thread)
Damn it! That's why I don't like multi-thread so much.
No matter what, thanks your help.
If you were doing this in pure Python, I'd use a Queue object; these buffer up data which is written but block on read until something is available, and do any necessary locking under the hood.
This is an extremely common datatype, and some equivalent should always be available, whatever your current language or toolchain; there's a STL Queue available in C++, for instance, but the standard doesn't specify thread-safety characteristics (so see your local implementation docs).
Well, theoretically locks can be avoided if both of your threads can work on duplicate copies of the same data. After reading your question in the MSDN forum...
"So to avoid deadlock, I should not acquire any lock in the graber callback function? How can I do if I want to output audio to another thread?"
I think that you should be able to deposit your audio data in a dequeue (an STL class) and then fetch this data from another thread. This other thread can then process your audio data.
I am glad that your problem has been resolved the reason I asked about your Os was that the documentation you referred to said that you should not wait on other threads because of some problem with win16Mutexes. There are no win16mutexes on windows XP (except when programs are running on ntvdm/wow16) so you should be able to use locks to synchronize these threads.

Resources