I have a multithreaded C++ program in which the main thread creates two tcl interpreters, interp#1 and interp#2. During parallel running, the main thread and one slave thread each try to invoke different cmds through interp#1 and interp#2 seperately. At some point, memory error happens and program crashes.
The log file tells me that some value of kObjv[] for interp#1 is contaminated by that for interp#2.
I also run helgrind to check possible data races and it dumps plenty of data race risks on beneath tcl lib apis, like: Tcl_NewStringObj/TclFreeObj/ResetObjResult/TclNREvalObjv, etc.
It looks like the underlying memory is shared by interpreters from same thread. Is that true? My program links static tcl 8.6 lib, which was installed with thread enabled.
The Tcl library uses thread-bound memory pooling to (hugely!) reduce pressure on global locks, with the consequence that every Tcl interpreter object is also strongly bound to the thread that created it. (This is the Apartment Threading Model, if you're familiar with that.) You cannot safely use a Tcl interpreter from any other thread. If you want to have access to a Tcl interpreter in each thread, each thread should create its own interpreter and use that.
There are a few operations that allow safe inter-thread communication, specifically Tcl_ThreadQueueEvent() and Tcl_ThreadAlert(), which allow you to lodge a message for the other thread to handle when it is ready (every thread with a Tcl interpreter on it has an event queue associated with it inside the Tcl library; this is in the core of the Tcl event notifier engine).
You're recommended to use the Tcl thread package (which should be part of any good Tcl 8.6 installation and is available for older versions too) for inter-thread working in Tcl. Apart from the complexity of getting each side to know what the handle for the other thread is, it's really quite easy to use.
Related
I have been studying multithreading in python for a while, however I was confused on a few issues-
Firstly, are the threads created by the python threading library user level or kernel level threads?
Books say that user level threads must be mapped to kernel threads and
the operating system only creates and maintains kernel level threads.
Which thread model will be used in the python threading library? Further, who makes the choice between kernel and user level threads? Is it the operating system or can the programmer have a say?
If the many-to-one model (illustrated in the picture) is used, I think it is not real multithreading, since all the threads map to a single kernel thread.
Is there a way to direct the operating system to adhere to a certain threading model in my python program?
Can all running threads for a process be shown with their state separately marked as either kernel or user level. Also can the mappings between the two levels (user and kernel) be shown?
Usually, you never create 'kernel level threads' directly - everything you do in user space executes in user space, otherwise even a random browser JavaScript would be executing at the kernel level guaranteeing that within seconds the whole internet would go dark.
Thus, in most languages, a threading interface (if supported) is far removed from the actual 'kernel threads' and depending on implementation it will either link to a lower-level threading interface (pthreads for example) or just simulate threading unbeknownst to the user. Going down that chain, pthreads may or may not link to actual 'kernel' threads (it happens to be true on Linux, but on Windows there is another level of separation) but even then, the code executes in the user space - the 'supporting' kernel thread is there to control the scheduling the code runs separately.
When it comes to CPython, its threading interface links to pthreads so, technically, there is a chain from a Python thread all the way down to the kernel threads. However, Python also has the dreaded GIL pretty much guaranteeing that, with some rare exceptions mostly related to I/O, no two threads ever execute at the same time, pretty much making its threads operate in a cooperative multitasking mode. However, since on most systems processes are also backed by kernel threads, you can still utilize them in all their glory by using the multiprocessing interface.
Also, until you have multiple cores/CPUs on your system even kernel threads execute in a cooperative multitasking mode so, technically, kernel threads don't guarantee actual multi-threading as you're describing it.
As for how to list threads and their dependencies, you can use top -H -p <pid> to show the thread tree of a process.
I am trying to embed a ruby1.9 interpreter in a program. I am currently using forkOS in my hruby package, but it seems this only works for ruby 1.8 and 2.x. It looks like 1.9 needs to execute in the primary thread. As a side node, there is no documentation one how to do such a thing, so the only pointer to my current problem is here.
Is there a way to take control of the primary thread to run all my FFI calls ?
Having done some testing and reading of documentation I have come to the following conclusions. The report says all of this is implementation defined so there is no standard way. The module Control.Concurrent states in its documentation that main is a bound thread however it doesn't require that it is the same as the primary OS thread.
Experimentally (at least on Linux 64-bit with GHC 7.8 and 7.10-rc3) the main thread is the OS thread. Given that the main thread is bound it seems there would be no reason for this to be different on other GHC platforms however I cannot test other platforms.
In terms of actually implementing this if you want to program as if ruby was in a different thread you can run most non-ruby stuff in a different thread and communicate with main thread (which talks to the ruby interpreter) via either MVars or TVars. See the comment by #chi for an example of how this is done in gtk.
In terms of a library interface you can have a initialisation function that takes a continuation. Your library hijacks the thread at initialisation and then calls the continuation on another thread. Of course you then need to document to users that it must be called in the main thread.
In Tclsh thread package, a created thread is not sharing variables and namespace with main thread, which is quite different from C implementation of threads. Why is this contradiction in tcl thread design. Or am i missing something in the code? Does all scripting language have similar threaded design with them?
Below is the quote from Tcl thread documentation PDF,
thread::create
. All other extensions must be loaded
explicitly into each thread
that needs to use them
It's not a contradiction. It's just a different model. It has its advantages and its disadvantages. The key disadvantage you already know: scripts and variables are not shared (unless you take special steps). The key advantage is that the Tcl implementation has no big global locks, and that makes it much easier to use multi-core hardware effectively and means that there are very few gotchas when doing so. Contrast this with the Python Global Interpreter Lock, which is necessary because Python uses the C-like global shared state model.
At the low level, Tcl's threading is strongly isolated with plenty of thread-shared variables behind the scenes so that locks can be avoided (including in the memory management a lot of time, which would otherwise be a key bottleneck). Inter-thread communications are based on top of Tcl's built-in event queueing system; when two threads communicate, one sends a message and (optionally) waits for the other to respond, with the receiver getting the message placed on its internal queue of events until it is in a state that is ready to handle it. This does slow down inter-thread communications, but is much faster when they're not communicating.
It is actually similar to one way you'd use threads in C: message passing. Of course, you can use threads in other ways as well in C. But message passing is one way to completely avoid deadlocks since the semaphores/mutexes can be completely managed around the message queues and you don't need them anywhere else in your code.
This is in fact what Tcl implements at the C level. And it is in fact why it was done this way: to avoid the need for semaphores (to prevent the user form deadlocking himself).
Most other scripting languages simply provide a thin wrapper around pthreads so you can deadlock yourself if you're not careful. I remember way back in the early 2000s the general advice for threaded programming in C and most other languages is to implement a message passing architecture to avoid deadlocks.
Since tcl generally takes the view that API exposed at the script level should be high level, the thread implementation was implemented with a message passing architecture built-in. Of course, there is also the convenient fact that it also avoids having to make the tcl interpreter thread-safe (thus introducing mutexes all over the interpreter source code).
Making interpreters thread-safe is non trivial. Some languages suffer mysterious crashes to this day when running threaded applications. Some languages took over a decade to iron out all threading bugs. Tcl just decided not to try. The tcl interpreter is small enough and spins up quite fast so the solution was to simply run one interpreter per thread.
My application uses Lua in multithreaded environment with global mutex. It implemented like this:
Thread locks mutex,
Call lua_newthread
Perform some initialization on coroutine
Run lua_resume on coroutine
Unlocks mutex
lua_lock/unlock is not implemented, GC is stopped, when lua works with coroutine.
My question is, can I perform steps 2 and 3 without locking, if initialisation process does not requires any global Lua structs? Can i perform all this process without locking at all, if coroutine does not requires globals too?
In what case I generally can use Lua functions without locking?
Lua does not guarantee thread safety if you're trying to use single Lua state in separate OS threads without lua_lock/unlock. If you want to use multithreaded environment you need to use individual state for each OS thread.
Look at some multithreading solutions, e.g. https://github.com/effil/effil.
In what case I generally can use Lua functions without locking?
On the same Lua state (or threads derived from the same source Lua state)?
None.
Lua is thread-safe in the sense that separate Lua state instances can be executed in parallel. There are absolutely no thread safety guarantees when you call any Lua API function from two different threads on the same Lua state instance.
You cannot do any of the steps 2, 3, or 4 outside of some synchronization mechanism to prevent concurrent access to the same state. It doesn't matter if it's just creating a new thread (which allocates memory) or some "initialization process" (which will likely allocate memory). Even things that don't allocate memory are still not allowed.
Lua offers no guarantees about thread-safety within a Lua state.
I haven't been able to write a program in Lua that will load more than one CPU. Since Lua supports the concept via coroutines, I believe it's achievable.
Reason for me failing can be one of:
It's not possible in Lua
I'm not able to write it ☺ (and I hope it's the case )
Can someone more experienced (I discovered Lua two weeks ago) point me in right direction?
The point is to write a number-crunching script that does hi-load on ALL cores...
For demonstrative purposes of power of Lua.
Thanks...
Lua coroutines are not the same thing as threads in the operating system sense.
OS threads are preemptive. That means that they will run at arbitrary times, stealing timeslices as dictated by the OS. They will run on different processors if they are available. And they can run at the same time where possible.
Lua coroutines do not do this. Coroutines may have the type "thread", but there can only ever be a single coroutine active at once. A coroutine will run until the coroutine itself decides to stop running by issuing a coroutine.yield command. And once it yields, it will not run again until another routine issues a coroutine.resume command to that particular coroutine.
Lua coroutines provide cooperative multithreading, which is why they are called coroutines. They cooperate with each other. Only one thing runs at a time, and you only switch tasks when the tasks explicitly say to do so.
You might think that you could just create OS threads, create some coroutines in Lua, and then just resume each one in a different OS thread. This would work so long as each OS thread was executing code in a different Lua instance. The Lua API is reentrant; you are allowed to call into it from different OS threads, but only if are calling from different Lua instances. If you try to multithread through the same Lua instance, Lua will likely do unpleasant things.
All of the Lua threading modules that exist create alternate Lua instances for each thread. Lua-lltreads just makes an entirely new Lua instance for each thread; there is no API for thread-to-thread communication outside of copying parameters passed to the new thread. LuaLanes does provide some cross-connecting code.
It is not possible with the core Lua libraries (if you don't count creating multiple processes and communicating via input/output), but I think there are Lua bindings for different threading libraries out there.
The answer from jpjacobs to one of the related questions links to LuaLanes, which seems to be a multi-threading library. (I have no experience, though.)
If you embed Lua in an application, you will usually want to have the multithreading somehow linked to your applications multithreading.
In addition to LuaLanes, take a look at llthreads
In addition to already suggested LuaLanes, llthreads and other stuff mentioned here, there is a simpler way.
If you're on POSIX system, try doing it in old-fashioned way with posix.fork() (from luaposix). You know, split the task to batches, fork the same number of processes as the number of cores, crunch the numbers, collate results.
Also, make sure that you're using LuaJIT 2 to get the max speed.
It's very easy just create multiple Lua interpreters and run lua programs inside all of them.
Lua multithreading is a shared nothing model. If you need to exchange data you must serialize the data into strings and pass them from one interpreter to the other with either a c extension or sockets or any kind of IPC.
Serializing data via IPC-like transport mechanisms is not the only way to share data across threads.
If you're programming in an object-oriented language like C++ then it's quite possible for multiple threads to access shared objects across threads via object pointers, it's just not safe to do so, unless you provide some kind of guarantee that no two threads will attempt to simultaneously read and write to the same data.
There are many options for how you might do that, lock-free and wait-free mechanisms are becoming increasingly popular.