Lua - how to simulate "multithreading"? - multithreading

I have 2 functions:
function func1()
while true do
-- listen on connection
end
end
function func2()
while true do
-- execute other code
end
end
I want to run both functions "simultaneously" while sharing variables between them. I have tried to create a dispatcher that makes a coroutine with the two functions but I can't think of a way to schedule them to quickly alternate their execution. (func1 runs for a second, func2 runs for a second, func1 runs for a second, and so on)

Lua does not support asynchronous multithreading. It only supports cooperative threading. That means the two "threads" have to be designed to give the other thread time to execute. Such designs are usually highly dependent on what you're trying to accomplish.
Your example talks about one thread listening for a connection and the other thread doing something (either with data from that connection or not; it's not exactly clear). In such a system, it would be a good idea to have func1 invoke the thread for func2 when the connection hasn't provided new data. And func2 would return control back to func1 only when it has finished processing something.
But there is no one-size-fits-all solution to cooperative multithreading.

There exist C libraries for Lua which expose methods for multithreading or multiprocessing. Some examples would be
MPI for Lua
lua-llthread
Lua 5.1 pthread
lua---parallel (apparently (?) depends on Torch)
Torch has some parallel methods
All of these are third-party solution and, as the other answer explains, there are no built-in capabilities for asynchronous multithreading in Lua.
I think lua-llthread would come closest to what you describe. It supports communication between threads via ZeroMQ.

Related

How can I integrate asyncio with an external event loop?

I'm writing an event-driven program with the event scheduling written in C. The program uses Python for extension modules. I would like to allow extension modules to use async/await syntax to implement coroutines. Coroutines will just interact with parts of my program, no IO is involved. My C scheduler is single-threaded and I need coroutines to execute in its thread.
In a pure Python program, I would just use asyncio as is, and let my program use its event loop to drive all events. This is however not an option; my event loop needs to serve millions of C-based events per second and I cannot afford Python's overheads.
I tried to write my own event loop implementation, that delegates all scheduling to my C scheduler. I tried a few approaches:
Re-implement EventLoop, Future, Task etc to imitate how asyncio works (minus IO), so that call_soon delegates scheduling to my C event loop. This is safe, but requires a bit of work and my implementation will always be inferior to asyncio when it comes to documentation, debugging support, intricate semantic details, and correctness/test coverage.
I can use vanilla Task, Future etc from asyncio, and only create a custom implementation of AbstractEventLoop, delegating scheduling to my C event loop in the same way. This is pretty straightforward, but I can see that the vanilla EventLoop accesses non-obvious internals (task._source_traceback, _asyncgen_finalizer_hook, _set_running_loop), so my implementation is still second class. I also have to rely on the undocumented Handle._run to actually invoke callbacks.
Things appeared to get simpler and better if I subclassed from BaseEventLoop instead of AbstractEventLoop (but docs say I shouldn't do that). I still need Handle._run, though.
I could spawn a separate thread that run_forever:s a vanilla asyncio.DefaultEventLoop and run all my coroutines there, but coroutines depend on my program's extension API, which does not support concurrent calls. So I must somehow make DefaultEventLoop pause my C event loop while calling Handle._run(). I don't see a reasonable way to achieve that.
Any ideas on how to best do this? How did others solve this problem?
I found that trio, a third-party alternative to asyncio, provides explicit support for integration with alien event loops through something called guest mode. Solves my problem!

Thread in Tcl is not really working as C threads

In Tclsh thread package, a created thread is not sharing variables and namespace with main thread, which is quite different from C implementation of threads. Why is this contradiction in tcl thread design. Or am i missing something in the code? Does all scripting language have similar threaded design with them?
Below is the quote from Tcl thread documentation PDF,
thread::create
. All other extensions must be loaded
explicitly into each thread
that needs to use them
It's not a contradiction. It's just a different model. It has its advantages and its disadvantages. The key disadvantage you already know: scripts and variables are not shared (unless you take special steps). The key advantage is that the Tcl implementation has no big global locks, and that makes it much easier to use multi-core hardware effectively and means that there are very few gotchas when doing so. Contrast this with the Python Global Interpreter Lock, which is necessary because Python uses the C-like global shared state model.
At the low level, Tcl's threading is strongly isolated with plenty of thread-shared variables behind the scenes so that locks can be avoided (including in the memory management a lot of time, which would otherwise be a key bottleneck). Inter-thread communications are based on top of Tcl's built-in event queueing system; when two threads communicate, one sends a message and (optionally) waits for the other to respond, with the receiver getting the message placed on its internal queue of events until it is in a state that is ready to handle it. This does slow down inter-thread communications, but is much faster when they're not communicating.
It is actually similar to one way you'd use threads in C: message passing. Of course, you can use threads in other ways as well in C. But message passing is one way to completely avoid deadlocks since the semaphores/mutexes can be completely managed around the message queues and you don't need them anywhere else in your code.
This is in fact what Tcl implements at the C level. And it is in fact why it was done this way: to avoid the need for semaphores (to prevent the user form deadlocking himself).
Most other scripting languages simply provide a thin wrapper around pthreads so you can deadlock yourself if you're not careful. I remember way back in the early 2000s the general advice for threaded programming in C and most other languages is to implement a message passing architecture to avoid deadlocks.
Since tcl generally takes the view that API exposed at the script level should be high level, the thread implementation was implemented with a message passing architecture built-in. Of course, there is also the convenient fact that it also avoids having to make the tcl interpreter thread-safe (thus introducing mutexes all over the interpreter source code).
Making interpreters thread-safe is non trivial. Some languages suffer mysterious crashes to this day when running threaded applications. Some languages took over a decade to iron out all threading bugs. Tcl just decided not to try. The tcl interpreter is small enough and spins up quite fast so the solution was to simply run one interpreter per thread.

Lua :: How to write simple program that will load multiple CPUs?

I haven't been able to write a program in Lua that will load more than one CPU. Since Lua supports the concept via coroutines, I believe it's achievable.
Reason for me failing can be one of:
It's not possible in Lua
I'm not able to write it ☺ (and I hope it's the case )
Can someone more experienced (I discovered Lua two weeks ago) point me in right direction?
The point is to write a number-crunching script that does hi-load on ALL cores...
For demonstrative purposes of power of Lua.
Thanks...
Lua coroutines are not the same thing as threads in the operating system sense.
OS threads are preemptive. That means that they will run at arbitrary times, stealing timeslices as dictated by the OS. They will run on different processors if they are available. And they can run at the same time where possible.
Lua coroutines do not do this. Coroutines may have the type "thread", but there can only ever be a single coroutine active at once. A coroutine will run until the coroutine itself decides to stop running by issuing a coroutine.yield command. And once it yields, it will not run again until another routine issues a coroutine.resume command to that particular coroutine.
Lua coroutines provide cooperative multithreading, which is why they are called coroutines. They cooperate with each other. Only one thing runs at a time, and you only switch tasks when the tasks explicitly say to do so.
You might think that you could just create OS threads, create some coroutines in Lua, and then just resume each one in a different OS thread. This would work so long as each OS thread was executing code in a different Lua instance. The Lua API is reentrant; you are allowed to call into it from different OS threads, but only if are calling from different Lua instances. If you try to multithread through the same Lua instance, Lua will likely do unpleasant things.
All of the Lua threading modules that exist create alternate Lua instances for each thread. Lua-lltreads just makes an entirely new Lua instance for each thread; there is no API for thread-to-thread communication outside of copying parameters passed to the new thread. LuaLanes does provide some cross-connecting code.
It is not possible with the core Lua libraries (if you don't count creating multiple processes and communicating via input/output), but I think there are Lua bindings for different threading libraries out there.
The answer from jpjacobs to one of the related questions links to LuaLanes, which seems to be a multi-threading library. (I have no experience, though.)
If you embed Lua in an application, you will usually want to have the multithreading somehow linked to your applications multithreading.
In addition to LuaLanes, take a look at llthreads
In addition to already suggested LuaLanes, llthreads and other stuff mentioned here, there is a simpler way.
If you're on POSIX system, try doing it in old-fashioned way with posix.fork() (from luaposix). You know, split the task to batches, fork the same number of processes as the number of cores, crunch the numbers, collate results.
Also, make sure that you're using LuaJIT 2 to get the max speed.
It's very easy just create multiple Lua interpreters and run lua programs inside all of them.
Lua multithreading is a shared nothing model. If you need to exchange data you must serialize the data into strings and pass them from one interpreter to the other with either a c extension or sockets or any kind of IPC.
Serializing data via IPC-like transport mechanisms is not the only way to share data across threads.
If you're programming in an object-oriented language like C++ then it's quite possible for multiple threads to access shared objects across threads via object pointers, it's just not safe to do so, unless you provide some kind of guarantee that no two threads will attempt to simultaneously read and write to the same data.
There are many options for how you might do that, lock-free and wait-free mechanisms are becoming increasingly popular.

In Go, does it make sense to write non-blocking code?

coming from node.js point of view, where all code is non-blocking.
In Go, non-blocking is easily achieved using channels.
if one were writing a node.js type server in go, does it make sense to make it non-blocking? for example, having a database connect() function return a channel, as versus blocking while waiting for the connection to occur.
to me, this seems the correct approach
but ...
Blocking and non-blocking aren't really about performance, they are about an interface.
If you have a single thread of execution then a blocking call prevents your program from doing any useful work while it's waiting.
But if you have multiple threads of execution a blocking call doesn't really matter because you can just leave that thread blocked and do useful work in another.
In Go, a goroutine is swapped out for another one when it blocks on I/O. The Go runtime uses non-blocking I/O syscalls to avoid the operating system blocking the thread so a different goroutine can be run on it while the first is waiting for it's I/O.
Goroutines are really cheap so writing non-blocking style code is not needed.
Write blocking functions. The language allows you to easily turn a synchronous call into an asynchronous one.
If you want to call a function asynchronously, use a go statement. Something like this:
c := make(chan bool)
go func() {
blockingFunction()
c <- true
}()
// do some other stuff here while the blocking function runs
// wait for the blocking function to finish if it hasn't already
<-c
In Go, system calls are implemented in a non-blocking way using the most efficient underlying mechanism that the OS supports (e.g. epoll). If you have no other code to run while you wait for the result of a call, then it blocks the thread (for lack of a better thing to do), but if you have alternate goroutines active, then they will run instead.
Callbacks (as you're used to using in js) allow for essentially the same underlying mechanics, but with arguably more mental gymnastics necessary for the programmer.
In Go, your code to run after a function call is specified immediately following the function call rather than defined as a callback. Code that you want to run parallel to an execution path should be wrapped in a goroutine, with communication through channels.
For typical web-server type applications, I would recommend not making everything asynchronous. There are a few reasons.
It's easier to reason about serial blocking code than async code (easier to see bugs)
golang error handling is based on defer(), panic(), and recover(), which probably won't give you what you want with 100% asynchronous code
Goroutines can leak if you're not careful [one discussion]. The more async behavior you have, the harder it becomes to track down these types of problems and the more likely they are to show up.
One strategy is to focus the asynchonicity at a high level and leave everything else blocking. So you might have a "database handler" blob that is logically distinct from the "request handler" blob. They both run in separate goroutines and communicate using channels. But within the "database handler", the calls to establish a database connection and execute each query are blocking.
You don't have to choose 100% asynchronous or 0% asynchronous.
Blocking interfaces are always simpler and better than non-blocking ones. The beauty of Go is that it allows you to write concurrent (and parallel) code in a simple, and easy to reason about, blocking style.
The fashion for non-blocking programming is all due to deficiencies in the languages people are using (specially JavaScript), not because non-blocking programming is intrinsically better.

How to implement a practical fiber scheduler?

I know the very basics about using coroutines as a base and implementing a toy scheduler. But I assume it's oversimplified view about asynchronous schedulers in whole. There are whole set of holes missing in my thoughts.
How to keep the cpu from running a scheduler that's running idle/waiting? Some fibers just sleep, others wait for input from operating system.
You'd need to multiplex io operations into an event based interface(select/poll), so you can leverage the OS to do the waiting, while still being able to schedule other fibers. select/poll have a timeout argument - for fibers that want to sleep, you can create a priority queue that uses that option of select/poll to emulate a sleep call.
Trying to serve fibers that does blocking operations (call read/write/sleep etc). directly won't work unless you schedule each fiber in a native thread - which kind of beats the purpose.
See http://swtch.com/libtask/ for a working implementation.
You should probably take a look at the setcontext family of functions (http://en.wikipedia.org/wiki/Setcontext). This will mean that within your application you will need to re-implement all functions that may block (read, write, sleep etc) into asynchronous forms and return to the scheduler.
Only the "scheduler fibre" will get to wait on completion events using select(), poll() or epoll(). This means when the scheduler is idle, the process will be sleeping in the select/poll/epoll call, and would not be taking up CPU.
Though it's a little bit late to answer, I'd like to mention that I have a practical implementation of a fiber library in C, called libevfibers.
Despite of being a young project, it is used in production. It provides a solution not only to classical asynchronous operations like reading/writing a socket, but also addresses the filesystem IO in a non-blocking manner. The project leverages 3 great libraries --- libcoro, libev and libeio.
You can control controlflow also via the use of coroutines. A library that supports the creation of those is BOOST.ASIO.
A good example is available here: Boost Stackful Coroutines
From an implementation point of view, you can start with an asynchronous event loop implementation. Then you can just implement the fiber scheduling on top of that by using the asynchronous event handlers to switch to the corresponding fiber.
A sleeping/waiting fiber just means that it isn't scheduled at the moment - it just switches to the event loop instead.
BTW, if you are looking for some actual code, have a look at http://svn.cmeerw.net/src/nginetd/trunk/ which is still work in progress, but tries to implement a fiber scheduler on top of a multi-threaded event loop (with Win32 I/O completion ports or Linux's edge-triggered epoll).

Resources