What are the alternatives to Perl interpreter threads? - multithreading

perldoc threads says:
The use of interpreter-based threads in perl is officially discouraged.
Are there any other Perl based threads? Or should we just not use threads in Perl?

Depends on what you're trying to accomplish. I still use threads extensively, and there isn't a massive problem with them.
The biggest problem with them is that they are not lightweight, where if you've threaded in other languages, you might expect them to be.
They're more the opposite - spawning a thread is like starting your code again, but with some useful hooks for IPC. That means you really don't want to be doing a task-per-thread model of program, like you might be thinking of.
Instead, you would be much better served by a Thread::Queue worker-thread style model. Here's an example of that:
Perl daemonize with child daemons
However, you may want to consider using fork as an alternative. fork - because of it's implementation on Unix - is a very efficient system call, and can be quite efficient for spawning new processes.
The downside is - it's not quite as friendly for doing IPC.
Parallel::ForkManager is one module that I like for doing forking for multiprocessing.
But in either case you should note - multiprocessing isn't a magic bullet. It lets you hog more CPUs if you have the right sort of problem to solve. It won't make your disks go any faster :)

That warning is poppycock. It should be removed. The developers of Perl explained that it means "The use of interpreter-based threads in perl is officially discouraged if you want a lightweight system for multitasking".
Since creating new threads can be expensive, just use a model that involves reusable worker threads.

As long as I know, there are not any reliable thread implemantations. You should stick to some event-based modules, like Coro, AnyEvent, IO::Async etc.

Related

Thread in Tcl is not really working as C threads

In Tclsh thread package, a created thread is not sharing variables and namespace with main thread, which is quite different from C implementation of threads. Why is this contradiction in tcl thread design. Or am i missing something in the code? Does all scripting language have similar threaded design with them?
Below is the quote from Tcl thread documentation PDF,
thread::create
. All other extensions must be loaded
explicitly into each thread
that needs to use them
It's not a contradiction. It's just a different model. It has its advantages and its disadvantages. The key disadvantage you already know: scripts and variables are not shared (unless you take special steps). The key advantage is that the Tcl implementation has no big global locks, and that makes it much easier to use multi-core hardware effectively and means that there are very few gotchas when doing so. Contrast this with the Python Global Interpreter Lock, which is necessary because Python uses the C-like global shared state model.
At the low level, Tcl's threading is strongly isolated with plenty of thread-shared variables behind the scenes so that locks can be avoided (including in the memory management a lot of time, which would otherwise be a key bottleneck). Inter-thread communications are based on top of Tcl's built-in event queueing system; when two threads communicate, one sends a message and (optionally) waits for the other to respond, with the receiver getting the message placed on its internal queue of events until it is in a state that is ready to handle it. This does slow down inter-thread communications, but is much faster when they're not communicating.
It is actually similar to one way you'd use threads in C: message passing. Of course, you can use threads in other ways as well in C. But message passing is one way to completely avoid deadlocks since the semaphores/mutexes can be completely managed around the message queues and you don't need them anywhere else in your code.
This is in fact what Tcl implements at the C level. And it is in fact why it was done this way: to avoid the need for semaphores (to prevent the user form deadlocking himself).
Most other scripting languages simply provide a thin wrapper around pthreads so you can deadlock yourself if you're not careful. I remember way back in the early 2000s the general advice for threaded programming in C and most other languages is to implement a message passing architecture to avoid deadlocks.
Since tcl generally takes the view that API exposed at the script level should be high level, the thread implementation was implemented with a message passing architecture built-in. Of course, there is also the convenient fact that it also avoids having to make the tcl interpreter thread-safe (thus introducing mutexes all over the interpreter source code).
Making interpreters thread-safe is non trivial. Some languages suffer mysterious crashes to this day when running threaded applications. Some languages took over a decade to iron out all threading bugs. Tcl just decided not to try. The tcl interpreter is small enough and spins up quite fast so the solution was to simply run one interpreter per thread.

Use cases for ithreads (interpreter threads) in Perl and rationale for using or not using them?

If you want to learn how to use Perl interpreter threads, there's good documentation in perlthrtut (threads tutorial) and the threads pragma manpage. It's definitely good enough to write some simple scripts.
However, I have found little guidance on the web on why and what to sensibly use Perl's interpreter threads for. In fact, there's not much talk about them, and if people talk about them it's quite often to discourage people from using them.
These threads, available when perl -V:useithreads is useithreads='define'; and unleashed by use threads, are also called ithreads, and maybe more appropriately so as they are very different from threads as offered by the Linux or Windows operating systems or the Java VM in that nothing is shared by default and instead a lot of data is copied, not just the thread stack, thus significantly increasing the process size. (To see the effect, load some modules in a test script, then create threads in a loop pausing for key presses each time around, and watch memory rise in task manager or top.)
[...] every time you start a thread all data structures are copied to
the new thread. And when I say all, I mean all. This e.g. includes
package stashes, global variables, lexicals in scope. Everything!
-- Things you need to know before programming Perl ithreads (Perlmonks 2003)
When researching the subject of Perl ithreads, you'll see people discouraging you from using them ("extremely bad idea", "fundamentally flawed", or "never use ithreads for anything").
The Perl thread tutorial highlights that "Perl Threads Are Different", but it doesn't much bother to explain how they are different and what that means for the user.
A useful but very brief explanation of what ithreads really are is from the Coro manpage under the heading WINDOWS PROCESS EMULATION. The author of that module (Coro - the only real threads in perl) also discourages using Perl interpreter threads.
Somewhere I read that compiling perl with threads enabled will result in a significantly slower interpreter.
There's a Perlmonks page from 2003 (Things you need to know before programming Perl ithreads), in which the author asks: "Now you may wonder why Perl ithreads didn't use fork()? Wouldn't that have made a lot more sense?" This seems to have been written by the author of the forks pragma. Not sure the info given on that page still holds true in 2012 for newer Perls.
Here are some guidelines for usage of threads in Perl I have distilled from my readings (maybe erroneously so):
Consider using non-blocking IO instead of threads, like with HTTP::Async, or AnyEvent::Socket, or Coro::Socket.
Consider using Perl interpreter threads on Windows only, not on UNIX because on UNIX, forks are more efficient both for memory and execution speed.
Create threads at beginning of program, not when memory concumption already considerable - see "ideal way to reduce these costs" in perlthrtut.
Minimize communication between threads because it's slow (all answers on that page).
So far my research. Now, thanks for any more light you can shed on this issue of threads in Perl. What are some sensible use cases for ithreads in Perl? What is the rationale for using or not using them?
The short answer is that they're quite heavy (you can't launch 100+ of them cheaply), and they exhibit unexpected behaviours (somewhat mitigated by recent CPAN modules).
You can safely use Perl ithreads by treating them as independent Actors.
Create a Thread::Queue::Any for "work".
Launch multiple ithreads and "result" Queues passing them the ("work" + own "result") Queues by closure.
Load (require) all the remaining code your application requires (not before threads!)
Add work for the threads into the Queue as required.
In "worker" ithreads:
Bring in any common code (for any kind of job)
Blocking-dequeue a piece of work from the Queue
Demand-load any other dependencies required for this piece of work.
Do the work.
Pass the result back to the main thread via the "result" queue.
Back to 2.
If some "worker" threads start to get a little beefy, and you need to limit "worker" threads to some number then launch new ones in their place, then create a "launcher" thread first, whose job it is to launch "worker" threads and hook them up to the main thread.
What are the main problems with Perl ithreads?
They're a little inconvenient with for "shared" data as you need to explicity do the sharing (not a big issue).
You need to look out for the behaviour of objects with DESTROY methods as they go out of scope in some thread (if they're still required in another!)
The big one: Data/variables that aren't explicitly shared are CLONED into new threads. This is a performance hit and probably not at all what you intended. The work around is to launch ithreads from a pretty much "pristine" condition (not many modules loaded).
IIRC, there are modules in the Threads:: namespace that help with making dependencies explicit and/or cleaning up cloned data for new threads.
Also, IIRC, there's a slightly different model using ithreads called "Apartment" threads, implemented by Thread::Appartment which has a different usage pattern and another set of trade-offs.
The upshot:
Don't use them unless you know what you're doing :-)
Fork may be more efficient on Unix, but the IPC story is much simpler for ithreads. (This may have been mitigated by CPAN modules since I last looked :-)
They're still better than Python's threads.
There might, one day, be something much better in Perl 6.
I have used perl's "threads" on several occasions. They're most useful for launching some process and continuing on with something else. I don't have a lot of experience in the theory of how they work under the hood, but I do have a lot of practical coding experience with them.
For example, I have a server thread that listens for incoming network connections and spits out a status response when someone asks for it. I create that thread, then move on and create another thread that monitors the system, checking five items, sleeping a few seconds, and looping again. It might take 3-4 seconds to collect the monitor data, then it gets shoved into a shared variable, and the server thread can read that when needed and immediately return the last known result to whomever asks. The monitor thread, when it finds that an item is in a bad state, kicks off a separate thread to repair that item. Then it moves on, checking the other items while the bad one is repaired, and kicking off other threads for other bad items or joining finished repair threads. The main program all the while is looping every few seconds, making sure that the monitor and server threads aren't joinable/still running. All of this could be written as a bunch of separate programs utilizing some other form of IPC, but perl's threads make it simple.
Another place where I've used them is in a fractal generator. I would split up portions of the image using some algorithm and then launch as many threads as I have CPUs to do the work. They'd each stuff their results into a single GD object, which didn't cause problems because they were each working on different portions of the array, and then when done I'd write out the GD image. It was my introduction to using perl threads, and was a good introduction, but then I rewrote it in C and it was two orders of magnitude faster :-). Then I rewrote my perl threaded version to use Inline::C, and it was only 20% slower than the pure C version. Still, in most cases where you'd want to use threads due to being CPU intensive, you'd probably want to just choose another language.
As mentioned by others, fork and threads really overlap for a lot of purposes. Coro, however, doesn't really allow for multi-cpu use or parallel processing like fork and thread do, you'll only ever see your process using 100%. I'm over-simplifying this, but I think the easiest way to describe Coro is that it's a scheduler for your subroutines. If you have a subroutine that blocks you can hop to another and do something else while you wait, for example of you have an app that calculates results and writes them to a file. One block might calculate results and push them into a channel. When it runs out of work, another block starts writing them to disk. While that block is waiting on disk, the other block can start calculating results again if it gets more work. Admittedly I haven't done much with Coro; it sounds like a good way to speed some things up, but I'm a bit put off by not being able to do two things at once.
My own personal preference if I want to do multiprocessing is to use fork if I'm doing lots of small or short things, threads for a handful of large or long-lived things.

Lua :: How to write simple program that will load multiple CPUs?

I haven't been able to write a program in Lua that will load more than one CPU. Since Lua supports the concept via coroutines, I believe it's achievable.
Reason for me failing can be one of:
It's not possible in Lua
I'm not able to write it ☺ (and I hope it's the case )
Can someone more experienced (I discovered Lua two weeks ago) point me in right direction?
The point is to write a number-crunching script that does hi-load on ALL cores...
For demonstrative purposes of power of Lua.
Thanks...
Lua coroutines are not the same thing as threads in the operating system sense.
OS threads are preemptive. That means that they will run at arbitrary times, stealing timeslices as dictated by the OS. They will run on different processors if they are available. And they can run at the same time where possible.
Lua coroutines do not do this. Coroutines may have the type "thread", but there can only ever be a single coroutine active at once. A coroutine will run until the coroutine itself decides to stop running by issuing a coroutine.yield command. And once it yields, it will not run again until another routine issues a coroutine.resume command to that particular coroutine.
Lua coroutines provide cooperative multithreading, which is why they are called coroutines. They cooperate with each other. Only one thing runs at a time, and you only switch tasks when the tasks explicitly say to do so.
You might think that you could just create OS threads, create some coroutines in Lua, and then just resume each one in a different OS thread. This would work so long as each OS thread was executing code in a different Lua instance. The Lua API is reentrant; you are allowed to call into it from different OS threads, but only if are calling from different Lua instances. If you try to multithread through the same Lua instance, Lua will likely do unpleasant things.
All of the Lua threading modules that exist create alternate Lua instances for each thread. Lua-lltreads just makes an entirely new Lua instance for each thread; there is no API for thread-to-thread communication outside of copying parameters passed to the new thread. LuaLanes does provide some cross-connecting code.
It is not possible with the core Lua libraries (if you don't count creating multiple processes and communicating via input/output), but I think there are Lua bindings for different threading libraries out there.
The answer from jpjacobs to one of the related questions links to LuaLanes, which seems to be a multi-threading library. (I have no experience, though.)
If you embed Lua in an application, you will usually want to have the multithreading somehow linked to your applications multithreading.
In addition to LuaLanes, take a look at llthreads
In addition to already suggested LuaLanes, llthreads and other stuff mentioned here, there is a simpler way.
If you're on POSIX system, try doing it in old-fashioned way with posix.fork() (from luaposix). You know, split the task to batches, fork the same number of processes as the number of cores, crunch the numbers, collate results.
Also, make sure that you're using LuaJIT 2 to get the max speed.
It's very easy just create multiple Lua interpreters and run lua programs inside all of them.
Lua multithreading is a shared nothing model. If you need to exchange data you must serialize the data into strings and pass them from one interpreter to the other with either a c extension or sockets or any kind of IPC.
Serializing data via IPC-like transport mechanisms is not the only way to share data across threads.
If you're programming in an object-oriented language like C++ then it's quite possible for multiple threads to access shared objects across threads via object pointers, it's just not safe to do so, unless you provide some kind of guarantee that no two threads will attempt to simultaneously read and write to the same data.
There are many options for how you might do that, lock-free and wait-free mechanisms are becoming increasingly popular.

What is so great about TPL

I have done this POC and verified that when you create 4 threads and run them on Quad core machine, all cores get busy - so, CLR is already scheduling threads on different cores effectively, so why the class TASK?
I agree Task simplifies the creation and use of threads, but apart from that what? Its just a wrapper around threads and threadpools right? Or does it in some way help scheduling threads on multicore machines?
I am specifially looking at whats with Task wrt multicore that wasnt there in 2.0 threads.
"I agree Task simplifies the creation and use of threads"
Isn't that enough? Isn't it fabulous that it provides higher-level building blocks so that us mere mortals can build lock-free multithreaded code which is safe because really smart people like Joe Duffy have done the work for us?
If TPL really just consisted of a way of starting a new task, it wouldn't be much use - the work-stealing etc is nice, but probably not crucial to most of us. It's the building blocks around tasks - and in particular around the idea of a "future" - which provide the value. Do you really want to write Parallel.ForEach yourself? Do you want to want to work out how to perform partitioning efficiently yourself? I know that if I tried doing that, it would take me a long time and I'd certainly do a worse job of it than the PFX team.
Many of the advances in development haven't been about making it possible to do something which was impossible before - they've been about raising the abstraction level so that a problem can be solved once and then that solution reused. Do you feel the same way about the CLR itself? You could do the same thing in assembly yourself, obviously... but by raising the abstraction level, the CLR and C# make us more productive.
Although you could do everything equivalently in TPL or threadpool, for a better abstraction, and scalability patterns TPL is preferred over Threadpool. But it is upto the programmer, and if you know exactly what you are doing, and based on your scheduling and synchronization requirements play out in your specific application you could use Threadpool more effectively. There are some stuff you get free with TPL which you've got to code up when using Threadpool, like following few I can think of now.
work stealing
worker thread local pool
scheduling groups of actions like Parallel.For
The TPL lets you program in terms of Tasks not threads. Thinking of tasks solely as a better thread abstraction would be a mistake. Tasks allow you to specify the work you want to get executed not how you want the work executed (threads). This allows you to express the potential parallelism of your application and have the TPL runtime--the scheduler and thread pool--handle how that work gets executed. This means that the TPL will take a lot of the burden off you of having your application deal with ensuring the best perfromance on a wide variety of hardware with different numbers of cores.
For example, the TPL makes it easy to implement key several design patterns that allow you to express the potential parallelism of your application.
http://msdn.microsoft.com/en-us/library/ff963553.aspx
Like Futures (mentioned by Jon) as well as pipelines and parallel loops.

How to make perl thread without copying all variables?

I have one perl program, where using some form of parallelism would really be helpful.
However, I have quite a lot of data in variables, that I don't need at all at that part of the program.
If I use perl threads, it copies all variables every time I create a new thread. In my case, that hurts a lot.
What should I use to make a new thread without the copying? Or are there some better thread implementations, that don't copy everything?
Like the syntax an ease of threads but not all the fat? Use the amazing forks module! It implements the threads interface using fork and IPC making it easy to share data between child processes.
Really, you just have to avoid ithreads. They're horrible, and unlike every other form of threads on the planet they're more expensive than regular heavyweight processes. My preferred solution is to use an event-based framework like POE or AnyEvent (I use POE) and break out any tasks that can't be made nonblocking into subprocesses using POE::Wheel::Run (or fork_call for AnyEvent). It does take more up-front design work to write an app in that manner, but done right, it will give you some efficient code. From time to time I've also written code that simply uses fork and pipe (or open '-|') and IO::Select and waitpid directly within its own event loop, but you should probably consider that a symptom of my having learned C before perl, and not a recommendation. :)
A word to the wise, though: if you're running on Windows, then this approach might be almost as bad as using ithreads directly, since Perl makes up for win32's lack of fork() by using ithreads, so you'll pay that same ithread-creation cost (in CPU and memory) on every fork. There isn't really a good solution to that one.
Use the fork(2) system call to take advantage of Copy-on-write.

Resources