why isn't my thread joinable in perl? - multithreading

I wrote a very short script with Perl and I used multi-thread in it.
My problem is, the thread I created is not joinable. So I am wondering, what is the condition to make thread joinable?
What is the limit of a thread in Perl?
#!/usr/bin/env perl
#
#
use lib "$::XCATROOT/lib/perl";
use strict;
use threads;
use Safe;
sub test
{
my $parm = shift;
}
my $newchassis = ["1", "2", "3"];
my #snmp_threads ;
for my $item (#$newchassis)
{
my $thread = threads->create(\&test, $item);
push #snmp_threads, $thread;
}
for my $t (#snmp_threads)
{
$t->join();
}
This can be very tricky as it works find on RHEL 6.3 and but fails on SLES 11sp2.

Though there is no code, i will go ahead and assume that you are using join foreach #threads; for joining the threads. Now the joining of the threads depends on the post processing. Without seeing your code it's difficult to know, what you are doing. But how it works is that :
If the post-processing step needs all threads to finish before
beginning work, then the wait for individual threads is unavoidable.
If the post-processing step is specific to the results of each
thread, it should be possible to make the post-processing part of
the thread itself.
In both cases, $_->join foreach #threads; is the way to go.
If there is no need to wait for the threads to finish, use the
detach command instead of join. However, any results that the
threads may return will be discarded.
Are you sure, you have provided a valid post processing scenario for your activity ?

Related

Threads in Perl: order of detach()

I am trying to get into Perl's use of threads. Reading the documentation I came across the following code:
use threads;
my $thr = threads->create(\&sub1); # Spawn the thread
$thr->detach(); # Now we officially don't care any more
sleep(15); # Let thread run for awhile
sub sub1 {
my $count = 0;
while (1) {
$count++;
print("\$count is $count\n");
sleep(1);
}
}
The goal, it seems, would be to create one thread running sub1 for 15 seconds, and in the mean time print some strings. However, I don't think I understand what's going on at the end of the programme.
First of all, detach() is defined as follows:
Once a thread is detached, it'll run until it's finished; then Perl
will clean up after it automatically.
However, when does the subroutine finish? while(1) never finishes. Nor do I find any information in sleep() that it'd cause to break a loop. On top of that, from the point we detach we are 'waiting for the script to finish and then clean it up' for 15 seconds, so if we are waiting for the subroutine to finish, why do we need sleep() in the main script? The position is awkward to me; it suggests that the main programme sleeps for 15 seconds. But what is the point of that? The main programme (thread?) sleeps while the sub-thread keeps running, but how is the subroutine then terminated?
I guess the idea is that after sleep-ing is done, the subroutine ends, after which we can detach/clean up. But how is this syntactically clear? Where in the definition of sleep is it said that sleep terminates a subroutine (and why), and how does it know which one to terminate in case there are more than one threads?
All threads end when the program ends. The program ends when the main thread ends. The sleep in the main thread is merely keeping the program running a short time, after which the main thread (therefore the program, therefore all created threads) also end.
So what's up with detach? It just says "I'm never going to bother joining to this thread, and I don't care what it returns". If you don't either detach a thread or join to it, you'd get a warning when the program ends.
detach a thread means "I don't care any more", and that does actually mean when your process exits, the thread will error and terminate.
Practically speaking - I don't think you ever want to detach a thread in perl - just add a join at the end of your code, so it can exit cleanly, and signal it via a semaphore or Thread::Queue in order to terminate.
$_ -> join for threads -> list;
Will do the trick.
That code example - in my opinion - is a bad example. It's just plain messy to sleep so a detached thread has a chance to complete, when you could just join and know that it's finished. This is especially true of perl threads, which it's deceptive to assume they're lightweight, and so can be trivially started (and detached). If you're ever spawning enough that the overhead of joining them is too high, then you're using perl threads wrong, and probably should fork instead.
You're quite right - the thread will never terminate, and so you code will always have a 'dirty' exit.
So instead I'd rewrite:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
my $run : shared;
$run = 1;
sub sub1 {
my $count = 0;
while ($run) {
$count++;
print("\$count is $count\n");
sleep(1);
}
print "Terminating\n";
}
my $thr = threads->create( \&sub1 ); # Spawn the thread
sleep(15); # Let thread run for awhile
$run = 0;
$thr->join;
That way your main signals the thread to say "I'm done" and waits for it to finish it's current loop.

How To Prevent Perl Thread Printouts from Intercepting?

Suppose I made three perl threads that each do this:
print hi I am thread $threadnum!;
print this is a test!;
you would expect an output of:
hi I am thread 1
this is a test!
hi I am thread 2
this is a test!
hi I am thread 3
this is a test!
but instead, what happens about half of the time is:
hi I am thread 1
this is a test!
hi I am thread2
hi I am thread3
this is a test!
this is a test!
is there any was to ensure that they output in the correct order WITHOUT condensing it all into one line?
First of all: don't use perl interpreter threads.
That being said, to prevent those lines from being printed separately, your options are either:
Acquire a semaphore while printing these two lines to prevent two threads from entering that critical section at the same time.
Print a single line of text with an embedded newline, e.g:
print "hi I am thread $threadnum\nthis is a test!\n";
Fundamentally here, you're misunderstanding what threads do. They're designed to operate in parallel and asynchronously.
That means different threads hit different bits of the program at different times, and potentially run on different processors.
One of the drawbacks of this is- as you have found - you cannot guarantee ordering or atomicity of operations. That's true of compound operations too - you can't actually guarantee that even a print statement is an atomic operation - you can end up with split lines.
You should always assume that any operation isn't atomic, unless you know for sure otherwise, and lock accordingly. Most of the time you will get away with it, but you can find yourself tripping over some truly horrific and hard to find bugs, because in a small percentage of cases, your non-atomic operations are interfering with each other. Even things like ++ may not be. (This isn't a problem on variables local to your thread though, just any time you're interacting with a shared resource, like a file, STDOUT, shared variable, etc.)
This is a very common problem with parallel programming though, and so there are a number of solutions:
Use lock and a shared variable:
##outside threads:
use threads::shared;
my $lock : shared;
And inside the threads:
{
lock $lock;
### do atomic operation
}
When the lock leaves scope, it'll be released. A thread will 'block' waiting to obtain that lock, so for that one bit, you are no longer running parallel.
Use a Thread::Semaphore
Much like a lock - you have the Thread::Semaphore module, that ... in your case works the same. But it's built around limited (but more than 1) resources. I wouldn't use this in your scenario, but it can be useful if you're trying to e.g. limit concurrent disk IO and concurrent processor usage - set a semaphore:
use Thread::Semaphore;
my $limit = Thread::Semaphore -> new ( 8 );
And inside the threads:
$limit -> down();
#do protected bit
$limit -> up();
You can of course, set that 8 to 1, but you don't gain that much over lock. (Just the ability to up() to remove it, rather than letting it go out of scope).
Use a 'IO handler' thread with Thread::Queue
(In forking, you might use a pipe here).
use Thread::Queue;
my $output = Thread::Queue -> new ();
sub print_output_thread {
while ( $output -> dequeue ) {
print;
}
}
threads -> create ( \&output_thread );
And in your threads, rather than print you would use:
$output -> enqueue ( "Print this message \n" );
This thread serialises your output, and will ensure each message is atomic - but note that if you do two enqueue operations, they might be interleaved again, for exactly the same reason.
So you would need to;
$output -> enqueue ( "Print this message\n", "And this message too\n" );
(you can also lock your queue as in the first example). That's again, perhaps a bit overkill for your example, but it can be useful if you're trying to collate results into a particular ordering.

Perl threading model

I have 100+ tasks to do, I can do it in a loop, but that will be slow
I want to do these jobs by threading, let's say, 10 threads
There is no dependency between the jobs, each can run independently, and stop if failed
I want these threads to pick up my jobs and do it, there should be no more than 10 threads in total, otherwise it may harm the server
These threads keep doing the jobs until all finished
Stop the job in the thread when timeout
I was searching information about this on the Internet, Threads::Pool, Threads::Queue...
But I can't be sure on which one is better for my case. Could anyone give me some advise?
You could use Thread::Queue and threads.
The IPC (communication between threads) is much easier tan between processes.
To fork or not to fork?
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
my #thrs = threads->create(sub {
while (my $item = $q->dequeue()) {
# Do work on $item
}
})->detach() for 1..10;#for 10 threads
my $dbh = ...
while (1){
#get items from db
my #items = get_items_from_db($dbh);
# Send work to the thread
$q->enqueue(#items);
print "Pending items: "$q->pending()."\n";
sleep 15;#check DB in every 15 secs
}
I'd never use perl threads. The reason is that they aren't conceptually speaking threads: you have to specify what data is to be shared between the threads. Each thread runs a perl interpreter. That's why they are called interpreterthreads or ithreads. Needless to say, this consumes alot of memory all for running things in parallel. fork() shares al the memory up until the fork point. So if they are independent tasks, always use fork. It's also the most Unix way of doing things.

Perl parallel HTTP requests - out of memory

First of all, I'm new to Perl.
I want to make multiple (e.g. 160) HTTP GET requests on a REST API in Perl. Executing them one after another takes much time, so I was thinking of running the requests in parallel. Therefore I used threads to execute more requests at the same time and limited the number of parallel requests to 10.
This worked just fine for the first time I ran the program, the second time I ran 'out of memory' after the 40th request.
Here's the code: (#urls contains the 160 URLs for the requests)
while(#urls) {
my #threads;
for (my $j = 0; $j < 10 and #urls; $j++) {
my $url = shift(#urls);
push #threads, async { $ua->get($url) };
}
for my $thread (#threads) {
my $response = $thread->join;
print "$response\n";
}
}
So my question is, why am I NOT running out of memory the first time but the second time (am I missing something crucial in my code)? And what can I do to prevent it?
Or is there a better way of executing parallel GET requests?
I'm not sure why you would get a OOM error on a second run when you don't get one on the first run; when you run a Perl script and the perl binary exits, it'll release all of it's memory back to the OS. Nothing is kept between executions. Is the exact same data being returned by the REST service each time? Maybe there's more data the second time you run and it's pushing you over the edge.
One problem I notice is that you're launching 10 threads and running them to completion, then spawning 10 more threads. A better solution may be a worker-thread model. Spawn 10 threads (or however many you want) at the start of the program, put the URLs into a queue, and allow the threads to process the queue themselves. Here's a quick example that may help:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new();
my #thr = map {
threads->create(sub {
my #responses = ();
while (defined (my $url = $q->dequeue())) {
push #responses, $ua->get($url);
}
return #responses;
});
} 1..10;
$q->enqueue($_) for #urls;
$q->enqueue(undef) for 1..10;
foreach (#thr) {
my #responses_of_this_thread = $_->join();
print for #responses_of_this_thread;
}
Note, I haven't tested this to make sure it works. In this example, you create a new thread queue and spawn up 10 worker threads. Each thread will block on the dequeue method until there is something to be read. Next, you queue up all the URLs that you have, and an undef for each thread. The undef will allow the threads to exit when there is no more work to perform. At this point, the threads will go through and process the work, and you will gather the responses via the join at the end.
Whenever I need an asynchronous solution Perl, I first look at the POE framework. In this particular case I used POE HTTP Request module that will allow us to send multiple requests simultaneously and provide a callback mechanism where you can process your http responses.
Perl threads are scary and can crash your application, especially when you join or detach them. If responses do not take a long time to process, a single threaded POE solution would work beautifully.
Sometimes though, we have to a rely on threading because application gets blocked due to long running tasks. In those cases, I create a certain number of threads BEFORE initiating anything in the application. Then with Thread::Queue I pass the data from the main thread to these workers AND never join/detach them; always keep them around for stability purposes.
(Not an ideal solution for every case.)
POE supports threads now and each thread can run a POE::Kernel. The kernels can communicate with each other through TCP sockets (which POE provides nice unblocking interfaces).

Does Perl's thread join method wait for the first thread to finish if I launch more than one thread?

foreach $name (#project_list)
{
# a thread is created for each project
my $t = threads->new(\&do_work, $name);
push(#threads, $t);
}
foreach (#threads) {
my $thrd = $_->join;
print "Thread $thrd done\n";
}
sub do_work {
# execute some commands here...
}
The project_list is a list of 40 items. When I spawn a thread for each item, will the join method wait for the first thread to finish and then move over to the next one and so on?
If that is the case, then is it possible to avoid it? I mean some threads will finish faster then others so why wait?
Please let me know if more information is required.
Thank you.
$_->join waits for the thread designated by $_ to finish. Since you're pushing them in order, and foreach traverses the list in order, yes, you'll wait first for the first thread.
But that doesn't matter since you're waiting for all threads to finish. It doesn't matter if you wait for the fastest finishers or the slowest ones first - you'll be waiting for everyone anyway.
Why wait? It depends on the scope of the post-processing step.
If the post-processing step needs all threads to finish before beginning work, then the wait for individual threads is unavoidable.
If the post-processing step is specific to the results of each thread, it should be possible to make the post-processing part of the thread itself.
In both cases, $_->join foreach #threads; is the way to go.
If there is no need to wait for the threads to finish, use the detach command instead of join. However, any results that the threads may return will be discarded.
The main thread has to live for the duration of all threads so you need to know when they are all done. The can be done queues or semaphores. Semaphores are the simplest:
use Thread::Semaphore;
my $S = Thread::Semaphore->new();
foreach $name (#project_list)
{
# a thread is created for each project
my $t = threads->new(\&do_work, $name);
$S->down_force(); # take one for each thread
}
$S->down(); # this blocks until worker threads release one each
print "Thread $thrd done\n";
sub do_work {
# execute some commands here...
$S->up(); # here the worker gives one back when done.
}

Resources