Basic perl threading: threads->list() does not decrease - multithreading

It looks like thread_sleep is not ending properly.
I could handle it using thread queues, semaphores and such but I am interested in what the problem may be here.
This code never ends, as the threads->list() size never decreases.
use strict;
use warnings;
use Thread;
my #threads;
my $count = 0;
while ( scalar( #threads ) < 10 ) {
my $thr = threads->create( 'thread_sleep' );
push #threads, $thr;
$count++;
print "Spawned Thread nr. $count\n";
while ( threads->list() > 4 ) {
print "too many threads, sleeping a second...\n";
sleep( 1 );
}
}
sub thread_sleep {
sleep( 5 );
}

Threads work a lot like processes -- after a thread exits, it stays around in the thread list as a "zombie" thread until another thread (not necessarily its parent) calls $thr->join to collect its return value.
You aren't calling $thr->join anywhere, so these threads are piling up. You can use threads->list(threads::joinable) to check which threads have exited and are now joinable.
(Alternatively, consider using Parallel::ForkManager to manage multiple worker processes. Perl interpreter threads are messy and are best avoided.)

You are relying on a very outdated Perl module that was introduced in 1998 and removed from core in 2007. You don't say what version of Perl you're running, but software doesn't work like automobiles where a pristine example of very old edition is laudable
You need to update your installation
The documentation for the Thread module says this (markup original)
DEPRECATED
The Thread module served as the frontend to the old-style thread model, called 5005threads, that was introduced in release 5.005. That model was deprecated, and has been removed in version 5.10.
For old code and interim backwards compatibility, the Thread module has been reworked to function as a frontend for the new interpreter threads (ithreads) model. However, some previous functionality is not available. Further, the data sharing models between the two thread models are completely different, and anything to do with data sharing has to be thought differently. With ithreads, you must explicitly share() variables between the threads.
You are strongly encouraged to migrate any existing threaded code to the new model (i.e., use the threads and threads::shared modules) as soon as possible.

Related

Perl modules to use for parallel processing

I want to parallelize a program written in Perl.
The code loops over multiple files and calls a subroutine for each file.
I also need to share some read only local data structures with the subroutine.
sub process_in_parallel {
my $readOnlySchema = foo();
foreach my $file ( #files ) {
validate_the_file($file,$readOnlySchema);
}
What are the Perl modules that the perl monks can recommend for this scenario. I tried some of the following:
threads
The problem with this is managing the threads. Is there an efficient thread manager or thread pool library that can help me with this? I am also not sure if I can share the read only object.
Parallel::ForkManager
The problem with this is that it forks processes rather than threads and is increasing the time of execution in my case.
I have the same question posted here : http://perlmonks.com/?node_id=1182517

Does join in perl threads block SIGALRM?

I have a small sample program that hangs on perl 5.16.3. I am attempting to use an alarm to trigger if two threads don't finish working in time in a much more complicated program, but this boils down the gist of it. I know there's plenty of other ways to do this, but for the sake of argument, let's say I'm stuck with the code the way it is. I'm not sure if this is a bug in perl, or something that legitimately shouldn't work.
I have researched this on the Internet, and it seems like mixing alarms and threads is generally discouraged, but I've seen plenty of examples where people claim that this is a perfectly reasonable thing to do, such as this other SO question, Perl threads with alarm. The code provided in the accepted answer on that question also hangs on my system, which is why I'm wondering if maybe this is something that's now broke, at least as of 5.16.3.
It appears that in the code below, if I call join before the alarm goes off, the alarm never triggers. If I replace the join with while(1){} and go into a busy-wait loop, then the alarm goes off just fine, so it appears that join is blocking the SIGALRM for some reason.
My expectation is that the join happens, and then a few seconds later I see "Alarm!" printed on the screen, but this never happens, so long as that join gets called before the alarm goes off.
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
sub worker {
print "Worker thread started.\n";
while(1){}
}
my $thread = threads->create(\&worker);
print "Setting alarm.\n";
$SIG{ALRM} = sub { print "Alarm!\n" };
alarm 2;
print "Joining.\n";
$thread->join();
The problem has nothing to do with threads. Signals are only processed between Perl ops, and join is written in C, so the signal will only be handled when join returns. The following demonstrates this:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
sub worker {
print "Worker thread started.\n";
for (1..5) {
sleep(1);
print(".\n");
}
}
my $thread = threads->create(\&worker);
print "Setting alarm.\n";
$SIG{ALRM} = sub { print "Alarm!\n" };
alarm 2;
print "Joining.\n";
$thread->join();
Output:
Setting alarm.
Joining.
Worker thread started.
.
.
.
.
.
Alarm!
join is essentially a call to pthread_join. Unlike other blocking system calls, pthread_join does not get interrupted by signals.
By the way, I renamed $tid to $thread since threads->create returns a thread object, not a thread id.
I'm going to post an answer to my own question to add some detail to ikegami's response above, and summarize our conversation, which should save future visitors from having to read through the huge comment trail it collected.
After discussing things with ikegami, I went and did some more reading on perl signals, consulted some other perl experts, and discovered the exact reason why join isn't being "interrupted" by the interpreter. As ikegami said, signals only get delivered in between perl operations. In perl, this is called Deferred Signals, or Safe Signals.
Deferred Signals were released in 5.8.0, back in 2002, which could be one of the reasons I was seeing older posts on the Net which don't appear to work. They probably worked with "unsafe signals", which act more like signal delivery that we're used to in C. In fact, as of 5.8.1, you can turn off deferred signal delivery by setting the environment variable PERL_SIGNALS=unsafe before executing your script. When I do this, the threads::join call is indeed interrupted as I was expecting, just as pthread_join is interrupted in C in this same scenario.
Unlike other I/O operations, like read, which returns EINTR when a signal interrupts it, threads::join doesn't do this. Under the hood it's a call to the C library call pthread_join, which the man page confirms does not return EINTR. Under deferred signals, when the interpreter gets the SIGALRM, it schedules delivery of the signal, deferring it, until the threads::join->pthread_join library call returns. Since pthread_join doesn't "interrupt" and return EINTR, my SIGALRM is effectively being swallowed by the threads::join. With other I/O operations, they would "interrupt" and return EINTR, giving the perl interpreter a chance to deliver the signal and then restart the system call via SA_RESTART.
Obviously, running in unsafe signals mode is probably a Bad Thing, so as an alternative, according to perlipc, you can use the POSIX module to install a signal handler directly via sigaction. This then makes the one particular signal "unsafe".

Perl threading model

I have 100+ tasks to do, I can do it in a loop, but that will be slow
I want to do these jobs by threading, let's say, 10 threads
There is no dependency between the jobs, each can run independently, and stop if failed
I want these threads to pick up my jobs and do it, there should be no more than 10 threads in total, otherwise it may harm the server
These threads keep doing the jobs until all finished
Stop the job in the thread when timeout
I was searching information about this on the Internet, Threads::Pool, Threads::Queue...
But I can't be sure on which one is better for my case. Could anyone give me some advise?
You could use Thread::Queue and threads.
The IPC (communication between threads) is much easier tan between processes.
To fork or not to fork?
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
my #thrs = threads->create(sub {
while (my $item = $q->dequeue()) {
# Do work on $item
}
})->detach() for 1..10;#for 10 threads
my $dbh = ...
while (1){
#get items from db
my #items = get_items_from_db($dbh);
# Send work to the thread
$q->enqueue(#items);
print "Pending items: "$q->pending()."\n";
sleep 15;#check DB in every 15 secs
}
I'd never use perl threads. The reason is that they aren't conceptually speaking threads: you have to specify what data is to be shared between the threads. Each thread runs a perl interpreter. That's why they are called interpreterthreads or ithreads. Needless to say, this consumes alot of memory all for running things in parallel. fork() shares al the memory up until the fork point. So if they are independent tasks, always use fork. It's also the most Unix way of doing things.

why isn't my thread joinable in perl?

I wrote a very short script with Perl and I used multi-thread in it.
My problem is, the thread I created is not joinable. So I am wondering, what is the condition to make thread joinable?
What is the limit of a thread in Perl?
#!/usr/bin/env perl
#
#
use lib "$::XCATROOT/lib/perl";
use strict;
use threads;
use Safe;
sub test
{
my $parm = shift;
}
my $newchassis = ["1", "2", "3"];
my #snmp_threads ;
for my $item (#$newchassis)
{
my $thread = threads->create(\&test, $item);
push #snmp_threads, $thread;
}
for my $t (#snmp_threads)
{
$t->join();
}
This can be very tricky as it works find on RHEL 6.3 and but fails on SLES 11sp2.
Though there is no code, i will go ahead and assume that you are using join foreach #threads; for joining the threads. Now the joining of the threads depends on the post processing. Without seeing your code it's difficult to know, what you are doing. But how it works is that :
If the post-processing step needs all threads to finish before
beginning work, then the wait for individual threads is unavoidable.
If the post-processing step is specific to the results of each
thread, it should be possible to make the post-processing part of
the thread itself.
In both cases, $_->join foreach #threads; is the way to go.
If there is no need to wait for the threads to finish, use the
detach command instead of join. However, any results that the
threads may return will be discarded.
Are you sure, you have provided a valid post processing scenario for your activity ?

Perl parallel HTTP requests - out of memory

First of all, I'm new to Perl.
I want to make multiple (e.g. 160) HTTP GET requests on a REST API in Perl. Executing them one after another takes much time, so I was thinking of running the requests in parallel. Therefore I used threads to execute more requests at the same time and limited the number of parallel requests to 10.
This worked just fine for the first time I ran the program, the second time I ran 'out of memory' after the 40th request.
Here's the code: (#urls contains the 160 URLs for the requests)
while(#urls) {
my #threads;
for (my $j = 0; $j < 10 and #urls; $j++) {
my $url = shift(#urls);
push #threads, async { $ua->get($url) };
}
for my $thread (#threads) {
my $response = $thread->join;
print "$response\n";
}
}
So my question is, why am I NOT running out of memory the first time but the second time (am I missing something crucial in my code)? And what can I do to prevent it?
Or is there a better way of executing parallel GET requests?
I'm not sure why you would get a OOM error on a second run when you don't get one on the first run; when you run a Perl script and the perl binary exits, it'll release all of it's memory back to the OS. Nothing is kept between executions. Is the exact same data being returned by the REST service each time? Maybe there's more data the second time you run and it's pushing you over the edge.
One problem I notice is that you're launching 10 threads and running them to completion, then spawning 10 more threads. A better solution may be a worker-thread model. Spawn 10 threads (or however many you want) at the start of the program, put the URLs into a queue, and allow the threads to process the queue themselves. Here's a quick example that may help:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new();
my #thr = map {
threads->create(sub {
my #responses = ();
while (defined (my $url = $q->dequeue())) {
push #responses, $ua->get($url);
}
return #responses;
});
} 1..10;
$q->enqueue($_) for #urls;
$q->enqueue(undef) for 1..10;
foreach (#thr) {
my #responses_of_this_thread = $_->join();
print for #responses_of_this_thread;
}
Note, I haven't tested this to make sure it works. In this example, you create a new thread queue and spawn up 10 worker threads. Each thread will block on the dequeue method until there is something to be read. Next, you queue up all the URLs that you have, and an undef for each thread. The undef will allow the threads to exit when there is no more work to perform. At this point, the threads will go through and process the work, and you will gather the responses via the join at the end.
Whenever I need an asynchronous solution Perl, I first look at the POE framework. In this particular case I used POE HTTP Request module that will allow us to send multiple requests simultaneously and provide a callback mechanism where you can process your http responses.
Perl threads are scary and can crash your application, especially when you join or detach them. If responses do not take a long time to process, a single threaded POE solution would work beautifully.
Sometimes though, we have to a rely on threading because application gets blocked due to long running tasks. In those cases, I create a certain number of threads BEFORE initiating anything in the application. Then with Thread::Queue I pass the data from the main thread to these workers AND never join/detach them; always keep them around for stability purposes.
(Not an ideal solution for every case.)
POE supports threads now and each thread can run a POE::Kernel. The kernels can communicate with each other through TCP sockets (which POE provides nice unblocking interfaces).

Resources