Perl threads stop working suddenly after long runs - multithreading

I have a perl script, that goes through a queue of files and process files in different threads:
Below is a snippet after filling the files queue.
The problem is, sometimes when I leave it running for too long (12-24 hours), I come back to see the script hanging.
I have log files for each thread, I see that the last time stamp of activity is always the same for all of them. But the script does not exit, meaning that the threads do not give back the semaphore.
Also I am sure that the threads did not end properly, because I can still see the queue is full of files to be handled.
I have a log before and after calling the EXE, and the last trace is always the one after calling the EXE. Cannot find a rational explanation.
#initiate all threads
for( my $i = 0; $i < $max_thread; $i++ )
{
my $my_thread = threads->new( sub { start() } );
push( #Threads, $my_thread );
}
$semaphore->down($max_thread);
terminate();
sub start
{
$SIG{INT} = sub { thread_exit() };
while( (my $file = $file_queue->dequeue_nb) )
{
#This function calls an external EXE
processFile( $file );
}
#Thread ended
$semaphore->up();
}

If an uncaught exception is thrown in your thread, the semaphore will never be "upped". It looks like you are using the semaphore to wait for the thread to exit, which can be done more safely using $thread->join.
That's assuming you are correct about the thread having exited. It could also be that the thread has stalled (e.g. deadlocked) or entered an infinite loop.

Related

How to get started multithreading in Perl

I have a perl program that takes over 13 hours to run. I think it could benefit from introducing multithreading but I have never done this before and I'm at a loss as to how to begin.
Here is my situation:
I have a directory of hundreds of text files. I loop through every file in the directory using a basic for loop and do some processing (text processing on the file itself, calling an outside program on the file, and compressing it). When complete I move on to the next file. I continue this way doing each file, one after the other, in a serial fashion. The files are completely independent from each other and the process returns no values (other than success/failure codes) so this seems like a good candidate for multithreading.
My questions:
How do I rewrite my basic loop to take advantage of threads? There appear to be several moduals for threading out there.
How do I control how many threads are currently running? If I have N cores available, how do I limit the number of threads to N or N - n?
Do I need to manage the thread count manually or will Perl do that for me?
Any advice would be much appreciated.
Since your threads are simply going to launch a process and wait for it to end, best to bypass the middlemen and just use processes. Unless you're on a Windows system, I'd recommend Parallel::ForkManager for your scenario.
use Parallel::ForkManager qw( );
use constant MAX_PROCESSES => ...;
my $pm = Parallel::ForkManager->new(MAX_PROCESSES);
my #qfns = ...;
for my $qfn (#qfns) {
my $pid = $pm->start and next;
exec("extprog", $qfn)
or die $!;
}
$pm->wait_all_children();
If you wanted you avoid using needless intermediary threads in Windows, you'd have to use something akin to the following:
use constant MAX_PROCESSES => ...;
my #qfns = ...;
my %children;
for my $qfn (#qfns) {
while (keys(%children) >= MAX_PROCESSES) {
my $pid = wait();
delete $children{$pid};
}
my $pid = system(1, "extprog", $qfn);
++$children{$pid};
}
while (keys(%children)) {
my $pid = wait();
delete $children{$pid};
}
Someone's given your a forking example. Forks aren't native on Windows, so I'd tend to prefer threading.
For the sake of completeness - here's a rough idea of how threading works (and IMO is one of the better approaches, rather than respawning threads).
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $nthreads = 5;
my $process_q = Thread::Queue->new();
my $failed_q = Thread::Queue->new();
#this is a subroutine, but that runs 'as a thread'.
#when it starts, it inherits the program state 'as is'. E.g.
#the variable declarations above all apply - but changes to
#values within the program are 'thread local' unless the
#variable is defined as 'shared'.
#Behind the scenes - Thread::Queue are 'shared' arrays.
sub worker {
#NB - this will sit a loop indefinitely, until you close the queue.
#using $process_q -> end
#we do this once we've queued all the things we want to process
#and the sub completes and exits neatly.
#however if you _don't_ end it, this will sit waiting forever.
while ( my $server = $process_q->dequeue() ) {
chomp($server);
print threads->self()->tid() . ": pinging $server\n";
my $result = `/bin/ping -c 1 $server`;
if ($?) { $failed_q->enqueue($server) }
print $result;
}
}
#insert tasks into thread queue.
open( my $input_fh, "<", "server_list" ) or die $!;
$process_q->enqueue(<$input_fh>);
close($input_fh);
#we 'end' process_q - when we do, no more items may be inserted,
#and 'dequeue' returns 'undefined' when the queue is emptied.
#this means our worker threads (in their 'while' loop) will then exit.
$process_q->end();
#start some threads
for ( 1 .. $nthreads ) {
threads->create( \&worker );
}
#Wait for threads to all finish processing.
foreach my $thr ( threads->list() ) {
$thr->join();
}
#collate results. ('synchronise' operation)
while ( my $server = $failed_q->dequeue_nb() ) {
print "$server failed to ping\n";
}
If you need to move complicated data structures around, I'd recommend having a look at Storable - specifically freeze and thaw. These will let you shuffle around objects, hashes, arrays etc. easily in queues.
Note though - for any parallel processing option, you get good CPU utilisation, but you don't get more disk IO - that's often a limiting factor.

How to pause Thread::Queue access while waiting for prior threads to finish up?

So this is what I have right now:
for my $action (#actionList){
$q->enqueue([$_, $action]) for #component_dirs;
print "\nWaiting for prior actions to finish up...\n";
until (!defined($q->peek())) {}
}
$q->end();
$_->join() for threads->list();
But this doesn't seem to work.. is there a better way to force the queue to wait for previous $action items to complete before allowing access again?
edit: Oddly enough, it's magically started working... maybe it was working all along and I just didn't make the output apparent enough. Either way, my question still stands - is there a better way?
Your code doesn't wait until the previous action has completed, it just wastes CPU until another thread starts working on the last job.
For things like “flags”, you should generally use semaphores instead. Semaphores are thread-safe counters with up and down methods. For example, we could pass a semaphore along with the job, which starts with count zero. Each thread increments the semaphore when it finishes a job. Our main thread tries to decrement the semaphore by the count of jobs, which will block until all threads have finished:
my $q = Thread::Queue->new;
my #workers = map { threads->create(\&worker, $q) } 1 .. $NUM_WORKERS;
for my $action (#actionList) {
my $sem = Thread::Semaphore->new(0);
$q->enqueue([$_, $action, $sem]) for #component_dirs;
$sem->down(0+#component_dirs); # wait for the threads
}
$q->end;
$_->join for #workers;
sub worker {
my ($q) = #_;
while (my $job = $q->dequeue) {
my ($component, $action, $sem) = #$job;
...
$sem->up;
}
}
Actually, we could reuse the semaphore.
See the Thread::Semaphore docs for more details.
This usage is similar to barriers.

perl system call causing hang when using threads

I am a newbie to perl, so please excuse my ignorance. (I'm using windows 7)
I have borrowed echicken's threads example script and wanted to use it as a basis for a script to make a number of system calls, but I have run into an issue which is beyond my understanding. To illustrate the issue I am seeing, I am doing a simple ping command in the example code below.
$nb_process is the number or simultaneous running threads allowed.
$nb_compute as the number of times we want to run the sub routine (i.e the total number of time we will issue the ping command).
When I set $nb_compute and $nb_process to be same value as each other, it works perfectly.
However when I reduce $nb_process (to restrict the number of running threads at any one time), it seems to lock once the number of threads defined in $nb_process have started.
It works fine if I remove the system call (ping command).
I see the same behaviour for other system calls (it'd not just ping).
Please could someone help? I have provided the script below.
#!/opt/local/bin/perl -w
use threads;
use strict;
use warnings;
my #a = ();
my #b = ();
sub sleeping_sub ( $ $ $ );
print "Starting main program\n";
my $nb_process = 3;
my $nb_compute = 6;
my $i=0;
my #running = ();
my #Threads;
while (scalar #Threads < $nb_compute) {
#running = threads->list(threads::running);
print "LOOP $i\n";
print " - BEGIN LOOP >> NB running threads = ".(scalar #running)."\n";
if (scalar #running < $nb_process) {
my $thread = threads->new( sub { sleeping_sub($i, \#a, \#b) });
push (#Threads, $thread);
my $tid = $thread->tid;
print " - starting thread $tid\n";
}
#running = threads->list(threads::running);
print " - AFTER STARTING >> NB running Threads = ".(scalar #running)."\n";
foreach my $thr (#Threads) {
if ($thr->is_running()) {
my $tid = $thr->tid;
print " - Thread $tid running\n";
}
elsif ($thr->is_joinable()) {
my $tid = $thr->tid;
$thr->join;
print " - Results for thread $tid:\n";
print " - Thread $tid has been joined\n";
}
}
#running = threads->list(threads::running);
print " - END LOOP >> NB Threads = ".(scalar #running)."\n";
$i++;
}
print "\nJOINING pending threads\n";
while (scalar #running != 0) {
foreach my $thr (#Threads) {
$thr->join if ($thr->is_joinable());
}
#running = threads->list(threads::running);
}
print "NB started threads = ".(scalar #Threads)."\n";
print "End of main program\n";
sub sleeping_sub ( $ $ $ ) {
my #res2 = `ping 136.13.221.34`;
print "\n#res2";
sleep(3);
}
The main problem with your program is that you have a busy loop that tests whether a thread can be joined. This is wasteful. Furthermore, you could reduce the amount of global variables to better understand your code.
Other eyebrow-raiser:
Never ever use prototypes, unless you know exactly what they mean.
The sleeping_sub does not use any of its arguments.
You use the threads::running list a lot without contemplating whether this is actually correct.
It seems you only want to run N workers at once, but want to launch M workers in total. Here is a fairly elegant way to implement this. The main idea is that we have a queue between threads where threads that just finished can enqueue their thread ID. This thread will then be joined. To limit the number of threads, we use a semaphore:
use threads; use strict; use warnings;
use feature 'say'; # "say" works like "print", but appends newline.
use Thread::Queue;
use Thread::Semaphore;
my #pieces_of_work = 1..6;
my $num_threads = 3;
my $finished_threads = Thread::Queue->new;
my $semaphore = Thread::Semaphore->new($num_threads);
for my $task (#pieces_of_work) {
$semaphore->down; # wait for permission to launch a thread
say "Starting a new thread...";
# create a new thread in scalar context
threads->new({ scalar => 1 }, sub {
my $result = worker($task); # run actual task
$finished_threads->enqueue(threads->tid); # report as joinable "in a second"
$semaphore->up; # allow another thread to be launched
return $result;
});
# maybe join some threads
while (defined( my $thr_id = $finished_threads->dequeue_nb )) {
join_thread($thr_id);
}
}
# wait for all threads to be finished, by "down"ing the semaphore:
$semaphore->down for 1..$num_threads;
# end the finished thread ID queue:
$finished_threads->enqueue(undef);
# join any threads that are left:
while (defined( my $thr_id = $finished_threads->dequeue )) {
join_thread($thr_id);
}
With join_thread and worker defined as
sub worker {
my ($task) = #_;
sleep rand 2; # sleep random amount of time
return $task + rand; # return some number
}
sub join_thread {
my ($tid) = #_;
my $thr = threads->object($tid);
my $result = $thr->join;
say "Thread #$tid returned $result";
}
we could get the output:
Starting a new thread...
Starting a new thread...
Starting a new thread...
Starting a new thread...
Thread #3 returned 3.05652608754778
Starting a new thread...
Thread #1 returned 1.64777186731541
Thread #2 returned 2.18426146087901
Starting a new thread...
Thread #4 returned 4.59414651998983
Thread #6 returned 6.99852684265667
Thread #5 returned 5.2316971836585
(order and return values are not deterministic).
The usage of a queue makes it easy to tell which thread has finished. Semaphores make it easier to protect resources, or limit the amount of parallel somethings.
The main benefit of this pattern is that far less CPU is used, when contrasted to your busy loop. This also shortens general execution time.
While this is a very big improvement, we could do better! Spawning threads is expensive: This is basically a fork() without all the copy-on-write optimizations on Unix systems. The whole interpreter is copied, including all variables, all state etc. that you have already created.
Therefore, as threads should be used sparingly, and be spawned as early as possible. I already introduced you to queues that can pass values between threads. We can extend this so that a few worker threads constantly pull work from an input queue, and return via an output queue. The difficulty now is to have the last thread to exit finish the output queue.
use threads; use strict; use warnings;
use feature 'say';
use Thread::Queue;
use Thread::Semaphore;
# define I/O queues
my $input_q = Thread::Queue->new;
my $output_q = Thread::Queue->new;
# spawn the workers
my $num_threads = 3;
my $all_finished_s = Thread::Semaphore->new(1 - $num_threads); # a negative start value!
my #workers;
for (1 .. $num_threads) {
push #workers, threads->new( { scalar => 1 }, sub {
while (defined( my $task = $input_q->dequeue )) {
my $result = worker($task);
$output_q->enqueue([$task, $result]);
}
# we get here when the input queue is exhausted.
$all_finished_s->up;
# end the output queue if we are the last thread (the semaphore is > 0).
if ($all_finished_s->down_nb) {
$output_q->enqueue(undef);
}
});
}
# fill the input queue with tasks
my #pieces_of_work = 1 .. 6;
$input_q->enqueue($_) for #pieces_of_work;
# finish the input queue
$input_q->enqueue(undef) for 1 .. $num_threads;
# do something with the data
while (defined( my $result = $output_q->dequeue )) {
my ($task, $answer) = #$result;
say "Task $task produced $answer";
}
# join the workers:
$_->join for #workers;
With worker defined as before, we get:
Task 1 produced 1.15207098293783
Task 4 produced 4.31247785766295
Task 5 produced 5.96967474718984
Task 6 produced 6.2695013168678
Task 2 produced 2.02545636412421
Task 3 produced 3.22281619053999
(The three threads would get joined after all output is printed, so that output would be boring).
This second solution gets a bit simpler when we detach the threads – the main thread won't exit before all threads have exited, because it is listening to the input queue which is finished by the last thread.

Strange variable behaviour using Perl ithreads

I'm trying to implement a multithreaded application based on a slightly altered boss/worker model. Basically the main thread creates several boss threads, which in turn spawn two worker threads each (possibly more). That's because the boss threads deal with one host or network device each, and the worker threads could take a while to complete their work.
I'm using Thread::Pool to realize this concept, and so far it works quite well; I also don't think my problem is related to Thread::Pool (see below). Very simplified pseudocode ahead:
use strict;
use warnings;
my $bosspool = create_bosspool(); # spawns all boss threads
my $taskpool = undef; # created in each boss thread at
# creation of each boss thread
# give device jobs to boss threads
while (1) {
foreach my $device ( #devices ) {
$bosspool->job($device);
}
sleep(1);
}
# This sub is called for jobs passed to the $bosspool
sub process_boss
{
my $device = shift;
foreach my $task ( $device->{tasks} ) {
# process results as they become available
process_result() while ( $taskpool->results );
# give task jobs to task threads
scalar $taskpool->job($device, $task);
sleep(1); ### HACK ###
}
# process remaining results / wait for all tasks to finish
process_result() while ( $taskpool->results || $taskpool->todo );
# happy result processing
}
sub process_result
{
my $result = $taskpool->result_any();
# mangle $result
}
# This sub is called for jobs passed to the $taskpool of each boss thread
sub process_task
{
# not so important stuff
return $result;
}
By the way, the reason I'm not using the monitor()-routine is because I have to wait for all jobs in the $taskpool to finish. Now, this code works just wonderful, unless you remove the ### HACK ### line. Without sleeping, $taskpool->todo() won't deliver the right number of jobs still open if you add them or receive their results too "fast". Like, you add 4 jobs in total but $taskpool->todo() will only return 2 afterwards (with no pending results). This leads to all sorts of interesting effects.
OK, so Thread::Pool->todo() is crap, let's try a workaround:
sub process_boss
{
my $device = shift;
my $todo = 0;
foreach my $task ( $device->{tasks} ) {
# process results as they become available
while ( $taskpool->results ) {
process_result();
$todo--;
}
# give task jobs to task threads
scalar $taskpool->job($device, $task);
$todo++;
}
# process remaining results / wait for all tasks to finish
while ( $todo ) {
process_result();
sleep(1); ### HACK ###
$todo--;
}
}
This will also work fine, as long as I keep the ### HACK ### line. Without this line, this code will reproduce the problems of Thread::Pool->todo(), as $todo does not only get decremented by 1, but 2 or even more.
I've tested this code with only one boss thread, so there was basically no multithreading involved (when it comes to this subroutine). $bosspool, $taskpool and especially $todo aren't :shared, no side effects possible, right? What's happening in this subroutine, which gets executed by only one boss thread, with no shared variables, semaphores, etc.?
I would suggest that the best way to implement a 'worker' threads model, is with Thread::Queue. The problem with doing something like this, is figuring out when queues are complete, or whether items are dequeued and pending processing.
With Thread::Queue you can use a while loop to fetch elements from the queue, and end the queue, such that the while loop returns undef and the threads exit.
So you don't always need multiple 'boss' threads, you can just use multiple different flavours of worker and input queues. I would question why you need a 'boss' thread model in that instance. It seems unnecessary.
With reference to:
Perl daemonize with child daemons
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $nthreads = 4;
my #targets = qw ( device1 device2 device3 device4 );
my $task_one_q = Thread::Queue->new();
my $task_two_q = Thread::Queue->new();
my $results_q = Thread::Queue->new();
sub task_one_worker {
while ( my $item = task_one_q->dequeue ) {
#do something with $item
$results_q->enqueue("$item task_one complete");
}
}
sub task_two_worker {
while ( my $item = task_two_q->dequeue ) {
#do something with $item
$results_q->enqueue("$item task_two complete");
}
}
#start threads;
for ( 1 .. $nthreads ) {
threads->create( \&task_one_worker );
threads->create( \&task_two_worker );
}
foreach my $target (#targets) {
$task_one_q->enqueue($target);
$task_two_q->enqueue($target);
}
$task_one_q->end;
$task_two_q->end;
#Wait for threads to exit.
foreach my $thr ( threads->list() ) {
threads->join();
}
$results_q->end();
while ( my $item = $results_q->dequeue() ) {
print $item, "\n";
}
You could do something similar with a boss thread if you were desirous - you can create a queue per boss and pass it by reference to the workers. I'm not sure that it's necessary though.

Perl - Capturing an unexpected termination

I have some code that spawns 5 threads of itself, at a time. I detach those threads, but have a shared variable $THREADCOUNT that I increment at the beginning of the subprocedure that is called by the thread call, and decrement at the end of the thread. When $THREADCOUNT equals 0, I spawn another 5 times.
The problem is, sometimes the thread exits unexpectedly and the $THREADCOUNT never makes it to 0, so the program stops. Is there someway to capture an exit like this and have $THREADCOUNT-- on unexpected exit?
Thanks so much. This is my first post so appologies if it's a little unclear.
Chris
Is the shared $THREADCOUNT variable really necessary? A call to threads->list(threads::running) will tell you whether any of your spawn are still running.
I can't figure out when anyone would ever want to use detach. I'd use something like
use threads;
my %workers;
sub start_worker {
my $thread = threads->create(#_);
$workers{$thread->tid} = $thread;
return $thread;
}
sub collect_finished_workers {
for my $thread (threads->list(threads::joinable)) {
$thread->join()
if delete($workers{$threads->tid}); # Don't assume we own all threads.
}
}
sub get_worker_count {
collect_finished_workers();
return 0+keys(%workers);
}
Note that this fixes the problem in your code where a thread isn't counted as started for a short while after it has started.

Resources