Where do I have to undefine queue when using multithread - multithreading

I have a script that creates a queue and some workers that are reading its jobs from the queue. My problem is now that the script does not terminate and call printData() because the threads are idling. And this because I have not set the queue to undef.
I have tried many different ways but all lead to various problems.
Either queue was terminated although there were still jobs in the queue
Or there were no jobs in the queue at the moment although there was still a thread working and trying to push new work into the queue.
I use the following code
# -------------------------
# Main
# -------------------------
my #threads = map threads->create(\&doOperation), 1 .. $maxNumberOfParallelJobs;
pullDataFromDbWithDirectory($directory);
#$worker->enqueue((undef) x $maxNumberOfParallelJobs);
$_->join for #threads;
sub pullDataFromDbWithDirectory {
my $_dir = $_[0];
if ($itemCount <= $maxNumberOfItems) {
my #retval = grep { /^Dir|^File/ } qx($omnidb -filesystem $filesystem '$label' -listdir '$_dir');
foreach my $item (#retval) {
$itemCount++;
(my $filename = $item) =~ s/^File\s+|^Dir\s+|\n//g;
my $file = "$_dir/$filename";
push(#data,$file);
if ($item =~ /^Dir/) {
$worker->enqueue($file);
print "Add $file to queue\n" if $debug;
}
}
}
}
sub doOperation () {
my $ithread = threads->tid();
do {
my $folder = $worker->dequeue();
print "Read $folder from queue with thread $ithread\n" if $debug;
pullDataFromDbWithDirectory($folder);
} while ($worker->pending());
push(#IDLE_THREADS,$ithread);
}
EDIT:
I found an ugly solution. Maybe there is better ones? I add the workers into an IDLE array and sleep until all the workers are in there
sleep 0.01 while (scalar #IDLE_THREADS < $maxNumberOfParallelJobs);
$worker->enqueue((undef) x $maxNumberOfParallelJobs);
$_->join for #threads;

You can't use ->pending() without having threads die off prematurely. Fix:
my $busy: shared = $num_workers;
sub pullDataFromDbWithDirectory {
my $tid = threads->tid();
while (defined( my $folder = $q->dequeue() )) {
{ lock $busy; ++$busy; }
print "Worker thread $tid processing folder $folder.\n" if $debug;
pullDataFromDbWithDirectory($folder);
{ lock $busy; --$busy; }
}
print "Worker thread $tid exiting.\n" if $debug;
}
sleep 0.01 while $q->pending || $busy;
$worker->end();
$_->join for #threads;
But that introduces a race condition.
A worker thread dequeues the last item currently in the queue
main thread checks pending (false)
main thread checks number of busy threads (none)
main thread signals workers to end
All other worker threads exit.
The worker that dequeued the item above marks itself busy
The worker starts processing last item, tries to adding a bunch of items in the queue and fails.
The dequeuing plus the busy incrementing needs to be atomic, and the pending check plus the busy check needs to be atomic.
That's not possible to do without changing Thread::Queue. You can't just throw a lock around those two piece of code, cause that would prevent the master from checking if all of the threads are idle when one of them is idle.
We need to split ->dequeue into its waiting component and its dequeuing component. We have the latter (->dequeue_nb), so we just need the former.
use Thread::Queue 3.01;
sub T_Q_wait {
my $self = shift;
lock(%$self);
my $queue = $$self{'queue'};
my $count = #_ ? $self->_validate_count(shift) : 1;
# Wait for requisite number of items
cond_wait(%$self) while ((#$queue < $count) && ! $$self{'ENDED'});
cond_signal(%$self) if (#$queue);
return !$$self{'ENDED'};
}
Now we can write the solution:
my $busy: shared = 0;
sub pullDataFromDbWithDirectory {
my $tid = threads->tid();
WORKER_LOOP:
while (T_Q_wait($q)) {
my $folder;
{
lock $busy;
$folder = $q->dequeue_nb();
next WORKER_LOOP if !defined($folder);
++$busy;
}
print "Worker thread $tid processing folder $folder.\n" if $debug;
pullDataFromDbWithDirectory($folder);
{
lock $busy;
--$busy;
cond_signal($busy) if !$busy;
}
}
}
{
lock $busy;
cond_wait($busy) while $busy;
$q->end();
$_->join() for threads->list();
}
The next is there in case another thread snagged the work between wait and dequeue_nb.

Related

Scalars leaked: -2 Scalars leaked: 2 warning in a multi threaded perl script

During the end of my multi threaded perl script, I get error like below. The number changes from time to time.
Scalars leaked: -2
Scalars leaked:2
What could be the cause of this problem? Are they just warnings?
I have created my threads in the following way:
our $threads1=3;
our $threads2=3;
for(my $i = 0; $i<$threads1; $i++)
{
$threadpool1[$i] = threads->create( \&sub1, $arg1, $arg2 , $arg2, $threads1, $threads2);
}
#Add work to queue1
foreach my $work (keys %{$workobj})
{
$queue1->enqueue( $work );
}
for(my $i = 0; $i<$threads2; $i++)
{
$threadpool2[$i] = threads->create( \&sub2, $arg1 , $arg2);
}
#Wait until worker threads complete the work
$_->join for #threadpool1;
$_->join for #threadpool2;
sub sub1($arg1, $arg2 , $arg2, $threads1, $threads2)
{
while($queue1->dequeue)
{
#do some work
#send work to queue 2
$queue2->enqueue(work);
$queue1->enqueue(undef x threads1);
}
# if all work has been sent to second queue, send undef to second set of threads
$queue1->enqueue(undef x $threads2);
return;
}
sub sub1($arg1, $arg2)
{
while($queue2->dequeue)
{
#do some work
}
return;
}
Any ideas on where I am going wrong?

PERL - Multithreading - Children + grandchildren - limit of threads running

I have made a Perl script where I create threads (limited in terms of threads running in the meantime) and each threads create its own children which should be also limited in number.
Where I host my script, I cannot launch more than X threads per Perl script in the meantime. In the below example, I have X = 3 x 7 = 21 threads maximum in the meantime.
3 for the 1st job ($nb_process_first)
7 for the 2nd job ($nb_process_second)
Questions:
Is there a better way to manage threads and their children? (queues for example - could you please bring me some code example because I have tried with no success)
My current script is not terminating with all the threads joined, although I use a loop on all running threads to join them (cf. at the end of the script).
#!/usr/bin/perl -s
use threads;
my #threads;
my $nb_process_first = 3;
my #running = ();
print "START" . "\n";
$current = 1;
while ( $current <= 10 ) {
#running = threads->list(threads::running);
if ( scalar #running < $nb_process_first ) {
print "Launch firstJob=" . scalar #running . "\n";
my $thread = threads->create( \&firstJob );
push( #threads, $thread );
} else {
redo;
}
$current++;
}
my #joinable = threads->list(threads::joinable);
while ( scalar #joinable != 0 ) {
foreach my $thr ( threads->list() ) {
$thr->join();
}
#joinable = threads->list(threads::joinable);
}
print "END" . "\n";
sub secondJob {
for ( $i = 0; $i <= 15; $i++ ) {
print "secondJob=" . $i . "\n";
sleep 1;
}
threads->exit();
}
sub firstJob {
my $nb_process_second = 7;
my #running = ();
$current = 1;
while ( $current <= 10 ) {
#running = threads->list(threads::running);
if ( scalar #running < $nb_process_second ) {
print "firstJob/Launch secondJob=" . scalar #running . "-" . $current . "\n";
my $secondthread = threads->create( \&secondJob );
push( #threads, $secondthread );
sleep 2;
}
$current++;
}
threads->exit();
}
Thread::Queue is a handy model for basic 'worker thread' model of threaded code.
It goes a bit like this:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $firstworkitem_q = Thread::Queue -> new();
my $secondworkitem_q = Thread::Queue -> new();
my $nthreads = 10;
sub first_worker {
while ( my $item = $firstworkitem_q -> dequeue() ) {
print "First worker picked up $item, and queues it to second worker\n";
$secondworkitem_q -> enqueue ( $item );
}
}
sub second_worker {
while ( my $item = $secondworkitem_q -> dequeue() ) {
print "Second worker got $item";
}
}
my #first_workers;
for ( 1..$nthreads ) {
my $thr = threads -> create ( \&first_worker );
push ( #first_workers, $thr );
}
for ( 1..$nthreads ) {
my $thr = threads -> create ( \&second_worker );
}
$firstworkitem_q -> enqueue ( #things_to_processs );
$firstworkitem_q -> end;
foreach my $firstworker ( #first_workers ) {
$firstworker -> join();
}
#here all the first workers have finished, so we know nothing will be queued to second work queue.
$secondworkitem_q -> end();
foreach my $thr ( threads -> list() ) {
$thr -> join();
}
You stuff things into the queue, and iterate on it for processing. When you end the queue, the while loop gets an undef and thus terminates - making your thread joinable.
You don't need to track #running the way you do, because threads -> list() will do that. And more importantly - you'd need to make #running a shared variable and lock it, because otherwise you've got a different copy of it in each thread.
Having firstJob spawn secondJob I'd steer away from, because it can create all manner of fruity bugs. I'd suggest spawning two classes of worker thread. Use $queue -> end() to trigger the first set of workers to close.
As for your second question, threads are only joinable if they have finished running (see this answer). Because some of the threads aren't done when the second while loop runs, it finishes without joining them.
Your loop should wait based on the number of active threads, not the number of joinable threads. Something like this:
while (threads->list() > 0)
{
foreach my $joinable (threads->list(threads::joinable))
{
$joinable->join();
}
}
As for the first question, there are certainly other ways to manage threads. However, it is not possible to say what you should do without knowing your task.

In Perl, how can a child thread signal to the main thread that no more threads should be created?

I'm having some problems in returning a value from a thread in perl.
The code I'm using is this:
use threads;
foreach $num(1 .. 100)
{
push(#threads, threads->create (\&readnum, $num));
sleep(1) while(scalar threads->list(threads::running) >= 10);
}
$_->join foreach #threads;
sub readnum {
# some code here
}
so I want to return a value from readnum i.e:
use threads;
foreach $num(1 .. 100)
{
if($ok)
{
push(#threads, threads->create (\&readnum, $num));
sleep(1) while(scalar threads->list(threads::running) >= 10);
}
}
$_->join foreach #threads;
sub readnum {
# some code here
return $ok ? "1" : "0";
}
So I want to check the value of $ok if it's true it'll create a new thread.
edit:
what i want is to check for $ok value, if it's true it'll creat a new thread and keep progress else it stop. the same idea without threads :
foreach $num(1 .. 100)
{
$ok = readnum($num);
print "runing\n";
die "stoped\n" if $ok eq 1;
}
sub readnum {
# some code here
$_[0]/5 eq 2 ? return 1 : return 0;
}
but with thread i can't put the returned value in $ok.
hope it's clear now. thanks
What you are trying to do is not return a value from a thread, but have one thread the ability to signal to the main thread that no more threads should be created. You can do that by creating a shared global. In the example below, there is a 1/20 chance that a random thread will decide that processing should be stopped.
There may be other, better ways of dealing with your specific problem (for example, each thread pushing results to a shared array, and the main thread checking how many results are there etc), but this seems to match your situation.
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use threads::shared;
my $KEEP_GOING :shared;
$KEEP_GOING = 1;
my #threads;
THREAD:
for my $num (1 .. 100) {
sleep 1 while threads->list(threads::running) >= 10;
{
lock $KEEP_GOING;
if($KEEP_GOING) {
push #threads, threads->create(\&readnum, $num);
}
else {
print "Won't create thread for $num .. Goodbye!\n";
last THREAD;
}
}
}
$_->join for #threads;
sub readnum {
my $num = shift;
printf "Thread id: %d\tnum = %d\n", threads->tid, $num;
sleep 1 + rand(3);
{
lock $KEEP_GOING;
if (0.05 > rand) {
$KEEP_GOING = 0;
}
}
}
Output:
Thread id: 1 num = 1
Thread id: 2 num = 2
Thread id: 3 num = 3
Thread id: 4 num = 4
Thread id: 5 num = 5
Thread id: 6 num = 6
Thread id: 7 num = 7
Thread id: 8 num = 8
Thread id: 9 num = 9
Thread id: 10 num = 10
Won't create thread for 11 .. Goodbye!

How can I share a common thread pool amongst objects in Perl?

I've been trying to extend the first answer at Perl Monks (http://www.perlmonks.org/?node_id=735923) to a threaded model to no avail. I keep getting issues with not being able to pass a coderef
In my superclass I define the threadpool as a package variable so it can be shared amongst the subclasses:
package Things::Generic;
my $Qwork = new Thread::Queue;
my $Qresults = new Thread::Queue;
my #pool = map { threads->create(\&worker, $Qwork, $Qresults) } 1..$MAX_THREADS;
sub worker {
my $tid = threads->tid;
my( $Qwork, $Qresults ) = #_;
while( my $work = $Qwork->dequeue ) {
my $result = $work->process_thing();
$Qresults->enqueue( $result );
}
$Qresults->enqueue( undef ); ## Signal this thread is finished
}
sub enqueue {
my $self = shift;
$Qwork->enqueue($self);
}
sub new {
#Blessing and stuff
}
.
.
Now for the subclasses. It is guaranteed that they have a process_thing() method.
package Things::SpecificN;
use base qw (Things::Generic);
sub new() {
#instantiate
}
sub do_things {
my $self = shift;
#enqueue self into the shared worker pool so that "process_thing" is called
$self->enqueue();
}
sub process_thing() {
#Do some work here
return RESULT;
}
#
Main
my #things;
push #things, Things::Specific1->new();
push #things, Things::Specific2->new();
.
.
push #things, Things::SpecificN->new();
#Asynchronously kick off "work"
foreach my $thing (#things) {
$thing->do_things();
}
My goal is to put a list of "work" on the queue. Each thread will pull work from the queue and execute it, no matter what it. Each Thing has it's own unique work, however the function to do the work will be guaranteed to be called "process_thing". I just want the thread pool to grab an entry from the queue and do the "something". I think I am describing functionality similar to Android AsyncTask.
My Perl is not high enough for Thread::Queue::Any
$Qwork->enqueue($self); instead of $self->enqueue();

Controling ther max no of threads running simultaneously at a given time in perl

I have an array which contains a list of file #arr=(a.txt,b.txt,c.txt);
I am iterating the array and processing the files with foreach loop; each line of the file will generate a sql and will run on the DB server.
I want to create one thread with each line of the file and query the DB. I also want to control the max no of threads at a time running simultaneously.
You can use a Thread::Pool based system. Or any Boss/Worker model based system.
That's just a simple worker model, an ideal scenario. No problem.
use threads;
use Thread::Queue qw( );
use constant NUM_WORKERS => 5;
sub work {
my ($dbh, $job) = #_;
...
}
{
my $q = Thread::Queue->new();
my #threads;
for (1..NUM_WORKERS) {
push #threads, async {
my $dbh = ...;
while (my $job = $q->dequeue())
work($dbh, $job);
}
};
}
while (<>) {
chomp;
$q->enqueue($_);
}
$q->enqueue(undef) for 1..#threads;
$_->join() for #threads;
}
Pass the file names to the script as arguments, or assign them to #ARGV within the script.
local #ARGV = qw( a.txt b.txt c.txt );
Interesting I manually control how many threads to run. I use Hash of the thread id
[code snip]
my %thr; #my hashes for threads
$count=1;
$maxthreads=5;
while (shift (#data) {
$syncthr = threads->create(sub {callfunction here}, {pass variables});
$tid = $syncthr->tid; #get the thread ID
$thr{$tid} = $syncthr;
if ($count >= $maxthreads) {
threads->yield();
while (1) { # loop until threads are completed
$num_run_threads = keys (%thr);
foreach $one_thread ( keys %thr ) {
if ($thr{$one_thread}->is_running() ) { # if thread running check for error state
if ($err = $thr{$one_thread}->error() } {
[ do stuff here]
}
# otherwise move on to next thread check
} else { # thread is either completed or has error
if ($err = $thr{$one_thread}->error()) {
[ check for error again cann't hurt to double check ]
}
if ($err = $thr{$one_thread}->join()) {
print "Thread completed id: [$one_thread]\n";
}
delete $thr{$one_thread}; # delete the hash since the thread is no more
$num_run_threads = $num_run_threads - 1; # reduce the number of running threads
}
} # close foreach loop
#threads = threads->list(threads::running); # get threads
if ($num_run_threads < $maxthreads ) {
$count = $num_run_threads; # reset the counter to number of threads running
if ( $#data != -1 ) { # check to make sure we still have data
last; # exit the infinite while loop
} else {
if (#threads) {
next; # we still have threads continue with processing
} else {
{ no more threads to process exit program or do something else }
}
} # end else
} # end threads running
} # end the while statement
#Check the threads to see if they are joinable
undef #threads;
#threads = threads->joinable()
if (#threads) {
foreach $mthread(#threads) {
if ($mthreads != 0) {
$thr->join();
}
} #end foreach
} #end #threads
} #end the if statement
$count++; Increment the counter to get to number of max threads to spawn
}
This is by no means a complete program. Furthermore, I have changed it to be very bland. However, I've been using this for a while with success. Especially in the OO Perl. This works for me and have quite a lot of uses. I maybe missing a few more error checking especially with timeout but I do that in the thread itself. Which by the way the thread is actually a sub routine that I am calling.

Resources