Perl Multithread Program - multithreading

I am new in Perl. I want to write a Perl script using thread.I have few files say 20 files and want to process those files using 5 threads in 4 batches. I am printing the thread no. After completing one batch ,the thread no must start with 1 for the next batch. But instead of that its creating 20 threads.please help. my code is as follows:
#!/usr/bin/perl -w
use strict;
use warnings;
use threads;
use threads::shared;
my $INPUT_DIR="/home/Documents/myscript/IMPORTLDIF/";
opendir(DIR, $INPUT_DIR) ;
my #files = grep { /^InputFile/ } readdir DIR;
my $count = #files;
#print "Total Files: $count \n";
my #threads;
my $noofthread = 5;
my $nooffiles = $count;
my $noofbatch = $nooffiles / $noofthread;
#print "No of batch: $noofbatch \n";
my $fileIndex = 0;
my $batch = 1;
while ($fileIndex < $nooffiles) {
print "Batch: $batch \n";
for (my $i=0; $i < $noofthread && $fileIndex < $nooffiles ; $i++) {
my $t = threads->new(\&doOperation, $files[$fileIndex], $i)->join;
push(#threads, $t);
$fileIndex++;
print "FileIndex: $fileIndex \n";
}
$batch++;
}
sub doOperation () {
my $ithread = threads->tid() ;
print "Thread Index : [id=$ithread]\n" ;
foreach my $item (#_){
my $filename = $item;
print "Filename name: $filename \n";
}
Edited program using thread queue:
#!/usr/bin/perl -w
# This is compiled with threading support
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
my $INPUT_DIR="/home/Documents/myscript/IMPORTLDIF/";
opendir(DIR, $INPUT_DIR) or die "Cannot opendir: $!";
my #thrs = threads->create(\&doOperation ) for 1..5;#for 5 threads
#my #files = `ls -1 /home/Documents/myscript/IMPORTLDIF/`;
my #files = grep { /^Input/ } readdir DIR or die "File not present present. \n";
chomp(#files);
#add files to queue
foreach my $f (#files){
# Send work to the thread
$q->enqueue($f);
print "Pending items: " + $q->pending()."\n";
}
$q->enqueue('_DONE_') for #thrs;
$_->join() for #thrs;
sub doOperation () {
my $ithread = threads->tid() ;
while (my $filename = $q->dequeue()) {
# Do work on $item
return 1 if $filename eq '_DONE_';
print "[id=$ithread]\t$filename\n";
}
return 1;
}

You are spawning a thread and then waiting for it to complete before spawning the next, each thread handling one file. That is why you see as many threads as you have files.
my $t = threads->new(\&doOperation, $files[$fileIndex], $i)->join;
^^^^--- This will block
Instead try something like this:
....
# split the workload into N batches
#
while (my #batch = splice(#files, 0, $batch_size)) {
push #threads, threads->new(\&doOperation, #batch);
}
# now wait for all workers to finish
#
for my $thr (#threads) {
$thr->join;
}
As an aside, Thread::Queue and Thread-Pool might imply better designs for the work you want to do.

You could use Paralel:Queue and create 4 thread and pass them items that they could work on.
To fork or not to fork?
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
my #thrs;
push #thrs, threads->create(\&doOperation ) for 1..5;#for 5 threads
my #files = `ls -1 /tmp/`;chomp(#files);
#add files to queue
foreach my $f (#files){
# Send work to the thread
$q->enqueue($f);
print "Pending items: "$q->pending()."\n";
}
$q->enqueue('_DONE_') for #thrs;
$_->join() for threads->list();
sub doOperation () {
my $ithread = threads->tid() ;
while (my $filename = $q->dequeue()) {
# Do work on $item
return 1 if $filename eq '_DONE_';
print "[id=$ithread]\t$filename\n";
}
return 1;
}

Related

Perl - multithreading / fork / synchronize problem:

I'm trying to work as an example for the following code:
my $milon;
my $pid = fork();
die if not defined $pid;
if (not $pid)
{
$milon->{$pid} = $pid;
exit;
}
$milon->{3} = 4;
my $finished = wait();
print( "debug10: TEST = ", Dumper($milon));
output:
debug10: TEST = $VAR1 = {
'3' => 4
};
How do I make the dictionary keep both 3 => 4 and also the $pid => $pid?
It doesn't have to be forking, it could be multithreading or NonBlocking IO, whichever is better according to what you think.
This is an example of course, I just want to conclude from this example to my real code.
You need some memory that is shared between your threads/processes. The easiest is probably to use interpreter-based threads and threads::shared. For instance:
use threads;
use threads::shared;
my %milon :shared;
for (1 .. 2) {
threads->create(sub {
my $tid = threads->tid();
$milon{$tid} = $tid;
});
}
$milon{3} = 4;
$_->join for threads->list; # Wait for all threads to be done
print Dumper \%milon;
This outputs:
$VAR1 = {
'1' => 1,
'2' => 2,
'3' => 4
};
Following sample code demonstrates usage of fork() to compute square of a number from an array #numbers (total 100) in parallel execution.
REAPER function assigned to $SIG{CHLD} signal cleans up completed child processes to avoid zombie processes hanging around in process table.
Investigate if fork() approach will fit your problem/task.
use strict;
use warnings;
use POSIX qw(strftime :sys_wait_h);
use Time::HiRes qw(usleep);
my $limit = 10;
my $threads = $limit;
my #numbers = map { int(rand(100)) } 1..100;
sub REAPER {
local $!;
while( (my $pid = waitpid(-1, WNOHANG) ) > 0 && WIFEXITED($?) ) {
$threads++;
}
$SIG{CHLD} = \&REAPER;
}
$SIG{CHLD} = \&REAPER;
for ( #numbers ) {
while( $threads == 0 or $threads > $limit ) { usleep(1) }
my $pid = fork();
die $! unless defined $pid;
if( $pid ) {
# parent
$threads--;
} else {
# child
my $n = compute_square($_);
printf "Process %6d: %3d => %d\n", $$, $_, $n;
exit 0;
}
}
sub compute_square {
my $num = shift;
return $num*$num;
}

Scalars leaked: -2 Scalars leaked: 2 warning in a multi threaded perl script

During the end of my multi threaded perl script, I get error like below. The number changes from time to time.
Scalars leaked: -2
Scalars leaked:2
What could be the cause of this problem? Are they just warnings?
I have created my threads in the following way:
our $threads1=3;
our $threads2=3;
for(my $i = 0; $i<$threads1; $i++)
{
$threadpool1[$i] = threads->create( \&sub1, $arg1, $arg2 , $arg2, $threads1, $threads2);
}
#Add work to queue1
foreach my $work (keys %{$workobj})
{
$queue1->enqueue( $work );
}
for(my $i = 0; $i<$threads2; $i++)
{
$threadpool2[$i] = threads->create( \&sub2, $arg1 , $arg2);
}
#Wait until worker threads complete the work
$_->join for #threadpool1;
$_->join for #threadpool2;
sub sub1($arg1, $arg2 , $arg2, $threads1, $threads2)
{
while($queue1->dequeue)
{
#do some work
#send work to queue 2
$queue2->enqueue(work);
$queue1->enqueue(undef x threads1);
}
# if all work has been sent to second queue, send undef to second set of threads
$queue1->enqueue(undef x $threads2);
return;
}
sub sub1($arg1, $arg2)
{
while($queue2->dequeue)
{
#do some work
}
return;
}
Any ideas on where I am going wrong?

Are these two code different

foreach $thr (1..5)
{
$threads[$thr]=threads->create("worker");
}
and
foreach (1..5)
{
push #threads,threads->create("worker");
}
the latter works well, well the former gives warning.
#!/usr/bin/perl
use strict;
use threads;
use threads::shared;
use Thread::Queue;
my $queue = Thread::Queue->new();
my #threads;
my $thr;
#----------------------------------------create
#send work first,and then creat threads, the first thread work earlier.
$queue->enqueue(1..10000);
foreach (1..5)
{
push #threads,threads->create("worker");
}
$queue->end();
sub worker
{
while (my #DataElement = $queue->dequeue(100))
{
my $tid = threads->tid();
#open (my $out,">>$tid.txt") or die $!;
print "Threads ID:$tid\t#DataElement\n";
#print $out "Threads ID:$tid\t#DataElement\n";
#close $out;
}
}
#----------------------------------------cut
my $thr_num=1;
my $i;
while ($thr_num)
{
$i++;
foreach (#threads) #store threads, TRUE even if joined.
{
$thr_num = threads->list();
print "threads total: $thr_num\n";
if ($_->is_running())
{
sleep 1; #wait
next;
}
if ($_->is_joinable())
{
$_->join();
}
sleep 1;# wait
}
print $i,"\n";
}
this is the whole code. and the warning is can't call method "is_running" on an undefined value at threadqueue2(1).plx.line42. Perl exited with active threads.
No. You will end up with different data structures. As you can see from this simplified version of your code.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
my #threads;
foreach my $thr (1 .. 5) {
$threads[$thr] = 'A Thread';
}
say Dumper \#threads;
#threads = ();
foreach (1 .. 5) {
push #threads, 'A Thread';
}
say Dumper \#threads;
The output is:
$VAR1 = [
undef,
'A Thread',
'A Thread',
'A Thread',
'A Thread',
'A Thread'
];
$VAR1 = [
'A Thread',
'A Thread',
'A Thread',
'A Thread',
'A Thread'
];
In your first example, you begin populating the array at element 1, so the first element (which has an index of 0) contains undef.

Perl: Make thread sleep

I have a basic Perl script that runs a function with 20 threads.
use threads;
use threads::shared;
use Thread::Queue;
use File::Slurp;
$| = 1; my $numthreads = 20;
my $fetch_q = Thread::Queue->new();
sub fetch {
while ( my $target = $fetch_q->dequeue() ) {
my $request = `curl "http://WEBSITE" -s -o /dev/null -w "%{http_code}"`; # Returns HTTP Status code of request (i.e. 200, 302, 404, etc)
if ($request eq "200") {
print "Success. Sleeping for 5 seconds.";
sleep(5);
}
else {
print "Fail. Will try again in 10 seconds.";
sleep(10);
redo;
}
}
}
my #workers = map { threads->create( \&fetch ) } 1 .. $numthreads;
$fetch_q->enqueue( 1 .. $max );
$fetch_q->end();
foreach my $thr (#workers) {$thr->join();}
If a condition is true, I want the thread to sleep for 5 seconds but have all the other threads continue. Right now when I use sleep(5) it seems like the entire script sleeps for 5 seconds.
How can I use sleep() for individual threads?
sleep is the right tool. I don't know why you think it seems to block all threads, but it doesn't. The following illustrates this:
use threads;
use threads::shared;
use Thread::Queue;
my $fetch_q = Thread::Queue->new();
sub fetch {
while (defined( my $job = $fetch_q->dequeue() )) {
if ($job) {
print threads->tid, " Starting to sleep\n";
sleep(5);
print threads->tid, " Finished sleeping\n";
}
else {
print threads->tid, " Starting to sleep\n";
sleep(10);
print threads->tid, " Finished sleeping\n";
}
}
}
my #workers = map { threads->create( \&fetch ) } 1 .. 2;
$fetch_q->enqueue($_) for 0, 1, 1;
$fetch_q->end();
foreach my $thr (#workers) {$thr->join();}
Output:
1 Starting to sleep
2 Starting to sleep
2 Finished sleeping
2 Starting to sleep <-- The second job picked up work while the first was still sleeping.
1 Finished sleeping
2 Finished sleeping

How to pause and resume a multithread perl script?

I have written the perl script to pause and resume.When the user enters Ctrl+c it has to pause and on pressing c it should resume. But is not working properly as expected. Can anyone help me on this what mistake i am making:
use strict;
use threads;
use threads::shared;
use Thread::Suspend;
use Lens;
$SIG{'INT'} = 'Pause';
#$| = 1;
print chr(7);
my $nthreads = 64;
my #thrs;
for(1..$nthreads)
{
print "START $_ \n";
my ($thr) = threads->create(\&worker, $_);
push #thrs ,$thr;
}
$_->join for #thrs;
exit;
sub worker
{
my $id = shift;
my $tmp;
my $lens = Lens->new("172.16.1.65:2000");
die "cannot create object" unless defined $lens;
die "cannot connect to XRay at " unless defined $lens->open("172.16.1.65:2000");
for(1..100000)
{
print "Thread $id \n";
}
print "$id>LOAD EXIT\n";
}
sub Pause
{
sleep(1);
print "\nCaught ^C\n";
print "Press \"c\" to continue, \"e\" to exit: ";
$_->suspend() for #thrs;
while (1)
{
my $input = lc(getc());
chomp ($input);
if ($input eq 'c') {
#clock($hour,$min,$sec);
$_->resume() for #thrs;
return;
}
elsif ($input eq 'e') {
exit 1;
}
}
}
Well, you haven't been too specific as to how it's "not working properly". But I would suggest looking at using Thread::Semaphore for a 'suspend' mechanism.
I would also suggest not using signal and instead doing something like:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Semaphore;
use Term::ReadKey;
my $nthreads = 64;
my $thread_semaphore = Thread::Semaphore->new($nthreads);
sub worker {
for ( 1 .. 10 ) {
$thread_semaphore->down();
print threads->self->tid(), "\n";
sleep 1;
$thread_semaphore->up();
}
}
for ( 1 .. $nthreads ) {
threads->create( \&worker );
}
my $keypress;
ReadMode 4;
while ( threads->list(threads::running) ) {
while ( not defined( $keypress = ReadKey(-1) )
and threads->list(threads::running) )
{
print "Waiting\nRunning:". threads->list(threads::running) . "\n";
sleep 1;
}
print "Got $keypress\n";
if ( $keypress eq "p" ) {
print "Pausing...";
$thread_semaphore -> down_force($nthreads);
print "All paused\n";
}
if ( $keypress eq "c" ) {
print "Resuming...";
$thread_semaphore -> up ( $nthreads );
}
}
ReadMode 0;
foreach my $thr ( threads->list ) {
$thr->join();
}
It'll 'suspend' by setting the semaphores to zero (or negative) and relies on the threads checking if they should be stopping here or not.
I think the root of your problem though, will probably be signal propagation - your signal handler is global across your threads. You might find configuring $SIG{'INT'} for your threads separately will yield better results. (E.g. set the signal handler to 'IGNORE' at the start of your code, and set specific ones in the thread/main once the threads have been spawned).

Resources