I have written the perl script to pause and resume.When the user enters Ctrl+c it has to pause and on pressing c it should resume. But is not working properly as expected. Can anyone help me on this what mistake i am making:
use strict;
use threads;
use threads::shared;
use Thread::Suspend;
use Lens;
$SIG{'INT'} = 'Pause';
#$| = 1;
print chr(7);
my $nthreads = 64;
my #thrs;
for(1..$nthreads)
{
print "START $_ \n";
my ($thr) = threads->create(\&worker, $_);
push #thrs ,$thr;
}
$_->join for #thrs;
exit;
sub worker
{
my $id = shift;
my $tmp;
my $lens = Lens->new("172.16.1.65:2000");
die "cannot create object" unless defined $lens;
die "cannot connect to XRay at " unless defined $lens->open("172.16.1.65:2000");
for(1..100000)
{
print "Thread $id \n";
}
print "$id>LOAD EXIT\n";
}
sub Pause
{
sleep(1);
print "\nCaught ^C\n";
print "Press \"c\" to continue, \"e\" to exit: ";
$_->suspend() for #thrs;
while (1)
{
my $input = lc(getc());
chomp ($input);
if ($input eq 'c') {
#clock($hour,$min,$sec);
$_->resume() for #thrs;
return;
}
elsif ($input eq 'e') {
exit 1;
}
}
}
Well, you haven't been too specific as to how it's "not working properly". But I would suggest looking at using Thread::Semaphore for a 'suspend' mechanism.
I would also suggest not using signal and instead doing something like:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Semaphore;
use Term::ReadKey;
my $nthreads = 64;
my $thread_semaphore = Thread::Semaphore->new($nthreads);
sub worker {
for ( 1 .. 10 ) {
$thread_semaphore->down();
print threads->self->tid(), "\n";
sleep 1;
$thread_semaphore->up();
}
}
for ( 1 .. $nthreads ) {
threads->create( \&worker );
}
my $keypress;
ReadMode 4;
while ( threads->list(threads::running) ) {
while ( not defined( $keypress = ReadKey(-1) )
and threads->list(threads::running) )
{
print "Waiting\nRunning:". threads->list(threads::running) . "\n";
sleep 1;
}
print "Got $keypress\n";
if ( $keypress eq "p" ) {
print "Pausing...";
$thread_semaphore -> down_force($nthreads);
print "All paused\n";
}
if ( $keypress eq "c" ) {
print "Resuming...";
$thread_semaphore -> up ( $nthreads );
}
}
ReadMode 0;
foreach my $thr ( threads->list ) {
$thr->join();
}
It'll 'suspend' by setting the semaphores to zero (or negative) and relies on the threads checking if they should be stopping here or not.
I think the root of your problem though, will probably be signal propagation - your signal handler is global across your threads. You might find configuring $SIG{'INT'} for your threads separately will yield better results. (E.g. set the signal handler to 'IGNORE' at the start of your code, and set specific ones in the thread/main once the threads have been spawned).
Related
I'm trying to work as an example for the following code:
my $milon;
my $pid = fork();
die if not defined $pid;
if (not $pid)
{
$milon->{$pid} = $pid;
exit;
}
$milon->{3} = 4;
my $finished = wait();
print( "debug10: TEST = ", Dumper($milon));
output:
debug10: TEST = $VAR1 = {
'3' => 4
};
How do I make the dictionary keep both 3 => 4 and also the $pid => $pid?
It doesn't have to be forking, it could be multithreading or NonBlocking IO, whichever is better according to what you think.
This is an example of course, I just want to conclude from this example to my real code.
You need some memory that is shared between your threads/processes. The easiest is probably to use interpreter-based threads and threads::shared. For instance:
use threads;
use threads::shared;
my %milon :shared;
for (1 .. 2) {
threads->create(sub {
my $tid = threads->tid();
$milon{$tid} = $tid;
});
}
$milon{3} = 4;
$_->join for threads->list; # Wait for all threads to be done
print Dumper \%milon;
This outputs:
$VAR1 = {
'1' => 1,
'2' => 2,
'3' => 4
};
Following sample code demonstrates usage of fork() to compute square of a number from an array #numbers (total 100) in parallel execution.
REAPER function assigned to $SIG{CHLD} signal cleans up completed child processes to avoid zombie processes hanging around in process table.
Investigate if fork() approach will fit your problem/task.
use strict;
use warnings;
use POSIX qw(strftime :sys_wait_h);
use Time::HiRes qw(usleep);
my $limit = 10;
my $threads = $limit;
my #numbers = map { int(rand(100)) } 1..100;
sub REAPER {
local $!;
while( (my $pid = waitpid(-1, WNOHANG) ) > 0 && WIFEXITED($?) ) {
$threads++;
}
$SIG{CHLD} = \&REAPER;
}
$SIG{CHLD} = \&REAPER;
for ( #numbers ) {
while( $threads == 0 or $threads > $limit ) { usleep(1) }
my $pid = fork();
die $! unless defined $pid;
if( $pid ) {
# parent
$threads--;
} else {
# child
my $n = compute_square($_);
printf "Process %6d: %3d => %d\n", $$, $_, $n;
exit 0;
}
}
sub compute_square {
my $num = shift;
return $num*$num;
}
I have a perl scipt that I need it to end every two hours on a Linux machine. I was going to do a separate script and add it on cron.d to accomplish this, but I wanted an easier way. It has to do a graceful exit because after doing a CTRL+C, it writes a log file and killing it won't write the file.
You can set up an alarm at the beginning of the script, and provide a handler for it:
alarm 60 * 60 * 2;
local $SIG{ALRM} = sub {
warn "Time over!\n";
# Do the logging here...
exit
};
The question is how you would restart the script again.
A wrapper keeps things simple.
#!/usr/bin/perl
# usage:
# restarter program [arg [...]]
use strict;
use warnings;
use IPC::Open3 qw( open3 );
use POSIX qw( WNOHANG );
use constant RESTART_AFTER => 2*60*60;
use constant KILL_INT_WAIT => 30;
use constant KILL_TERM_WAIT => 30;
use constant WAIT_POLL => 15;
sub start_it {
open(local *NULL, '<', '/dev/null')
or die($!);
return open3('<&NULL', '>&STDOUT', '>&STDERR', #_);
}
sub wait_for_it {
my ($pid, $max_wait) = #_;
my $end_time = time + $max_wait;
while (1) {
if (waitpid($pid, WNOHANG) > 0) {
return 1;
}
my $time = time;
if ($end_time >= $time) {
return 0;
}
sleep(1);
}
}
sub end_it {
my ($pid) = #_;
kill(INT => $pid)
or die($!);
return if wait_for_it($pid, KILL_INT_WAIT);
kill(TERM => $pid)
or die($!);
return if wait_for_it($pid, KILL_TERM_WAIT);
kill(KILL => $pid)
or die($!);
waitpid($pid, 0);
}
sub run_it {
my $end_time = time + RESTART_AFTER;
my $pid = start_it(#_);
while (1) {
if (waitpid($pid, WNOHANG) > 0) {
last;
}
my $time = time;
if ($end_time >= $time) {
end_it($pid);
last;
}
my $sleep_time = $end_time - $time;
$sleep_time = WAIT_POLL if $sleep_time > WAIT_POLL; # Workaround for race condition.
sleep($sleep_time);
}
my $status = $?;
if ($? & 0x7F) { warn("Child killed by signal ".($? & 0x7F)."\n"); }
elsif ($? >> 8) { warn("Child exited with error ".($? >> 8)."\n"); }
else { warn("Child exited with succcesfully.\n"); }
}
run_it(#ARGV) while 1;
You might want to forward signals sent to the handler to the child.
You can catch the signal sent by Ctrl-C by setting a subroutine in $SIG{INT}:
$ perl -e '$SIG{INT} = sub { print "Caught signal, cleaning up\n"; exit 0 }; while(1) {}'
Do your cleanup within the sub, and there you go.
I have made a Perl script where I create threads (limited in terms of threads running in the meantime) and each threads create its own children which should be also limited in number.
Where I host my script, I cannot launch more than X threads per Perl script in the meantime. In the below example, I have X = 3 x 7 = 21 threads maximum in the meantime.
3 for the 1st job ($nb_process_first)
7 for the 2nd job ($nb_process_second)
Questions:
Is there a better way to manage threads and their children? (queues for example - could you please bring me some code example because I have tried with no success)
My current script is not terminating with all the threads joined, although I use a loop on all running threads to join them (cf. at the end of the script).
#!/usr/bin/perl -s
use threads;
my #threads;
my $nb_process_first = 3;
my #running = ();
print "START" . "\n";
$current = 1;
while ( $current <= 10 ) {
#running = threads->list(threads::running);
if ( scalar #running < $nb_process_first ) {
print "Launch firstJob=" . scalar #running . "\n";
my $thread = threads->create( \&firstJob );
push( #threads, $thread );
} else {
redo;
}
$current++;
}
my #joinable = threads->list(threads::joinable);
while ( scalar #joinable != 0 ) {
foreach my $thr ( threads->list() ) {
$thr->join();
}
#joinable = threads->list(threads::joinable);
}
print "END" . "\n";
sub secondJob {
for ( $i = 0; $i <= 15; $i++ ) {
print "secondJob=" . $i . "\n";
sleep 1;
}
threads->exit();
}
sub firstJob {
my $nb_process_second = 7;
my #running = ();
$current = 1;
while ( $current <= 10 ) {
#running = threads->list(threads::running);
if ( scalar #running < $nb_process_second ) {
print "firstJob/Launch secondJob=" . scalar #running . "-" . $current . "\n";
my $secondthread = threads->create( \&secondJob );
push( #threads, $secondthread );
sleep 2;
}
$current++;
}
threads->exit();
}
Thread::Queue is a handy model for basic 'worker thread' model of threaded code.
It goes a bit like this:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $firstworkitem_q = Thread::Queue -> new();
my $secondworkitem_q = Thread::Queue -> new();
my $nthreads = 10;
sub first_worker {
while ( my $item = $firstworkitem_q -> dequeue() ) {
print "First worker picked up $item, and queues it to second worker\n";
$secondworkitem_q -> enqueue ( $item );
}
}
sub second_worker {
while ( my $item = $secondworkitem_q -> dequeue() ) {
print "Second worker got $item";
}
}
my #first_workers;
for ( 1..$nthreads ) {
my $thr = threads -> create ( \&first_worker );
push ( #first_workers, $thr );
}
for ( 1..$nthreads ) {
my $thr = threads -> create ( \&second_worker );
}
$firstworkitem_q -> enqueue ( #things_to_processs );
$firstworkitem_q -> end;
foreach my $firstworker ( #first_workers ) {
$firstworker -> join();
}
#here all the first workers have finished, so we know nothing will be queued to second work queue.
$secondworkitem_q -> end();
foreach my $thr ( threads -> list() ) {
$thr -> join();
}
You stuff things into the queue, and iterate on it for processing. When you end the queue, the while loop gets an undef and thus terminates - making your thread joinable.
You don't need to track #running the way you do, because threads -> list() will do that. And more importantly - you'd need to make #running a shared variable and lock it, because otherwise you've got a different copy of it in each thread.
Having firstJob spawn secondJob I'd steer away from, because it can create all manner of fruity bugs. I'd suggest spawning two classes of worker thread. Use $queue -> end() to trigger the first set of workers to close.
As for your second question, threads are only joinable if they have finished running (see this answer). Because some of the threads aren't done when the second while loop runs, it finishes without joining them.
Your loop should wait based on the number of active threads, not the number of joinable threads. Something like this:
while (threads->list() > 0)
{
foreach my $joinable (threads->list(threads::joinable))
{
$joinable->join();
}
}
As for the first question, there are certainly other ways to manage threads. However, it is not possible to say what you should do without knowing your task.
I am new in Perl. I want to write a Perl script using thread.I have few files say 20 files and want to process those files using 5 threads in 4 batches. I am printing the thread no. After completing one batch ,the thread no must start with 1 for the next batch. But instead of that its creating 20 threads.please help. my code is as follows:
#!/usr/bin/perl -w
use strict;
use warnings;
use threads;
use threads::shared;
my $INPUT_DIR="/home/Documents/myscript/IMPORTLDIF/";
opendir(DIR, $INPUT_DIR) ;
my #files = grep { /^InputFile/ } readdir DIR;
my $count = #files;
#print "Total Files: $count \n";
my #threads;
my $noofthread = 5;
my $nooffiles = $count;
my $noofbatch = $nooffiles / $noofthread;
#print "No of batch: $noofbatch \n";
my $fileIndex = 0;
my $batch = 1;
while ($fileIndex < $nooffiles) {
print "Batch: $batch \n";
for (my $i=0; $i < $noofthread && $fileIndex < $nooffiles ; $i++) {
my $t = threads->new(\&doOperation, $files[$fileIndex], $i)->join;
push(#threads, $t);
$fileIndex++;
print "FileIndex: $fileIndex \n";
}
$batch++;
}
sub doOperation () {
my $ithread = threads->tid() ;
print "Thread Index : [id=$ithread]\n" ;
foreach my $item (#_){
my $filename = $item;
print "Filename name: $filename \n";
}
Edited program using thread queue:
#!/usr/bin/perl -w
# This is compiled with threading support
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
my $INPUT_DIR="/home/Documents/myscript/IMPORTLDIF/";
opendir(DIR, $INPUT_DIR) or die "Cannot opendir: $!";
my #thrs = threads->create(\&doOperation ) for 1..5;#for 5 threads
#my #files = `ls -1 /home/Documents/myscript/IMPORTLDIF/`;
my #files = grep { /^Input/ } readdir DIR or die "File not present present. \n";
chomp(#files);
#add files to queue
foreach my $f (#files){
# Send work to the thread
$q->enqueue($f);
print "Pending items: " + $q->pending()."\n";
}
$q->enqueue('_DONE_') for #thrs;
$_->join() for #thrs;
sub doOperation () {
my $ithread = threads->tid() ;
while (my $filename = $q->dequeue()) {
# Do work on $item
return 1 if $filename eq '_DONE_';
print "[id=$ithread]\t$filename\n";
}
return 1;
}
You are spawning a thread and then waiting for it to complete before spawning the next, each thread handling one file. That is why you see as many threads as you have files.
my $t = threads->new(\&doOperation, $files[$fileIndex], $i)->join;
^^^^--- This will block
Instead try something like this:
....
# split the workload into N batches
#
while (my #batch = splice(#files, 0, $batch_size)) {
push #threads, threads->new(\&doOperation, #batch);
}
# now wait for all workers to finish
#
for my $thr (#threads) {
$thr->join;
}
As an aside, Thread::Queue and Thread-Pool might imply better designs for the work you want to do.
You could use Paralel:Queue and create 4 thread and pass them items that they could work on.
To fork or not to fork?
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
my #thrs;
push #thrs, threads->create(\&doOperation ) for 1..5;#for 5 threads
my #files = `ls -1 /tmp/`;chomp(#files);
#add files to queue
foreach my $f (#files){
# Send work to the thread
$q->enqueue($f);
print "Pending items: "$q->pending()."\n";
}
$q->enqueue('_DONE_') for #thrs;
$_->join() for threads->list();
sub doOperation () {
my $ithread = threads->tid() ;
while (my $filename = $q->dequeue()) {
# Do work on $item
return 1 if $filename eq '_DONE_';
print "[id=$ithread]\t$filename\n";
}
return 1;
}
I have little scraping application and trying to add multithreading to it. Here is code (MyMech is WWW::Mechanize subclass used to process HTTP errors):
#!/usr/bin/perl
use strict;
use MyMech;
use File::Basename;
use File::Path;
use HTML::Entities;
use threads;
use threads::shared;
use Thread::Queue;
use List::Util qw( max sum );
my $page = 1;
my %CONFIG = read_config();
my $mech = MyMech->new( autocheck => 1 );
$mech->quiet(0);
$mech->get( $CONFIG{BASE_URL} . "/site-map.php" );
my #championship_links =
$mech->find_all_links( url_regex => qr/\d{4}-\d{4}\/$/ );
foreach my $championship_link (#championship_links) {
my #threads;
my $queue = Thread::Queue->new;
my $queue_processed = Thread::Queue->new;
my $url = sprintf $championship_link->url_abs();
print $url, "\n";
next unless $url =~ m{soccer}i;
$mech->get($url);
my ( $last_round_loaded, $current_round ) =
find_current_round( $mech->content() );
unless ($last_round_loaded) {
print "\tLoading rounds data...\n";
$mech->submit_form(
form_id => "leagueForm",
fields => {
round => $current_round,
},
);
}
my #match_links =
$mech->find_all_links( url_regex => qr/matchdetails\.php\?matchid=\d+$/ );
foreach my $link (#match_links) {
$queue->enqueue($link);
}
print "Starting printing thread...\n";
my $printing_thread = threads->create(
sub { printing_thread( scalar(#match_links), $queue_processed ) } )
->detach;
push #threads, $printing_thread;
print "Starting threads...\n";
foreach my $thread_id ( 1 .. $CONFIG{NUMBER_OF_THREADS} ) {
my $thread = threads->create(
sub { scrape_match( $thread_id, $queue, $queue_processed ) } )
->join;
push #threads, $thread;
}
undef $queue;
undef $queue_processed;
foreach my $thread ( threads->list() ) {
if ( $thread->is_running() ) {
print $thread->tid(), "\n";
}
}
#sleep 5;
}
print "Finished!\n";
sub printing_thread {
my ( $number_of_matches, $queue_processed ) = #_;
my #fields =
qw (
championship
year
receiving_team
visiting_team
score
average_home
average_draw
average_away
max_home
max_draw
max_away
date
url
);
while ($number_of_matches) {
if ( my $match = $queue_processed->dequeue_nb ) {
open my $fh, ">>:encoding(UTF-8)", $CONFIG{RESULT_FILE} or die $!;
print $fh join( "\t", #{$match}{#fields} ), "\n";
close $fh;
$number_of_matches--;
}
}
threads->exit();
}
sub scrape_match {
my ( $thread_id, $queue, $queue_processed ) = #_;
while ( my $match_link = $queue->dequeue_nb ) {
my $url = sprintf $match_link->url_abs();
print "\t$url", "\n";
my $mech = MyMech->new( autocheck => 1 );
$mech->quiet(0);
$mech->get($url);
my $match = parse_match( $mech->content() );
$match->{url} = $url;
$queue_processed->enqueue($match);
}
return 1;
}
And i have some strange things with this code. Sometimes it run but sometimes it exit with no errors (at the ->detach point). I know that #match_links contain data but threads are not created and it just close. Usually it terminates after processing second $championship_link entry.
May be i'm doing something wrong?
Update
Here is code for find_current_round subroutine (but i'm sure it's not related to the question):
sub find_current_round {
my ($html) = #_;
my ($select_html) = $html =~ m{
<select\s+name="round"[^>]+>\s*
(.+?)
</select>
}isx;
my ( $option_html, $current_round ) = $select_html =~ m{
(<option\s+value="\d+"(?:\s+ selected="selected")?>(\d+)</option>)\Z
}isx;
my ($last_round_loaded) = $option_html =~ m{selected};
return ( $last_round_loaded, $current_round );
}
First off - don't use dequeue_nb(). This is a bad idea, because if a queue is temporarily empty, it'll return undef and your thread will exit.
Use instead dequeue and and end. dequeue will block, but once you end your queue, the while will exit.
You're also doing some decidedly odd things with your threads - I would suggest that you rarely want to detach a thread. You're just assuming your thread is going to complete before your program, which isn't a good plan.
Likewise this;
my $thread = threads->create(
sub { scrape_match( $thread_id, $queue, $queue_processed ) } )
->join;
You're spawning a thread, and then instantly joining it. And so that join call will... block waiting for your thread to exit. You don't need threads at all to do that...
You also scope your queues within your foreach loop. I don't think that's a good plan. I would suggest instead - scope them externally, and spawn a defined number of 'worker' threads (and one 'printing' thread).
And then just feed them through the queue mechanism. Otherwise you'll end up creating multiple queue instances, because they're lexically scoped.
And once you've finished queuing stuff, issue a $queue -> end which'll terminate the while loop.
You also don't need to give a thread a $thread_id because ... they already have one. Try: threads -> self -> tid(); instead.