Is there a multiprocessing module for Perl? - multithreading

Is there a multiprocessing module for Perl? Something that has similar functionality to what's offered by Python's multiprocessing module.
I understand I could build similar functionality using Perl, but I'm looking for something already implemented.

forks provides the same awesome interface as threads, but uses processes instead of threads.
use forks; # Or: use threads;
use Thread::Queue;
my $q = Thread::Queue->new();
my #workers;
for (1..NUM_WORKERS) {
push #workers, async {
while (defined(my $job = $q->dequeue())) {
...
}
};
}
$q->enqueue(...);
$q->enqueue(undef) for #workers;
$_->join for #workers;
Comparing forks with Forks::Super.
Keep in mind, these are suppose to the be the cases where Forks::Super excels!
use Forks::Super;
sub do_something { my #args = #_; ... }
$process = fork { sub => \&do_something, args => [#args] };
$process->wait;
can be written as
use forks;
sub do_something { my #args = #_; ... }
$process = async { do_something(#args) };
$process->join;
---
use Forks::Super;
my $x = 42;
my #y = ();
my %z = ();
sub do_something_else {
$x = 19;
#y = qw(foo bar);
%z = (foo => 'bar');
}
$process = fork { sub => 'do_something_else', share => [\$x, \#y, \%z ] };
$process->wait;
can be written as
use forks;
use forks::shared;
my $x :shared = 42;
my #y :shared = ();
my %z :shared = ();
sub do_something_else {
$x = 19;
#y = qw(foo bar);
%z = (foo => 'bar');
}
$process = async { do_something_else() };
$process->join;
---
use Forks::Super;
use IO::Handle;
pipe my $child_read, my $parent_write;
pipe my $parent_read, my $child_write;
$parent_write->autoflush(1);
$child_write->autoflush(1);
sub square {
while (my $x = <$child_read>) {
chomp($x);
print {$child_write} $x ** 2, "\n";
}
close $child_write;
}
$process = fork { sub => 'square' };
print { $parent_write } "9\n";
chomp( my $result = <$parent_read> ); # 81
close $parent_write;
$process->wait;
can be written as
use forks;
use Threads::Queue;
my $req = Threads::Queue->new();
my $resp = Threads::Queue->new();
sub square { $_[0] ** 2 }
$process = async {
while (defined(my $x = $req->dequeue())) {
$resp->enqueue( square($x) );
}
};
$req->enqueue(9);
my $result = $resp->dequeue(); # 81
$resp->enqueue(undef);
$process->join;
---
use Forks::Super;
sub square_root {
sleep 1 && seek STDIN,0,1 while eof(STDIN); # ok, this is a workaround for an existing bug :-(
while (my $x = <STDIN>) {
chomp($x);
print sqrt($x), "\n";
}
}
$process = fork { sub => 'square_root', child_fh => 'in,out,block' };
$process->write_stdin("81\n");
chomp( $result = $process->read_stdout() ); # 9
$process->close_fh('stdin');
$process->wait;
can be written as
use forks;
use Threads::Queue;
my $req = Threads::Queue->new();
my $resp = Threads::Queue->new();
$process = async {
while (defined(my $x = $req->dequeue())) {
$resp->enqueue( sqrt($x) );
}
};
$req->enqueue(81);
my $result = $resp->dequeue(); # 9
$resp->enqueue(undef);
$process->join;

I think Forks::Super comes pretty close. It has a few features for running an arbitrary subroutine (or external command) in a background process, monitoring and signalling the background process, and making interprocess communication a little less painful.
use Forks::Super;
sub do_something { my #args = #_; ... }
$process = fork { sub => \&do_something, args => [#args] };
$process->wait;
my $x = 42;
my #y = ();
my %z = ();
sub do_something_else {
$x = 19;
#y = qw(foo bar);
%z = (foo => 'bar');
}
$process = fork { sub => 'do_something_else', share => [\$x, \#y, \%z ] };
$process->wait;
# $x, #y, and %z are now updated with changes made in background process
# create your own pipes to use for IPC
use IO::Handle;
pipe my $child_read, my $parent_write;
pipe my $parent_read, my $child_write;
$parent_write->autoflush(1);
$child_write->autoflush(1);
sub square {
while (my $x = <$child_read>) {
print {$child_write} $x ** 2, "\n";
}
close $child_write;
}
$process = fork { sub => 'square' };
print {$parent_write} "9\n";
my $result = <$parent_read>; # should be "81\n";
close $parent_write;
# or use the standard I/O handles for IPC
sub square_root {
sleep 1 && seek STDIN,0,1 while eof(STDIN); # ok, this is a workaround for an existing bug :-(
while (my $x = <STDIN>) {
print sqrt($x), "\n";
}
}
$process = fork { sub => 'square_root', child_fh => 'in,out,block' };
$process->write_stdin("81\n");
$result = $process->read_stdout(); # => "9\n"
Both the multiprocessing module and Forks::Super have a lot of features. Which ones are you specifically interested in?
I am the author of Forks::Super and my goal is to include any features for parallel processing that people find useful, so if there's a feature in multiprocessing that you want in Perl, let me know.

What about POE: Perl Object Environment? It has support for asynchronous child processes.

You can use https://github.com/marioroy/mce-perl
It is similar to python multiprocess module

Related

Perl - multithreading / fork / synchronize problem:

I'm trying to work as an example for the following code:
my $milon;
my $pid = fork();
die if not defined $pid;
if (not $pid)
{
$milon->{$pid} = $pid;
exit;
}
$milon->{3} = 4;
my $finished = wait();
print( "debug10: TEST = ", Dumper($milon));
output:
debug10: TEST = $VAR1 = {
'3' => 4
};
How do I make the dictionary keep both 3 => 4 and also the $pid => $pid?
It doesn't have to be forking, it could be multithreading or NonBlocking IO, whichever is better according to what you think.
This is an example of course, I just want to conclude from this example to my real code.
You need some memory that is shared between your threads/processes. The easiest is probably to use interpreter-based threads and threads::shared. For instance:
use threads;
use threads::shared;
my %milon :shared;
for (1 .. 2) {
threads->create(sub {
my $tid = threads->tid();
$milon{$tid} = $tid;
});
}
$milon{3} = 4;
$_->join for threads->list; # Wait for all threads to be done
print Dumper \%milon;
This outputs:
$VAR1 = {
'1' => 1,
'2' => 2,
'3' => 4
};
Following sample code demonstrates usage of fork() to compute square of a number from an array #numbers (total 100) in parallel execution.
REAPER function assigned to $SIG{CHLD} signal cleans up completed child processes to avoid zombie processes hanging around in process table.
Investigate if fork() approach will fit your problem/task.
use strict;
use warnings;
use POSIX qw(strftime :sys_wait_h);
use Time::HiRes qw(usleep);
my $limit = 10;
my $threads = $limit;
my #numbers = map { int(rand(100)) } 1..100;
sub REAPER {
local $!;
while( (my $pid = waitpid(-1, WNOHANG) ) > 0 && WIFEXITED($?) ) {
$threads++;
}
$SIG{CHLD} = \&REAPER;
}
$SIG{CHLD} = \&REAPER;
for ( #numbers ) {
while( $threads == 0 or $threads > $limit ) { usleep(1) }
my $pid = fork();
die $! unless defined $pid;
if( $pid ) {
# parent
$threads--;
} else {
# child
my $n = compute_square($_);
printf "Process %6d: %3d => %d\n", $$, $_, $n;
exit 0;
}
}
sub compute_square {
my $num = shift;
return $num*$num;
}

Perl hand Module to threads

i am trying to pass a subroutine from an self written module to threads using the following code.
This is my first time using threads so I'm kinda not familiar with it.
Main Script (shortend)
#!/usr/bin/perl -w
use strict;
use threads;
use lib 'PATH TO LIB';
use goldstandard;
my $delete_raw_files = 0;
my $outfolder = /PATH/;
my %folder = goldstandard -> create_folder($outfolder,$delete_raw_files);
&tagging if $tagging == 1;
sub tagging{
my %hash = goldstandard -> tagging_hash(\%folder);
my #threads;
foreach(keys %hash){
if($_ =~ m/mate/){
my $arguments = "goldstandard -> mate_tagging($hash{$_}{raw},$hash{$_}{temp},$hash{$_}{tagged},$mate_anna,$mate_model)";
push(#threads,$arguments);
}
if($_ =~ m/morpheus/){
my $arguments = "goldstandard -> morpheus_tagging($hash{$_}{source},$hash{$_}{tagged},$morpheus_stemlib,$morpheus_cruncher)";
push(#threads,$arguments)
}
}
foreach(#threads){
my $thread = threads->create($_);
$thread ->join();
}
}
Module
package goldstandard;
use strict;
use warnings;
sub mate_tagging{
my $Referenz = shift;
my $input = shift;
my $output_temp_dir = shift;
my $output_mate_human = shift;
my $anna = shift;
my $model = shift;
opendir(DIR,"$input");
my #dir = readdir(DIR);
my $anzahl = #dir;
foreach(#dir){
unless($_ =~ m/^\./){
my $name = $_;
my $path = $input . $_;
my $out_temp = $output_temp_dir . $name;
my $out_mate_human_final = $output_mate_human . $name;
qx(java -Xmx10G -classpath $anna is2.tag.Tagger -model $model -test $path -out $out_temp);
open(OUT, "> $out_mate_human_final");
open(TEMP, "< $out_temp");
my $output_text;
while(<TEMP>){
unless($_ =~ m/^\s+$/){
if ($_ =~ m/^\d+\t(.*?)\t_\t_\t_\t(.*?)\t_\t/) {
my $tags = $2;
my $words = $1;
print OUT "$words\t$tags\n";
}
}
}
}
}
}
sub morpheus_tagging{
my $Referenz = shift;
my $input = shift;
my $output = shift;
my $stemlib = shift;
my $cruncher = shift;
opendir(DIR,"$input");
my #dir = readdir(DIR);
foreach(#dir){
unless($_ =~ m/^\./){
my $name = $_;
my $path = $input . $_;
my $out = $output . $name;
qx(env MORPHLIB='$stemlib' '$cruncher' < '$path' > '$out');
}
}
}
1;
Executing this code gets me
Thread 1 terminated abnormally: Undefined subroutine &main::goldstandard -> morpheus_tagging(...) called at ... line 43.
I guess eather the way I am calling the treads or the way I am providing the arguments are wrong. I Hope some can help me with that? I Also found something on safe and unsafe modules bum I'm not sure is this is realy the problem.
I guess eather the way I am calling the treads or the way I am providing the arguments are wrong. I Hope some can help me with that? I Also found something on safe and unsafe modules bum I'm not sure is this is realy the problem.Thanks in advance
You must pass the name of a sub or a reference to a sub, plus arguments, to threads->create. So you need something like
my $method_ref = $invoker->can($method_name);
threads->create($method_ref, $invoker, #args);
That said, passing arguments to threads->create has issues that can be avoided by using a closure.
threads->create(sub { $invoker->$method_name(#args) })
The above can be written more simply as follows:
async { $invoker->$method_name(#args) }
This gets us the following:
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #jobs;
for (keys %hash) {
if (/mate/) {
push #jobs, [ 'goldstandard', 'mate_tagging',
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
];
}
if (/morpheus/) {
push #jobs, [ 'goldstandard', 'morpheus_tagging',
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
];
}
}
my #threads;
for my $job (#jobs) {
my ($invoker, $method_name, #args) = #$job;
push #threads, async { $invoker->$method_name(#args) };
}
$_->join for #threads;
}
or just
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #threads;
for (keys %hash) {
if (/mate/) {
push #threads, async {
goldstandard->mate_tagging(
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
);
};
}
if (/morpheus/) {
push #threads, async {
goldstandard->morpheus_tagging(
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
);
};
}
}
$_->join for #threads;
}
Notes that I delayed the calls to join until after all the threads are created. Your way made it so only one thread would run at a time.
But what we have isn't great. We have no way of limiting how many threads are active at a time, and we (expensively) create many threads instead of reusing them. We can use a worker pool to solve both of these problems.
use constant NUM_WORKERS => 5;
use Thread::Queue 3.01 qw( );
my $q;
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #threads;
for (keys %hash) {
if (/mate/) {
$q->enqueue(sub {
goldstandard->mate_tagging(
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
);
});
}
if (/morpheus/) {
$q->enqueue(sub {
goldstandard->morpheus_tagging(
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
);
});
}
}
}
{
$q = Thread::Queue->new();
for (1..NUM_WORKERS) {
async {
while ( my $job = $q->dequeue() ) {
$job->();
}
};
}
... call tagging and whatever ...
$q->end();
$_->join() for threads->list();
}

Exit perl script automatically every 2 hours

I have a perl scipt that I need it to end every two hours on a Linux machine. I was going to do a separate script and add it on cron.d to accomplish this, but I wanted an easier way. It has to do a graceful exit because after doing a CTRL+C, it writes a log file and killing it won't write the file.
You can set up an alarm at the beginning of the script, and provide a handler for it:
alarm 60 * 60 * 2;
local $SIG{ALRM} = sub {
warn "Time over!\n";
# Do the logging here...
exit
};
The question is how you would restart the script again.
A wrapper keeps things simple.
#!/usr/bin/perl
# usage:
# restarter program [arg [...]]
use strict;
use warnings;
use IPC::Open3 qw( open3 );
use POSIX qw( WNOHANG );
use constant RESTART_AFTER => 2*60*60;
use constant KILL_INT_WAIT => 30;
use constant KILL_TERM_WAIT => 30;
use constant WAIT_POLL => 15;
sub start_it {
open(local *NULL, '<', '/dev/null')
or die($!);
return open3('<&NULL', '>&STDOUT', '>&STDERR', #_);
}
sub wait_for_it {
my ($pid, $max_wait) = #_;
my $end_time = time + $max_wait;
while (1) {
if (waitpid($pid, WNOHANG) > 0) {
return 1;
}
my $time = time;
if ($end_time >= $time) {
return 0;
}
sleep(1);
}
}
sub end_it {
my ($pid) = #_;
kill(INT => $pid)
or die($!);
return if wait_for_it($pid, KILL_INT_WAIT);
kill(TERM => $pid)
or die($!);
return if wait_for_it($pid, KILL_TERM_WAIT);
kill(KILL => $pid)
or die($!);
waitpid($pid, 0);
}
sub run_it {
my $end_time = time + RESTART_AFTER;
my $pid = start_it(#_);
while (1) {
if (waitpid($pid, WNOHANG) > 0) {
last;
}
my $time = time;
if ($end_time >= $time) {
end_it($pid);
last;
}
my $sleep_time = $end_time - $time;
$sleep_time = WAIT_POLL if $sleep_time > WAIT_POLL; # Workaround for race condition.
sleep($sleep_time);
}
my $status = $?;
if ($? & 0x7F) { warn("Child killed by signal ".($? & 0x7F)."\n"); }
elsif ($? >> 8) { warn("Child exited with error ".($? >> 8)."\n"); }
else { warn("Child exited with succcesfully.\n"); }
}
run_it(#ARGV) while 1;
You might want to forward signals sent to the handler to the child.
You can catch the signal sent by Ctrl-C by setting a subroutine in $SIG{INT}:
$ perl -e '$SIG{INT} = sub { print "Caught signal, cleaning up\n"; exit 0 }; while(1) {}'
Do your cleanup within the sub, and there you go.

Perl: How to push a hash into an array that is outside of a subroutine

I originally experimented with trying to send a hash object through Thread::Queue, but according to this link, my versions of Thread::Queue and threads::shared is too old. Unfortunately, since the system I'm testing on isn't mine, I can't upgrade.
I then tried to use a common array to store my hashes. Here is the code so far:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
use constant NUM_WORKERS => 10;
my #out_array;
test1();
sub test1
{
my $in_queue = Thread::Queue->new();
foreach (1..NUM_WORKERS) {
async {
while (my $job = $in_queue->dequeue()) {
test2($job);
}
};
}
my #sentiments = ("Axe Murderer", "Mauler", "Babyface", "Dragon");
$in_queue->enqueue(#sentiments);
$in_queue->enqueue(undef) for 1..NUM_WORKERS;
$_->join() for threads->list();
foreach my $element (#out_array) {
print "element: $element\n";
}
}
sub test2
{
my $string = $_[0];
my %hash = (Skeleton => $string);
push #out_array, \%hash;
}
However, at the end of the procedure, #out_array is always empty. If I remove the threading parts of the script, then #out_array is correctly populated. I suspect I'm implementing threading incorrectly here.
How would I correctly populate #out_array in this instance?
You need to make it shared
use threads::shared;
my #out_array :shared;
I don't think you need to lock it if all you do is push onto it, but if you did, you'd use
lock #out_array;
You need to share any array or hash referenced by a value you push onto it using the tools in thread::shared.
push #out_array, share(%hash);
Though as I mentioned earlier, I'd use a Thread::Queue.
sub test2 {
my ($string) = #_;
my %hash = ( Skeleton => $string );
return \%hash;
}
...
my $response_q = Thread::Queue->new()
my $running :shared = NUM_WORKERS;
...
async {
while (my $job = $request_q->dequeue()) {
$response_q->enqueue(test2($job));
}
{ lock $running; $response_q->enqueue(undef) if !--$running; }
};
...
$request_q->enqueue(#sentiments);
$request_q->enqueue(undef) for 1..NUM_WORKERS;
while (my $response = $response_q->dequeue()) {
print "Skeleton: $response->{Skeleton}\n";
}
$_->join() for threads->list();
Note that lack of anything thread-specific in test2. This is good. You should always strive for separation of concerns.
You need to return your data from thread:
....
async {
my $data;
while (my $job = $in_queue->dequeue()) {
$data = test2($job);
}
return $data;
};
...
for ( threads->list() ) {
my $data = $_->join();
#now you have this thread return value in $data
}
sub test2
{
my $string = $_[0];
my %hash = (Skeleton => $string);
return \%hash;
}
I found my answer in the example here.
I had to change 2 things:
share the #out_array outside both subs
share the %hash in test2
add return; to the end of test2
Code outside both subs:
my #out_array : shared = ();
test2 sub:
sub test2
{
my $string = $_[0];
my %hash : shared;
$hash{Skeleton} = $string;
push #out_array, \%hash;
return;
}

thread shared perl

i wrote a code and i need to make it multithreaded. Evething works, but every loop repeats 4 times:
use LWP::UserAgent;
use HTTP::Cookies;
use threads;
use threads::shared;
$| = 1;
$threads = 4;
my #groups :shared = loadf('groups.txt');
my #thread_list = ();
$thread_list[$_] = threads->create(\&thread) for 0 .. $threads - 1;
$_->join for #thread_list;
thread();
sub thread
{
my $url = 'http://www.site.ru/';
my $response = $web->post($url, Content =>
['st.redirect' => ''
]);
foreach $i (#groups)
{
my $response = $web->get($i);
if(!($response->header('Location')))
{
---------;
}
else
{
----------;
}
}
}
sub loadf {
open (F, "<".$_[0]) or erroropen($_[0]);
chomp(my #data = <F>);
close F;
return #data;
}
groups.txt :
http://www.odnoklassniki.ru/group/47357692739634
http://www.odnoklassniki.ru/group/56099517562922
I understand that i need to use threads::shared; but i can't undestand how to use it.
Your post does not have much context to explain the code sections; please explain your scenario more clearly.
The problem is that you never remove from #groups, so all threads do all jobs in #groups.
Here's one solution.
use threads;
use Thread::Queue 3.01 qw( );
my $NUM_WORKERS = 4;
sub worker {
my ($url) = #_;
... download the page ...
}
my $q = Thread::Queue->new();
for (1..$NUM_WORKERS) {
async {
while (my $url = $q->dequeue()) {
worker($url);
}
};
}
$q->enqueue($_) for loadf('groups.txt');
$q->end();
$_->join() for threads->list;
Why do you need to make it threaded? perl does much better using forks in most cases.
That said, your code starts 4 threads, each of which processes everything in #groups. It sounds like that's not what you want to do. If you want #groups to be a queue of work to do, take a look at Thread::Queue (or Parallel::ForkManager).

Resources