How can I pass shared data to threads in Perl? - multithreading

use threads;
use threads::shared;
sub test {
my $s :shared = 22;
my $thread = threads->new(\&thrsub);
$thread->join();
print $s;
}
sub thrsub {
$s = 33;
}
test;
Why isn't the data being shared in the thread?

It shares the variable, but you're accessing a different variable than the one you shared. (use strict; would have told you there were different variables in this case. Always use use strict; use warnings;) The fix is to use a single variable.
my $s :shared = 22;
sub test {
my $thread = threads->new(\&thrsub);
$thread->join();
print $s;
}
sub thrsub {
$s = 33;
}
test;

You misunderstood what threads::shared does. It does not give access to variables across lexical scopes. If you want thrsub to affect $s, you'll have to pass a reference to it when you create the thread.
use strict; use warnings;
use threads;
use threads::shared;
sub test {
my $s = 22;
my $s_ref = share $s;
my $thread = threads->new(\&thrsub, $s_ref);
$thread->join();
print $s;
}
sub thrsub {
my $s_ref = shift;
$$s_ref = 33;
return;
}
test;

Related

Perl hand Module to threads

i am trying to pass a subroutine from an self written module to threads using the following code.
This is my first time using threads so I'm kinda not familiar with it.
Main Script (shortend)
#!/usr/bin/perl -w
use strict;
use threads;
use lib 'PATH TO LIB';
use goldstandard;
my $delete_raw_files = 0;
my $outfolder = /PATH/;
my %folder = goldstandard -> create_folder($outfolder,$delete_raw_files);
&tagging if $tagging == 1;
sub tagging{
my %hash = goldstandard -> tagging_hash(\%folder);
my #threads;
foreach(keys %hash){
if($_ =~ m/mate/){
my $arguments = "goldstandard -> mate_tagging($hash{$_}{raw},$hash{$_}{temp},$hash{$_}{tagged},$mate_anna,$mate_model)";
push(#threads,$arguments);
}
if($_ =~ m/morpheus/){
my $arguments = "goldstandard -> morpheus_tagging($hash{$_}{source},$hash{$_}{tagged},$morpheus_stemlib,$morpheus_cruncher)";
push(#threads,$arguments)
}
}
foreach(#threads){
my $thread = threads->create($_);
$thread ->join();
}
}
Module
package goldstandard;
use strict;
use warnings;
sub mate_tagging{
my $Referenz = shift;
my $input = shift;
my $output_temp_dir = shift;
my $output_mate_human = shift;
my $anna = shift;
my $model = shift;
opendir(DIR,"$input");
my #dir = readdir(DIR);
my $anzahl = #dir;
foreach(#dir){
unless($_ =~ m/^\./){
my $name = $_;
my $path = $input . $_;
my $out_temp = $output_temp_dir . $name;
my $out_mate_human_final = $output_mate_human . $name;
qx(java -Xmx10G -classpath $anna is2.tag.Tagger -model $model -test $path -out $out_temp);
open(OUT, "> $out_mate_human_final");
open(TEMP, "< $out_temp");
my $output_text;
while(<TEMP>){
unless($_ =~ m/^\s+$/){
if ($_ =~ m/^\d+\t(.*?)\t_\t_\t_\t(.*?)\t_\t/) {
my $tags = $2;
my $words = $1;
print OUT "$words\t$tags\n";
}
}
}
}
}
}
sub morpheus_tagging{
my $Referenz = shift;
my $input = shift;
my $output = shift;
my $stemlib = shift;
my $cruncher = shift;
opendir(DIR,"$input");
my #dir = readdir(DIR);
foreach(#dir){
unless($_ =~ m/^\./){
my $name = $_;
my $path = $input . $_;
my $out = $output . $name;
qx(env MORPHLIB='$stemlib' '$cruncher' < '$path' > '$out');
}
}
}
1;
Executing this code gets me
Thread 1 terminated abnormally: Undefined subroutine &main::goldstandard -> morpheus_tagging(...) called at ... line 43.
I guess eather the way I am calling the treads or the way I am providing the arguments are wrong. I Hope some can help me with that? I Also found something on safe and unsafe modules bum I'm not sure is this is realy the problem.
I guess eather the way I am calling the treads or the way I am providing the arguments are wrong. I Hope some can help me with that? I Also found something on safe and unsafe modules bum I'm not sure is this is realy the problem.Thanks in advance
You must pass the name of a sub or a reference to a sub, plus arguments, to threads->create. So you need something like
my $method_ref = $invoker->can($method_name);
threads->create($method_ref, $invoker, #args);
That said, passing arguments to threads->create has issues that can be avoided by using a closure.
threads->create(sub { $invoker->$method_name(#args) })
The above can be written more simply as follows:
async { $invoker->$method_name(#args) }
This gets us the following:
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #jobs;
for (keys %hash) {
if (/mate/) {
push #jobs, [ 'goldstandard', 'mate_tagging',
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
];
}
if (/morpheus/) {
push #jobs, [ 'goldstandard', 'morpheus_tagging',
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
];
}
}
my #threads;
for my $job (#jobs) {
my ($invoker, $method_name, #args) = #$job;
push #threads, async { $invoker->$method_name(#args) };
}
$_->join for #threads;
}
or just
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #threads;
for (keys %hash) {
if (/mate/) {
push #threads, async {
goldstandard->mate_tagging(
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
);
};
}
if (/morpheus/) {
push #threads, async {
goldstandard->morpheus_tagging(
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
);
};
}
}
$_->join for #threads;
}
Notes that I delayed the calls to join until after all the threads are created. Your way made it so only one thread would run at a time.
But what we have isn't great. We have no way of limiting how many threads are active at a time, and we (expensively) create many threads instead of reusing them. We can use a worker pool to solve both of these problems.
use constant NUM_WORKERS => 5;
use Thread::Queue 3.01 qw( );
my $q;
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #threads;
for (keys %hash) {
if (/mate/) {
$q->enqueue(sub {
goldstandard->mate_tagging(
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
);
});
}
if (/morpheus/) {
$q->enqueue(sub {
goldstandard->morpheus_tagging(
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
);
});
}
}
}
{
$q = Thread::Queue->new();
for (1..NUM_WORKERS) {
async {
while ( my $job = $q->dequeue() ) {
$job->();
}
};
}
... call tagging and whatever ...
$q->end();
$_->join() for threads->list();
}

error code is 'Thread 1 terminated abnormally: Invalid value for shared scalar at'

This is my code.
the code has some problem about hash shared.
use strict;
use warnings;
use threads;
use threads::shared;
my %db;
share(%db);
my #threads;
sub test{
my $db_ref = $_[0];
my #arr = ('a','b');
push #{$db_ref->{'key'}}, \#arr;
}
foreach(1..2){
my $t = threads->new(
sub {
test(\%db);
}
);
push(#threads,$t);
}
foreach (#threads) {
$_->join;
}
error code.
Thread 1 terminated abnormally: Invalid value for shared scalar at test1.pl line 13.
Thread 2 terminated abnormally: Invalid value for shared scalar at test1.pl line 13.
I waana using threads::shared.
But I don`t know what is problem.
help me plz~
You can only place references to shared objects into shared vars. #arr isn't shared, and neither is the array onto which you push a reference to #arr.
Replace
my #arr = ('a','b');
push #{$db_ref->{'key'}}, \#arr;
with
my #arr :shared = ('a','b');
lock %$db_ref;
# We can't use autovivification as we need a shared array.
$db_ref->{'key'} = shared_clone([]);
push #{$db_ref->{'key'}}, \#arr;
I changed code.
But can not save all data in hash(%db). Next code is check code.
use strict;
use warnings;
use threads;
use threads::shared;
my %db;
share(%db);
my #threads;
sub test{
my $db_ref = $_[0];
my #arr :shared = ('a','b');
lock %$db_ref;
$db_ref->{'key'} = shared_clone([]);
push #{$db_ref->{'key'}}, \#arr;
}
foreach(1..5){
my $t = threads->new(
sub {
test(\%db);
}
);
push(#threads,$t);
}
foreach (#threads) {
$_->join;
}
while(my ($key, $val) = each %db){
print "$key => $val\n";
foreach my $value (#$val) {
foreach (#$value) {
print $_, " ";
}
print "\n";
}
}
Only one data(a,b) in %db.
We must one more data in %db.

Perl: How to push a hash into an array that is outside of a subroutine

I originally experimented with trying to send a hash object through Thread::Queue, but according to this link, my versions of Thread::Queue and threads::shared is too old. Unfortunately, since the system I'm testing on isn't mine, I can't upgrade.
I then tried to use a common array to store my hashes. Here is the code so far:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
use constant NUM_WORKERS => 10;
my #out_array;
test1();
sub test1
{
my $in_queue = Thread::Queue->new();
foreach (1..NUM_WORKERS) {
async {
while (my $job = $in_queue->dequeue()) {
test2($job);
}
};
}
my #sentiments = ("Axe Murderer", "Mauler", "Babyface", "Dragon");
$in_queue->enqueue(#sentiments);
$in_queue->enqueue(undef) for 1..NUM_WORKERS;
$_->join() for threads->list();
foreach my $element (#out_array) {
print "element: $element\n";
}
}
sub test2
{
my $string = $_[0];
my %hash = (Skeleton => $string);
push #out_array, \%hash;
}
However, at the end of the procedure, #out_array is always empty. If I remove the threading parts of the script, then #out_array is correctly populated. I suspect I'm implementing threading incorrectly here.
How would I correctly populate #out_array in this instance?
You need to make it shared
use threads::shared;
my #out_array :shared;
I don't think you need to lock it if all you do is push onto it, but if you did, you'd use
lock #out_array;
You need to share any array or hash referenced by a value you push onto it using the tools in thread::shared.
push #out_array, share(%hash);
Though as I mentioned earlier, I'd use a Thread::Queue.
sub test2 {
my ($string) = #_;
my %hash = ( Skeleton => $string );
return \%hash;
}
...
my $response_q = Thread::Queue->new()
my $running :shared = NUM_WORKERS;
...
async {
while (my $job = $request_q->dequeue()) {
$response_q->enqueue(test2($job));
}
{ lock $running; $response_q->enqueue(undef) if !--$running; }
};
...
$request_q->enqueue(#sentiments);
$request_q->enqueue(undef) for 1..NUM_WORKERS;
while (my $response = $response_q->dequeue()) {
print "Skeleton: $response->{Skeleton}\n";
}
$_->join() for threads->list();
Note that lack of anything thread-specific in test2. This is good. You should always strive for separation of concerns.
You need to return your data from thread:
....
async {
my $data;
while (my $job = $in_queue->dequeue()) {
$data = test2($job);
}
return $data;
};
...
for ( threads->list() ) {
my $data = $_->join();
#now you have this thread return value in $data
}
sub test2
{
my $string = $_[0];
my %hash = (Skeleton => $string);
return \%hash;
}
I found my answer in the example here.
I had to change 2 things:
share the #out_array outside both subs
share the %hash in test2
add return; to the end of test2
Code outside both subs:
my #out_array : shared = ();
test2 sub:
sub test2
{
my $string = $_[0];
my %hash : shared;
$hash{Skeleton} = $string;
push #out_array, \%hash;
return;
}

Can you put $self into a Thread::Queue in Perl?

I'm having issues with trying to put $self into the thread queue. Perl complains about CODE refs. Is it possible to put an object instance onto the thread queue?
generic.pm (Superclass)
package Things::Generic;
use Thread::Queue;
use threads;
our $work_queue = new Thread::Queue;
our $result_queue = new Thread::Queue;
my #worker_pool = map { threads->create (\&delegate_task, $work_queue, $result_queue) } 1 .. $MAX_THREADS;
sub delegate_task {
my( $Qwork, $Qresults ) = #_;
while( my $work = $Qwork->dequeue ) {
#The item on the queue contains "self" taht was passed in,
# so call it's do_work method
$work->do_work();
$Qresults->enqueue( "lol" );
}
$Qresults->enqueue( undef ); ## Signal this thread is finished
}
sub new {
my $class = shift;
my $self = {
_options => shift,
};
bless $self, $class;
return $self;
}
.
.
.
#other instance methods
#
object.pm (Subclass)
package Things::Specific;
use base qw ( Things::Generic )
sub new {
my $class = shift;
my $self = $class->SUPER::new(#_);
return $self;
}
sub do_stuff {
my $self = shift;
$Things::Generic::work_queue->enqueue($self);
}
sub do_work {
print "DOING WORK\n";
}
It's not objects it has a problem with; it's with a code ref within. That's not unreasonable. Why are you trying to share objects with code refs? You should be sharing data between threads, not code.
While I'm not certain of this, the likely root cause is not that you're passing an object, but that the object in question is storing an anonymous coderef in it (a callback, iterator, or the like). You may be able to refactor the object to eliminate this or perform some sort of serialization that allows it to recreate the coderef in the other thread.

thread shared perl

i wrote a code and i need to make it multithreaded. Evething works, but every loop repeats 4 times:
use LWP::UserAgent;
use HTTP::Cookies;
use threads;
use threads::shared;
$| = 1;
$threads = 4;
my #groups :shared = loadf('groups.txt');
my #thread_list = ();
$thread_list[$_] = threads->create(\&thread) for 0 .. $threads - 1;
$_->join for #thread_list;
thread();
sub thread
{
my $url = 'http://www.site.ru/';
my $response = $web->post($url, Content =>
['st.redirect' => ''
]);
foreach $i (#groups)
{
my $response = $web->get($i);
if(!($response->header('Location')))
{
---------;
}
else
{
----------;
}
}
}
sub loadf {
open (F, "<".$_[0]) or erroropen($_[0]);
chomp(my #data = <F>);
close F;
return #data;
}
groups.txt :
http://www.odnoklassniki.ru/group/47357692739634
http://www.odnoklassniki.ru/group/56099517562922
I understand that i need to use threads::shared; but i can't undestand how to use it.
Your post does not have much context to explain the code sections; please explain your scenario more clearly.
The problem is that you never remove from #groups, so all threads do all jobs in #groups.
Here's one solution.
use threads;
use Thread::Queue 3.01 qw( );
my $NUM_WORKERS = 4;
sub worker {
my ($url) = #_;
... download the page ...
}
my $q = Thread::Queue->new();
for (1..$NUM_WORKERS) {
async {
while (my $url = $q->dequeue()) {
worker($url);
}
};
}
$q->enqueue($_) for loadf('groups.txt');
$q->end();
$_->join() for threads->list;
Why do you need to make it threaded? perl does much better using forks in most cases.
That said, your code starts 4 threads, each of which processes everything in #groups. It sounds like that's not what you want to do. If you want #groups to be a queue of work to do, take a look at Thread::Queue (or Parallel::ForkManager).

Resources