I'm trying to write a simple script that uses threads and shares a variable, but I don't want to make this variable global to the whole script. Below is a simplified example.
use strict;
use warnings;
use threads;
use threads::shared;
my $val:shared;
# Create threads
for my $i (1 .. 5) {
threads->create(\&do_something, $i);
}
# Wait for all threads to complete
map { $_->join(); } threads->list();
# $val is global to the script so this line will work!
print "VAL IS: $val\n";
sub do_something {
my $i = shift;
print "Doing something with thread $i!\n";
{
lock $val;
$val = "SOMETHING IS $i";
print "$val\n\n";
}
}
Output:
Doing something with thread 1!
SOMETHING IS 1
Doing something with thread 2!
SOMETHING IS 2
Doing something with thread 3!
SOMETHING IS 3
Doing something with thread 4!
SOMETHING IS 4
Doing something with thread 5!
SOMETHING IS 5
VAL IS: SOMETHING IS 5
How can I get this effect without making $val accessible to the whole script? In other words, how can I make it so attempting to print VAL IS: $val will fail, but the variable will still be successfully shared by the threads?
I can't define it like this:
# Create threads
for my $i (1 .. 5) {
my $val:shared;
threads->create(\&do_something, $i);
}
Or I will get:
Global symbol "$val" requires explicit package
What is the right way to lexically scope a shared variable?
Pass a reference to it as an argument.
sub do_something {
my ($id, $lock_ref) = #_;
print("$id: Started\n");
{
lock $$lock_ref;
print("$id: Exclusive\n");
sleep(1);
}
print("$id: done.\n");
}
{
my $lock :shared;
for my $id (1..5) {
async { do_something($id, \$lock); };
}
}
Or scope it so only the worker subs can see it.
{
my $lock :shared;
sub do_something {
my ($id) = #_;
print("$id: Started\n");
{
lock $lock;
print("$id: Exclusive\n");
sleep(1);
}
print("$id: done.\n");
}
}
for my $id (1..5) {
async { do_something($id); };
}
You can limit the scope of shared variable (make sure that perl sees shared variable before thread creation),
# ..
{
my $val:shared;
sub do_something {
my $i = shift;
print "Doing something with thread $i!\n";
{
lock $val;
$val = "SOMETHING IS $i";
print "$val\n\n";
}
}
}
# Create threads
for my $i (1 .. 5) {
threads->create(\&do_something, $i);
}
# ...
Related
During the end of my multi threaded perl script, I get error like below. The number changes from time to time.
Scalars leaked: -2
Scalars leaked:2
What could be the cause of this problem? Are they just warnings?
I have created my threads in the following way:
our $threads1=3;
our $threads2=3;
for(my $i = 0; $i<$threads1; $i++)
{
$threadpool1[$i] = threads->create( \&sub1, $arg1, $arg2 , $arg2, $threads1, $threads2);
}
#Add work to queue1
foreach my $work (keys %{$workobj})
{
$queue1->enqueue( $work );
}
for(my $i = 0; $i<$threads2; $i++)
{
$threadpool2[$i] = threads->create( \&sub2, $arg1 , $arg2);
}
#Wait until worker threads complete the work
$_->join for #threadpool1;
$_->join for #threadpool2;
sub sub1($arg1, $arg2 , $arg2, $threads1, $threads2)
{
while($queue1->dequeue)
{
#do some work
#send work to queue 2
$queue2->enqueue(work);
$queue1->enqueue(undef x threads1);
}
# if all work has been sent to second queue, send undef to second set of threads
$queue1->enqueue(undef x $threads2);
return;
}
sub sub1($arg1, $arg2)
{
while($queue2->dequeue)
{
#do some work
}
return;
}
Any ideas on where I am going wrong?
i am trying to pass a subroutine from an self written module to threads using the following code.
This is my first time using threads so I'm kinda not familiar with it.
Main Script (shortend)
#!/usr/bin/perl -w
use strict;
use threads;
use lib 'PATH TO LIB';
use goldstandard;
my $delete_raw_files = 0;
my $outfolder = /PATH/;
my %folder = goldstandard -> create_folder($outfolder,$delete_raw_files);
&tagging if $tagging == 1;
sub tagging{
my %hash = goldstandard -> tagging_hash(\%folder);
my #threads;
foreach(keys %hash){
if($_ =~ m/mate/){
my $arguments = "goldstandard -> mate_tagging($hash{$_}{raw},$hash{$_}{temp},$hash{$_}{tagged},$mate_anna,$mate_model)";
push(#threads,$arguments);
}
if($_ =~ m/morpheus/){
my $arguments = "goldstandard -> morpheus_tagging($hash{$_}{source},$hash{$_}{tagged},$morpheus_stemlib,$morpheus_cruncher)";
push(#threads,$arguments)
}
}
foreach(#threads){
my $thread = threads->create($_);
$thread ->join();
}
}
Module
package goldstandard;
use strict;
use warnings;
sub mate_tagging{
my $Referenz = shift;
my $input = shift;
my $output_temp_dir = shift;
my $output_mate_human = shift;
my $anna = shift;
my $model = shift;
opendir(DIR,"$input");
my #dir = readdir(DIR);
my $anzahl = #dir;
foreach(#dir){
unless($_ =~ m/^\./){
my $name = $_;
my $path = $input . $_;
my $out_temp = $output_temp_dir . $name;
my $out_mate_human_final = $output_mate_human . $name;
qx(java -Xmx10G -classpath $anna is2.tag.Tagger -model $model -test $path -out $out_temp);
open(OUT, "> $out_mate_human_final");
open(TEMP, "< $out_temp");
my $output_text;
while(<TEMP>){
unless($_ =~ m/^\s+$/){
if ($_ =~ m/^\d+\t(.*?)\t_\t_\t_\t(.*?)\t_\t/) {
my $tags = $2;
my $words = $1;
print OUT "$words\t$tags\n";
}
}
}
}
}
}
sub morpheus_tagging{
my $Referenz = shift;
my $input = shift;
my $output = shift;
my $stemlib = shift;
my $cruncher = shift;
opendir(DIR,"$input");
my #dir = readdir(DIR);
foreach(#dir){
unless($_ =~ m/^\./){
my $name = $_;
my $path = $input . $_;
my $out = $output . $name;
qx(env MORPHLIB='$stemlib' '$cruncher' < '$path' > '$out');
}
}
}
1;
Executing this code gets me
Thread 1 terminated abnormally: Undefined subroutine &main::goldstandard -> morpheus_tagging(...) called at ... line 43.
I guess eather the way I am calling the treads or the way I am providing the arguments are wrong. I Hope some can help me with that? I Also found something on safe and unsafe modules bum I'm not sure is this is realy the problem.
I guess eather the way I am calling the treads or the way I am providing the arguments are wrong. I Hope some can help me with that? I Also found something on safe and unsafe modules bum I'm not sure is this is realy the problem.Thanks in advance
You must pass the name of a sub or a reference to a sub, plus arguments, to threads->create. So you need something like
my $method_ref = $invoker->can($method_name);
threads->create($method_ref, $invoker, #args);
That said, passing arguments to threads->create has issues that can be avoided by using a closure.
threads->create(sub { $invoker->$method_name(#args) })
The above can be written more simply as follows:
async { $invoker->$method_name(#args) }
This gets us the following:
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #jobs;
for (keys %hash) {
if (/mate/) {
push #jobs, [ 'goldstandard', 'mate_tagging',
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
];
}
if (/morpheus/) {
push #jobs, [ 'goldstandard', 'morpheus_tagging',
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
];
}
}
my #threads;
for my $job (#jobs) {
my ($invoker, $method_name, #args) = #$job;
push #threads, async { $invoker->$method_name(#args) };
}
$_->join for #threads;
}
or just
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #threads;
for (keys %hash) {
if (/mate/) {
push #threads, async {
goldstandard->mate_tagging(
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
);
};
}
if (/morpheus/) {
push #threads, async {
goldstandard->morpheus_tagging(
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
);
};
}
}
$_->join for #threads;
}
Notes that I delayed the calls to join until after all the threads are created. Your way made it so only one thread would run at a time.
But what we have isn't great. We have no way of limiting how many threads are active at a time, and we (expensively) create many threads instead of reusing them. We can use a worker pool to solve both of these problems.
use constant NUM_WORKERS => 5;
use Thread::Queue 3.01 qw( );
my $q;
sub tagging {
my %hash = goldstandard->tagging_hash(\%folder);
my #threads;
for (keys %hash) {
if (/mate/) {
$q->enqueue(sub {
goldstandard->mate_tagging(
$hash{$_}{raw},
$hash{$_}{temp},
$hash{$_}{tagged},
$mate_anna,
$mate_model,
);
});
}
if (/morpheus/) {
$q->enqueue(sub {
goldstandard->morpheus_tagging(
$hash{$_}{source},
$hash{$_}{tagged},
$morpheus_stemlib,
$morpheus_cruncher,
);
});
}
}
}
{
$q = Thread::Queue->new();
for (1..NUM_WORKERS) {
async {
while ( my $job = $q->dequeue() ) {
$job->();
}
};
}
... call tagging and whatever ...
$q->end();
$_->join() for threads->list();
}
foreach $thr (1..5)
{
$threads[$thr]=threads->create("worker");
}
and
foreach (1..5)
{
push #threads,threads->create("worker");
}
the latter works well, well the former gives warning.
#!/usr/bin/perl
use strict;
use threads;
use threads::shared;
use Thread::Queue;
my $queue = Thread::Queue->new();
my #threads;
my $thr;
#----------------------------------------create
#send work first,and then creat threads, the first thread work earlier.
$queue->enqueue(1..10000);
foreach (1..5)
{
push #threads,threads->create("worker");
}
$queue->end();
sub worker
{
while (my #DataElement = $queue->dequeue(100))
{
my $tid = threads->tid();
#open (my $out,">>$tid.txt") or die $!;
print "Threads ID:$tid\t#DataElement\n";
#print $out "Threads ID:$tid\t#DataElement\n";
#close $out;
}
}
#----------------------------------------cut
my $thr_num=1;
my $i;
while ($thr_num)
{
$i++;
foreach (#threads) #store threads, TRUE even if joined.
{
$thr_num = threads->list();
print "threads total: $thr_num\n";
if ($_->is_running())
{
sleep 1; #wait
next;
}
if ($_->is_joinable())
{
$_->join();
}
sleep 1;# wait
}
print $i,"\n";
}
this is the whole code. and the warning is can't call method "is_running" on an undefined value at threadqueue2(1).plx.line42. Perl exited with active threads.
No. You will end up with different data structures. As you can see from this simplified version of your code.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
my #threads;
foreach my $thr (1 .. 5) {
$threads[$thr] = 'A Thread';
}
say Dumper \#threads;
#threads = ();
foreach (1 .. 5) {
push #threads, 'A Thread';
}
say Dumper \#threads;
The output is:
$VAR1 = [
undef,
'A Thread',
'A Thread',
'A Thread',
'A Thread',
'A Thread'
];
$VAR1 = [
'A Thread',
'A Thread',
'A Thread',
'A Thread',
'A Thread'
];
In your first example, you begin populating the array at element 1, so the first element (which has an index of 0) contains undef.
I originally experimented with trying to send a hash object through Thread::Queue, but according to this link, my versions of Thread::Queue and threads::shared is too old. Unfortunately, since the system I'm testing on isn't mine, I can't upgrade.
I then tried to use a common array to store my hashes. Here is the code so far:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
use constant NUM_WORKERS => 10;
my #out_array;
test1();
sub test1
{
my $in_queue = Thread::Queue->new();
foreach (1..NUM_WORKERS) {
async {
while (my $job = $in_queue->dequeue()) {
test2($job);
}
};
}
my #sentiments = ("Axe Murderer", "Mauler", "Babyface", "Dragon");
$in_queue->enqueue(#sentiments);
$in_queue->enqueue(undef) for 1..NUM_WORKERS;
$_->join() for threads->list();
foreach my $element (#out_array) {
print "element: $element\n";
}
}
sub test2
{
my $string = $_[0];
my %hash = (Skeleton => $string);
push #out_array, \%hash;
}
However, at the end of the procedure, #out_array is always empty. If I remove the threading parts of the script, then #out_array is correctly populated. I suspect I'm implementing threading incorrectly here.
How would I correctly populate #out_array in this instance?
You need to make it shared
use threads::shared;
my #out_array :shared;
I don't think you need to lock it if all you do is push onto it, but if you did, you'd use
lock #out_array;
You need to share any array or hash referenced by a value you push onto it using the tools in thread::shared.
push #out_array, share(%hash);
Though as I mentioned earlier, I'd use a Thread::Queue.
sub test2 {
my ($string) = #_;
my %hash = ( Skeleton => $string );
return \%hash;
}
...
my $response_q = Thread::Queue->new()
my $running :shared = NUM_WORKERS;
...
async {
while (my $job = $request_q->dequeue()) {
$response_q->enqueue(test2($job));
}
{ lock $running; $response_q->enqueue(undef) if !--$running; }
};
...
$request_q->enqueue(#sentiments);
$request_q->enqueue(undef) for 1..NUM_WORKERS;
while (my $response = $response_q->dequeue()) {
print "Skeleton: $response->{Skeleton}\n";
}
$_->join() for threads->list();
Note that lack of anything thread-specific in test2. This is good. You should always strive for separation of concerns.
You need to return your data from thread:
....
async {
my $data;
while (my $job = $in_queue->dequeue()) {
$data = test2($job);
}
return $data;
};
...
for ( threads->list() ) {
my $data = $_->join();
#now you have this thread return value in $data
}
sub test2
{
my $string = $_[0];
my %hash = (Skeleton => $string);
return \%hash;
}
I found my answer in the example here.
I had to change 2 things:
share the #out_array outside both subs
share the %hash in test2
add return; to the end of test2
Code outside both subs:
my #out_array : shared = ();
test2 sub:
sub test2
{
my $string = $_[0];
my %hash : shared;
$hash{Skeleton} = $string;
push #out_array, \%hash;
return;
}
I have an array which contains a list of file #arr=(a.txt,b.txt,c.txt);
I am iterating the array and processing the files with foreach loop; each line of the file will generate a sql and will run on the DB server.
I want to create one thread with each line of the file and query the DB. I also want to control the max no of threads at a time running simultaneously.
You can use a Thread::Pool based system. Or any Boss/Worker model based system.
That's just a simple worker model, an ideal scenario. No problem.
use threads;
use Thread::Queue qw( );
use constant NUM_WORKERS => 5;
sub work {
my ($dbh, $job) = #_;
...
}
{
my $q = Thread::Queue->new();
my #threads;
for (1..NUM_WORKERS) {
push #threads, async {
my $dbh = ...;
while (my $job = $q->dequeue())
work($dbh, $job);
}
};
}
while (<>) {
chomp;
$q->enqueue($_);
}
$q->enqueue(undef) for 1..#threads;
$_->join() for #threads;
}
Pass the file names to the script as arguments, or assign them to #ARGV within the script.
local #ARGV = qw( a.txt b.txt c.txt );
Interesting I manually control how many threads to run. I use Hash of the thread id
[code snip]
my %thr; #my hashes for threads
$count=1;
$maxthreads=5;
while (shift (#data) {
$syncthr = threads->create(sub {callfunction here}, {pass variables});
$tid = $syncthr->tid; #get the thread ID
$thr{$tid} = $syncthr;
if ($count >= $maxthreads) {
threads->yield();
while (1) { # loop until threads are completed
$num_run_threads = keys (%thr);
foreach $one_thread ( keys %thr ) {
if ($thr{$one_thread}->is_running() ) { # if thread running check for error state
if ($err = $thr{$one_thread}->error() } {
[ do stuff here]
}
# otherwise move on to next thread check
} else { # thread is either completed or has error
if ($err = $thr{$one_thread}->error()) {
[ check for error again cann't hurt to double check ]
}
if ($err = $thr{$one_thread}->join()) {
print "Thread completed id: [$one_thread]\n";
}
delete $thr{$one_thread}; # delete the hash since the thread is no more
$num_run_threads = $num_run_threads - 1; # reduce the number of running threads
}
} # close foreach loop
#threads = threads->list(threads::running); # get threads
if ($num_run_threads < $maxthreads ) {
$count = $num_run_threads; # reset the counter to number of threads running
if ( $#data != -1 ) { # check to make sure we still have data
last; # exit the infinite while loop
} else {
if (#threads) {
next; # we still have threads continue with processing
} else {
{ no more threads to process exit program or do something else }
}
} # end else
} # end threads running
} # end the while statement
#Check the threads to see if they are joinable
undef #threads;
#threads = threads->joinable()
if (#threads) {
foreach $mthread(#threads) {
if ($mthreads != 0) {
$thr->join();
}
} #end foreach
} #end #threads
} #end the if statement
$count++; Increment the counter to get to number of max threads to spawn
}
This is by no means a complete program. Furthermore, I have changed it to be very bland. However, I've been using this for a while with success. Especially in the OO Perl. This works for me and have quite a lot of uses. I maybe missing a few more error checking especially with timeout but I do that in the thread itself. Which by the way the thread is actually a sub routine that I am calling.