How do you share arrays in threaded perl? - multithreading

Inside the below sub (a simplified version of code causing an error), each thread should be adding to a master output list. While inside the sub, the array seems to be populating, but when I get back to the main calling portion, it’s empty again. What have I done wrong?
#!/usr/bin/env perl
use threads;
use strict;
my $num_threads = 8;
my #threads = initThreads();
our #outputArray;
foreach(#threads){
$_ = threads->create(\&do_search);
}
foreach(#threads){
$_->join();
}
print "#outputArray";
sub initThreads{
# An array to place our threads in
my #initThreads;
for(my $i = 1;$i<=$num_threads+1;$i++){
push(#initThreads,$i);
}
return #initThreads;
}
sub do_search{
my $id = threads->tid();
push(#outputArray,$id);
threads->exit();
}

According to the threads::shared documentation that #mpapec cited
By default, variables are private to each thread, and each newly
created thread gets a private copy of each existing variable.
So the solution is the module:
use threads::shared ;
our #outputArray :shared ;
There are other forms you can use and a lot of limitations, so reading the entire document is recommended.

Related

Perl: build complex object tree using multiple threads

I am trying to optimize code that loads a file and parses the data in Perl.
Background
The data ends up in a complicated object tree.
The top level object is a blessed package.
Some of the nested objects are new'ed as another blessed package type call Item.
The first pass separates the binary data into units and segments
within a unit which are all stored in multiple arrays.
There could be 20 or 50 units with 8 segments each.
The second pass performs the decoding of the binary data and is what needs optimized for speed.
Approach with threads
I am trying to use the modules threads and threads::shared.
I want each thread to process a subset of the units and populate the data into a common object tree.
I am looking for some sample code that demonstrates how to share blessed objects that may be allocated and blessed from any thread context and inserted into a shared object tree. And then is accessible from the main thread to walk the data for data lookups. The decoding threads will return once the decoding is completed.
I am having difficulty seeing how to have multiple threads insert objects to a common object tree using the threads:shared module. In particular when object of Item type are blessed from a thread context. The class (package) functions won't be bound to the object as I understand.
I do realize that at certain points in the code the code will need to use the threads::shared::lock() function before adding objects to the object tree.
In particular the nested blessed Item objects would be allocated from each thread context.
The threads::shared documentation says "Note that it is often not wise to share an object unless the class itself has been written to support sharing".
Is there a sample code that demonstrates how to accomplish this?
The documentation also says "object's destructor may get called multiple times, one for each thread's scope exit". How is this handled properly?
Thanks
J.R.
OK, so backtracking a bit - threads::shared really does 'single data structures' and doesn't really support more complicated things. That's because when you 'thread' you actually create separate program instances with (some) shared memory space, but practically speaking each 'thread' is a separate program anyway.
So, supporting sharing in an object gets really quite messy. I've found a better approach is to ... not. Use Thread::Queue to pass data between threads, and have one thread that acts to collate results. If you need to pass more complicated data structures, you can use Storable and freeze/thaw to serialise the object, and enqueue it.
That way you don't have to worry about tripping over shared nested data structures - and it's quite likely that you will, because there's no such thing as a 'deep share' option on an object - you have to explicitly share every internal array/hash(reference).
So I would tackle it like this:
#!/usr/bin/perl
use strict;
use warnings;
package Test_Object;
sub new {
my ( $class, $id ) = #_;
my $self = {};
$self->{id} = $id;
bless $self, $class;
return $self;
}
sub set_result {
my ( $self, $result_code ) = #_;
$self->{result} = $result_code;
}
sub get_id {
my ($self) = #_;
return $self->{id};
}
sub get_result {
my ($self) = #_;
return $self->{result};
}
package main;
use strict;
use warnings qw/ all /;
use threads;
use Thread::Queue;
use Storable qw/ freeze thaw/;
my $work_q = Thread::Queue->new();
my $result_q = Thread::Queue->new();
sub worker {
my $tid = threads->self->tid;
print "$tid: starting\n";
while ( my $item = $work_q->dequeue() ) {
my $object = thaw($item);
print "$tid: got object with ID of ", $object->get_id, "\n";
$object->set_result( $object->get_id . " : $tid" );
$result_q->enqueue( freeze $object );
}
}
sub collator {
while ( my $result = $result_q->dequeue ) {
my $object = thaw $result;
print "Collator got object with result code of ", $object->get_result,
"\n";
}
## do something with collated wossnames - pass back to main maybe?
}
my #workers;
for ( 1 .. 5 ) {
my $thr = threads->create( \&worker );
push #workers, $thr;
}
my $collator = threads->create( \&collator );
for ( 1 .. 200 ) {
my $work_object = Test_Object->new($_);
$work_q->enqueue( freeze $work_object );
}
$work_q->end;
foreach my $thr (#workers) {
$thr->join;
}
$result_q->end;
foreach my $thr ( threads->list ) {
$thr->join;
}

Perl seg fault while joining threads

I have a code similar to the below. I have one main script which is calling another module named initial.pm. initial.pm opens up connection with an AMQP server (In my case RabbitMQ)and using Net::AMQP::RabbitMQ library for establishing the connection. Everything works fine except when I try to join my threads I get segmentation fault.
I think the Net::AMQP::RabbitMQ is not thread safe. But this is only being used by the main thread. Im pretty sure you can reproduce the error if you just copy and past the codes below.
How do I fix it ?
main.pl
#!/usr/bin/perl
use Cwd qw/realpath/;
use File::Basename qw/dirname/;
use lib 'lib';
use threads;
use threads::shared;
use initial;
my #threads = ();
my $run :shared = 1;
my $init = load initial($name);
$SIG{'TERM'} = sub {
$run = 0;
};
threads->create(\&proc1);
threads->create(\&proc2);
while($run){
sleep(1);
print "I am main thread\n";
}
$_->join() for threads->list();
sub proc1 {
while($run){
sleep(1);
print "I am child thread 1 \n"
}
}
sub proc2 {
while($run){
sleep(1);
print "I am child thread 2 \n";
}
}
lib/initial.pm
package initial;
use Net::AMQP::RabbitMQ;
use Cwd qw/realpath/;
use File::Basename qw/dirname/;
my $mq;
my $stop = 0;
sub load {
my $class = shift;
my $self = {};
connectq();
bless $self,$class;
return $self;
}
sub connectq {
$mq = Net::AMQP::RabbitMQ->new();
my ($host,$port,$user,$pass) = ('localhost','5672','guest','guest');
$mq->connect($host, {
user => $user,
password => $pass,
port => $port,
timeout => 10,
});
$mq->channel_open(1);
$mq->consume(1, 'logger');
}
1;
I can't reproduce your problem directly, because I don't have the library installed.
One way of 'faking' thread safety in a not-thread-safe module is to rescope your 'use' to only the bit where you'll be using it.
You see, when you start a thread, it copies the program state - loaded libraries and everything.
If your run (something like):
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
sub thread1 {
print threads->self->tid.": Includes:", Dumper \%INC,"\n";
}
#main;
print "Main includes:", Dumper \%INC,"\n";
threads -> create ( \&thread1 );
You'll see XML::Twig is loaded in both. If the process of 'loading' the module causes some state changes (and it can) then you immediately have a potential thread-safety issue.
However if you instead do:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use Data::Dumper;
sub thread1 {
require XML::Twig;
XML::Twig -> import;
print threads->self->tid.": Includes:", Dumper (\%INC),"\n";
}
#main;
print "Main includes:", Dumper (\%INC),"\n";
threads -> create ( \&thread1 );
foreach my $thr ( threads -> list ) {
$thr -> join;
}
You effectively cause the module to be dynamically loaded within the thread - the module is only present in one 'code instance' so you are much less likely to be tripped up by 'thread safety' issues.
Alternatively - forking instead of threading ... might be an alternative. This has slightly different 'safety' problems.
But there really is no way to avoid this. Even with shared variables, the core problem is - when you thread, bits of code happen in a different order. There's all sorts of fruity things that can happen as a result. A shared var is one way of ensuring it's the same variable being checked each time - e.g. share $init, but that may well make things worse, because you're then potentially trampling over the same instance/sockets with different threads.
You can, however, reduce the 'thread safety' problem to a limited scope, and use e.g. Thread::Queue to pass messages to/from your 'module user' thread.

Perl: Correctly passing array for threads to work on

I'm learning how to do threading in Perl. I was going over the example code here and adapted the solution code slightly:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Semaphore;
my $sem = Thread::Semaphore->new(2); # max 2 threads
my #names = ("Kaku", "Tyson", "Dawkins", "Hawking", "Goswami", "Nye");
my #threads = map {
# request a thread slot, waiting if none are available:
foreach my $whiz (#names) {
$sem->down;
threads->create(\&mySubName, $whiz);
}
} #names;
sub mySubName {
return "Hello Dr. " . $_[0] . "\n";
# release slot:
$sem->up;
}
foreach my $t (#threads) {
my $hello = $t->join();
print "$hello";
}
Of course, this is now completely broken and does not work. It results in this error:
C:\scripts\perl\sandbox>threaded.pl
Can't call method "join" without a package or object reference at C:\scripts\perl\sandbox\threaded.pl line 24.
Perl exited with active threads:
0 running and unjoined
9 finished and unjoined
0 running and detached
My objective was two-fold:
Enforce max number of threads allowed at any given time
Provide the array of 'work' for the threads to consume
In the original solution, I noticed that the 0..100; code seems to specify the amount of 'work' given to the threads. However, in my case where I want to supply an array of work, do I still need to supply something similar?
Any guidance and corrections very welcome.
You're storing the result of foreach into #threads rather than the result of threads->create.
Even if you fix this, you collect completed threads too late. I'm not sure how big of a problem that is, but it might prevent more than 64 threads from being started on some systems. (64 is the max number of threads a program can have at a time on some systems.)
A better approach is to reuse your threads. This solves both of your problems and avoids the overhead of repeatedly creating threads.
use threads;
use Thread::Queue 3.01 qw( );
use constant NUM_WORKERS => 2;
sub work {
my ($job) = #_;
...
}
{
my $q = Thread::Queue->new();
for (1..NUM_WORKERS) {
async {
while (my $job = $q->dequeue()) {
work($job);
}
};
}
$q->enqueue(#names); # Can be done over time.
$q->end(); # When you're done adding.
$_->join() for threads->list();
}

Perl threads not printing correctly

If there are explicit examples in another post please let me know. I am having problems with interleaved printing from my threads. I am trying to control my threads by using a shared variable across all threads. The pseudo code below highlights the pieces of my code giving me problems. I have tried everything to make the threads wait their turns to print. Right now only a few output lines are being destroyed.
#!/usr/bin/perl
use threads;
use threads::shared;
my $PRINTFLAG :shared = 1;
Run_Threads();
sub Do_stuff{
lock($PRINTFLAG);
cond_wait($PRINTFLAG) until $PRINTFLAG == 1;
$PRINTFLAG = 0;
print "$working\n";
$PRINTFLAG =1;
}
Sub to spawn threads.
sub Run_Threads{
my #threads;
for (my $i = 1; $i <= 5; $i++){
push #threads, threads->create(\&Do_stuff);
}
foreach (#threads){
$_->join;
}
}
It would seem that each thread has its own handle, and thus its own output buffer. Considering that Perl file handles can't be shared using the mechanisms in threads::shared, that's not very surprising.
That means you need to flush the handle's buffer before releasing the lock. You can do that explicitly:
select->flush(); # Flush handle currently default for print.
Or you can have perl flush automatically after every print to that handle:
select->autoflush(1); # Autoflush handle currently default for print.
$| = 1; # Autoflush handle currently default for print.
Note: To use the ->flush and ->autoflush methods (but not for $|=1;) before Perl 5.14, you'll need to also load IO::Handle.
By the way,
my $PRINTFLAG :shared = 1;
lock($PRINTFLAG);
cond_wait($PRINTFLAG) until $PRINTFLAG == 1;
$PRINTFLAG = 0;
print "$d\n";
$PRINTFLAG =1;
can be simplified to
my $PRINTMUTEX :shared;
lock($PRINTMUTEX);
print "$d\n";
From my experience with threads it's much better/simple to use Threads::Queue.
I have two queues: one for the tasks for running threads and another for the results from threads.
So in my master thread I just check the result's queue and print from it. So no any conflicts accessing result file etc.

Error using ithreads with Memoize

I just introduced threads to a Perl program, where one of its modules was using Memoize.
I'm getting this error message:
Thread 1 terminated abnormally: Anonymous function called in forbidden scalar context; faulting
The error occurs if I have both threads and Memoize, but will disappear if I take away one of these elements. But the problem isn't because Memoize isn't thread-safe - in my code, all the memoization happens within the same thread.
Is this a bug with Memoize? Is there a way I can work around this? Otherwise I'm going to get rid of Memoize.
Here's some sample code to isolate the problem:
use strict;
use warnings;
use threads;
use Thread::Semaphore;
use Memoize;
my $semaphore = Thread::Semaphore->new;
memoize('foo');
sub foo {
return shift;
}
sub invoke_foo {
$semaphore->down; # ensure memoization is thread-safe
my $result = foo(#_);
$semaphore->up;
return $result;
}
my #threads;
foreach (1 .. 5) {
my $t = threads->create( sub { invoke_foo($_) });
push #threads, $t;
}
$_->join foreach #threads;
Memoize stores the caches for every memoized function in one hash (instead of using a closure). It uses the address of the function as the index into that hash.
The problem is that the address of the function changes when it's cloned into a new thread. (Add print(\&foo, "\n"); in invoke_foo.). It's a bug in Memoize.
Workaround: Load the memoised module from within the threads. the following simulates (the relevant aspects of) that:
use strict;
use warnings;
use threads;
use Memoize;
sub foo {
return shift;
}
sub invoke_foo {
return foo(#_);
}
my #threads;
foreach (1 .. 5) {
my $t = threads->create( sub {
memoize('foo');
invoke_foo($_);
});
push #threads, $t;
}
$_->join foreach #threads;
By the way, each thread has its own cache. that could also be considered a bug.
As noted, Memoize is not thread aware. If you want per thread memoization, ikegami's restructuring will work well. If instead you want global memoization, then replacing Memoize with something like the following could work:
use strict;
use warnings;
use 5.010;
use threads;
use threads::shared;
sub memoize_shared {
my $name = shift;
my $glob = do {
no strict 'refs';
\*{(caller)."::$name"}
};
my $code = \&$glob;
my $sep = $;;
my (%scalar, %list) :shared;
no warnings 'redefine';
*$glob = sub {
my $arg = join $sep => #_;
if (wantarray) {
#{$list{$arg} ||= sub {\#_}->(&$code)}
}
else {
exists $scalar{$arg}
? $scalar{$arg}
:($scalar{$arg} = &$code)
}
}
}
and to use it:
sub foo {
my $x = shift;
say "foo called with '$x'";
"foo($x)"
}
memoize_shared 'foo';
for my $t (1 .. 4) {
threads->create(sub {
my $x = foo 'bar';
say "thread $t got $x"
})->join
}
which prints:
foo called with 'bar'
thread 1 got foo(bar)
thread 2 got foo(bar)
thread 3 got foo(bar)
thread 4 got foo(bar)
The memoize_shared function above is fairly complicated because it deals with propegating list and scalar contexts as well as replacing the named subroutine. It is sometimes easier to just build the memoziation into the target subroutine:
{my %cache :shared;
sub foo {
my $x = shift;
if (exists $cache{$x}) {$cache{$x}}
else {
say "foo called with '$x'";
$cache{$x} = "foo($x)"
}
}}
Building the memoization into the subroutine does make it a bit more complicated, but it will be faster than using a wrapper function like memoize. And it gives you exact control over how to memoize the subroutine, including things like using a threads::shared cache.
Memoize should work under threads, albeit a bit slower:
"There is some problem with the way goto &f works under threaded Perl, perhaps because of the lexical scoping of #_. This is a bug in
Perl, and until it is resolved, memoized functions will see a slightly
different caller() and will perform a little more slowly on threaded
perls than unthreaded perls."

Resources