I'm having issues with trying to put $self into the thread queue. Perl complains about CODE refs. Is it possible to put an object instance onto the thread queue?
generic.pm (Superclass)
package Things::Generic;
use Thread::Queue;
use threads;
our $work_queue = new Thread::Queue;
our $result_queue = new Thread::Queue;
my #worker_pool = map { threads->create (\&delegate_task, $work_queue, $result_queue) } 1 .. $MAX_THREADS;
sub delegate_task {
my( $Qwork, $Qresults ) = #_;
while( my $work = $Qwork->dequeue ) {
#The item on the queue contains "self" taht was passed in,
# so call it's do_work method
$work->do_work();
$Qresults->enqueue( "lol" );
}
$Qresults->enqueue( undef ); ## Signal this thread is finished
}
sub new {
my $class = shift;
my $self = {
_options => shift,
};
bless $self, $class;
return $self;
}
.
.
.
#other instance methods
#
object.pm (Subclass)
package Things::Specific;
use base qw ( Things::Generic )
sub new {
my $class = shift;
my $self = $class->SUPER::new(#_);
return $self;
}
sub do_stuff {
my $self = shift;
$Things::Generic::work_queue->enqueue($self);
}
sub do_work {
print "DOING WORK\n";
}
It's not objects it has a problem with; it's with a code ref within. That's not unreasonable. Why are you trying to share objects with code refs? You should be sharing data between threads, not code.
While I'm not certain of this, the likely root cause is not that you're passing an object, but that the object in question is storing an anonymous coderef in it (a callback, iterator, or the like). You may be able to refactor the object to eliminate this or perform some sort of serialization that allows it to recreate the coderef in the other thread.
Related
I have an existing perl script which I have to modify. For this, I need some struct-like container for my data. I do not have any 'outside' modules, nor do I possess the capability to obtain them, and my perl is 5.8.8. I've written a package containing one var and two arrays for my needs, however I cannot get it to work and I am not sure why. Here it is:
{
package TestData;
sub new
{
my $class = shift;
my $self = {
_id => shift,
_genUsers => [],
_testSymbols => [],
};
return bless ($self, $class);
}
sub setId
{
my ($self, $id) = #_;
$self->{_id} = $id if defined($id);
}
sub addGenUser
{
my ($self, $user) = #_;
push #{$self->{_genUsers}}, $user;
}
sub addTestSymbol
{
my ($self, $sym) = #_;
push #{$self->{_testSymbols}}, $sym;
}
sub getId
{
my $self = #_;
return $self->{_id};
}
sub getGenUserList
{
my $self = #_;
return #{$self->{_genUsers}};
}
sub getTestSymbolList
{
my $self = #_;
return #{$self->{_testSymbols}};
}
}
use strict;
use POSIX;
my $id = "test";
my #a;
push #a, "this";
my #b;
push #b, "that";
my $obj = new TestData($id, #a, #b);
print "DEBUG: " . $obj->getId() . "\n";
Last line always throws "use of uninitialized value". What's going on?
Also would it be possible to do something like this:
my #c;
push #c, $obj;
foreach(#c){
print "DEBUG2: " . $_->getId() . "\n";
}
Thank you.
EDIT: Thank you everyone for your replies. This is what the fully working end result looked like:
{
use strict;
use POSIX;
package TestData;
sub new
{
my $class = shift;
my $self = {
_alpha => shift,
_beta => shift,
_gamma => shift,
_delta => shift,
_theta => shift
};
return bless ($self, $class);
}
sub getAlpha
{
my $self = shift;
return $self->{_alpha};
}
sub getBeta
{
my $self = shift;
return $self->{_beta};
}
sub getGamma
{
my $self = shift;
return $self->{_gamma};
}
sub getDelta
{
my $self = shift;
return $self->{_delta};
}
sub getTheta
{
my $self = shift;
return $self->{_theta};
}
}
The bug is in the line(s) that read:
my $self = #_;
This is a scalar assignment, and so assigns the length on #_ to $self. Trying to use the length as a hash reference then gives the "use of uninitialized value" warning you see, and returns undef.
By the way, in addition to enabling warnings, you should get into the habit of starting all your scripts and modules with use strict;. If you'd done that, it would've caught this bug, and you would've received a runtime error saying something like:
Can't use string ("1") as a HASH ref while "strict refs" in use at foo.pl line 36.
Anyway, to fix your code, you should replace the line above with either:
my ($self) = #_;
or:
my $self = shift;
or even:
my $self = $_[0];
With my ($self) = #_ the parentheses turn it into a list assignment; with my $self = shift it's still a scalar assignment, but shift (which, inside a sub, is shorthand for shift #_) pulls the first scalar out of #_.
The choice of which style to prefer basically comes down to personal taste and consistency; if you're used to pulling all your method args out of #_ with a single assignment like:
my ($self, $foo, $bar, $baz) = #_;
then, for methods that take no arguments other than $self, a single-element list assignment seems more consistent. On the other hand, if you prefer to pick your args out of #_ one at a time, as in:
my $self = shift;
my $foo = shift;
my $bar = shift;
my $baz = shift;
the you obviously should follow that pattern for single-arg methods too.
Change
my $self = #_;
to
my $self = shift;
in your methods that only expect one argument and especifically the called one (i.e.):
sub getId
{
my $self = shift;
return $self->{_id};
}
You can also wrap the $self around parentheses to change context and do the same thing as commented
Output:
DEBUG: test
Rather than roll your own, you might want to check out Class::Struct, which has been part of Perl's core since 5.4.
I originally experimented with trying to send a hash object through Thread::Queue, but according to this link, my versions of Thread::Queue and threads::shared is too old. Unfortunately, since the system I'm testing on isn't mine, I can't upgrade.
I then tried to use a common array to store my hashes. Here is the code so far:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
use constant NUM_WORKERS => 10;
my #out_array;
test1();
sub test1
{
my $in_queue = Thread::Queue->new();
foreach (1..NUM_WORKERS) {
async {
while (my $job = $in_queue->dequeue()) {
test2($job);
}
};
}
my #sentiments = ("Axe Murderer", "Mauler", "Babyface", "Dragon");
$in_queue->enqueue(#sentiments);
$in_queue->enqueue(undef) for 1..NUM_WORKERS;
$_->join() for threads->list();
foreach my $element (#out_array) {
print "element: $element\n";
}
}
sub test2
{
my $string = $_[0];
my %hash = (Skeleton => $string);
push #out_array, \%hash;
}
However, at the end of the procedure, #out_array is always empty. If I remove the threading parts of the script, then #out_array is correctly populated. I suspect I'm implementing threading incorrectly here.
How would I correctly populate #out_array in this instance?
You need to make it shared
use threads::shared;
my #out_array :shared;
I don't think you need to lock it if all you do is push onto it, but if you did, you'd use
lock #out_array;
You need to share any array or hash referenced by a value you push onto it using the tools in thread::shared.
push #out_array, share(%hash);
Though as I mentioned earlier, I'd use a Thread::Queue.
sub test2 {
my ($string) = #_;
my %hash = ( Skeleton => $string );
return \%hash;
}
...
my $response_q = Thread::Queue->new()
my $running :shared = NUM_WORKERS;
...
async {
while (my $job = $request_q->dequeue()) {
$response_q->enqueue(test2($job));
}
{ lock $running; $response_q->enqueue(undef) if !--$running; }
};
...
$request_q->enqueue(#sentiments);
$request_q->enqueue(undef) for 1..NUM_WORKERS;
while (my $response = $response_q->dequeue()) {
print "Skeleton: $response->{Skeleton}\n";
}
$_->join() for threads->list();
Note that lack of anything thread-specific in test2. This is good. You should always strive for separation of concerns.
You need to return your data from thread:
....
async {
my $data;
while (my $job = $in_queue->dequeue()) {
$data = test2($job);
}
return $data;
};
...
for ( threads->list() ) {
my $data = $_->join();
#now you have this thread return value in $data
}
sub test2
{
my $string = $_[0];
my %hash = (Skeleton => $string);
return \%hash;
}
I found my answer in the example here.
I had to change 2 things:
share the #out_array outside both subs
share the %hash in test2
add return; to the end of test2
Code outside both subs:
my #out_array : shared = ();
test2 sub:
sub test2
{
my $string = $_[0];
my %hash : shared;
$hash{Skeleton} = $string;
push #out_array, \%hash;
return;
}
I've been trying to extend the first answer at Perl Monks (http://www.perlmonks.org/?node_id=735923) to a threaded model to no avail. I keep getting issues with not being able to pass a coderef
In my superclass I define the threadpool as a package variable so it can be shared amongst the subclasses:
package Things::Generic;
my $Qwork = new Thread::Queue;
my $Qresults = new Thread::Queue;
my #pool = map { threads->create(\&worker, $Qwork, $Qresults) } 1..$MAX_THREADS;
sub worker {
my $tid = threads->tid;
my( $Qwork, $Qresults ) = #_;
while( my $work = $Qwork->dequeue ) {
my $result = $work->process_thing();
$Qresults->enqueue( $result );
}
$Qresults->enqueue( undef ); ## Signal this thread is finished
}
sub enqueue {
my $self = shift;
$Qwork->enqueue($self);
}
sub new {
#Blessing and stuff
}
.
.
Now for the subclasses. It is guaranteed that they have a process_thing() method.
package Things::SpecificN;
use base qw (Things::Generic);
sub new() {
#instantiate
}
sub do_things {
my $self = shift;
#enqueue self into the shared worker pool so that "process_thing" is called
$self->enqueue();
}
sub process_thing() {
#Do some work here
return RESULT;
}
#
Main
my #things;
push #things, Things::Specific1->new();
push #things, Things::Specific2->new();
.
.
push #things, Things::SpecificN->new();
#Asynchronously kick off "work"
foreach my $thing (#things) {
$thing->do_things();
}
My goal is to put a list of "work" on the queue. Each thread will pull work from the queue and execute it, no matter what it. Each Thing has it's own unique work, however the function to do the work will be guaranteed to be called "process_thing". I just want the thread pool to grab an entry from the queue and do the "something". I think I am describing functionality similar to Android AsyncTask.
My Perl is not high enough for Thread::Queue::Any
$Qwork->enqueue($self); instead of $self->enqueue();
i wrote a code and i need to make it multithreaded. Evething works, but every loop repeats 4 times:
use LWP::UserAgent;
use HTTP::Cookies;
use threads;
use threads::shared;
$| = 1;
$threads = 4;
my #groups :shared = loadf('groups.txt');
my #thread_list = ();
$thread_list[$_] = threads->create(\&thread) for 0 .. $threads - 1;
$_->join for #thread_list;
thread();
sub thread
{
my $url = 'http://www.site.ru/';
my $response = $web->post($url, Content =>
['st.redirect' => ''
]);
foreach $i (#groups)
{
my $response = $web->get($i);
if(!($response->header('Location')))
{
---------;
}
else
{
----------;
}
}
}
sub loadf {
open (F, "<".$_[0]) or erroropen($_[0]);
chomp(my #data = <F>);
close F;
return #data;
}
groups.txt :
http://www.odnoklassniki.ru/group/47357692739634
http://www.odnoklassniki.ru/group/56099517562922
I understand that i need to use threads::shared; but i can't undestand how to use it.
Your post does not have much context to explain the code sections; please explain your scenario more clearly.
The problem is that you never remove from #groups, so all threads do all jobs in #groups.
Here's one solution.
use threads;
use Thread::Queue 3.01 qw( );
my $NUM_WORKERS = 4;
sub worker {
my ($url) = #_;
... download the page ...
}
my $q = Thread::Queue->new();
for (1..$NUM_WORKERS) {
async {
while (my $url = $q->dequeue()) {
worker($url);
}
};
}
$q->enqueue($_) for loadf('groups.txt');
$q->end();
$_->join() for threads->list;
Why do you need to make it threaded? perl does much better using forks in most cases.
That said, your code starts 4 threads, each of which processes everything in #groups. It sounds like that's not what you want to do. If you want #groups to be a queue of work to do, take a look at Thread::Queue (or Parallel::ForkManager).
I have little scraping application and trying to add multithreading to it. Here is code (MyMech is WWW::Mechanize subclass used to process HTTP errors):
#!/usr/bin/perl
use strict;
use MyMech;
use File::Basename;
use File::Path;
use HTML::Entities;
use threads;
use threads::shared;
use Thread::Queue;
use List::Util qw( max sum );
my $page = 1;
my %CONFIG = read_config();
my $mech = MyMech->new( autocheck => 1 );
$mech->quiet(0);
$mech->get( $CONFIG{BASE_URL} . "/site-map.php" );
my #championship_links =
$mech->find_all_links( url_regex => qr/\d{4}-\d{4}\/$/ );
foreach my $championship_link (#championship_links) {
my #threads;
my $queue = Thread::Queue->new;
my $queue_processed = Thread::Queue->new;
my $url = sprintf $championship_link->url_abs();
print $url, "\n";
next unless $url =~ m{soccer}i;
$mech->get($url);
my ( $last_round_loaded, $current_round ) =
find_current_round( $mech->content() );
unless ($last_round_loaded) {
print "\tLoading rounds data...\n";
$mech->submit_form(
form_id => "leagueForm",
fields => {
round => $current_round,
},
);
}
my #match_links =
$mech->find_all_links( url_regex => qr/matchdetails\.php\?matchid=\d+$/ );
foreach my $link (#match_links) {
$queue->enqueue($link);
}
print "Starting printing thread...\n";
my $printing_thread = threads->create(
sub { printing_thread( scalar(#match_links), $queue_processed ) } )
->detach;
push #threads, $printing_thread;
print "Starting threads...\n";
foreach my $thread_id ( 1 .. $CONFIG{NUMBER_OF_THREADS} ) {
my $thread = threads->create(
sub { scrape_match( $thread_id, $queue, $queue_processed ) } )
->join;
push #threads, $thread;
}
undef $queue;
undef $queue_processed;
foreach my $thread ( threads->list() ) {
if ( $thread->is_running() ) {
print $thread->tid(), "\n";
}
}
#sleep 5;
}
print "Finished!\n";
sub printing_thread {
my ( $number_of_matches, $queue_processed ) = #_;
my #fields =
qw (
championship
year
receiving_team
visiting_team
score
average_home
average_draw
average_away
max_home
max_draw
max_away
date
url
);
while ($number_of_matches) {
if ( my $match = $queue_processed->dequeue_nb ) {
open my $fh, ">>:encoding(UTF-8)", $CONFIG{RESULT_FILE} or die $!;
print $fh join( "\t", #{$match}{#fields} ), "\n";
close $fh;
$number_of_matches--;
}
}
threads->exit();
}
sub scrape_match {
my ( $thread_id, $queue, $queue_processed ) = #_;
while ( my $match_link = $queue->dequeue_nb ) {
my $url = sprintf $match_link->url_abs();
print "\t$url", "\n";
my $mech = MyMech->new( autocheck => 1 );
$mech->quiet(0);
$mech->get($url);
my $match = parse_match( $mech->content() );
$match->{url} = $url;
$queue_processed->enqueue($match);
}
return 1;
}
And i have some strange things with this code. Sometimes it run but sometimes it exit with no errors (at the ->detach point). I know that #match_links contain data but threads are not created and it just close. Usually it terminates after processing second $championship_link entry.
May be i'm doing something wrong?
Update
Here is code for find_current_round subroutine (but i'm sure it's not related to the question):
sub find_current_round {
my ($html) = #_;
my ($select_html) = $html =~ m{
<select\s+name="round"[^>]+>\s*
(.+?)
</select>
}isx;
my ( $option_html, $current_round ) = $select_html =~ m{
(<option\s+value="\d+"(?:\s+ selected="selected")?>(\d+)</option>)\Z
}isx;
my ($last_round_loaded) = $option_html =~ m{selected};
return ( $last_round_loaded, $current_round );
}
First off - don't use dequeue_nb(). This is a bad idea, because if a queue is temporarily empty, it'll return undef and your thread will exit.
Use instead dequeue and and end. dequeue will block, but once you end your queue, the while will exit.
You're also doing some decidedly odd things with your threads - I would suggest that you rarely want to detach a thread. You're just assuming your thread is going to complete before your program, which isn't a good plan.
Likewise this;
my $thread = threads->create(
sub { scrape_match( $thread_id, $queue, $queue_processed ) } )
->join;
You're spawning a thread, and then instantly joining it. And so that join call will... block waiting for your thread to exit. You don't need threads at all to do that...
You also scope your queues within your foreach loop. I don't think that's a good plan. I would suggest instead - scope them externally, and spawn a defined number of 'worker' threads (and one 'printing' thread).
And then just feed them through the queue mechanism. Otherwise you'll end up creating multiple queue instances, because they're lexically scoped.
And once you've finished queuing stuff, issue a $queue -> end which'll terminate the while loop.
You also don't need to give a thread a $thread_id because ... they already have one. Try: threads -> self -> tid(); instead.