I am using perl with threads to process socket info
ConnectionThread
is responsible for receiving packets and parsing, and enque in a hash
QueueThread
Is responsible for processing Queue elelemts (hash entries ) and update DB
The Hash is Events and declared as my %Events:shared;
I am passing a hash reference to the Threads, but i noticed that each thread is getting a difference hash ref value
my $hash_ref1 = \%Events ; # referencing
print "Hash ref in main is 1 " . $hash_ref1 ."\n";
my $thr1 = threads->create(\&ConnectionThread, $hash_ref1 );
my $thr2 = threads->create(\&QueueThread, $hash_ref1);
The output is as below
Hash ref in main is 1 HASH(0x825faa4)
Hash ref is ConnectionThread is HASH(0x8493544)
Thread started ..
Hash ref is Queue thread is HASH(0x852dd9c)
below is the full code ( illustrative )
use strict;
use warnings;
use Socket;
use threads;
use threads::shared;
use DBI;
my %Events:shared;
sub checkSize {
my $size;
$size =keys %Events;
print "Size of Queue in Check is ***" .$size ." \n";
}
sub QueueThread {
my ($hash_ref) = #_;
print "Hash ref is Queue thread is " . $hash_ref ." \n";
while (1==1) {
sleep (5);
}
}
sub ConnectionThread {
my ( $hash_ref ) = #_;
print "Hash ref is ConnectionThread is " . $hash_ref ." \n";
while (1==1) {
sleep(5);
}
}
my $hash_ref1 = \%Events;
print "Hash ref in main is 1 " . $hash_ref1 ."\n";
my $thr1 = threads->create(\&ConnectionThread, $hash_ref1 );
my $thr2 = threads->create(\&QueueThread, $hash_ref1);
print "Thread started ..\n";
while (1==1) {
sleep 10;
}
You are not directly accessing the same variable in all threads. If you did, you'd have to explicitly guarantee mutual access to the variable every time you access it (even if just to read it) to avoid crashing the program.
Each thread (including the one in which the variable is created) gets a
"proxy" to the data-containing variable. The proxy is a magical variable, meaning accessing the elements of the proxy results in getters and setters being called. These getters and setters ensure the data-containing variable is never in an inconsistent state by providing mutually exclusive access to it.
$ perl -Mthreads -Mthreads::shared -MDevel::Peek \
-E'my %h :shared; ( async { Dump(%h); } )->join; Dump(%h);' 2>&1 |
grep -P '^\S|^ {8}IV'
SV = PVHV(0x1ed5f90) at 0x1f6fd68 <-----+
IV = 31930352 <--+ |
SV = PVHV(0x1d70af0) at 0x1d880d0 <--|--+------ Two different vars
IV = 31930352 <--+--------- Proxying the same var (0x1e737f0)
Yes, this will happen. Threads do not share memory. You can sort of fake it with shared which allows you to have common variables - but you won't necessarily see the same hash memory locations.
Despite %Events being shared that's not going to print the same memory address in each thread if you print \%Events;
Given you're talking about queueing though, can I suggest instead using Thread::Queue which allows you to 'do' queue/dequeue operations in a nice easy and thread safe manner.
Related
I am trying to optimize code that loads a file and parses the data in Perl.
Background
The data ends up in a complicated object tree.
The top level object is a blessed package.
Some of the nested objects are new'ed as another blessed package type call Item.
The first pass separates the binary data into units and segments
within a unit which are all stored in multiple arrays.
There could be 20 or 50 units with 8 segments each.
The second pass performs the decoding of the binary data and is what needs optimized for speed.
Approach with threads
I am trying to use the modules threads and threads::shared.
I want each thread to process a subset of the units and populate the data into a common object tree.
I am looking for some sample code that demonstrates how to share blessed objects that may be allocated and blessed from any thread context and inserted into a shared object tree. And then is accessible from the main thread to walk the data for data lookups. The decoding threads will return once the decoding is completed.
I am having difficulty seeing how to have multiple threads insert objects to a common object tree using the threads:shared module. In particular when object of Item type are blessed from a thread context. The class (package) functions won't be bound to the object as I understand.
I do realize that at certain points in the code the code will need to use the threads::shared::lock() function before adding objects to the object tree.
In particular the nested blessed Item objects would be allocated from each thread context.
The threads::shared documentation says "Note that it is often not wise to share an object unless the class itself has been written to support sharing".
Is there a sample code that demonstrates how to accomplish this?
The documentation also says "object's destructor may get called multiple times, one for each thread's scope exit". How is this handled properly?
Thanks
J.R.
OK, so backtracking a bit - threads::shared really does 'single data structures' and doesn't really support more complicated things. That's because when you 'thread' you actually create separate program instances with (some) shared memory space, but practically speaking each 'thread' is a separate program anyway.
So, supporting sharing in an object gets really quite messy. I've found a better approach is to ... not. Use Thread::Queue to pass data between threads, and have one thread that acts to collate results. If you need to pass more complicated data structures, you can use Storable and freeze/thaw to serialise the object, and enqueue it.
That way you don't have to worry about tripping over shared nested data structures - and it's quite likely that you will, because there's no such thing as a 'deep share' option on an object - you have to explicitly share every internal array/hash(reference).
So I would tackle it like this:
#!/usr/bin/perl
use strict;
use warnings;
package Test_Object;
sub new {
my ( $class, $id ) = #_;
my $self = {};
$self->{id} = $id;
bless $self, $class;
return $self;
}
sub set_result {
my ( $self, $result_code ) = #_;
$self->{result} = $result_code;
}
sub get_id {
my ($self) = #_;
return $self->{id};
}
sub get_result {
my ($self) = #_;
return $self->{result};
}
package main;
use strict;
use warnings qw/ all /;
use threads;
use Thread::Queue;
use Storable qw/ freeze thaw/;
my $work_q = Thread::Queue->new();
my $result_q = Thread::Queue->new();
sub worker {
my $tid = threads->self->tid;
print "$tid: starting\n";
while ( my $item = $work_q->dequeue() ) {
my $object = thaw($item);
print "$tid: got object with ID of ", $object->get_id, "\n";
$object->set_result( $object->get_id . " : $tid" );
$result_q->enqueue( freeze $object );
}
}
sub collator {
while ( my $result = $result_q->dequeue ) {
my $object = thaw $result;
print "Collator got object with result code of ", $object->get_result,
"\n";
}
## do something with collated wossnames - pass back to main maybe?
}
my #workers;
for ( 1 .. 5 ) {
my $thr = threads->create( \&worker );
push #workers, $thr;
}
my $collator = threads->create( \&collator );
for ( 1 .. 200 ) {
my $work_object = Test_Object->new($_);
$work_q->enqueue( freeze $work_object );
}
$work_q->end;
foreach my $thr (#workers) {
$thr->join;
}
$result_q->end;
foreach my $thr ( threads->list ) {
$thr->join;
}
I have a code similar to the below. I have one main script which is calling another module named initial.pm. initial.pm opens up connection with an AMQP server (In my case RabbitMQ)and using Net::AMQP::RabbitMQ library for establishing the connection. Everything works fine except when I try to join my threads I get segmentation fault.
I think the Net::AMQP::RabbitMQ is not thread safe. But this is only being used by the main thread. Im pretty sure you can reproduce the error if you just copy and past the codes below.
How do I fix it ?
main.pl
#!/usr/bin/perl
use Cwd qw/realpath/;
use File::Basename qw/dirname/;
use lib 'lib';
use threads;
use threads::shared;
use initial;
my #threads = ();
my $run :shared = 1;
my $init = load initial($name);
$SIG{'TERM'} = sub {
$run = 0;
};
threads->create(\&proc1);
threads->create(\&proc2);
while($run){
sleep(1);
print "I am main thread\n";
}
$_->join() for threads->list();
sub proc1 {
while($run){
sleep(1);
print "I am child thread 1 \n"
}
}
sub proc2 {
while($run){
sleep(1);
print "I am child thread 2 \n";
}
}
lib/initial.pm
package initial;
use Net::AMQP::RabbitMQ;
use Cwd qw/realpath/;
use File::Basename qw/dirname/;
my $mq;
my $stop = 0;
sub load {
my $class = shift;
my $self = {};
connectq();
bless $self,$class;
return $self;
}
sub connectq {
$mq = Net::AMQP::RabbitMQ->new();
my ($host,$port,$user,$pass) = ('localhost','5672','guest','guest');
$mq->connect($host, {
user => $user,
password => $pass,
port => $port,
timeout => 10,
});
$mq->channel_open(1);
$mq->consume(1, 'logger');
}
1;
I can't reproduce your problem directly, because I don't have the library installed.
One way of 'faking' thread safety in a not-thread-safe module is to rescope your 'use' to only the bit where you'll be using it.
You see, when you start a thread, it copies the program state - loaded libraries and everything.
If your run (something like):
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
sub thread1 {
print threads->self->tid.": Includes:", Dumper \%INC,"\n";
}
#main;
print "Main includes:", Dumper \%INC,"\n";
threads -> create ( \&thread1 );
You'll see XML::Twig is loaded in both. If the process of 'loading' the module causes some state changes (and it can) then you immediately have a potential thread-safety issue.
However if you instead do:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use Data::Dumper;
sub thread1 {
require XML::Twig;
XML::Twig -> import;
print threads->self->tid.": Includes:", Dumper (\%INC),"\n";
}
#main;
print "Main includes:", Dumper (\%INC),"\n";
threads -> create ( \&thread1 );
foreach my $thr ( threads -> list ) {
$thr -> join;
}
You effectively cause the module to be dynamically loaded within the thread - the module is only present in one 'code instance' so you are much less likely to be tripped up by 'thread safety' issues.
Alternatively - forking instead of threading ... might be an alternative. This has slightly different 'safety' problems.
But there really is no way to avoid this. Even with shared variables, the core problem is - when you thread, bits of code happen in a different order. There's all sorts of fruity things that can happen as a result. A shared var is one way of ensuring it's the same variable being checked each time - e.g. share $init, but that may well make things worse, because you're then potentially trampling over the same instance/sockets with different threads.
You can, however, reduce the 'thread safety' problem to a limited scope, and use e.g. Thread::Queue to pass messages to/from your 'module user' thread.
I am a newbie to perl, so please excuse my ignorance. (I'm using windows 7)
I have borrowed echicken's threads example script and wanted to use it as a basis for a script to make a number of system calls, but I have run into an issue which is beyond my understanding. To illustrate the issue I am seeing, I am doing a simple ping command in the example code below.
$nb_process is the number or simultaneous running threads allowed.
$nb_compute as the number of times we want to run the sub routine (i.e the total number of time we will issue the ping command).
When I set $nb_compute and $nb_process to be same value as each other, it works perfectly.
However when I reduce $nb_process (to restrict the number of running threads at any one time), it seems to lock once the number of threads defined in $nb_process have started.
It works fine if I remove the system call (ping command).
I see the same behaviour for other system calls (it'd not just ping).
Please could someone help? I have provided the script below.
#!/opt/local/bin/perl -w
use threads;
use strict;
use warnings;
my #a = ();
my #b = ();
sub sleeping_sub ( $ $ $ );
print "Starting main program\n";
my $nb_process = 3;
my $nb_compute = 6;
my $i=0;
my #running = ();
my #Threads;
while (scalar #Threads < $nb_compute) {
#running = threads->list(threads::running);
print "LOOP $i\n";
print " - BEGIN LOOP >> NB running threads = ".(scalar #running)."\n";
if (scalar #running < $nb_process) {
my $thread = threads->new( sub { sleeping_sub($i, \#a, \#b) });
push (#Threads, $thread);
my $tid = $thread->tid;
print " - starting thread $tid\n";
}
#running = threads->list(threads::running);
print " - AFTER STARTING >> NB running Threads = ".(scalar #running)."\n";
foreach my $thr (#Threads) {
if ($thr->is_running()) {
my $tid = $thr->tid;
print " - Thread $tid running\n";
}
elsif ($thr->is_joinable()) {
my $tid = $thr->tid;
$thr->join;
print " - Results for thread $tid:\n";
print " - Thread $tid has been joined\n";
}
}
#running = threads->list(threads::running);
print " - END LOOP >> NB Threads = ".(scalar #running)."\n";
$i++;
}
print "\nJOINING pending threads\n";
while (scalar #running != 0) {
foreach my $thr (#Threads) {
$thr->join if ($thr->is_joinable());
}
#running = threads->list(threads::running);
}
print "NB started threads = ".(scalar #Threads)."\n";
print "End of main program\n";
sub sleeping_sub ( $ $ $ ) {
my #res2 = `ping 136.13.221.34`;
print "\n#res2";
sleep(3);
}
The main problem with your program is that you have a busy loop that tests whether a thread can be joined. This is wasteful. Furthermore, you could reduce the amount of global variables to better understand your code.
Other eyebrow-raiser:
Never ever use prototypes, unless you know exactly what they mean.
The sleeping_sub does not use any of its arguments.
You use the threads::running list a lot without contemplating whether this is actually correct.
It seems you only want to run N workers at once, but want to launch M workers in total. Here is a fairly elegant way to implement this. The main idea is that we have a queue between threads where threads that just finished can enqueue their thread ID. This thread will then be joined. To limit the number of threads, we use a semaphore:
use threads; use strict; use warnings;
use feature 'say'; # "say" works like "print", but appends newline.
use Thread::Queue;
use Thread::Semaphore;
my #pieces_of_work = 1..6;
my $num_threads = 3;
my $finished_threads = Thread::Queue->new;
my $semaphore = Thread::Semaphore->new($num_threads);
for my $task (#pieces_of_work) {
$semaphore->down; # wait for permission to launch a thread
say "Starting a new thread...";
# create a new thread in scalar context
threads->new({ scalar => 1 }, sub {
my $result = worker($task); # run actual task
$finished_threads->enqueue(threads->tid); # report as joinable "in a second"
$semaphore->up; # allow another thread to be launched
return $result;
});
# maybe join some threads
while (defined( my $thr_id = $finished_threads->dequeue_nb )) {
join_thread($thr_id);
}
}
# wait for all threads to be finished, by "down"ing the semaphore:
$semaphore->down for 1..$num_threads;
# end the finished thread ID queue:
$finished_threads->enqueue(undef);
# join any threads that are left:
while (defined( my $thr_id = $finished_threads->dequeue )) {
join_thread($thr_id);
}
With join_thread and worker defined as
sub worker {
my ($task) = #_;
sleep rand 2; # sleep random amount of time
return $task + rand; # return some number
}
sub join_thread {
my ($tid) = #_;
my $thr = threads->object($tid);
my $result = $thr->join;
say "Thread #$tid returned $result";
}
we could get the output:
Starting a new thread...
Starting a new thread...
Starting a new thread...
Starting a new thread...
Thread #3 returned 3.05652608754778
Starting a new thread...
Thread #1 returned 1.64777186731541
Thread #2 returned 2.18426146087901
Starting a new thread...
Thread #4 returned 4.59414651998983
Thread #6 returned 6.99852684265667
Thread #5 returned 5.2316971836585
(order and return values are not deterministic).
The usage of a queue makes it easy to tell which thread has finished. Semaphores make it easier to protect resources, or limit the amount of parallel somethings.
The main benefit of this pattern is that far less CPU is used, when contrasted to your busy loop. This also shortens general execution time.
While this is a very big improvement, we could do better! Spawning threads is expensive: This is basically a fork() without all the copy-on-write optimizations on Unix systems. The whole interpreter is copied, including all variables, all state etc. that you have already created.
Therefore, as threads should be used sparingly, and be spawned as early as possible. I already introduced you to queues that can pass values between threads. We can extend this so that a few worker threads constantly pull work from an input queue, and return via an output queue. The difficulty now is to have the last thread to exit finish the output queue.
use threads; use strict; use warnings;
use feature 'say';
use Thread::Queue;
use Thread::Semaphore;
# define I/O queues
my $input_q = Thread::Queue->new;
my $output_q = Thread::Queue->new;
# spawn the workers
my $num_threads = 3;
my $all_finished_s = Thread::Semaphore->new(1 - $num_threads); # a negative start value!
my #workers;
for (1 .. $num_threads) {
push #workers, threads->new( { scalar => 1 }, sub {
while (defined( my $task = $input_q->dequeue )) {
my $result = worker($task);
$output_q->enqueue([$task, $result]);
}
# we get here when the input queue is exhausted.
$all_finished_s->up;
# end the output queue if we are the last thread (the semaphore is > 0).
if ($all_finished_s->down_nb) {
$output_q->enqueue(undef);
}
});
}
# fill the input queue with tasks
my #pieces_of_work = 1 .. 6;
$input_q->enqueue($_) for #pieces_of_work;
# finish the input queue
$input_q->enqueue(undef) for 1 .. $num_threads;
# do something with the data
while (defined( my $result = $output_q->dequeue )) {
my ($task, $answer) = #$result;
say "Task $task produced $answer";
}
# join the workers:
$_->join for #workers;
With worker defined as before, we get:
Task 1 produced 1.15207098293783
Task 4 produced 4.31247785766295
Task 5 produced 5.96967474718984
Task 6 produced 6.2695013168678
Task 2 produced 2.02545636412421
Task 3 produced 3.22281619053999
(The three threads would get joined after all output is printed, so that output would be boring).
This second solution gets a bit simpler when we detach the threads – the main thread won't exit before all threads have exited, because it is listening to the input queue which is finished by the last thread.
I'm learning how to do threading in Perl. I was going over the example code here and adapted the solution code slightly:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Semaphore;
my $sem = Thread::Semaphore->new(2); # max 2 threads
my #names = ("Kaku", "Tyson", "Dawkins", "Hawking", "Goswami", "Nye");
my #threads = map {
# request a thread slot, waiting if none are available:
foreach my $whiz (#names) {
$sem->down;
threads->create(\&mySubName, $whiz);
}
} #names;
sub mySubName {
return "Hello Dr. " . $_[0] . "\n";
# release slot:
$sem->up;
}
foreach my $t (#threads) {
my $hello = $t->join();
print "$hello";
}
Of course, this is now completely broken and does not work. It results in this error:
C:\scripts\perl\sandbox>threaded.pl
Can't call method "join" without a package or object reference at C:\scripts\perl\sandbox\threaded.pl line 24.
Perl exited with active threads:
0 running and unjoined
9 finished and unjoined
0 running and detached
My objective was two-fold:
Enforce max number of threads allowed at any given time
Provide the array of 'work' for the threads to consume
In the original solution, I noticed that the 0..100; code seems to specify the amount of 'work' given to the threads. However, in my case where I want to supply an array of work, do I still need to supply something similar?
Any guidance and corrections very welcome.
You're storing the result of foreach into #threads rather than the result of threads->create.
Even if you fix this, you collect completed threads too late. I'm not sure how big of a problem that is, but it might prevent more than 64 threads from being started on some systems. (64 is the max number of threads a program can have at a time on some systems.)
A better approach is to reuse your threads. This solves both of your problems and avoids the overhead of repeatedly creating threads.
use threads;
use Thread::Queue 3.01 qw( );
use constant NUM_WORKERS => 2;
sub work {
my ($job) = #_;
...
}
{
my $q = Thread::Queue->new();
for (1..NUM_WORKERS) {
async {
while (my $job = $q->dequeue()) {
work($job);
}
};
}
$q->enqueue(#names); # Can be done over time.
$q->end(); # When you're done adding.
$_->join() for threads->list();
}
I have a Perl script running on version 5.10 build 1004 of ActiveStates Active Perl on windows xp which creates a UI and then runs a long process after a button press. During this process I would like to update the UI (a list box) with status on what is going on during the execution of this thread. Here is a stripped down version of the code.
#!/usr/local/bin/perl
use warnings;
use strict;
use Tkx;
use threads;
use threads::shared;
my $outputText = " {a} {b}";
my $mw = Tkx::widget->new(".");
$mw->g_wm_title("MD5 Checker");
$mw->g_wm_minsize(300,200);
my $content = $mw->new_ttk__frame(-padding => "12 12 12 12");
my $btnCompare = $content->new_ttk__button(-text => "Compare", -command => sub{startWork()});
my $lstbxOutput = $content->new_tk__listbox(-listvariable => \$outputText, -height => 5);
my $scollListBox = $content->new_ttk__scrollbar(-orient => 'vertical', -command => [$lstbxOutput, 'yview']);
$lstbxOutput->configure(-yscrollcommand => [$scollListBox, 'set']);
sub startWork()
{
print "Starting thread \n";
my $t = threads->create(\&doWork, 1);
sleep (5);
print $outputText . "\n";
}
sub doWork()
{
for (my $a = 0; $a<10; $a++)
{
$outputText .= " {$a}";
print "Counting $a\n";
sleep(2);
}
print "End thread\n";
}
Currently the print commands are for my debugging so I know what the main and child threads are doing. From what I have read about threading I need the use threads::shared; to allow threads to share variables. At the moment my list box does not update at all during the child threads execution nor when the thread has ended. Without the threading, the list box would update after the main thread was done with the loop. What am I missing to get the UI to update during the threads execution?
Thanks
Wesley
One problem is that the listbox variable needs to be shared between the threads. Tk doesn't seem happy with the listbox variable shared directly, so I made two copies, and set up a periodic status update to copy the shared version to the non-shared version.
However, using threads with Tkx may be dicey. I was getting segfaults when I tried to join the thread rather than detach it, and I get a segfault with the code below if I move my $t inside startWork(). This discussion suggests that you may need to start the thread before creating any Tk widgets for it to work reliably.
Here is the code I ended up with:
my $outputTextShared :shared = " {a} {b}";
my $outputText = " {a} {b}";
my $t;
sub startWork()
{
print "Starting thread \n";
$t = threads->create(\&doWork, 1);
}
sub updateStatus()
{
$outputText = $outputTextShared;
}
sub doWork()
{
threads->detach();
for (my $a = 0; $a<10; $a++)
{
$outputTextShared .= " {$a}";
print "Counting $a\n";
sleep(1);
}
print "End thread\n";
}
my $update;
$update = sub {
Tkx::after (1000, $update);
updateStatus();
};
Tkx::after (1000, $update);
Tkx::MainLoop();
Threading is nice because the UI doesn't block and you can do things like kill the child process if it's taking too long. That power comes with complexity, though. If all you care about is updating the task status in the UI you can do that without using threads; you just have to do it manually.
$outputText = 'some message';
Tkx::update('idletasks');