I have a Perl script running on version 5.10 build 1004 of ActiveStates Active Perl on windows xp which creates a UI and then runs a long process after a button press. During this process I would like to update the UI (a list box) with status on what is going on during the execution of this thread. Here is a stripped down version of the code.
#!/usr/local/bin/perl
use warnings;
use strict;
use Tkx;
use threads;
use threads::shared;
my $outputText = " {a} {b}";
my $mw = Tkx::widget->new(".");
$mw->g_wm_title("MD5 Checker");
$mw->g_wm_minsize(300,200);
my $content = $mw->new_ttk__frame(-padding => "12 12 12 12");
my $btnCompare = $content->new_ttk__button(-text => "Compare", -command => sub{startWork()});
my $lstbxOutput = $content->new_tk__listbox(-listvariable => \$outputText, -height => 5);
my $scollListBox = $content->new_ttk__scrollbar(-orient => 'vertical', -command => [$lstbxOutput, 'yview']);
$lstbxOutput->configure(-yscrollcommand => [$scollListBox, 'set']);
sub startWork()
{
print "Starting thread \n";
my $t = threads->create(\&doWork, 1);
sleep (5);
print $outputText . "\n";
}
sub doWork()
{
for (my $a = 0; $a<10; $a++)
{
$outputText .= " {$a}";
print "Counting $a\n";
sleep(2);
}
print "End thread\n";
}
Currently the print commands are for my debugging so I know what the main and child threads are doing. From what I have read about threading I need the use threads::shared; to allow threads to share variables. At the moment my list box does not update at all during the child threads execution nor when the thread has ended. Without the threading, the list box would update after the main thread was done with the loop. What am I missing to get the UI to update during the threads execution?
Thanks
Wesley
One problem is that the listbox variable needs to be shared between the threads. Tk doesn't seem happy with the listbox variable shared directly, so I made two copies, and set up a periodic status update to copy the shared version to the non-shared version.
However, using threads with Tkx may be dicey. I was getting segfaults when I tried to join the thread rather than detach it, and I get a segfault with the code below if I move my $t inside startWork(). This discussion suggests that you may need to start the thread before creating any Tk widgets for it to work reliably.
Here is the code I ended up with:
my $outputTextShared :shared = " {a} {b}";
my $outputText = " {a} {b}";
my $t;
sub startWork()
{
print "Starting thread \n";
$t = threads->create(\&doWork, 1);
}
sub updateStatus()
{
$outputText = $outputTextShared;
}
sub doWork()
{
threads->detach();
for (my $a = 0; $a<10; $a++)
{
$outputTextShared .= " {$a}";
print "Counting $a\n";
sleep(1);
}
print "End thread\n";
}
my $update;
$update = sub {
Tkx::after (1000, $update);
updateStatus();
};
Tkx::after (1000, $update);
Tkx::MainLoop();
Threading is nice because the UI doesn't block and you can do things like kill the child process if it's taking too long. That power comes with complexity, though. If all you care about is updating the task status in the UI you can do that without using threads; you just have to do it manually.
$outputText = 'some message';
Tkx::update('idletasks');
Related
$^T stores the start time of a perl program in second since epoch.
Because I need to know how many seconds a child thread costs, the question is:
Does $^T in child thread store the beginning time of itself? or simply copy the value from its mother thread?
Running this:
#!/usr/bin/env perl
use strict;
use warnings;
sub test_th {
print $^T,"\n";
}
print $^T."\n"
sleep 10;
my $thr = threads -> create ( \&test_th );
$thr -> join;
Prints the same value twice.
Which is as expected, since when you thread, you effectively inherit all your parent variables.
If you try this via forking:
#!/usr/bin/env perl
use strict;
use warnings;
use Parallel::ForkManager;
print $^T, "\n";
for ( 1 .. 2 ) {
sleep 10;
$mgr->start and next;
print $^T, "\n";
$mgr->finish;
}
$mgr->wait_all_children;
You get the same value, despite the 'start' of the fork being 10s later.
So to answer your question - no, $^T is started at program instantiation. If you wish to measure things like thread run times, you'll have to find other ways of doing it.
Although, given "elapsed time" is at best a very crude metric (processors doing things like scheduling, such that 'real time' and 'run time' don't really correlate particularly)
But perhaps calling time() at start and end of each thread would give you what you need? Or perhaps something like Devel::NYTProf?
A quick test will reveal that $^T is defined for the populated at process startup, not at thread startup
But nothing's stopping you from noting when the thread starts. You can even save the time stamp in $^T since it's a per-thread variable!
use feature qw( say );
use threads;
sub thread {
my ($n) = #_;
sleep $n;
}
sub wrapper {
my ($n) = #_;
$^T = time;
thread($n);
say sprintf "Thread %s ran for %s seconds.", threads->tid, time-$^T;
}
async { wrapper(5) };
sleep 2;
async { wrapper(2) };
$_->join for threads->list;
Output:
Thread 2 ran for 2 seconds.
Thread 1 ran for 5 seconds.
Note that assigning to $^T coerces the stored value into an integer, so it would not be an appropriate place to store the result of Time::HiRes::time().
I have a code similar to the below. I have one main script which is calling another module named initial.pm. initial.pm opens up connection with an AMQP server (In my case RabbitMQ)and using Net::AMQP::RabbitMQ library for establishing the connection. Everything works fine except when I try to join my threads I get segmentation fault.
I think the Net::AMQP::RabbitMQ is not thread safe. But this is only being used by the main thread. Im pretty sure you can reproduce the error if you just copy and past the codes below.
How do I fix it ?
main.pl
#!/usr/bin/perl
use Cwd qw/realpath/;
use File::Basename qw/dirname/;
use lib 'lib';
use threads;
use threads::shared;
use initial;
my #threads = ();
my $run :shared = 1;
my $init = load initial($name);
$SIG{'TERM'} = sub {
$run = 0;
};
threads->create(\&proc1);
threads->create(\&proc2);
while($run){
sleep(1);
print "I am main thread\n";
}
$_->join() for threads->list();
sub proc1 {
while($run){
sleep(1);
print "I am child thread 1 \n"
}
}
sub proc2 {
while($run){
sleep(1);
print "I am child thread 2 \n";
}
}
lib/initial.pm
package initial;
use Net::AMQP::RabbitMQ;
use Cwd qw/realpath/;
use File::Basename qw/dirname/;
my $mq;
my $stop = 0;
sub load {
my $class = shift;
my $self = {};
connectq();
bless $self,$class;
return $self;
}
sub connectq {
$mq = Net::AMQP::RabbitMQ->new();
my ($host,$port,$user,$pass) = ('localhost','5672','guest','guest');
$mq->connect($host, {
user => $user,
password => $pass,
port => $port,
timeout => 10,
});
$mq->channel_open(1);
$mq->consume(1, 'logger');
}
1;
I can't reproduce your problem directly, because I don't have the library installed.
One way of 'faking' thread safety in a not-thread-safe module is to rescope your 'use' to only the bit where you'll be using it.
You see, when you start a thread, it copies the program state - loaded libraries and everything.
If your run (something like):
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
sub thread1 {
print threads->self->tid.": Includes:", Dumper \%INC,"\n";
}
#main;
print "Main includes:", Dumper \%INC,"\n";
threads -> create ( \&thread1 );
You'll see XML::Twig is loaded in both. If the process of 'loading' the module causes some state changes (and it can) then you immediately have a potential thread-safety issue.
However if you instead do:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use Data::Dumper;
sub thread1 {
require XML::Twig;
XML::Twig -> import;
print threads->self->tid.": Includes:", Dumper (\%INC),"\n";
}
#main;
print "Main includes:", Dumper (\%INC),"\n";
threads -> create ( \&thread1 );
foreach my $thr ( threads -> list ) {
$thr -> join;
}
You effectively cause the module to be dynamically loaded within the thread - the module is only present in one 'code instance' so you are much less likely to be tripped up by 'thread safety' issues.
Alternatively - forking instead of threading ... might be an alternative. This has slightly different 'safety' problems.
But there really is no way to avoid this. Even with shared variables, the core problem is - when you thread, bits of code happen in a different order. There's all sorts of fruity things that can happen as a result. A shared var is one way of ensuring it's the same variable being checked each time - e.g. share $init, but that may well make things worse, because you're then potentially trampling over the same instance/sockets with different threads.
You can, however, reduce the 'thread safety' problem to a limited scope, and use e.g. Thread::Queue to pass messages to/from your 'module user' thread.
I am a newbie to perl, so please excuse my ignorance. (I'm using windows 7)
I have borrowed echicken's threads example script and wanted to use it as a basis for a script to make a number of system calls, but I have run into an issue which is beyond my understanding. To illustrate the issue I am seeing, I am doing a simple ping command in the example code below.
$nb_process is the number or simultaneous running threads allowed.
$nb_compute as the number of times we want to run the sub routine (i.e the total number of time we will issue the ping command).
When I set $nb_compute and $nb_process to be same value as each other, it works perfectly.
However when I reduce $nb_process (to restrict the number of running threads at any one time), it seems to lock once the number of threads defined in $nb_process have started.
It works fine if I remove the system call (ping command).
I see the same behaviour for other system calls (it'd not just ping).
Please could someone help? I have provided the script below.
#!/opt/local/bin/perl -w
use threads;
use strict;
use warnings;
my #a = ();
my #b = ();
sub sleeping_sub ( $ $ $ );
print "Starting main program\n";
my $nb_process = 3;
my $nb_compute = 6;
my $i=0;
my #running = ();
my #Threads;
while (scalar #Threads < $nb_compute) {
#running = threads->list(threads::running);
print "LOOP $i\n";
print " - BEGIN LOOP >> NB running threads = ".(scalar #running)."\n";
if (scalar #running < $nb_process) {
my $thread = threads->new( sub { sleeping_sub($i, \#a, \#b) });
push (#Threads, $thread);
my $tid = $thread->tid;
print " - starting thread $tid\n";
}
#running = threads->list(threads::running);
print " - AFTER STARTING >> NB running Threads = ".(scalar #running)."\n";
foreach my $thr (#Threads) {
if ($thr->is_running()) {
my $tid = $thr->tid;
print " - Thread $tid running\n";
}
elsif ($thr->is_joinable()) {
my $tid = $thr->tid;
$thr->join;
print " - Results for thread $tid:\n";
print " - Thread $tid has been joined\n";
}
}
#running = threads->list(threads::running);
print " - END LOOP >> NB Threads = ".(scalar #running)."\n";
$i++;
}
print "\nJOINING pending threads\n";
while (scalar #running != 0) {
foreach my $thr (#Threads) {
$thr->join if ($thr->is_joinable());
}
#running = threads->list(threads::running);
}
print "NB started threads = ".(scalar #Threads)."\n";
print "End of main program\n";
sub sleeping_sub ( $ $ $ ) {
my #res2 = `ping 136.13.221.34`;
print "\n#res2";
sleep(3);
}
The main problem with your program is that you have a busy loop that tests whether a thread can be joined. This is wasteful. Furthermore, you could reduce the amount of global variables to better understand your code.
Other eyebrow-raiser:
Never ever use prototypes, unless you know exactly what they mean.
The sleeping_sub does not use any of its arguments.
You use the threads::running list a lot without contemplating whether this is actually correct.
It seems you only want to run N workers at once, but want to launch M workers in total. Here is a fairly elegant way to implement this. The main idea is that we have a queue between threads where threads that just finished can enqueue their thread ID. This thread will then be joined. To limit the number of threads, we use a semaphore:
use threads; use strict; use warnings;
use feature 'say'; # "say" works like "print", but appends newline.
use Thread::Queue;
use Thread::Semaphore;
my #pieces_of_work = 1..6;
my $num_threads = 3;
my $finished_threads = Thread::Queue->new;
my $semaphore = Thread::Semaphore->new($num_threads);
for my $task (#pieces_of_work) {
$semaphore->down; # wait for permission to launch a thread
say "Starting a new thread...";
# create a new thread in scalar context
threads->new({ scalar => 1 }, sub {
my $result = worker($task); # run actual task
$finished_threads->enqueue(threads->tid); # report as joinable "in a second"
$semaphore->up; # allow another thread to be launched
return $result;
});
# maybe join some threads
while (defined( my $thr_id = $finished_threads->dequeue_nb )) {
join_thread($thr_id);
}
}
# wait for all threads to be finished, by "down"ing the semaphore:
$semaphore->down for 1..$num_threads;
# end the finished thread ID queue:
$finished_threads->enqueue(undef);
# join any threads that are left:
while (defined( my $thr_id = $finished_threads->dequeue )) {
join_thread($thr_id);
}
With join_thread and worker defined as
sub worker {
my ($task) = #_;
sleep rand 2; # sleep random amount of time
return $task + rand; # return some number
}
sub join_thread {
my ($tid) = #_;
my $thr = threads->object($tid);
my $result = $thr->join;
say "Thread #$tid returned $result";
}
we could get the output:
Starting a new thread...
Starting a new thread...
Starting a new thread...
Starting a new thread...
Thread #3 returned 3.05652608754778
Starting a new thread...
Thread #1 returned 1.64777186731541
Thread #2 returned 2.18426146087901
Starting a new thread...
Thread #4 returned 4.59414651998983
Thread #6 returned 6.99852684265667
Thread #5 returned 5.2316971836585
(order and return values are not deterministic).
The usage of a queue makes it easy to tell which thread has finished. Semaphores make it easier to protect resources, or limit the amount of parallel somethings.
The main benefit of this pattern is that far less CPU is used, when contrasted to your busy loop. This also shortens general execution time.
While this is a very big improvement, we could do better! Spawning threads is expensive: This is basically a fork() without all the copy-on-write optimizations on Unix systems. The whole interpreter is copied, including all variables, all state etc. that you have already created.
Therefore, as threads should be used sparingly, and be spawned as early as possible. I already introduced you to queues that can pass values between threads. We can extend this so that a few worker threads constantly pull work from an input queue, and return via an output queue. The difficulty now is to have the last thread to exit finish the output queue.
use threads; use strict; use warnings;
use feature 'say';
use Thread::Queue;
use Thread::Semaphore;
# define I/O queues
my $input_q = Thread::Queue->new;
my $output_q = Thread::Queue->new;
# spawn the workers
my $num_threads = 3;
my $all_finished_s = Thread::Semaphore->new(1 - $num_threads); # a negative start value!
my #workers;
for (1 .. $num_threads) {
push #workers, threads->new( { scalar => 1 }, sub {
while (defined( my $task = $input_q->dequeue )) {
my $result = worker($task);
$output_q->enqueue([$task, $result]);
}
# we get here when the input queue is exhausted.
$all_finished_s->up;
# end the output queue if we are the last thread (the semaphore is > 0).
if ($all_finished_s->down_nb) {
$output_q->enqueue(undef);
}
});
}
# fill the input queue with tasks
my #pieces_of_work = 1 .. 6;
$input_q->enqueue($_) for #pieces_of_work;
# finish the input queue
$input_q->enqueue(undef) for 1 .. $num_threads;
# do something with the data
while (defined( my $result = $output_q->dequeue )) {
my ($task, $answer) = #$result;
say "Task $task produced $answer";
}
# join the workers:
$_->join for #workers;
With worker defined as before, we get:
Task 1 produced 1.15207098293783
Task 4 produced 4.31247785766295
Task 5 produced 5.96967474718984
Task 6 produced 6.2695013168678
Task 2 produced 2.02545636412421
Task 3 produced 3.22281619053999
(The three threads would get joined after all output is printed, so that output would be boring).
This second solution gets a bit simpler when we detach the threads – the main thread won't exit before all threads have exited, because it is listening to the input queue which is finished by the last thread.
I'm learning how to do threading in Perl. I was going over the example code here and adapted the solution code slightly:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Semaphore;
my $sem = Thread::Semaphore->new(2); # max 2 threads
my #names = ("Kaku", "Tyson", "Dawkins", "Hawking", "Goswami", "Nye");
my #threads = map {
# request a thread slot, waiting if none are available:
foreach my $whiz (#names) {
$sem->down;
threads->create(\&mySubName, $whiz);
}
} #names;
sub mySubName {
return "Hello Dr. " . $_[0] . "\n";
# release slot:
$sem->up;
}
foreach my $t (#threads) {
my $hello = $t->join();
print "$hello";
}
Of course, this is now completely broken and does not work. It results in this error:
C:\scripts\perl\sandbox>threaded.pl
Can't call method "join" without a package or object reference at C:\scripts\perl\sandbox\threaded.pl line 24.
Perl exited with active threads:
0 running and unjoined
9 finished and unjoined
0 running and detached
My objective was two-fold:
Enforce max number of threads allowed at any given time
Provide the array of 'work' for the threads to consume
In the original solution, I noticed that the 0..100; code seems to specify the amount of 'work' given to the threads. However, in my case where I want to supply an array of work, do I still need to supply something similar?
Any guidance and corrections very welcome.
You're storing the result of foreach into #threads rather than the result of threads->create.
Even if you fix this, you collect completed threads too late. I'm not sure how big of a problem that is, but it might prevent more than 64 threads from being started on some systems. (64 is the max number of threads a program can have at a time on some systems.)
A better approach is to reuse your threads. This solves both of your problems and avoids the overhead of repeatedly creating threads.
use threads;
use Thread::Queue 3.01 qw( );
use constant NUM_WORKERS => 2;
sub work {
my ($job) = #_;
...
}
{
my $q = Thread::Queue->new();
for (1..NUM_WORKERS) {
async {
while (my $job = $q->dequeue()) {
work($job);
}
};
}
$q->enqueue(#names); # Can be done over time.
$q->end(); # When you're done adding.
$_->join() for threads->list();
}
I'm trying to implement a multithreaded application based on a slightly altered boss/worker model. Basically the main thread creates several boss threads, which in turn spawn two worker threads each (possibly more). That's because the boss threads deal with one host or network device each, and the worker threads could take a while to complete their work.
I'm using Thread::Pool to realize this concept, and so far it works quite well; I also don't think my problem is related to Thread::Pool (see below). Very simplified pseudocode ahead:
use strict;
use warnings;
my $bosspool = create_bosspool(); # spawns all boss threads
my $taskpool = undef; # created in each boss thread at
# creation of each boss thread
# give device jobs to boss threads
while (1) {
foreach my $device ( #devices ) {
$bosspool->job($device);
}
sleep(1);
}
# This sub is called for jobs passed to the $bosspool
sub process_boss
{
my $device = shift;
foreach my $task ( $device->{tasks} ) {
# process results as they become available
process_result() while ( $taskpool->results );
# give task jobs to task threads
scalar $taskpool->job($device, $task);
sleep(1); ### HACK ###
}
# process remaining results / wait for all tasks to finish
process_result() while ( $taskpool->results || $taskpool->todo );
# happy result processing
}
sub process_result
{
my $result = $taskpool->result_any();
# mangle $result
}
# This sub is called for jobs passed to the $taskpool of each boss thread
sub process_task
{
# not so important stuff
return $result;
}
By the way, the reason I'm not using the monitor()-routine is because I have to wait for all jobs in the $taskpool to finish. Now, this code works just wonderful, unless you remove the ### HACK ### line. Without sleeping, $taskpool->todo() won't deliver the right number of jobs still open if you add them or receive their results too "fast". Like, you add 4 jobs in total but $taskpool->todo() will only return 2 afterwards (with no pending results). This leads to all sorts of interesting effects.
OK, so Thread::Pool->todo() is crap, let's try a workaround:
sub process_boss
{
my $device = shift;
my $todo = 0;
foreach my $task ( $device->{tasks} ) {
# process results as they become available
while ( $taskpool->results ) {
process_result();
$todo--;
}
# give task jobs to task threads
scalar $taskpool->job($device, $task);
$todo++;
}
# process remaining results / wait for all tasks to finish
while ( $todo ) {
process_result();
sleep(1); ### HACK ###
$todo--;
}
}
This will also work fine, as long as I keep the ### HACK ### line. Without this line, this code will reproduce the problems of Thread::Pool->todo(), as $todo does not only get decremented by 1, but 2 or even more.
I've tested this code with only one boss thread, so there was basically no multithreading involved (when it comes to this subroutine). $bosspool, $taskpool and especially $todo aren't :shared, no side effects possible, right? What's happening in this subroutine, which gets executed by only one boss thread, with no shared variables, semaphores, etc.?
I would suggest that the best way to implement a 'worker' threads model, is with Thread::Queue. The problem with doing something like this, is figuring out when queues are complete, or whether items are dequeued and pending processing.
With Thread::Queue you can use a while loop to fetch elements from the queue, and end the queue, such that the while loop returns undef and the threads exit.
So you don't always need multiple 'boss' threads, you can just use multiple different flavours of worker and input queues. I would question why you need a 'boss' thread model in that instance. It seems unnecessary.
With reference to:
Perl daemonize with child daemons
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $nthreads = 4;
my #targets = qw ( device1 device2 device3 device4 );
my $task_one_q = Thread::Queue->new();
my $task_two_q = Thread::Queue->new();
my $results_q = Thread::Queue->new();
sub task_one_worker {
while ( my $item = task_one_q->dequeue ) {
#do something with $item
$results_q->enqueue("$item task_one complete");
}
}
sub task_two_worker {
while ( my $item = task_two_q->dequeue ) {
#do something with $item
$results_q->enqueue("$item task_two complete");
}
}
#start threads;
for ( 1 .. $nthreads ) {
threads->create( \&task_one_worker );
threads->create( \&task_two_worker );
}
foreach my $target (#targets) {
$task_one_q->enqueue($target);
$task_two_q->enqueue($target);
}
$task_one_q->end;
$task_two_q->end;
#Wait for threads to exit.
foreach my $thr ( threads->list() ) {
threads->join();
}
$results_q->end();
while ( my $item = $results_q->dequeue() ) {
print $item, "\n";
}
You could do something similar with a boss thread if you were desirous - you can create a queue per boss and pass it by reference to the workers. I'm not sure that it's necessary though.