Perl thread shared data - multithreading

In my script there are n worker threads (0,1..n-1) and each work on the Nth item of the following arrays. Input array is used to provide input to thread and output array accepts the output from the thread. A thread won't access other items of the array. In that case should I declare the array as shared ?
my #ThreadInput :shared=();
my #ThreadOutput :shared=();

(I shall name "caller" the thread that populates #ThreadInput and consumes #ThreadOutput.)
Perl variables aren't shared between threads unless marked with :shared. Each thread gets a copy of variables not marked with :shared.
So,
If the caller populates #ThreadInput before the workers starts, #ThreadInput does not need to be shared, but it will avoid creating a copy of the array for each worker if it is.
If the caller populates #ThreadInput after the workers starts, #ThreadInput must be shared. If it isn't, changes in the caller's #ThreadInput won't affect the worker's copy.
#ThreadOutput must be shared. If it isn't, changes in the worker's #ThreadOutput won't affect the caller's copy.
It's going to be very hard to reuse workers with that model. You should probably be using something more like the following:
use threads;
use Thread::Queue 1.03; # or Thread::Queue::Any
use constant NUM_WORKERS => ...;
sub handle_request {
my ($request) = #_;
return ...response...;
}
{
my $request_q = Thread::Queue->new();
my $response_q = Thread::Queue->new();
my #threads;
my $threads;
for (1..NUM_WORKERS) {
++$threads;
push #threads, async {
while (my $request = $request_q->dequeue()) {
$response_q->enqueue([ $request => handle_request($request) ]);
}
$response_q->enqueue(undef);
};
}
... Add stuff to queue $request_q->enqueue(...) ...
$request_q->end(); # Can be done later if you want to add more items later.
while ($threads && my $job = $response_q->dequeue()) {
if (!defined($job)) {
--$threads;
next;
}
my ($request, $response) = #$job;
... handle response ...
}
$_->join for #threads;
}

Related

How to Start thread only on demand in perl?

In c#, we can create thread and start thread only on need like following (if I am correct)
Thread th=new thread("function");
th.start()
But in perl, when I create itself it has started. For example
$thread1=thread->create("function");
But I want to create 4 thread. I should start only on need. And I've to check whether it's running or not? if thread is not running, then I've to start the same thread by passing different parameter. How to do that in perl ?
Multiple jobs can be sent into the queue, and they are waiting for their turn to be passed to the worker.
use strict;
use warnings;
use threads;
use Thread::Queue;
my $no_of_workers = 4;
my $q = Thread::Queue->new();
# Worker thread
my #thr = map {
my $t = threads->create(sub{
# Thread will loop until no more work
while (defined(my $item = $q->dequeue())) {
# Do work on $item
print "$item\n";
}
});
{queue => $q, thread => $t, id => $_};
} 1 .. $no_of_workers;
# Send work to each thread
$_->{queue}->enqueue("Job for thread $_->{id}") for #thr;
for (#thr) {
# Signal that there is no more work to be sent
# $_->{queue}->end();
# similar to $queue->end() for older perl
$_->{queue}->enqueue(undef) for #thr;
# wait for threads to finish
$_->{thread}->join();
}
Assigning jobs 0..19 in circular way to workers,
for my $i (0 .. 19) {
my $t = $thr[$i % #thr]; # $i % #thr => 0,1,2,3, 0,1,2,3, ..
$t->{queue}->enqueue("Job for thread $t->{id}");
}
You don't want a queue for each thread! You'll end up with idle threads even if work's available.
use strict;
use warnings;
use threads;
use Thread::Queue 3.01 qw( );
use constant NUM_WORKERS => 4;
sub worker {
my ($job) = #_;
print("Job: $job\n");
sleep(rand(4)); # Pretending to do $job
}
{
my $q = Thread::Queue->new();
for (1..NUM_WORKERS) {
async {
while (defined(my $item = $q->dequeue())) {
worker($item);
}
};
}
# Give the workers work to do.
$q->enqueue($_) for 1..14;
# When you're done adding, wait for the workers to finish.
$q->end();
$_->join() for threads->list;
}
This code only does 4 threads, and then stops. It doesn't process the remaining 6 items in the queue.

Adding item to Thread::Queue in Perl gives me error "lock can be used only on shared values"

I have a process which accesses a $taskobj which has a Thread::Queue queue and starts child threads. I have no issue in accessing the $taskobj before forking, but after forking in the child thread I tried to call enqueue on the queue object in $taskobj. It is giving me an error
lock can only be used on shared values
Task.pm
sub new {
my $class = shift;
my $self;
$self->{ID} = 0;
my $taskqueue = Thread::Queue->new();
$self->{TaskQ} = $taskqueue;
}
sub queueing {
my $self = shift;
my $id = $self->{ID};
my $que = $self->{TaskQ};
$que->enqueue($id); # getting error here after forking
$self->{ID} += 1;
return $id;
}
Parent process
sub initializethreads {
my taskobj = new task();
taskobj -> queueing();
}
child process
use Parallel::ForkManager;
my $paralellprocess = new Parallel::ForkManager();
sub initializethreads {
my taskobj = new task();
$taskobj->queueing; # error
}
You seem to have become very confused. fork and threads are different things, and you have both Thread::Queue and Parallel::ForkManager in place. It is a very bad idea to use both unless you absolutely have to.
To work correctly, the Thread::Queue object must be shared across all the threads that need to acccess the queue. You don't show what you are doing with Parallel::ForkManager, but if you fork a secondary process and then $que will no longer be shared between main and the forked process, unless you make a special arrangement to make it shared using forks::shared.
Since Parallel::ForkManager is already using threads::shared to share the queue between threads you would end up with a mess of a program that is highly unlikely to work.
Because you need your data to be shared I recommend you stick with threads and Thread::Queue and forget about forking. If you prefer you could try doing it with fork, but not both at the same time.

Perl async tasks for "any" code, no matter what it is?

I've been writing a "checker" system that performs various "checks" on various services, systems, databases, files, etc. A "check" is generic in nature and can be anything. All checks are reported in a common format weather they pass or fail, whatever that may be.
It is written in a modular OO fashion so that developers can simply follow the framework and write checks independently of one and other. Each object contains a shared reporting object which after they run a check they simply $self->{'reporting'}->report(params). The params are defined and developers are assumed to report appropriately. The reporting object then indexes these reports. My main loader script has entries such as the following:
my $reportingObject = new Checks::Reporting(params);
my #checks;
push #checks, new Checks::Check_One($reportingObject, params));
push #checks, new Checks::Check_One($reportingObject, params));
.
.
push #checks, new Checks::Check_N($reportingObject, params));
To kick off the checks and finalize the report once they are done I have been doing:
foreach my $check (#checks) {
$check->run_stuff();
}
$reportingObject->finalize_report();
Now since these checks are totally independent (do not worry about the reporting object) they can be run in parallel. As an improvement I have done:
my #threads;
foreach my $check (#checks) {
push #threads, async { $check->run_stuff(); }
}
foreach my $thread (#threads) {
$thread->join;
}
#All threads are complete, and thus all checks are done
$reportingObject->finalize_report();
As I said earlier the developers will write Checks independently of each other. Some checks are simple and others are not. The simple checks may not have asynchronous code in them, but others might need to run asynchronously internally such as
sub do_check {
my #threads;
my #list = #{$self->{'list'}};
foreach my $item (#list) {
push #threads, async {
#do_work_on_$item
#return 1 or 0 for success or fail
};
foreach my $thread (#threads) {
my $res = $thread->join;
if($res == 1) {
$self->{'reporting'}->report(params_here);
}
}
}
}
As you can see the threading model allows me to do things in very vague terms. Each "Check" no matter what it is runs independently in its own thread. If an individual developer has asynchronous stuff to do, no matter what it is, he simply does it independently in its own thread. I want a model similar to this.
Unfortunately threads are slow and inefficient. All of the async libraries have specific watchers such as IO, etc. I do not want anything specific. I would like an event based model that allows me to simply kick off async tasks, no matter what they are, and simply notify when they are all done so I can move on.
Hopefully that explains it and you can point me in the right direction.
This seems like a good fit for a boss-worker model:
Spawn a few workers at the beginning of the program. Make sure they all have access to a queue.
Enqueue as many checks as you like. The workers dequeue the checks, execute them, and enqueue the result in an output queue.
Your main thread looks at the results from the output thread, and does whatever it wants.
Join the workers in an END block
You probably want to look at Thread::Queue::Any if there is a chance you want to put coderefs into the queue.
Here is a fully runnable example:
use strict; use feature 'say';
use threads; use threads::shared; use Thread::Queue::Any;
use constant NUM_THREADS => 5;
local $Storable::Deparse = 1; local $Storable::Eval = 1; # needed to serialize code
my $check_q = Thread::Queue::Any->new;
my $result_q = Thread::Queue::Any->new;
# start the workers
{
my $running :shared = NUM_THREADS;
my #threads = map threads->new(\&worker, $check_q, $result_q, \$running), 1..NUM_THREADS;
END { $_->join for #threads }
}
# enqueue the checks
$check_q->enqueue($_) for sub {1}, sub{2}, sub{"hi"}, sub{ die };
$check_q->enqueue(undef) for 1..NUM_THREADS; # end the queue
while(defined( my $result = $result_q->dequeue )) {
report($$result);
}
sub report {
say shift // "FAILED";
}
sub worker {
my ($in, $out, $running_ref) = #_;
while (defined( my $check = $in->dequeue )) {
my $result = eval { $check->() };
$out->enqueue(\$result);
}
# last thread closes the door
lock $$running_ref;
--$$running_ref || $out->enqueue(undef);
}
This prints
1
2
hi
FAILED
in a slightly random order.

How to implement semaphore thread communication in Perl?

My Perl script needs to run multiple threads simultaneously...
use threads ('yield', 'exit' => 'threads_only');
use threads::shared;
use strict;
use warnings;
no warnings 'threads';
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Async;
use ...
...and such threads need to obtain some information from web, so HTTP::Async is used.
my $request = HTTP::Request->new;
$request->protocol('HTTP/1.1');
$request->method('GET');
$request->header('User-Agent' => '...');
my $async = HTTP::Async->new( slots => 100,
timeout => REQUEST_TIMEOUT,
max_request_time => REQUEST_TIMEOUT );
But some threads need to access web only when other thread(s) says so.
my $start = [Time::HiRes::gettimeofday()];
my #threads = ();
foreach ... {
$thread = threads->create(
sub {
local $SIG{KILL} = sub { threads->exit };
my $url = shift;
if ($url ... ) {
# wait for "go" signal from other threads
}
my ($response, $data);
$request->url($url);
$data = '';
$async->add($request);
while ($response = $async->wait_for_next_response) {
threads->yield();
$data .= $response->as_string;
}
if ($data ... ) {
# send "go" signal to waiting threads
}
}
}, $_);
if (defined $thread) {
$thread->detach;
push (#threads, $thread);
}
}
There might be one or more threads waiting for "go" signal and there might be one or more threads that such "go" signal can send. At the beginning the status of semaphore is "wait" and once it turns to "go", it will stay so.
Finally, app checks max running time. If threads are running too long, self-termination signal is sent.
my $running;
do {
$running = 0;
foreach my $thread (#threads) {
$running++ if $thread->is_running();
}
threads->yield();
} until (($running == 0) ||
(Time::HiRes::tv_interval($start) > MAX_RUN_TIME));
$running = 0;
foreach my $thread (#threads) {
if ($thread->is_running()) {
$thread->kill('KILL');
$running++;
}
}
threads->yield();
Now to the point. My questions are:
How can I most effectively code waiting "semaphore" in the script (see comments in script above). Should I simply use just shared variable with some dummy sleep loop?
Do I need to add some sleep loop at the end of app to give time to threads for self-destruction?
You might look at Thread::Queue to perform this work. You could setup a queue that would handle the signaling between the threads waiting for the 'go' signal and the threads sending the 'go' signal. Here's a quick mock-up that I haven't tested:
...
use Thread::Queue;
...
# In main body
my $q = Thread::Queue->new();
...
$thread = threads->create(
sub {
local $SIG{KILL} = sub { threads->exit };
my $url = shift;
if ($url ... ) {
# wait for "go" signal from other threads
my $mesg = $q->dequeue();
# you could put in some termination code if the $mesg isn't 'go'
if ($mesg ne 'go') { ... }
}
...
if ($data ... ) {
# send "go" signal to waiting threads
$q->enqueue('go');
}
}
}, $_);
...
The threads that need to wait for a 'go' signal will wait on the dequeue method until something enters the queue. Once a message enters the queue one thread and only one thread will grab the message and process it.
If you wish to stop the threads so that they won't run, you can insert a stop message to the head of the queue.
$q->insert(0, 'stop') foreach (#threads);
There are examples in Thread::Queue and threads CPAN distributions that show this in more detail.
In response to your second question, the answer is, unfortunately, it depends. When you proceed to terminate your threads, what kind of clean up is required for a clean shutdown? What's the worst case scenario that could occur if the rug was yanked out from beneath the thread? You would want to plan in any time for the clean up to occur. The other option you could do is wait on each thread to actually complete.
The reason for my comment asking if you could remove the detach call is because this method allows the main thread to exit and not care what was happening to any child threads. Instead, if you remove this call, and add:
$_->join() foreach threads->list();
to the end of your main block, this will require the main application to wait for each thread to actually complete.
If you leave the detach method in place, then you will need to sleep at the end of your code if you require your threads to perform any sort of clean-up. When you call detach on a thread, what you are telling Perl is that you don't care what the thread is doing when your main thread exits. If the main thread exits and there are threads that still running that have been detached, then the program will finish with no warnings. However, if you don't require any clean-up, and you still call detach, feel free to exit whenever you like.
Try out something like this....
#!/usr/bin/perl
use threads;
use threads::shared;
$|=1;
my ($global):shared;
my (#threads);
push(#threads, threads->new(\&mySub,1));
push(#threads, threads->new(\&mySub,2));
push(#threads, threads->new(\&mySub,3));
$i = 0;
foreach my $myThread(#threads)
{
my #ReturnData = $myTread->join ;
print "Thread $i returned: #ReturnData\n";
$i++;
}
sub mySub
{
my ($threadID) = #_;
for(0..1000)
{
$global++;
print "Thread ID: $threadID >> $_ >> GLB: $global\n";
sleep(1);
}
return( $id );
}

Perl Threads and Unsafe Signals

So I recently wanted to thread one of my Perl programs to increase its speed. Taking in a list of websites, I wanted to start a thread for each url and get the content of each website and then look for a company description on the page. Once one thread found a result, or all thread's didn't, I wanted to exit, write my result, and read in urls for my next company.
The problem that I see is that I use the Perl::Unsafe::Signals module inside of the function that I call when creating a thread. I need the unsafe signals to interrupt regular expressions that get "stuck". However this seems to cause all sorts of problems, mainly having the program crash and the error msg "Alarm Clock" shown.
Therefore, is there a way to use Perl::Unsafe::Signals and threads safely? Is there a way to timeout a regular expression in another way by sending a signal to the function ( like I send a 'KILL' signal below?) Thanks.
Note: I stripped down the code to all pertinent parts, let me know if you need more.
use threads ('exit' => 'threads_only');
use threads::shared;
my #descrip;
share(#descrip);
my $lock;
share($lock);
URL:foreach my $url(#unique_urls) {
#skip blank urls
if(!$url) { next URL; }#if
#find description
my $thread = threads->create(\&findCompanyDescription, $PREV_COMPANY, $PREV_BASE_URL, $url);
#while a description has not been found and there are still active threads, keep looking
#there may be a better way to do this, but this seems to work for me
while(!#descrip && threads->list() != 0) {;}
#kill all threads, write output, read in next batch of urls
my #threads = threads->list();
foreach(#threads) { print("detaching\n"); $_->kill('KILL')->detach(); }#foreach
#######SUBROUTINE CALLED BY THREAD CREATE
sub findCompanyDescription {
my($company_full, $base_url, $url) = #_;
my($descrip, $raw_meta, $raw) = '';
my #company;
$SIG{'KILL'} = sub { alarm(0); threads->exit(); };
eval {
local $SIG{ALRM} = sub { die("alarm\n") }; # NB: \n required
alarm(5);
use Perl::Unsafe::Signals;
UNSAFE_SIGNALS {
while($company) {
my #matches = ($content =~ m!.*<([\w\d]+).*?>\s*about\s+$company[\w\s\-_]*<.*?>(?:<.*?>|\s)*(.*?)</\1.*?>!sig);
MATCH:for(my $ndx=1; $ndx<#matches; $ndx+=2) {
($raw, $descrip) = &filterResult($matches[$ndx], $company_full);
if($descrip) {
$company = undef;
last(MATCH);
}#if
}#for
#reduce the company name and try again
$company = &reduceCompanyName($company);
}#while
alarm(0);
};#unsafe_signals
};#eval
if($#) {
if($# eq "alarm\n" && $DEBUG) { print("\nWebpage Timeout [].\n"); }#if
}#if
if($descrip) { lock($lock); {
#descrip = ($PREV_ID, $company_full, $base_url, $url, 1, $raw, $descrip); }
}#if
In general, "unsafe" signals are unsafe for both single threaded and multi-threaded. You've only increased your peril by using threads and unsafe signals. Perl's usual safe signal handler sets the flag signal_pending without meaningfull interrupting execution. The VM checks that flag when it's between opcodes.
Your regexp execution is a single, "atomic" opcode. Of course, the regexp itself is another VM with its own opcodes but we don't have currently visibility into that for the perl signal handler.
Frankly, I've no good idea about how to interrupt the regexp engine. It's got some global C state which in the past prior to perl-5.10 prevented it from being reentrant. It might not be safe for universal interruptability like you're trying. If you really wanted it to be fully interruptible, you might want to fork and have your child process do the regexp and communicate the results back over a pipe.
require JSON;
require IO::Select;
my $TIMEOUT_SECONDS = 2.5; # seconds
my ( $read, $write );
pipe $read, $write;
my #matches;
my $pid = fork;
if ( $pid ) {
my $select = IO::Select->new( $read );
if ( $select->can_read( $TIMEOUT_SECONDS ) ) {
local $/;
my $json = <$read>;
if ( $json ) {
my $matches_ref = JSON::from_json( $json );
if ( $matches_ref ) {
#matches = #$matches_ref;
}
}
}
waitpid $pid, 0;
}
else {
my #r = $content =~ m!.*<([\w\d]+).*?>\s*about\s+$company[\w\s\-_]*<.*?>(?:<.*?>|\s)*(.*?)</\1.*?>!sig;
my $json = JSON::to_json( \ #r );
print { $write } $json;
close $write;
exit;
}
IMHO, mixing signals and threads is a challenging task per se (i.e. w/o perl-specific things).
Remember that even in a single-threaded program you can safely call only async-signal-safe functions from the signal handler because the program may be interrupted at any point.
Perl adds another layer of abstraction, so I have no idea about safety of calling "die" from signal handler in case of unsafe signals.
If I remember properly, SIGALRM is asynchronous signal, so it must be handled synchronously. Your way of handling it is generally incorrect in multi-threaded programs.
Moreover, IMHO perl threads just do not work as most people expect.
Just avoid using them and use processes instead.
P.S.
The following line doesn't make sense:
$SIG{'KILL'} = sub { alarm(0); threads->exit(); };
SIGKILL (as well as SIGSTOP) cannot be caught.
I'm not really specialist on Perl-MT, but one thing you apparently is missing is that signals are global to the whole process - they are not thread specific. On POSIX systems you can't set a signal handler for a thread: signals are delivered to the whole process. IOW alarm() call affects the whole process, not only the thread which calls it. And even local %SIG in MT context doesn't do what one might think it does - because local is a thing of syntax.

Resources