Why does a shared perl hash not share updates with other threads? - multithreading

I'm trying to make a web server whose requests are farmed out to a set of interpreters hidden behind open2(), based on which 'device' is indicated in the cgi parameters.
The trouble is, I want it multi-threaded but the hash I'm using to try to keep track of the event queue relating to each device doesn't remember the new device created for each request: the server below only prints this sort of thing:
Did not find default-device in (alreadyThere)...
Added default-device with Sun Oct 27 20:43:35 2013 to alreadyThere, default-device
Now... does (alreadyThere, default-device) persist for the next request?
Here is the script:
#!/usr/bin/perl -w
use strict;
use threads;
use threads::shared;
use base qw(Net::Server::HTTP);
our $monkeys = shared_clone({ alreadyThere => { 'a' => 'b' } });
sub process_http_request {
require CGI;
my $cgi = CGI->new;
my $device = $cgi->param('device') || 'default-device';
print "HTTP/1.0 200 OK\r\nContent-type: text/html\r\n\r\n<pre>";
unless (exists $monkeys->{$device}) {
print "Did not find $device in (".join(", ", sort keys %$monkeys).")...\n";
lock $monkeys;
unless (exists $monkeys->{$device}) {
my $t = localtime;
$monkeys->{$device} = $t;
print "\nAdded $device with ".$t." to ".join(", ", sort keys %$monkeys);
} else {
print "\nSurprise device... ".$device;
}
} else {
print "\nFound device... ".$device;
}
print "\nNow... does (".join(", ", sort keys %$monkeys).") persist for the next request?</pre>";
}
__PACKAGE__->run(port => 8080);
It's not the $t bit - that was previously shared_clone({ id => $t }), but I'm darned if I can see why $monkeys never seems to update.

The different requests are served by different processes, not threads.
Net::Server doesn't have a multi-threaded "personality"[1], so you're going to have to use a different sharing mechanism.
Notes:
"in the near future, we would like to add a 'Thread' personality"

Building on Ikegami's answer, I'm trying with this additional code to fake a 'threaded' personality with some success (and some problems with 'open3' misbehaving):
sub default_server_type { 'Single' }
sub loop {
my $self = shift;
while( $self->accept ){
async {
$self->run_client_connection;
};
last if $self->done;
}
}

a) Is there any reason to use Net::Server::HTTP instead of the higher level and easier to use Plack?
b) I've had to solve a problem not unlike this one recently, and settled on using event-based httpd with AnyEvent (or higher abstraction, Coro). There's Net::Server::Coro if you need a drop-in replacement for your code, or even a plethora of canned AnyEvent-based httpds like Twiggy, Feersum, etc.

Related

Running self-feeding channels in Perl 6

I would like to set up a number of threads operating concurrently on a channel, and every one of those threads should be also feeding the channel. One of the threads would decide when to stop. However, this is the closest I have come to doing that:
use Algorithm::Evolutionary::Simple;
my $length = 32;
my $supplier = Supplier.new;
my $supply = $supplier.Supply;
my $channel-one = $supply.Channel;
my $pairs-supply = $supply.batch( elems => 2 );
my $channel-two = $pairs-supply.Channel;
my $single = start {
react {
whenever $channel-one -> $item {
say "via Channel 1:", max-ones($item);
}
}
}
my $pairs = start {
react {
whenever $channel-two -> #pair {
my #new-chromosome = crossover( #pair[0], #pair[1] );
say "In Channel 2: ", #new-chromosome;
$supplier.emit( #new-chromosome[0]);
$supplier.emit( #new-chromosome[1]);
}
}
}
await (^10).map: -> $r {
start {
sleep $r/100.0;
$supplier.emit( random-chromosome($length) );
}
}
$supplier.done;
This stops after a number of emissions. And it's probably not running concurrently anyway. I am using channels instead of supplies and taps because these are not run concurrently, but asynchronously. I need supplies because I want to have a seudo-channel that takes the elements in pairs, as it's done above; I haven't seen the way of doing that with pure channels.
There is no difference above if I change the supply's emit to channel's send.
So several questions here
Are these react blocks run in different threads? If not, what would be the way of doing that?
Even if they are not, why does it stop even if $pairs is emitting to the channel all the time?
Could I have "batch" channels created automatically from single-item channels?
Update 1: if I eliminate $supplier.done from the end, it will just block. If I create a promise in whenever, one for each read, it just blocks and does nothing.
The answer is here, stripped down to the minimum necessary
my Channel $c .= new;
my Channel $c2 = $c.Supply.batch( elems => 2).Channel;
my Channel $output .= new;
my $count = 0;
$c.send(1) for ^2;
my $more-work = start react whenever $c2 -> #item {
if ( $count++ < 32 ) {
$c.send( #item[1]);
my $sum = sum #item;
$c.send( $sum );
$output.send( $sum );
} else {
$c.close;
}
}
await $more-work;
loop {
if my $item = $output.poll {
$item.say
} else {
$output.close;
}
if $output.closed { last };
}
A second channel that batches the first channel every two elements is used via the creation of a supply from a channel ($c.Supply), batching that supply in batches of two (batch( elems => 2)) and turning it back into a channel. A third channel is created for output.
In order to not exhaust the supply and hang the channel, every second element that is read from the first (and actually, only) channel is put back there. So the second channel that reads in twos is never hanged or waiting for new elements.
An output channel is created for every new element, and an external counter to finish the operation when it's needed; that output channel is read in a non-blocking way, and closed when there's nothing left to read in the last line.
To answer precisely to my original questions:
Yes, they are, only they are stealing elements from each other.
Because the two threads were reading from the same channel. The first one to stumble into an element, reads it.
Yes, by turning channels into supplies, batching them and turning them back into channels. Only bear in mind that they are not copies, they will be sharing the self same elements.

Perl: share complex data-structure between threads

I like to share a complex-datastructure between threads.
As far I know that is not possible with threads:shared (only basic types are shareable).
So I think about serialize/deserialize the structure with JSON or Storable so it is just a string that I can share perfectly. But I need to unpack it before use and pack it after a change.
Is that a common way to work on that problem?
Are there better ways?
Whould you prefer JSON or Storable or something else?
Thanks for help!
EDIT
I just did some tests with Storable and JSON. JSON is quicker and produces smaller serialized strings. I did not expect that.
When dealing with this problem, I use Thread::Queue to pass my objects around, and usually use Storable to serialise.
I haven't bothered doing performance comparisons, because usually my data-passing overhead isn't the limiting factor.
Note - the key advantage of Storable is that it allows some limited object support (Not - be careful - it only works if your object is self contained):
#!/usr/bin/env perl
use strict;
use warnings;
package MyObject;
sub new {
my ( $class, $id ) = #_;
my $self = {};
$self -> {id} = $id;
$self -> {access_count} = 0;
bless $self, $class;
return $self;
}
sub access_thing {
my ( $self ) = #_;
return $self -> {access_count}++;
}
sub get_id {
my ( $self ) = #_;
return $self -> {id};
}
package main;
use threads;
use Thread::Queue;
use Storable qw ( freeze thaw );
my $thread_count = 10;
my $work_q = Thread::Queue -> new;
sub worker {
while ( my $item = $work_q -> dequeue ) {
my $obj = thaw ( $item );
print $obj -> get_id, ": ", $obj -> access_thing,"\n";
}
}
for (1..$thread_count) {
threads -> create (\&worker);
}
for my $id ( 0..1000 ) {
my $obj = MyObject -> new ( $id );
$work_q -> enqueue ( freeze ( $obj ) );
}
$work_q -> end;
$_ -> join for threads -> list;
If JSON would limit you to array/hash data structures - which may be fine for your use case.
Complex data structures can be shared using shared_clone. The components of the data structure need be cloned before being added to it.
use strict;
use feature 'say';
use Data::Dump qw(dd);
use threads;
use threads::shared;
my $cds = {
k1 => shared_clone( { k1_l2 => [ 1..2 ] } ),
k2 => shared_clone( { k2_l2 => [10..11] } )
};
my #threads = map { async(\&proc_ds, $cds->{$_}) } keys %$cds;
$_->join() for #threads;
dd $cds;
sub proc_ds {
my ($ds) = #_;
lock $ds;
push #{$ds->{$_}}, 10+threads->tid for keys %$ds;
}
Note that you don't want to allow autovivification when working with shared values, as it would create unshared (and empty) components in the structure. Check explicitly for existence.
A ready data structure needs to be cloned-and-shared
my $cds = { k => [ 5..7 ] }; # already built, need be shared
my $cds_share = shared_clone( $cds );
my #threads = map { async(\&proc_ds, $cds_share) } 1..3;
$_->join() for #threads;
With the same proc_ds() as above this prints the structure (condensed output)
{ 'k' => [ '5', '6', '7', '11', '12', '13' ] };
When data structure is populated for sharing, as in the first example, then there is less overhead to pay. Otherwise there is a data copy involved, as in the second example, and whether that is OK depends on details (data size, how often a copy is made, etc).
The idea of serializing data is workable as well, but how suitable it is again depends on details since in that case you'd not only copy data but would go to disks as well.
In that case JSON is certainly one good way to go, being a data format that is simple and readable, and can also be shared between tools. The Storable is binary, works directly with Perl data structures, and is supposed to be fast (what should show with larger data).
One other option is to use a worker model and pass data over a message queue. Then you'd use Thread::Queue, or perhaps make use of Thread::Queue::Any, for communication channels.

Case insensitive hash keys perl

Problem
I have a hash/array structure, some of the hash keys are not in the same case though.
I would like to know if there is a way to handle this case without manually checking the keys of every hash.
In the example below i would like all ID/iD/id/Id fields to be printed.
Example code
use warnings;
use strict;
my $Hash = {
Server => [
{
Id=>123
},
{
iD=>456
},
{
ID=>789
}
]
};
for (#{$Hash->{Server}}){
print "$_->{ID}\n"
#This is the problematic part
}
Other
perl version: v5.10.0
This data is recieved from elsewhere and must remain the same case, the example above is minimal and i cannot just simply change them all to the same case.
Any more info needed let me know.
Well, it depends a little bit on your source of information. This looks like you've parsed something, so there may be a better solution.
However, with what we've got here, I'd do it like this:
for my $entry (#{$Hash->{Server}}){
#grep, find first match. Dupes discarded.
my ( $key ) = grep { /^id$/i } keys %$entry;
print "$key => ",$entry -> {$key},"\n";
}
This works by using grep with an i regex for case insensitive on keys, and grabbing whatever comes out first. So if you have multiple matches for /id/i then it'll be random which one you get. (sort could help with that though)
Given you're working with XML though, I'd probably backtrack a bit, throw out XML::Simple and do it like this instead:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> new ( twig_handlers => { '_all_' => sub { $_ -> lc_attnames }} );
$twig -> parse ( \*DATA );
print "XML looks like:\n";
$twig -> set_pretty_print ( 'indented_a');
$twig -> print;
print "Output:\n";
foreach my $server ( $twig -> get_xpath('//Server') ) {
print $server -> att('id'),"\n";
}
__DATA__
<XML>
<Server ID="123" />
<Server Id="456" />
<Server id="789" />
</XML>
Or you can just:
foreach my $server ( $twig -> get_xpath('//Server') ) {
$server -> lc_attnames;
print $server -> att('id'),"\n";
}
in lieu of doing it in the twig handlers. The first answer will 'fix' all of your XML to having lower case attributes, which might not be what you want. But then, it might be useful for other scenarios, which is why I've given two examples.
There is no built-in way to do that. What you could do is use List::Util's first to at least get less checks, and then still try until one fits for each of the keys.
use strict;
use warnings;
use feature 'say';
use List::Util 'first';
my $Hash = {
Server => [
{
Id => 123
},
{
iD => 456
},
{
ID => 789
}
]
};
foreach my $thing ( #{ $Hash->{Server} } ) {
# this returns the first match in the list, like grep
# so we need to use it here to return the actual value
say $thing->{ first { $thing->{$_} } qw/id ID iD Id/ };
}
If there are a lot of other keys in the data structure, this is cheaper than looking at all the keys, because you at max look up all possible id keys plus one, and at best two.
If you want the list of possible keys to auto-generate and the uppercase and lowercase letters can be arbitrarily mixed, take a look at this answer.
I would suggest you to use regex to ignore case of keys using i flag.
for my $item ( # { $Hash->{Server} }) {
for(keys %{$item}) {
print $item -> {$_},"\n" if /^ID$/i;
}
}

Perl Tk error "Invalid value for shared scalar"

I got problem with scalars in my program.
I got code like this:
use threads;
use threads::shared;
use Tk;
$mw = new MainWindow;
my $label = undef;
share($label) my $ok = undef;
share($ok)
HERE IS BUTTON WITH OPTION -command => \&sub1
threads->create('sub2');
sub sub1 {
$top = $mw->TopLevel();
$label = $top->Label( -text => 'something' )->pack();
$ok = 1;
}
sub sub2 {
while (1) {
if ($ok) {
$label->configure( -text => 'i need this' );
$label->update;
}
}
}
I got error in $label->configure(-text => 'i need this'); like this :
Invalid value for shared scalar at xxx.pl
I need to update my label text only from threads and i can't do this.
Thanks for advices.
That error suggests that Tk Label objects simply weren't written to support being shared under ithreads, a circumstance which I'd guess is very arduous to remedy.
I'd suggest instead you make a thread responsible for updating the UI widgets and have that thread receive update instructions from other threads. Awkward, but workable.

Perl Queues and Threading

I'm trying to accomplish the following:
Have a thread that reads data from a very large file say about
10GB and push them into the queue. (I do not wish for the queue to
get very large either)
While the buildQueue thread is pushing data to the queue at the same time have
about 5 worker threads de-queue and process data.
I've made an attempt but my other threads are unreachable because of a continuous loop in my buildQueue thread.
My approach may be totally wrong. Thanks for any help, it's much appreciated.
Here's the code for buildQueue:
sub buildQueue {
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
open DICT_FILE, $dict_path or die("Sorry, could not open file!");
while (1) {
if (<DICT_FILE>) {
if ($queue->pending() < 100) {
my $query = <DICT_FILE>;
chomp($query);
$queue->enqueue($query);
my $count = $queue->pending();
print "Queue Size: $count Query: $query\n";
}
}
}
}
And as I've expected when this thread gets executed nothing else after will be executed because this thread will not finish.
my $builder = new Thread(&buildQueue);
Since the builder thread will be running for a long time I never get to create worker threads.
Here's the entire code:
#!/usr/bin/perl -w
use strict;
use Thread;
use Thread::Queue;
my $queue = new Thread::Queue();
my #threads;
sub buildQueue {
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
open dict_file, $dict_path or die("Sorry, could not open file!");
while (1) {
if (<dict_file>) {
if ($queue->pending() < 100) {
my $query = <dict_file>;
chomp($query);
$queue->enqueue($query);
my $count = $queue->pending();
print "Queue Size: $count Query: $query\n";
}
}
}
}
sub processor {
my $query;
while (1) {
if ($query = $queue->dequeue) {
print "$query\n";
}
}
}
my $builder = new Thread(&buildQueue);
push #threads, new Thread(&processor) for 1..5;
You'll need to mark when you want your threads to exit (via either joinor detach ). The fact that you have infinite loops with no last statements to break out of them is also a problem.
Edit: I also forgot a very important part! Each worker thread will block, waiting for another item to process off of the queue until they get an undef in the queue. Hence why we specifically enqueue undef once for each thread after the queue builder is done.
Try:
#!/usr/bin/perl -w
use strict;
use threads;
use Thread::Queue;
my $queue = new Thread::Queue();
our #threads; #Do you really need our instead of my?
sub buildQueue
{
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
#Three-argument open, please!
open my $dict_file, "<",$dict_path or die("Sorry, could not open file!");
while(my $query=<$dict_file>)
{
chomp($query);
while(1)
{ #Wait to see if our queue has < 100 items...
if ($queue->pending() < 100)
{
$queue->enqueue($query);
print "Queue Size: " . $queue->pending . "\n";
last; #This breaks out of the infinite loop
}
}
}
close($dict_file);
foreach(1..5)
{
$queue->enqueue(undef);
}
}
sub processor
{
my $query;
while ($query = $queue->dequeue)
{
print "Thread " . threads->tid . " got $query\n";
}
}
my $builder=threads->create(\&buildQueue);
push #threads,threads->create(\&process) for 1..5;
#Waiting for our threads to finish.
$builder->join;
foreach(#threads)
{
$_->join;
}
The MCE module for Perl loves big files. With MCE, one can chunk many lines at once, slurp a big chunk as a scalar string, or read 1 line at a time. Chunking many lines at once reduces the overhead for IPC.
MCE 1.504 is out now. It provides MCE::Queue with support for child processes including threads. In addition, the 1.5 release comes with 5 models (MCE::Flow, MCE::Grep, MCE::Loop, MCE::Map, and MCE::Stream) which take care of instantiating the MCE instance as well as auto-tuning max_workers and chunk_size. One may override these options btw.
Below, MCE::Loop is used for the demonstration.
use MCE::Loop;
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
mce_loop_f {
my ($mce, $chunk_ref, $chunk_id) = #_;
foreach my $line ( #$chunk_ref ) {
chomp $line;
## add your code here to process $line
}
} $dict_path;
If you want to specify the number of workers and/or chunk_size, then there are 2 ways to do it.
use MCE::Loop max_workers => 5, chunk_size => 300000;
Or...
use MCE::Loop;
MCE::Loop::init {
max_workers => 5,
chunk_size => 300000
};
Although chunking is preferred for large files, one can compare the time with chunking one line at a time. One may omit the first line inside the block (commented out). Notice how there's no need for an inner for loop. $chunk_ref is still an array ref containing 1 line. The input scalar $_ contains the line when chunk_size equals 1, otherwise points to $chunk_ref.
use MCE::Loop;
MCE::Loop::init {
max_workers => 5,
chunk_size => 1
};
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
mce_loop_f {
# my ($mce, $chunk_ref, $chunk_id) = #_;
my $line = $_;
## add your code here to process $line or $_
} $dict_path;
I hope that this demonstration was helpful for folks wanting to process a file in parallel.
:) mario
It sounds like this case could do with the Parallel::ForkManager module.
A different approach: You can also use user_tasks in MCE 1.2+ and create two multi-worker multithreading tasks, one task for reading (since it's a big file, you could also benefit from parallel reading while preserving file read seek) and one task for processing, etc.
The code below still uses Thread::Queue to manage your buffer queue.
The buildQueue sub has your queue size control and it pushes the data directly to the manager process' $R_QUEUE since we've used threads, so it has access to the parent's memory space. If you want to use forks instead, you can still access the queue through a call back function. But here I chose to simply just push to the queue.
The processQueue sub will simply de-queue whatever is in the queue until there's nothing more pending.
The task_end sub in each task is run only once by the manager process at the end of each task, so we use it to signal a stop to our worker processes.
Obviously, there's a lot of freedom in how you want to chunk your data to the workers, so you can decide upon the size of the chunk or even how to slurp your data in.
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use threads::shared;
use Thread::Queue;
use MCE;
my $R_QUEUE = Thread::Queue->new;
my $queue_workers = 8;
my $process_workers = 8;
my $chunk_size = 1;
print "Enter a file name: ";
my $input_file = <STDIN>;
chomp($input_file);
sub buildQueue {
my ($self, $chunk_ref, $chunk_id) = #_;
if ($R_QUEUE->pending() < 100) {
$R_QUEUE->enqueue($chunk_ref);
$self->sendto('stdout', "Queue Size: " . $R_QUEUE->pending ."\n");
}
}
sub processQueue {
my $self = shift;
my $wid = $self->wid;
while (my $buff = $R_QUEUE->dequeue) {
$self->sendto('stdout', "Thread " . $wid . " got $$buff");
}
}
my $mce = MCE->new(
input_data => $input_file, # this could be a filepath or a file handle or even a scalar to treat like a file, check the documentation for more details.
chunk_size => $chunk_size,
use_slurpio => 1,
user_tasks => [
{ # queueing task
max_workers => $queue_workers,
user_func => \&buildQueue,
use_threads => 1, # we'll use threads to have access to the parent's variables in shared memory.
task_end => sub { $R_QUEUE->enqueue( (undef) x $process_workers ) } # signal stop to our process workers when they hit the end of the queue. Thanks > Jack Maney!
},
{ # process task
max_workers => $process_workers,
user_func => \&processQueue,
use_threads => 1, # we'll use threads to have access to the parent's variables in shared memory
task_end => sub { print "Finished processing!\n"; }
}
]
);
$mce->run();
exit;

Resources