I like to share a complex-datastructure between threads.
As far I know that is not possible with threads:shared (only basic types are shareable).
So I think about serialize/deserialize the structure with JSON or Storable so it is just a string that I can share perfectly. But I need to unpack it before use and pack it after a change.
Is that a common way to work on that problem?
Are there better ways?
Whould you prefer JSON or Storable or something else?
Thanks for help!
EDIT
I just did some tests with Storable and JSON. JSON is quicker and produces smaller serialized strings. I did not expect that.
When dealing with this problem, I use Thread::Queue to pass my objects around, and usually use Storable to serialise.
I haven't bothered doing performance comparisons, because usually my data-passing overhead isn't the limiting factor.
Note - the key advantage of Storable is that it allows some limited object support (Not - be careful - it only works if your object is self contained):
#!/usr/bin/env perl
use strict;
use warnings;
package MyObject;
sub new {
my ( $class, $id ) = #_;
my $self = {};
$self -> {id} = $id;
$self -> {access_count} = 0;
bless $self, $class;
return $self;
}
sub access_thing {
my ( $self ) = #_;
return $self -> {access_count}++;
}
sub get_id {
my ( $self ) = #_;
return $self -> {id};
}
package main;
use threads;
use Thread::Queue;
use Storable qw ( freeze thaw );
my $thread_count = 10;
my $work_q = Thread::Queue -> new;
sub worker {
while ( my $item = $work_q -> dequeue ) {
my $obj = thaw ( $item );
print $obj -> get_id, ": ", $obj -> access_thing,"\n";
}
}
for (1..$thread_count) {
threads -> create (\&worker);
}
for my $id ( 0..1000 ) {
my $obj = MyObject -> new ( $id );
$work_q -> enqueue ( freeze ( $obj ) );
}
$work_q -> end;
$_ -> join for threads -> list;
If JSON would limit you to array/hash data structures - which may be fine for your use case.
Complex data structures can be shared using shared_clone. The components of the data structure need be cloned before being added to it.
use strict;
use feature 'say';
use Data::Dump qw(dd);
use threads;
use threads::shared;
my $cds = {
k1 => shared_clone( { k1_l2 => [ 1..2 ] } ),
k2 => shared_clone( { k2_l2 => [10..11] } )
};
my #threads = map { async(\&proc_ds, $cds->{$_}) } keys %$cds;
$_->join() for #threads;
dd $cds;
sub proc_ds {
my ($ds) = #_;
lock $ds;
push #{$ds->{$_}}, 10+threads->tid for keys %$ds;
}
Note that you don't want to allow autovivification when working with shared values, as it would create unshared (and empty) components in the structure. Check explicitly for existence.
A ready data structure needs to be cloned-and-shared
my $cds = { k => [ 5..7 ] }; # already built, need be shared
my $cds_share = shared_clone( $cds );
my #threads = map { async(\&proc_ds, $cds_share) } 1..3;
$_->join() for #threads;
With the same proc_ds() as above this prints the structure (condensed output)
{ 'k' => [ '5', '6', '7', '11', '12', '13' ] };
When data structure is populated for sharing, as in the first example, then there is less overhead to pay. Otherwise there is a data copy involved, as in the second example, and whether that is OK depends on details (data size, how often a copy is made, etc).
The idea of serializing data is workable as well, but how suitable it is again depends on details since in that case you'd not only copy data but would go to disks as well.
In that case JSON is certainly one good way to go, being a data format that is simple and readable, and can also be shared between tools. The Storable is binary, works directly with Perl data structures, and is supposed to be fast (what should show with larger data).
One other option is to use a worker model and pass data over a message queue. Then you'd use Thread::Queue, or perhaps make use of Thread::Queue::Any, for communication channels.
Related
Use of global arrays in different threads
I'm going to use Dancer2 and File::Tail to use Tail on the web. So when the Websocket is opened, it stores the $conn in an array, and when File::Tail is detected, it tries to send data to the socket stored in the array. But it doesn't work as expected.
The array that is saved when a websocket connection occurs is probably not a global variable.
# it doesn't works.
foreach (#webs) {
$_->send_utf8("test2!!!!!!!!");
}
I tried to use threads::shared and Cache:::Memcached etc, but I failed.
I don't know perl very well. I tried to solve it myself, but I couldn't solve it for too long, so I leave a question.
This is the whole code.
use File::Tail ();
use threads;
use threads::shared;
use Net::WebSocket::Server;
use strict;
use Dancer2;
my #webs = ();
# my %clients :shared = ();
my $conns :shared = 4;
threads->create(sub {
print "start-end:", "$conns", "\n";
my #files = glob( $ARGV[0] . '/*' );
my #fs = ();
foreach my $fileName(#files) {
my $file = File::Tail->new(name=>"$fileName",
tail => 1000,
maxinterval=>1,
interval=>1,
adjustafter=>5,resetafter=>1,
ignore_nonexistant=>1,
maxbuf=>32768);
push(#fs, $file);
}
do {
my $timeout = 1;
(my $nfound,my $timeleft,my #pending)=
File::Tail::select(undef,undef,undef,$timeout,#fs);
unless ($nfound) {
} else {
foreach (#pending) {
my $str = $_->read;
print $_->{"input"} . " ||||||||| ".localtime(time)." ||||||||| ".$str;
# it doesn't works.
foreach (#webs) {
$_->send_utf8("test!!!!!!!!");
}
}
}
} until(0);
})->detach();
threads->create(sub {
Net::WebSocket::Server->new(
listen => 8080,
on_connect => sub {
my ($serv, $conn) = #_;
push(#webs, $conn);
$conn->on(
utf8 => sub {
my ($conn, $msg) = #_;
$conn->send_utf8($msg);
# it works.
foreach (#webs) {
$_->send_utf8("test!!!!!!!!");
}
},
);
},
)->start;
})->detach();
get '/' => sub {
my $ws_url = "ws://127.0.0.1:8080/";
return <<"END";
<html>
<head><script>
var urlMySocket = "$ws_url";
var mySocket = new WebSocket(urlMySocket);
mySocket.onmessage = function (evt) {
console.log( "Got message " + evt.data );
};
mySocket.onopen = function(evt) {
console.log("opening");
setTimeout( function() {
mySocket.send('hello'); }, 2000 );
};
</script></head>
<body><h1>WebSocket client</h1></body>
</html>
END
};
dance;
Threads in perl are not lightweight. They're separate instances of the program.
The only thing that threads have in common, are things that exist prior to the threads instantating.
You can - with declaring shared variables - allow data structures to share between threads, however I'd warn you to be cautious here - without some manner of locking, you potentially create yourself a race condition.
In your case, you could declare #webs as : shared. This will mean values inserted into it will be visible to all your threads. But you still need a degree of caution there, because 'when stuff is added' is still nondeterministic.
But anyway, this basically works:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use threads::shared;
use Data::Dumper;
my #shared_struct : shared;
sub reader {
print "Starting reader\n";
for ( 1..10 ) {
print threads -> self() -> tid(), ":", join (",", #shared_struct ), "\n";
sleep 1;
}
}
sub writer {
print "starting writer\n";
for ( 1..10 ) {
push #shared_struct, rand(10);
print Dumper \#shared_struct;
sleep 1;
}
}
## start the threads;
my $reader = threads -> create ( \&reader );
my $writer = threads -> create ( \&writer );
while ( 1 ) {
print #shared_struct;
sleep 1;
}
More generally, I'd suggest you almost never actually want to detach a thread in perl - in doing so, what you're saying is 'I don't care about your execution'. And clearly that's not the case in your code - you're trying to talk to the threads.
Just creating the thread accomplishes what you want - parallel execution and you can have:
for my $thread ( threads -> list ) {
$thread -> join;
}
As and when you're ready for the thread to terminate.
Problem
I have a hash/array structure, some of the hash keys are not in the same case though.
I would like to know if there is a way to handle this case without manually checking the keys of every hash.
In the example below i would like all ID/iD/id/Id fields to be printed.
Example code
use warnings;
use strict;
my $Hash = {
Server => [
{
Id=>123
},
{
iD=>456
},
{
ID=>789
}
]
};
for (#{$Hash->{Server}}){
print "$_->{ID}\n"
#This is the problematic part
}
Other
perl version: v5.10.0
This data is recieved from elsewhere and must remain the same case, the example above is minimal and i cannot just simply change them all to the same case.
Any more info needed let me know.
Well, it depends a little bit on your source of information. This looks like you've parsed something, so there may be a better solution.
However, with what we've got here, I'd do it like this:
for my $entry (#{$Hash->{Server}}){
#grep, find first match. Dupes discarded.
my ( $key ) = grep { /^id$/i } keys %$entry;
print "$key => ",$entry -> {$key},"\n";
}
This works by using grep with an i regex for case insensitive on keys, and grabbing whatever comes out first. So if you have multiple matches for /id/i then it'll be random which one you get. (sort could help with that though)
Given you're working with XML though, I'd probably backtrack a bit, throw out XML::Simple and do it like this instead:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> new ( twig_handlers => { '_all_' => sub { $_ -> lc_attnames }} );
$twig -> parse ( \*DATA );
print "XML looks like:\n";
$twig -> set_pretty_print ( 'indented_a');
$twig -> print;
print "Output:\n";
foreach my $server ( $twig -> get_xpath('//Server') ) {
print $server -> att('id'),"\n";
}
__DATA__
<XML>
<Server ID="123" />
<Server Id="456" />
<Server id="789" />
</XML>
Or you can just:
foreach my $server ( $twig -> get_xpath('//Server') ) {
$server -> lc_attnames;
print $server -> att('id'),"\n";
}
in lieu of doing it in the twig handlers. The first answer will 'fix' all of your XML to having lower case attributes, which might not be what you want. But then, it might be useful for other scenarios, which is why I've given two examples.
There is no built-in way to do that. What you could do is use List::Util's first to at least get less checks, and then still try until one fits for each of the keys.
use strict;
use warnings;
use feature 'say';
use List::Util 'first';
my $Hash = {
Server => [
{
Id => 123
},
{
iD => 456
},
{
ID => 789
}
]
};
foreach my $thing ( #{ $Hash->{Server} } ) {
# this returns the first match in the list, like grep
# so we need to use it here to return the actual value
say $thing->{ first { $thing->{$_} } qw/id ID iD Id/ };
}
If there are a lot of other keys in the data structure, this is cheaper than looking at all the keys, because you at max look up all possible id keys plus one, and at best two.
If you want the list of possible keys to auto-generate and the uppercase and lowercase letters can be arbitrarily mixed, take a look at this answer.
I would suggest you to use regex to ignore case of keys using i flag.
for my $item ( # { $Hash->{Server} }) {
for(keys %{$item}) {
print $item -> {$_},"\n" if /^ID$/i;
}
}
I'm trying to make a web server whose requests are farmed out to a set of interpreters hidden behind open2(), based on which 'device' is indicated in the cgi parameters.
The trouble is, I want it multi-threaded but the hash I'm using to try to keep track of the event queue relating to each device doesn't remember the new device created for each request: the server below only prints this sort of thing:
Did not find default-device in (alreadyThere)...
Added default-device with Sun Oct 27 20:43:35 2013 to alreadyThere, default-device
Now... does (alreadyThere, default-device) persist for the next request?
Here is the script:
#!/usr/bin/perl -w
use strict;
use threads;
use threads::shared;
use base qw(Net::Server::HTTP);
our $monkeys = shared_clone({ alreadyThere => { 'a' => 'b' } });
sub process_http_request {
require CGI;
my $cgi = CGI->new;
my $device = $cgi->param('device') || 'default-device';
print "HTTP/1.0 200 OK\r\nContent-type: text/html\r\n\r\n<pre>";
unless (exists $monkeys->{$device}) {
print "Did not find $device in (".join(", ", sort keys %$monkeys).")...\n";
lock $monkeys;
unless (exists $monkeys->{$device}) {
my $t = localtime;
$monkeys->{$device} = $t;
print "\nAdded $device with ".$t." to ".join(", ", sort keys %$monkeys);
} else {
print "\nSurprise device... ".$device;
}
} else {
print "\nFound device... ".$device;
}
print "\nNow... does (".join(", ", sort keys %$monkeys).") persist for the next request?</pre>";
}
__PACKAGE__->run(port => 8080);
It's not the $t bit - that was previously shared_clone({ id => $t }), but I'm darned if I can see why $monkeys never seems to update.
The different requests are served by different processes, not threads.
Net::Server doesn't have a multi-threaded "personality"[1], so you're going to have to use a different sharing mechanism.
Notes:
"in the near future, we would like to add a 'Thread' personality"
Building on Ikegami's answer, I'm trying with this additional code to fake a 'threaded' personality with some success (and some problems with 'open3' misbehaving):
sub default_server_type { 'Single' }
sub loop {
my $self = shift;
while( $self->accept ){
async {
$self->run_client_connection;
};
last if $self->done;
}
}
a) Is there any reason to use Net::Server::HTTP instead of the higher level and easier to use Plack?
b) I've had to solve a problem not unlike this one recently, and settled on using event-based httpd with AnyEvent (or higher abstraction, Coro). There's Net::Server::Coro if you need a drop-in replacement for your code, or even a plethora of canned AnyEvent-based httpds like Twiggy, Feersum, etc.
I get the following error when trying to run my test code:
thread failed to start: Invalid value for shared scalar at ./threaded_test.pl line 47.
Line 47 is:
%hoh = hoh(#new_array);
My observations:
If I remove line 47 and other lines referencing %hoh, then the script runs without errors
I can create a new hash %new_hash = (itchy => "Scratchy"); without errors, but when I try to "return" a hash from another sub (line 47), it results in the error above.
Unfortunately, I cannot use a in/out Queue because the version of Thread::Queue that I use is too old (and installed on a system I have no control over) and doesn't support hash and hash-ref types to be returned via a Queue (according to this). Apparently, my version only support strings to be returned via queues.
Is there a way to successfully do this: $hash{$string}{"jc"} = \%hoh;
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
use constant NUM_WORKERS => 10;
my #out_array : shared = ();
main();
sub main
{
my #results = test1();
foreach my $item (#results) {
print "item: $item\n";
}
}
sub test1
{
my $my_queue = Thread::Queue->new();
foreach (1..NUM_WORKERS) {
async {
while (my $job = $my_queue->dequeue()) {
test2($job);
}
};
}
my #sentiments = ("Axe Murderer", "Mauler", "Babyface", "Dragon");
$my_queue->enqueue(#sentiments);
$my_queue->enqueue(undef) for 1..NUM_WORKERS;
$_->join() for threads->list();
my #return_array = #out_array;
return #return_array;
}
sub test2
{
my $string = $_[0];
my %hash : shared;
my #new_array : shared;
my %new_hash : shared;
my %hoh : shared;
#new_array = ("tom", "jerry");
%new_hash = (itchy => "Scratchy");
%hoh = hoh(#new_array);
my %anon : shared;
$hash{$string} = \%anon;
$hash{$string}{"Grenade"} = \#new_array;
$hash{$string}{"Pipe bomb"} = \%new_hash;
$hash{$string}{"jc"} = \%hoh;
push #out_array, \%hash;
return;
}
sub hoh
{
my %hoh;
foreach my $item (#_) {
$hoh{"jeepers"}{"creepers"} = $item;
}
return %hoh;
}
The problem is that your trying to store a reference to something that isn't shared in a shared variable. You need to use share as previously mentioned, or you need to serialise the data structure.
#!/perl/bin/perl
use strict;
use threads;
use threads::shared;
my %hm_n2g:shared = ();
my $row = &share([]);
$hm_n2g{"aa"}=$row;
$row->[0]=1;
$row->[1]=2;
my #arr = #{$hm_n2g{"aa"}};
print #arr[0]." ".#arr[1]."\n";
#If you want to lock the hash in a thread-subroutine
{
lock(%hm_n2g)
}
I've a very large hash-of-hash with upwards of 50,000 entries. I want to process this multi-threaded due to time constraints.
Is it possible for each call to dequeue() to return the next item from the hash, and not the complete hash? In the example below I want dequeue() to return just:
flintstones => {
old_address => "0x1231234a",
new_address => "0x1234234d",
source => "sym"
},
I can then process that in my thread whilst another thread dequeues another item from the hash until all items are processed. My code example below.
If I need to change the storage format (HoH) that's not a problem. Perhaps an array of hashes would work? Any help/pointer appreciated.
use strict;
use warnings;
use threads;
use Thread::Queue;
use Data::Dumper;
my %hoh = (
flintstones => {
old_address => "0x1231234a",
new_address => "0x1234234d",
source => "sym"
},
jetsons => {
old_address => "0x12712343",
new_address => "0x12142344",
source => "sym"
},
simpsons => {
old_address => "0x12f12347",
new_address => "0x12a42348",
source => "dwarf"
},
);
my $href = \%hoh;
my $queue= Thread::Queue->new($href);
my $t = threads->create('start_sub');
my $result = $t->join;
sub start_sub {
print "items on queue = " . $queue->pending() . "\n";
while( $queue->pending() ) {
my $item = $queue->dequeue_nb();
#
## dequeue_nb returns undef when queue empty
if( $item ) {
print "Doing work in thread " . threads->tid() . " on:\n";
print Dumper($item);
print "Done =====================\n"
}
}
}
But it will be better if you use reference instead of hash. Use an arrefref that contain hashrefs. That would be more efficient.
This for your current code.
sub dequeue_nb{
my $key_of_first_item = (keys %hoh)[0];#random order
my $item = $hoh{$key_of_first_item};
delete $hoh{$key_of_first_item};
return $item;
regards,