Use of global arrays in different threads in perl - multithreading

Use of global arrays in different threads
I'm going to use Dancer2 and File::Tail to use Tail on the web. So when the Websocket is opened, it stores the $conn in an array, and when File::Tail is detected, it tries to send data to the socket stored in the array. But it doesn't work as expected.
The array that is saved when a websocket connection occurs is probably not a global variable.
# it doesn't works.
foreach (#webs) {
$_->send_utf8("test2!!!!!!!!");
}
I tried to use threads::shared and Cache:::Memcached etc, but I failed.
I don't know perl very well. I tried to solve it myself, but I couldn't solve it for too long, so I leave a question.
This is the whole code.
use File::Tail ();
use threads;
use threads::shared;
use Net::WebSocket::Server;
use strict;
use Dancer2;
my #webs = ();
# my %clients :shared = ();
my $conns :shared = 4;
threads->create(sub {
print "start-end:", "$conns", "\n";
my #files = glob( $ARGV[0] . '/*' );
my #fs = ();
foreach my $fileName(#files) {
my $file = File::Tail->new(name=>"$fileName",
tail => 1000,
maxinterval=>1,
interval=>1,
adjustafter=>5,resetafter=>1,
ignore_nonexistant=>1,
maxbuf=>32768);
push(#fs, $file);
}
do {
my $timeout = 1;
(my $nfound,my $timeleft,my #pending)=
File::Tail::select(undef,undef,undef,$timeout,#fs);
unless ($nfound) {
} else {
foreach (#pending) {
my $str = $_->read;
print $_->{"input"} . " ||||||||| ".localtime(time)." ||||||||| ".$str;
# it doesn't works.
foreach (#webs) {
$_->send_utf8("test!!!!!!!!");
}
}
}
} until(0);
})->detach();
threads->create(sub {
Net::WebSocket::Server->new(
listen => 8080,
on_connect => sub {
my ($serv, $conn) = #_;
push(#webs, $conn);
$conn->on(
utf8 => sub {
my ($conn, $msg) = #_;
$conn->send_utf8($msg);
# it works.
foreach (#webs) {
$_->send_utf8("test!!!!!!!!");
}
},
);
},
)->start;
})->detach();
get '/' => sub {
my $ws_url = "ws://127.0.0.1:8080/";
return <<"END";
<html>
<head><script>
var urlMySocket = "$ws_url";
var mySocket = new WebSocket(urlMySocket);
mySocket.onmessage = function (evt) {
console.log( "Got message " + evt.data );
};
mySocket.onopen = function(evt) {
console.log("opening");
setTimeout( function() {
mySocket.send('hello'); }, 2000 );
};
</script></head>
<body><h1>WebSocket client</h1></body>
</html>
END
};
dance;

Threads in perl are not lightweight. They're separate instances of the program.
The only thing that threads have in common, are things that exist prior to the threads instantating.
You can - with declaring shared variables - allow data structures to share between threads, however I'd warn you to be cautious here - without some manner of locking, you potentially create yourself a race condition.
In your case, you could declare #webs as : shared. This will mean values inserted into it will be visible to all your threads. But you still need a degree of caution there, because 'when stuff is added' is still nondeterministic.
But anyway, this basically works:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use threads::shared;
use Data::Dumper;
my #shared_struct : shared;
sub reader {
print "Starting reader\n";
for ( 1..10 ) {
print threads -> self() -> tid(), ":", join (",", #shared_struct ), "\n";
sleep 1;
}
}
sub writer {
print "starting writer\n";
for ( 1..10 ) {
push #shared_struct, rand(10);
print Dumper \#shared_struct;
sleep 1;
}
}
## start the threads;
my $reader = threads -> create ( \&reader );
my $writer = threads -> create ( \&writer );
while ( 1 ) {
print #shared_struct;
sleep 1;
}
More generally, I'd suggest you almost never actually want to detach a thread in perl - in doing so, what you're saying is 'I don't care about your execution'. And clearly that's not the case in your code - you're trying to talk to the threads.
Just creating the thread accomplishes what you want - parallel execution and you can have:
for my $thread ( threads -> list ) {
$thread -> join;
}
As and when you're ready for the thread to terminate.

Related

Perl: share complex data-structure between threads

I like to share a complex-datastructure between threads.
As far I know that is not possible with threads:shared (only basic types are shareable).
So I think about serialize/deserialize the structure with JSON or Storable so it is just a string that I can share perfectly. But I need to unpack it before use and pack it after a change.
Is that a common way to work on that problem?
Are there better ways?
Whould you prefer JSON or Storable or something else?
Thanks for help!
EDIT
I just did some tests with Storable and JSON. JSON is quicker and produces smaller serialized strings. I did not expect that.
When dealing with this problem, I use Thread::Queue to pass my objects around, and usually use Storable to serialise.
I haven't bothered doing performance comparisons, because usually my data-passing overhead isn't the limiting factor.
Note - the key advantage of Storable is that it allows some limited object support (Not - be careful - it only works if your object is self contained):
#!/usr/bin/env perl
use strict;
use warnings;
package MyObject;
sub new {
my ( $class, $id ) = #_;
my $self = {};
$self -> {id} = $id;
$self -> {access_count} = 0;
bless $self, $class;
return $self;
}
sub access_thing {
my ( $self ) = #_;
return $self -> {access_count}++;
}
sub get_id {
my ( $self ) = #_;
return $self -> {id};
}
package main;
use threads;
use Thread::Queue;
use Storable qw ( freeze thaw );
my $thread_count = 10;
my $work_q = Thread::Queue -> new;
sub worker {
while ( my $item = $work_q -> dequeue ) {
my $obj = thaw ( $item );
print $obj -> get_id, ": ", $obj -> access_thing,"\n";
}
}
for (1..$thread_count) {
threads -> create (\&worker);
}
for my $id ( 0..1000 ) {
my $obj = MyObject -> new ( $id );
$work_q -> enqueue ( freeze ( $obj ) );
}
$work_q -> end;
$_ -> join for threads -> list;
If JSON would limit you to array/hash data structures - which may be fine for your use case.
Complex data structures can be shared using shared_clone. The components of the data structure need be cloned before being added to it.
use strict;
use feature 'say';
use Data::Dump qw(dd);
use threads;
use threads::shared;
my $cds = {
k1 => shared_clone( { k1_l2 => [ 1..2 ] } ),
k2 => shared_clone( { k2_l2 => [10..11] } )
};
my #threads = map { async(\&proc_ds, $cds->{$_}) } keys %$cds;
$_->join() for #threads;
dd $cds;
sub proc_ds {
my ($ds) = #_;
lock $ds;
push #{$ds->{$_}}, 10+threads->tid for keys %$ds;
}
Note that you don't want to allow autovivification when working with shared values, as it would create unshared (and empty) components in the structure. Check explicitly for existence.
A ready data structure needs to be cloned-and-shared
my $cds = { k => [ 5..7 ] }; # already built, need be shared
my $cds_share = shared_clone( $cds );
my #threads = map { async(\&proc_ds, $cds_share) } 1..3;
$_->join() for #threads;
With the same proc_ds() as above this prints the structure (condensed output)
{ 'k' => [ '5', '6', '7', '11', '12', '13' ] };
When data structure is populated for sharing, as in the first example, then there is less overhead to pay. Otherwise there is a data copy involved, as in the second example, and whether that is OK depends on details (data size, how often a copy is made, etc).
The idea of serializing data is workable as well, but how suitable it is again depends on details since in that case you'd not only copy data but would go to disks as well.
In that case JSON is certainly one good way to go, being a data format that is simple and readable, and can also be shared between tools. The Storable is binary, works directly with Perl data structures, and is supposed to be fast (what should show with larger data).
One other option is to use a worker model and pass data over a message queue. Then you'd use Thread::Queue, or perhaps make use of Thread::Queue::Any, for communication channels.

Change thread priority ERROR_INVALID_HANDLE

I'm trying to change a thread priority within my script, without success, here are the details.
$thr = threads->new(\&someFunction,
$shared variable 1,
$shared variable 2,
);
I've tried using threads::State;
$thr->priority(2);
Without success
So, I thought the Win32::API must work
my $functionGetLastError= Win32::API->new('Kernel32',
'GetLastError',
'',
'N'
);
my $functionSetThreadPriority= Win32::API->new('Kernel32',
'SetThreadPriority',
'II', # I've tried 'PI' and 'II' as well
'N'
);
my $h = $thr->_handle();
my $success = $functionSetThreadPriority->Call( $h, 2 );
warn "Return Error #".$functionGetLastError->Call() if !$success;
Again, without success: (, but now I have a clue, the script return error number
last Error 6
From MSDN site, System Error Codes (0-499), it seems that the error is
ERROR_INVALID_HANDLE
What am I doing wrong?
$thread->_handle weirdly returns a HANDLE*, while SetThreadPriority expects a HANDLE. You need to dereference the pointer, which you can do as follows:
use constant THREAD_PRIORITY_HIGHEST => 2;
sub SetThreadPriority {
my ($thread, $priority) = #_;
# $thread->_handle() returns a HANDLE*.
my $handle_ptr = $thread->_handle();
my $packed_handle = unpack('P'.HANDLE_SIZE, pack(PTR_FORMAT, $handle_ptr));
my $handle = unpack(HANDLE_FORMAT, $packed_handle);
state $SetThreadPriority = (
Win32::API->new('Kernel32', 'SetThreadPriority', 'Ni', 'i')
or die("Loading SetThreadPriority: $^E\n")
);
return $SetThreadPriority->Call($handle, $priority);
}
Here's the full test program:
use strict;
use warnings;
use feature qw( say state );
use threads;
use threads::shared;
use Carp qw( croak );
use Config qw( %Config );
use Win32::API qw( );
sub uint_format {
$_[0] == 4 ? 'L'
: $_[0] == 8 ? 'Q'
: croak("Unsupported")
}
use constant PTR_SIZE => $Config{ptrsize};
use constant PTR_FORMAT => uint_format(PTR_SIZE);
use constant HANDLE_SIZE => PTR_SIZE;
use constant HANDLE_FORMAT => PTR_FORMAT;
use constant THREAD_PRIORITY_HIGHEST => 2;
sub SetThreadPriority {
my ($thread, $priority) = #_;
# $thread->_handle() returns a HANDLE*.
my $handle_ptr = $thread->_handle();
my $packed_handle = unpack('P'.HANDLE_SIZE, pack(PTR_FORMAT, $handle_ptr));
my $handle = unpack(HANDLE_FORMAT, $packed_handle);
state $SetThreadPriority = (
Win32::API->new('Kernel32', 'SetThreadPriority', 'Ni', 'i')
or die("Loading SetThreadPriority: $^E\n")
);
return $SetThreadPriority->Call($handle, $priority);
}
{
my $done :shared = 0;
my $thread = async {
{ lock($done); cond_wait($done) while !$done; }
};
my $rv = SetThreadPriority($thread, THREAD_PRIORITY_HIGHEST);
say $rv ? "Success" : "Error: $^E";
{ lock($done); $done = 1; cond_broadcast($done); }
$thread->join();
}
Notice that you can use $^E to access GetLastError.
SetThreadPriority($handle, THREAD_PRIORITY_HIGHEST)
or die("SetThreadPriority: $^E\n";
ERROR_INVALID_HANDLE
Which suggests that what _handle returns is not something Win32::API understands. I suspect "P" wants a string buffer not an integer-casted pointer. "I" may be the wrong thing because it's the wrong size on 64-bit, I would try "N" myself.
Also, for future readers running into this issue on Unix: try my POSIX::RT::Scheduler module.

How do you reuse a queue from Thread::Queue?

I was provided some guidance on here at one time, with the following snippet:
my $q = Thread::Queue->new();
sub worker {
my ($job, $action) = #_;
Build($job, $action);
}
for (1..NUM_WORKERS) {
async {
while (defined(my $job = $q->dequeue())) {
worker($job, 'clean');
}
};
}
$q->enqueue($_) for #compsCopy;
# When you're done adding to the queue.
$q->end();
$_->join() for threads->list();
What is the best option for reusing q? Currently, I'm just making new q objects, q2, q3 and doing all of this over again for each $action that I want to perform. Is there a better way though? I could potentially pass in an array of "actions" that I would like to perform, and would like to avoid duplicating this code 7 times if possible.
Maybe I don't fully understand what a Thread::Queue is..
You should use one queue for one direction. If you just would like to some operation paralel, use one queue. If you would like to report back errors and process those error in main or another thread then you use two queue.
for simple use here is for your reference:
use strict;
use warnings;
use threads;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
my %seen: shared;
# Worker thread
my #thrs = threads->create(\&doOperation ) for 1..5;#for 5 threads
add_file_to_q('/tmp/');
$q->enqueue('//_DONE_//') for #thrs;
$_->join() for #thrs;
sub add_file_to_q {
my $dir = shift;
my #files = `ls -1 $dir/`;chomp(#files);
#add files to queue
foreach my $f (#files){
# Send work to the thread
$q->enqueue($f);
print "Pending items: "$q->pending()."\n";
}
}
sub doOperation () {
my $ithread = threads->tid() ;
while (my $filename = $q->dequeue()) {
# Do work on $item
sleep(1) if ! defined $filename;
return 1 if $filename eq '//_DONE_//';
next if $seen{$filename};
print "[id=$ithread]\t$filename\n";
$seen{$filename} = 1;
### add files if it is a directory (check with symlinks, no file with //_DONE_// name!)
add_file_to_q($filename) if -d $filename;
}
return 1;
}

Perl Error: thread failed to start: Invalid value for shared scalar

I get the following error when trying to run my test code:
thread failed to start: Invalid value for shared scalar at ./threaded_test.pl line 47.
Line 47 is:
%hoh = hoh(#new_array);
My observations:
If I remove line 47 and other lines referencing %hoh, then the script runs without errors
I can create a new hash %new_hash = (itchy => "Scratchy"); without errors, but when I try to "return" a hash from another sub (line 47), it results in the error above.
Unfortunately, I cannot use a in/out Queue because the version of Thread::Queue that I use is too old (and installed on a system I have no control over) and doesn't support hash and hash-ref types to be returned via a Queue (according to this). Apparently, my version only support strings to be returned via queues.
Is there a way to successfully do this: $hash{$string}{"jc"} = \%hoh;
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
use constant NUM_WORKERS => 10;
my #out_array : shared = ();
main();
sub main
{
my #results = test1();
foreach my $item (#results) {
print "item: $item\n";
}
}
sub test1
{
my $my_queue = Thread::Queue->new();
foreach (1..NUM_WORKERS) {
async {
while (my $job = $my_queue->dequeue()) {
test2($job);
}
};
}
my #sentiments = ("Axe Murderer", "Mauler", "Babyface", "Dragon");
$my_queue->enqueue(#sentiments);
$my_queue->enqueue(undef) for 1..NUM_WORKERS;
$_->join() for threads->list();
my #return_array = #out_array;
return #return_array;
}
sub test2
{
my $string = $_[0];
my %hash : shared;
my #new_array : shared;
my %new_hash : shared;
my %hoh : shared;
#new_array = ("tom", "jerry");
%new_hash = (itchy => "Scratchy");
%hoh = hoh(#new_array);
my %anon : shared;
$hash{$string} = \%anon;
$hash{$string}{"Grenade"} = \#new_array;
$hash{$string}{"Pipe bomb"} = \%new_hash;
$hash{$string}{"jc"} = \%hoh;
push #out_array, \%hash;
return;
}
sub hoh
{
my %hoh;
foreach my $item (#_) {
$hoh{"jeepers"}{"creepers"} = $item;
}
return %hoh;
}
The problem is that your trying to store a reference to something that isn't shared in a shared variable. You need to use share as previously mentioned, or you need to serialise the data structure.
#!/perl/bin/perl
use strict;
use threads;
use threads::shared;
my %hm_n2g:shared = ();
my $row = &share([]);
$hm_n2g{"aa"}=$row;
$row->[0]=1;
$row->[1]=2;
my #arr = #{$hm_n2g{"aa"}};
print #arr[0]." ".#arr[1]."\n";
#If you want to lock the hash in a thread-subroutine
{
lock(%hm_n2g)
}

Perl Queues and Threading

I'm trying to accomplish the following:
Have a thread that reads data from a very large file say about
10GB and push them into the queue. (I do not wish for the queue to
get very large either)
While the buildQueue thread is pushing data to the queue at the same time have
about 5 worker threads de-queue and process data.
I've made an attempt but my other threads are unreachable because of a continuous loop in my buildQueue thread.
My approach may be totally wrong. Thanks for any help, it's much appreciated.
Here's the code for buildQueue:
sub buildQueue {
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
open DICT_FILE, $dict_path or die("Sorry, could not open file!");
while (1) {
if (<DICT_FILE>) {
if ($queue->pending() < 100) {
my $query = <DICT_FILE>;
chomp($query);
$queue->enqueue($query);
my $count = $queue->pending();
print "Queue Size: $count Query: $query\n";
}
}
}
}
And as I've expected when this thread gets executed nothing else after will be executed because this thread will not finish.
my $builder = new Thread(&buildQueue);
Since the builder thread will be running for a long time I never get to create worker threads.
Here's the entire code:
#!/usr/bin/perl -w
use strict;
use Thread;
use Thread::Queue;
my $queue = new Thread::Queue();
my #threads;
sub buildQueue {
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
open dict_file, $dict_path or die("Sorry, could not open file!");
while (1) {
if (<dict_file>) {
if ($queue->pending() < 100) {
my $query = <dict_file>;
chomp($query);
$queue->enqueue($query);
my $count = $queue->pending();
print "Queue Size: $count Query: $query\n";
}
}
}
}
sub processor {
my $query;
while (1) {
if ($query = $queue->dequeue) {
print "$query\n";
}
}
}
my $builder = new Thread(&buildQueue);
push #threads, new Thread(&processor) for 1..5;
You'll need to mark when you want your threads to exit (via either joinor detach ). The fact that you have infinite loops with no last statements to break out of them is also a problem.
Edit: I also forgot a very important part! Each worker thread will block, waiting for another item to process off of the queue until they get an undef in the queue. Hence why we specifically enqueue undef once for each thread after the queue builder is done.
Try:
#!/usr/bin/perl -w
use strict;
use threads;
use Thread::Queue;
my $queue = new Thread::Queue();
our #threads; #Do you really need our instead of my?
sub buildQueue
{
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
#Three-argument open, please!
open my $dict_file, "<",$dict_path or die("Sorry, could not open file!");
while(my $query=<$dict_file>)
{
chomp($query);
while(1)
{ #Wait to see if our queue has < 100 items...
if ($queue->pending() < 100)
{
$queue->enqueue($query);
print "Queue Size: " . $queue->pending . "\n";
last; #This breaks out of the infinite loop
}
}
}
close($dict_file);
foreach(1..5)
{
$queue->enqueue(undef);
}
}
sub processor
{
my $query;
while ($query = $queue->dequeue)
{
print "Thread " . threads->tid . " got $query\n";
}
}
my $builder=threads->create(\&buildQueue);
push #threads,threads->create(\&process) for 1..5;
#Waiting for our threads to finish.
$builder->join;
foreach(#threads)
{
$_->join;
}
The MCE module for Perl loves big files. With MCE, one can chunk many lines at once, slurp a big chunk as a scalar string, or read 1 line at a time. Chunking many lines at once reduces the overhead for IPC.
MCE 1.504 is out now. It provides MCE::Queue with support for child processes including threads. In addition, the 1.5 release comes with 5 models (MCE::Flow, MCE::Grep, MCE::Loop, MCE::Map, and MCE::Stream) which take care of instantiating the MCE instance as well as auto-tuning max_workers and chunk_size. One may override these options btw.
Below, MCE::Loop is used for the demonstration.
use MCE::Loop;
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
mce_loop_f {
my ($mce, $chunk_ref, $chunk_id) = #_;
foreach my $line ( #$chunk_ref ) {
chomp $line;
## add your code here to process $line
}
} $dict_path;
If you want to specify the number of workers and/or chunk_size, then there are 2 ways to do it.
use MCE::Loop max_workers => 5, chunk_size => 300000;
Or...
use MCE::Loop;
MCE::Loop::init {
max_workers => 5,
chunk_size => 300000
};
Although chunking is preferred for large files, one can compare the time with chunking one line at a time. One may omit the first line inside the block (commented out). Notice how there's no need for an inner for loop. $chunk_ref is still an array ref containing 1 line. The input scalar $_ contains the line when chunk_size equals 1, otherwise points to $chunk_ref.
use MCE::Loop;
MCE::Loop::init {
max_workers => 5,
chunk_size => 1
};
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
mce_loop_f {
# my ($mce, $chunk_ref, $chunk_id) = #_;
my $line = $_;
## add your code here to process $line or $_
} $dict_path;
I hope that this demonstration was helpful for folks wanting to process a file in parallel.
:) mario
It sounds like this case could do with the Parallel::ForkManager module.
A different approach: You can also use user_tasks in MCE 1.2+ and create two multi-worker multithreading tasks, one task for reading (since it's a big file, you could also benefit from parallel reading while preserving file read seek) and one task for processing, etc.
The code below still uses Thread::Queue to manage your buffer queue.
The buildQueue sub has your queue size control and it pushes the data directly to the manager process' $R_QUEUE since we've used threads, so it has access to the parent's memory space. If you want to use forks instead, you can still access the queue through a call back function. But here I chose to simply just push to the queue.
The processQueue sub will simply de-queue whatever is in the queue until there's nothing more pending.
The task_end sub in each task is run only once by the manager process at the end of each task, so we use it to signal a stop to our worker processes.
Obviously, there's a lot of freedom in how you want to chunk your data to the workers, so you can decide upon the size of the chunk or even how to slurp your data in.
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use threads::shared;
use Thread::Queue;
use MCE;
my $R_QUEUE = Thread::Queue->new;
my $queue_workers = 8;
my $process_workers = 8;
my $chunk_size = 1;
print "Enter a file name: ";
my $input_file = <STDIN>;
chomp($input_file);
sub buildQueue {
my ($self, $chunk_ref, $chunk_id) = #_;
if ($R_QUEUE->pending() < 100) {
$R_QUEUE->enqueue($chunk_ref);
$self->sendto('stdout', "Queue Size: " . $R_QUEUE->pending ."\n");
}
}
sub processQueue {
my $self = shift;
my $wid = $self->wid;
while (my $buff = $R_QUEUE->dequeue) {
$self->sendto('stdout', "Thread " . $wid . " got $$buff");
}
}
my $mce = MCE->new(
input_data => $input_file, # this could be a filepath or a file handle or even a scalar to treat like a file, check the documentation for more details.
chunk_size => $chunk_size,
use_slurpio => 1,
user_tasks => [
{ # queueing task
max_workers => $queue_workers,
user_func => \&buildQueue,
use_threads => 1, # we'll use threads to have access to the parent's variables in shared memory.
task_end => sub { $R_QUEUE->enqueue( (undef) x $process_workers ) } # signal stop to our process workers when they hit the end of the queue. Thanks > Jack Maney!
},
{ # process task
max_workers => $process_workers,
user_func => \&processQueue,
use_threads => 1, # we'll use threads to have access to the parent's variables in shared memory
task_end => sub { print "Finished processing!\n"; }
}
]
);
$mce->run();
exit;

Resources