Perl. shared hash threads in 2 files - multithreading

please help me resolve my problems with shared hash threads in 2 files
my first file "H.pl" contain defined hash (very huge with many levels):
#!usr/bin/perl
use strict;
use warnings;
our %h;
%h= (
"hd",
{
"0", {"type", "fix", "ln", "8", "descr", "P"},
"1", {"type", "hex", "ln", "2", "descr", "H"},
},
"hdr",
{
"0", {"type", "fix", "ln", "8", "descr", "E"},
},
);
second file "V.pl" contain main code and threads for change hash:
#!usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
use Data::Dumper;
use lib '.';
require 'H.pl';
our %h;
threads->create(sub{
print "befor change in threads \n";
print Dumper \%h;
$h{"hd"}{"0"}{"value"} = "hello";
$h{"hd"}{"0"}{"descr"} = "R";
$h{"hdr"}{"0"}{"value"} = "hello";
print "after change in threads \n";
print Dumper \%h;
}
);
sleep 1;
print "without threads \n";
print Dumper \%h;
I tried use our %h:shared; in both files, but always get error Invalid value for shared scalar at H.pl
How I can share my hash?
Thanks for help

Per the documentation:
Shared variables can only store scalars, refs of shared variables, or
refs of shared data (discussed in next section):
If you want to share a hash that's not shallow, you need to share the inner references first.
You can use the shared_clone function for that. Adding the following line into V.pl below our %h makes the programs work:
%h = %{ shared_clone(\%h) };
Also, store the result of create to a variable $t and instead of sleep, run $t->join.
It's usually better to use a higher level module like Thread::Queue and only send the data you actually need between threads.

Related

Why this Perl script run out of memory gradually

I have a trouble in running a Perl script in muti-threads. It continued consume memory and finally the system ran out of memory and killed it. It seems that the sub-threads were detached but the system resource were not released when they finished. I am pretty new to Perl and couldn't find which part went wrong. This is part of the script that may cause this problem. Could anyone help me with this?
use strict;
use warnings;
print "different number:\t";
my $num1=<>;
chomp $num1;
if($num1!~/[1 2 3 4 5]/)
{
print "invalid input number\n";
END;
}
my $i=0;
my $no;
my #spacer1;
my $nn;
my #spacer2;
open IN,"file1.txt"or die"$!";
while(<IN>)
{
chomp;
if($_=~ /^>((\d)+)\|((\d)+)/)
{
$no=$1;
$spacer1[$no][0]=$3;
}
else
{
$spacer1[$no][1]=$_;
}
}
close IN;
open IN, "file2.txt" or die "$!";
while(<IN>)
{
chomp;
if($_=~ /^>((\d)+)\|((\d)+)/)
{
$nn=$1;
$spacer2[$nn][0]=$3;
}
else
{
$spacer2[$nn][1]=$_;
}
}
close IN;
#-----------------------------------------------------------------#create threads
use subs qw(sg_ana);
use threads;
use Thread::Semaphore;
my $cycl=(int($no/10000))+1;
my $c;
my #thd;
my $thread_limit= Thread::Semaphore -> new (3);
foreach $c(1..$cycl)
{
$thread_limit->down();
$thd[$c]=threads->create("sg_ana",$c-1,$c,$num1);
$thd[$c]->detach();
}
&waitquit;
#-------------------------------------------------------------#limite threads num
sub waitquit
{
print "waiting\n";
my $num=0;
while($num<3)
{
$thread_limit->down();
$num++;
}
}
#---------------------------------------------------------------#alignment
my $n;
my $n1;
my $j;
my $k;
my $l;
my $m;
my $num;#number of match
my $num2=0;;#arrange num
sub sg_ana
{
my $c1=shift;
my $c2=shift;
$num1=shift;
open OUT,">$num1.$c2.txt" or die "$!";
if($num1==1)
{
foreach $n($c1*10000..$c2*10000-1)
{
if($spacer2[$n][1])
{
my $presult1;
my $presult2;
$num2=-1;
foreach $i(0..19)
{
$num=0;
$num2++;
my $tmp1=(substr $spacer2[$n][1],0,$i)."\\"."w".(substr $spacer2[$n][1],$i+1,19-$i);
foreach $n1(0..#spacer1-1)
{
if($spacer1[$n1][1])
{
my $tmp2=substr $spacer1[$n1][1],0,20;
if($tmp2=~/$tmp1/)
{
$num++;
$presult1.=$n1.",";
}
}
}
$presult2=$i+1;
if($num>=4)
{
print OUT "\n";
}
}
}
}
}
close OUT;
$thread_limit->up();
}
Rule one of debugging perl is enable use strict; and use
warnings; and then sort out the errors. Actually, you should
probably do that first of all, before you even start writing code.
You're creating and limiting threads via a Semaphore - but actually
this is really inefficient because of how perl does threads - they
aren't lightweight, so spawning loads is a bad idea. A better way of doing this is via Thread::Queue a bit like this.
Please use 3 arg open and lexical file handles. e.g. open ( my
$out, '>', "$num.$c2.txt" ) or die $!;. You're probably getting
away with it here, but you have got OUT as a global namespace
variable being used by multiple threads. That way lies dragons.
Don't use single letter variables. And given how you you use $c
then you'd be far better off:
foreach my $value ( 1..$cycl ) {
## do stuff
}
The same is true of all your other single letter variables though - they're not meaningful.
You pass $num before it's initialised, so it's always going to
be undef within your sub. So your actual subroutine is just:
sub sg_ana
{
my $c1=shift;
my $c2=shift;
$num1=shift;
open OUT,">$num1.$c2.txt" or die "$!";
close OUT;
$semaphore->up();
}
Looking at it - I think you may be trying to do something with a shared variable there, but you're not actually sharing it. I can't decode the logic of your program though (thanks to having a load of single letter variables most likely) so I can't say for sure.
You're calling a subroutine &waitquit;. That's not good style -
prefixing with an ampersand and supplying no arguments does
something subtly different to just invoking the sub 'normally' - so
you should avoid it.
Don't instantiate your semaphore like this:
my $semaphore=new Thread::Semaphore(3);
That's an indirect procedure call, and bad style. It would be better written as:
my $thread_limit = Thread::Semaphore -> new ( 3 );
I would suggest rather than using Semaphores like that, you'd be far better off not detatching your threads, and just using join. You also don't need an array of threads - threads -> list does that for you.
I can't reproduce your problem, because your sub isn't doing
anything. Have you by any chance modified it for posting? But a classic reason for perl memory exhaustion when threading is because each thread clones the parent process - and so 100 threads is 100x the memory.

Perl: the value of variable $^T in child threads

$^T stores the start time of a perl program in second since epoch.
Because I need to know how many seconds a child thread costs, the question is:
Does $^T in child thread store the beginning time of itself? or simply copy the value from its mother thread?
Running this:
#!/usr/bin/env perl
use strict;
use warnings;
sub test_th {
print $^T,"\n";
}
print $^T."\n"
sleep 10;
my $thr = threads -> create ( \&test_th );
$thr -> join;
Prints the same value twice.
Which is as expected, since when you thread, you effectively inherit all your parent variables.
If you try this via forking:
#!/usr/bin/env perl
use strict;
use warnings;
use Parallel::ForkManager;
print $^T, "\n";
for ( 1 .. 2 ) {
sleep 10;
$mgr->start and next;
print $^T, "\n";
$mgr->finish;
}
$mgr->wait_all_children;
You get the same value, despite the 'start' of the fork being 10s later.
So to answer your question - no, $^T is started at program instantiation. If you wish to measure things like thread run times, you'll have to find other ways of doing it.
Although, given "elapsed time" is at best a very crude metric (processors doing things like scheduling, such that 'real time' and 'run time' don't really correlate particularly)
But perhaps calling time() at start and end of each thread would give you what you need? Or perhaps something like Devel::NYTProf?
A quick test will reveal that $^T is defined for the populated at process startup, not at thread startup
But nothing's stopping you from noting when the thread starts. You can even save the time stamp in $^T since it's a per-thread variable!
use feature qw( say );
use threads;
sub thread {
my ($n) = #_;
sleep $n;
}
sub wrapper {
my ($n) = #_;
$^T = time;
thread($n);
say sprintf "Thread %s ran for %s seconds.", threads->tid, time-$^T;
}
async { wrapper(5) };
sleep 2;
async { wrapper(2) };
$_->join for threads->list;
Output:
Thread 2 ran for 2 seconds.
Thread 1 ran for 5 seconds.
Note that assigning to $^T coerces the stored value into an integer, so it would not be an appropriate place to store the result of Time::HiRes::time().

Perl seg fault while joining threads

I have a code similar to the below. I have one main script which is calling another module named initial.pm. initial.pm opens up connection with an AMQP server (In my case RabbitMQ)and using Net::AMQP::RabbitMQ library for establishing the connection. Everything works fine except when I try to join my threads I get segmentation fault.
I think the Net::AMQP::RabbitMQ is not thread safe. But this is only being used by the main thread. Im pretty sure you can reproduce the error if you just copy and past the codes below.
How do I fix it ?
main.pl
#!/usr/bin/perl
use Cwd qw/realpath/;
use File::Basename qw/dirname/;
use lib 'lib';
use threads;
use threads::shared;
use initial;
my #threads = ();
my $run :shared = 1;
my $init = load initial($name);
$SIG{'TERM'} = sub {
$run = 0;
};
threads->create(\&proc1);
threads->create(\&proc2);
while($run){
sleep(1);
print "I am main thread\n";
}
$_->join() for threads->list();
sub proc1 {
while($run){
sleep(1);
print "I am child thread 1 \n"
}
}
sub proc2 {
while($run){
sleep(1);
print "I am child thread 2 \n";
}
}
lib/initial.pm
package initial;
use Net::AMQP::RabbitMQ;
use Cwd qw/realpath/;
use File::Basename qw/dirname/;
my $mq;
my $stop = 0;
sub load {
my $class = shift;
my $self = {};
connectq();
bless $self,$class;
return $self;
}
sub connectq {
$mq = Net::AMQP::RabbitMQ->new();
my ($host,$port,$user,$pass) = ('localhost','5672','guest','guest');
$mq->connect($host, {
user => $user,
password => $pass,
port => $port,
timeout => 10,
});
$mq->channel_open(1);
$mq->consume(1, 'logger');
}
1;
I can't reproduce your problem directly, because I don't have the library installed.
One way of 'faking' thread safety in a not-thread-safe module is to rescope your 'use' to only the bit where you'll be using it.
You see, when you start a thread, it copies the program state - loaded libraries and everything.
If your run (something like):
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
sub thread1 {
print threads->self->tid.": Includes:", Dumper \%INC,"\n";
}
#main;
print "Main includes:", Dumper \%INC,"\n";
threads -> create ( \&thread1 );
You'll see XML::Twig is loaded in both. If the process of 'loading' the module causes some state changes (and it can) then you immediately have a potential thread-safety issue.
However if you instead do:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use Data::Dumper;
sub thread1 {
require XML::Twig;
XML::Twig -> import;
print threads->self->tid.": Includes:", Dumper (\%INC),"\n";
}
#main;
print "Main includes:", Dumper (\%INC),"\n";
threads -> create ( \&thread1 );
foreach my $thr ( threads -> list ) {
$thr -> join;
}
You effectively cause the module to be dynamically loaded within the thread - the module is only present in one 'code instance' so you are much less likely to be tripped up by 'thread safety' issues.
Alternatively - forking instead of threading ... might be an alternative. This has slightly different 'safety' problems.
But there really is no way to avoid this. Even with shared variables, the core problem is - when you thread, bits of code happen in a different order. There's all sorts of fruity things that can happen as a result. A shared var is one way of ensuring it's the same variable being checked each time - e.g. share $init, but that may well make things worse, because you're then potentially trampling over the same instance/sockets with different threads.
You can, however, reduce the 'thread safety' problem to a limited scope, and use e.g. Thread::Queue to pass messages to/from your 'module user' thread.

Perl Hash reference is changing in threads

I am using perl with threads to process socket info
ConnectionThread
is responsible for receiving packets and parsing, and enque in a hash
QueueThread
Is responsible for processing Queue elelemts (hash entries ) and update DB
The Hash is Events and declared as my %Events:shared;
I am passing a hash reference to the Threads, but i noticed that each thread is getting a difference hash ref value
my $hash_ref1 = \%Events ; # referencing
print "Hash ref in main is 1 " . $hash_ref1 ."\n";
my $thr1 = threads->create(\&ConnectionThread, $hash_ref1 );
my $thr2 = threads->create(\&QueueThread, $hash_ref1);
The output is as below
Hash ref in main is 1 HASH(0x825faa4)
Hash ref is ConnectionThread is HASH(0x8493544)
Thread started ..
Hash ref is Queue thread is HASH(0x852dd9c)
below is the full code ( illustrative )
use strict;
use warnings;
use Socket;
use threads;
use threads::shared;
use DBI;
my %Events:shared;
sub checkSize {
my $size;
$size =keys %Events;
print "Size of Queue in Check is ***" .$size ." \n";
}
sub QueueThread {
my ($hash_ref) = #_;
print "Hash ref is Queue thread is " . $hash_ref ." \n";
while (1==1) {
sleep (5);
}
}
sub ConnectionThread {
my ( $hash_ref ) = #_;
print "Hash ref is ConnectionThread is " . $hash_ref ." \n";
while (1==1) {
sleep(5);
}
}
my $hash_ref1 = \%Events;
print "Hash ref in main is 1 " . $hash_ref1 ."\n";
my $thr1 = threads->create(\&ConnectionThread, $hash_ref1 );
my $thr2 = threads->create(\&QueueThread, $hash_ref1);
print "Thread started ..\n";
while (1==1) {
sleep 10;
}
You are not directly accessing the same variable in all threads. If you did, you'd have to explicitly guarantee mutual access to the variable every time you access it (even if just to read it) to avoid crashing the program.
Each thread (including the one in which the variable is created) gets a
"proxy" to the data-containing variable. The proxy is a magical variable, meaning accessing the elements of the proxy results in getters and setters being called. These getters and setters ensure the data-containing variable is never in an inconsistent state by providing mutually exclusive access to it.
$ perl -Mthreads -Mthreads::shared -MDevel::Peek \
-E'my %h :shared; ( async { Dump(%h); } )->join; Dump(%h);' 2>&1 |
grep -P '^\S|^ {8}IV'
SV = PVHV(0x1ed5f90) at 0x1f6fd68 <-----+
IV = 31930352 <--+ |
SV = PVHV(0x1d70af0) at 0x1d880d0 <--|--+------ Two different vars
IV = 31930352 <--+--------- Proxying the same var (0x1e737f0)
Yes, this will happen. Threads do not share memory. You can sort of fake it with shared which allows you to have common variables - but you won't necessarily see the same hash memory locations.
Despite %Events being shared that's not going to print the same memory address in each thread if you print \%Events;
Given you're talking about queueing though, can I suggest instead using Thread::Queue which allows you to 'do' queue/dequeue operations in a nice easy and thread safe manner.

Perl: Correctly passing array for threads to work on

I'm learning how to do threading in Perl. I was going over the example code here and adapted the solution code slightly:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Semaphore;
my $sem = Thread::Semaphore->new(2); # max 2 threads
my #names = ("Kaku", "Tyson", "Dawkins", "Hawking", "Goswami", "Nye");
my #threads = map {
# request a thread slot, waiting if none are available:
foreach my $whiz (#names) {
$sem->down;
threads->create(\&mySubName, $whiz);
}
} #names;
sub mySubName {
return "Hello Dr. " . $_[0] . "\n";
# release slot:
$sem->up;
}
foreach my $t (#threads) {
my $hello = $t->join();
print "$hello";
}
Of course, this is now completely broken and does not work. It results in this error:
C:\scripts\perl\sandbox>threaded.pl
Can't call method "join" without a package or object reference at C:\scripts\perl\sandbox\threaded.pl line 24.
Perl exited with active threads:
0 running and unjoined
9 finished and unjoined
0 running and detached
My objective was two-fold:
Enforce max number of threads allowed at any given time
Provide the array of 'work' for the threads to consume
In the original solution, I noticed that the 0..100; code seems to specify the amount of 'work' given to the threads. However, in my case where I want to supply an array of work, do I still need to supply something similar?
Any guidance and corrections very welcome.
You're storing the result of foreach into #threads rather than the result of threads->create.
Even if you fix this, you collect completed threads too late. I'm not sure how big of a problem that is, but it might prevent more than 64 threads from being started on some systems. (64 is the max number of threads a program can have at a time on some systems.)
A better approach is to reuse your threads. This solves both of your problems and avoids the overhead of repeatedly creating threads.
use threads;
use Thread::Queue 3.01 qw( );
use constant NUM_WORKERS => 2;
sub work {
my ($job) = #_;
...
}
{
my $q = Thread::Queue->new();
for (1..NUM_WORKERS) {
async {
while (my $job = $q->dequeue()) {
work($job);
}
};
}
$q->enqueue(#names); # Can be done over time.
$q->end(); # When you're done adding.
$_->join() for threads->list();
}

Resources