Perl Thread Safe Modules - multithreading

I am trying to take a Perl program I wrote and thread it. The problem is I read that some modules aren't "thread safe". How do I know if a module is thread safe? I've looked around for a list and cannot locate one.
To test out one module I use frequently (Text::CSV_XS) I tried the following code out:
use strict;
use warnings;
use threads;
use threads::shared;
require Text::CSV_XS;
my $CSV = Text::CSV_XS->new ({ binary => 1, eol => "\n" }) or die("Cannot use CSV: ".Text::CSV->error_diag());
open my $OUTPUT , ">:encoding(utf8)", "test.csv" or die("test.csv: $!");
share($CSV);
my $thr1 = threads->create(\&sayHello('1'));
my $thr2 = threads->create(\&sayHello('2'));
my $thr3 = threads->create(\&sayHello('3'));
sub sayHello
{
my($num) = #_;
print("Hello thread number: $num\n");
my #row = ($num);
lock($CSV);{
$CSV->print($OUTPUT, \#row);
$OUTPUT->autoflush(1);
}#lock
}#sayHello
The output I receive is the following:
Hello thread number: 1
Segmentation fault
Does this mean the module is not thread safe, or is it another problem?
Thanks

Generally speaking, core and high-visibility modules are thread-safe unless their documentation says otherwise.
That said, there are a few missteps in your post:
share($CSV)
This clears $CSV (a blessed hashref), just as documented in threads. Generally, you want to share() complex objects prior to initialization or, perhaps in this case, share() some dumb $lock variable between threads.
Since $CSV holds state for the underlying XS, this might lead to undefined behavior.
But this isn't your segfault.
threads->create(\&sayHello('1'));
You are mistakenly invoking sayHello(1) in the main thread and passing a reference to its return value to threads->create() as a (bogus) start routine.
You meant to say:
threads->create(\&sayHello, '1');
But this isn't your segfault.
(EDIT Just to clarify -- a bad start routine here doesn't risk a SEGV in any case. threads::create properly complains if an unrecognized subroutine name or non-CODE ref is passed in. In your case, however, you are segfaulting too quickly to reach this error handling.)
Encodings are not thread-safe.
Again as documented in encodings, the encoding module is not thread-safe.
Here's the smallest possible code I could get to reproduce your symptoms:
use threads;
open my $OUTPUT , ">:encoding(utf8)", "/dev/null" or die $!;
threads->create( sub {} )->join;
That's perl 5.12.1 with threads-1.77 on i686-linux-thread-multi, if you're interested. Drop the "utf8" magic, and it works just fine.
This is your segfault

Related

SOAP::Lite - clients using version 1.1 and 1.2 threaded in mod_perl

I have several SOAP::Lite clients running under mod_perl in Apache hhtpd.
Some of them use 1.1 soap-servers and some of them use 1.2 servers. So I have code like:
# In client 1:
my $soap1 = SOAP::Lite->soapversion("1.1");
$result1 = $soap1->method1();
# In client 2:
my $soap2 = SOAP::Lite->soapversion("1.2");
$result2 = $soap2->method2();
This works in stand-alone clients, but when I run the code under mod_perl, I seem to get stung by that the soapversion
method has side-effects:
# From SOAP::Lite.pm
sub soapversion {
my $self = shift;
my $version = shift or return $SOAP::Constants::SOAP_VERSION;
($version) = grep {
$SOAP::Constants::SOAP_VERSIONS{$_}->{NS_ENV} eq $version
} keys %SOAP::Constants::SOAP_VERSIONS
unless exists $SOAP::Constants::SOAP_VERSIONS{$version};
die qq!$SOAP::Constants::WRONG_VERSION Supported versions:\n#{[
join "\n", map {" $_ ($SOAP::Constants::SOAP_VERSIONS{$_}->{NS_ENV})"} keys %SOAP::Constants::SOAP_VERSIONS
]}\n!
unless defined($version) && defined(my $def = $SOAP::Constants::SOAP_VERSIONS{$version});
foreach (keys %$def) {
eval "\$SOAP::Constants::$_ = '$SOAP::Constants::SOAP_VERSIONS{$version}->{$_}'";
}
$SOAP::Constants::SOAP_VERSION = $version;
return $self;
}
This is what I believe happens:
Basically, the soapversion call rededefines essential constants in $SOAP::Constants. And since this is mod_perl, the $SOAP::Constants are global and shared between every server-thread (I believe. Please correct me if I'm wrong). This leads to a race-condition: Most of the times, the codelines gets executed more-or-less in the sequence seen above. But once in at while (actually about 2% of the calls) the execution sequence is:
Thread1: my $soap1 = SOAP::Lite->soapversion("1.1");
Thread2: my $soap2 = SOAP::Lite->soapversion("1.2");
Thread1: $result1 = $soap1->method1();
Thread2: $result2 = $soap2->method2();
And so, the $soap1->method1() gets called with $SOAP::Constants set as befitting version 1.2 - causing several namespace to be wrong, notably:
xmlns:soapenc="http://www.w3.org/2003/05/soap-encoding"
Which is wrong for 1.1 - who prefers:
xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
If I could somehow make $SOAP::Constants localized to a serverthread or something like that, I guess things would be fine. But any solution will be appreciated.
Run Apache with the prefork model instead of the threading model (mpm_prefork_module instead of mpm_event_module or mpm_worker_module), so that each Apache child will have its own Perl interpreter, hence its own set of constants.
Otherwise have a look on the modperl documentation regarding the PerlOptions directive, specifically the clone and/or parent value. But stop using threads seem simpler to me, threads and Perl were never friends.

Perl multithreading - thread doesn't start

I need some help, I can't figure out why my thread doesn't want to start. I don't have experience with perl, and was asked to make a script that will process a file row by row. Depending on the row, the process should execute other functions (not in snippet), call the same function on a new file or call the same function on a new file in parallel (thread).
Below, I pasted a snippet of the actual code (removed the non-relevant code).
I'm testing the multithreading part on a function called "test" which should print "ok".
The process executes correctly, "start" is printed, but then it gets stuck and after a brief delay, the process stops executing completely.
A thousand thanks to whoever may help me!
use strict;
use warnings;
use IO::Prompter;
use Getopt::Long;
use Log::Message::Simple;
use File::Basename;
use File::Spec;
use IO::Socket::INET;
use UUID::Tiny ':std';
use threads;
use threads::shared;
# *bunch of code deleted*
process_file( $cmdline{csvfile}, 1 );
sub test {
print "ok\n";
}
sub process_file {
# get parameters
my ( $input_file, $flowid ) = #_;
# init variables
# open input file
open( my $fh, '<:encoding(UTF-8)', $input_folder . $input_file )
or die "Could not open file '$input_file' $!";
# process file
while ( my $row = <$fh> ) {
chomp $row;
#request = split ";", $row;
$flow_type = $request[0];
$flow = $request[1];
# *bunch of code deleted*
$filename = "$flow.csv";
$keep_flowid = $request[2]; # keep flowid?
$tmp_flowid = $keep_flowid ? $flowid : undef; # set flowid
$thread = $request[3];
if ( $thread == 1 ) {
### Create new thread
print "start\n";
my $process_thread = threads->create("test");
$process_thread->join();
}
elsif ( $thread == 0 ) {
# wait on process to complete
process_file( $filename, $tmp_flowid );
}
# *bunch of code deleted*
}
# close file
close $fh or die "Couldn't close inputfile: $input_file";
}
It's hard to say exactly why you're having this problem - the major possiblity seems to be:
$thread = $request[3];
if ($thread == 1){
This is input from your filehandle, so a real possiblity is that "$request[3]" isn't actually 1.
I am a bit suspicious though - your code as use strict; use warnings at the top, but you're not declaring e.g. $thread, $flow etc. with my. That either means you're not using strict, or you're reusing variables - which is a good way to end up with annoying glitches (like this one).
But as it stands - we can't tell you for sure, because we cannot reproduce the problem to test it. In order to do this, we would need some sample input and a MCVE
To expand on the point about threads made in the comments - you may see warnings that they are "Discouraged". The major reason for this, is because perl threads are not like threads in other languages. They aren't lightweight, where in other languages they are. They're perfectly viable solutions to particular classes of problems - specifically, the ones where you need parallelism with more IPC than a fork based concurrency model would give you.
I suspect you are experiencing this bug, fixed in Perl 5.24.
If so, you could work around it by performing your own decoding rather than using an encoding layer.

basic chat system on perl under linux

Im trying to write some basic chat system just to learn perl. Im trying to get the chatlog into a 1 file and print new message if it's appears in the chatlog.dat file, So i've wrote a function that does almost the same thing, but I have got some problems and don't know how to solve them.
So now I have 2 problems!
I could not understand how to keep checkFile function always active (like multiprocession) to continuously check for new messages
This problem occurs when I'm trying to write a new message that will be appended into the chatlog. The Interpreter waits for my input on the line my $newMessage = <STDIN>;, but, what if someone writes a new message? it will not be shown until he press enter... how to void that?
my ($sec,$min,$hour) = localtime();
while(1){
my $userMessage = <STDIN>;
last if $userMessage eq "::quit";
`echo "($hour:$min:$sec): $userMessage" >>chatlog.dat`;
}
sub checkFile{
my $lastMessage = "";
my $newMessage = "";
while (1) {
my $context = `cat chatlog.dat`;
split(/\n/, $context);
$newMessage = $_[$#_];
if ($newMessage ne $lastMessage) {
print $newMessage;
$lastMessage = $newMessage;
}
}
}
First:
don't use echo within a perl script. It's nasty to shell escape when you've got perfectly good IO routines.
using cat to read files is about as nasty as using 'echo'.
reading <STDIN> like that will be a blocking call - which means your script will pause.
but that's not as bad as it sounds, because otherwise you're running a 'busy wait' loop which'll repeatedy cat the file. This is a very bad idea.
You're assuming writing a file like that is an atomic operation, when it's not. You'll hit problems with doing that too.
What I would suggest you do it look at IO::Handle and also consider using flock to ensure you've got the file locked for IO. You may also wish to consider File::Tail instead.
I would actually suggest though, you want to consider a different mode of IPC - as 'file swapping' is quite inefficient. If you really want to use the filesystem for your IO, you might want to consider using a FIFO pipe - have each 'client' open it's own, and have a server reading and coalescing them.
Either way though - you'll either need to use IO::Select or perhaps multithreading, just to swap back and forth between reading and writing. http://perldoc.perl.org/IO/Select.html
Answering my own question
sub checkFile{
my $lastMessage = "";
my $newMessage = "";
my $userName = $_[0];
while (1) {
my $context = `cat chatlog.dat`;
split(/\n/, $context);
$newMessage = $_[$#_];
if ($newMessage ne $lastMessage) {
$newMessage =~ /^\(.+\):\((.+)\) (.+$)/;
if ($1 ne $userName) { print "[$1]: $2";}
$lastMessage = $newMessage;
}
}
}
my $userName = "Rocker";
my ($sec,$min,$hour) = localtime();
my $thr = threads -> create ( \&checkFile, $userName ); #Starting a thread to continuously check for the file update
while (1) {
my $userMessage = <STDIN>; #STDIN will not interfere file checking
last if $userMessage eq "::quit";
`echo "($hour:$min:$sec):($userName) $userMessage" >>chatlog.dat` if $userMessage =~ /\S+/;
}
$thr -> join();

How to deal with multiple threads in perl which turn into zombie

It seems using pipe in threads might cause the threads turn into zombie. In fact the commands in the pipe truned into zombie, not the threads. This does not happen very time which is annoying since it's hard to find out the real problem. How to deal with this issue? What causes these? Was it related to the pipe? How to avoid this?
The following is the codes that creates sample files.
#buildTest.pl
use strict;
use warnings;
sub generateChrs{
my ($outfile, $num, $range)=#_;
open OUTPUT, "|gzip>$outfile";
my #set=('A','T','C','G');
my $cnt=0;
while ($cnt<$num) {
# body...
my $pos=int(rand($range));
my $str = join '' => map $set[rand #set], 1 .. rand(200)+1;
print OUTPUT "$cnt\t$pos\t$str\n";
$cnt++
}
close OUTPUT;
}
sub new_chr{
my #chrs=1..22;
push #chrs,("X","Y","M", "Other");
return #chrs;
}
for my $chr (&new_chr){
generateChrs("$chr.gz",50000,100000)
}
The following codes will create zombie threads occasionally. Reason or trigger remains unknown.
#paralRM.pl
use strict;
use threads;
use Thread::Semaphore;
my $s = Thread::Semaphore->new(10);
sub rmDup{
my $reads_chr=$_[0];
print "remove duplication $reads_chr START TIME: ",`date`;
return 0 if(!-s $reads_chr);
my $dup_removed_file=$reads_chr . ".rm.gz";
$s->down();
open READCHR, "gunzip -c $reads_chr |sort -n -k2 |" or die "Error: cannot open $reads_chr";
open OUTPUT, "|sort -k4 -n|gzip>$dup_removed_file";
my ($last_id, $last_pos, $last_reads)=split('\t',<READCHR>);
chomp($last_reads);
my $last_length=length($last_reads);
my $removalCnts=0;
while (<READCHR>) {
chomp;
my #line=split('\t',$_);
my ($id, $pos, $reads)=#line;
my $cur_length=length($reads);
if($last_pos==$pos){
#may dup
if($cur_length>$last_length){
($last_id, $last_pos, $last_reads)=#line;
$last_length=$cur_length;
}
$removalCnts++;
next;
}else{
#not dup
}
print OUTPUT join("\t",$last_id, $last_pos, $last_reads, $last_length, "\n");
($last_id, $last_pos, $last_reads)=#line;
$last_length=$cur_length;
}
print OUTPUT join("\t",$last_id, $last_pos, $last_reads, $last_length, "\n");
close OUTPUT;
close READCHR;
$s->up();
print "remove duplication $reads_chr END TIME: ",`date`;
#unlink("$reads_chr")
return $removalCnts;
}
sub parallelRMdup{
my #chrs=#_;
my %jobs;
my #removedCnts;
my #processing;
foreach my $chr(#chrs){
while (${$s}<=0) {
# body...
sleep 10;
}
$jobs{$chr}=async {
return &rmDup("$chr.gz")
}
push #processing, $chr;
};
#wait for all threads finish
foreach my $chr(#processing){
push #removedCnts, $jobs{$chr}->join();
}
}
sub new_chr{
my #chrs=1..22;
push #chrs,("X","Y","M", "Other");
return #chrs;
}
&parallelRMdup(&new_chr);
As the comments on your originating post suggest - there isn't anything obviously wrong with your code here. What might be helpful to understand is what a zombie process is.
Specifically - it's a spawned process (by your open) which has exited, but the parent hasn't collected it's return code yet.
For short running code, that's not all that significant - when your main program exits, the zombies will 'reparent' to init which will clean them up automatically.
For longer running, you can use waitpid to clean them up and collect return codes.
Now in this specific case - I can't see a specific problem, but I would guess it's to do with how you're opening your filehandles. The downside of opening filehandles like you are, is that they're globally scoped - and that's just generally bad news when you're doing thready things.
I would imagine if you changed your open calls to:
my $pid = open ( my $exec_fh, "|-", "executable" );
And then called waitpid on that $pid following your close then your zombies would finish. Test the return from waitpid to get an idea of which of your execs has errored (if any), which should help you track down why.
Alternatively - set $SIG{CHLD} = "IGNORE"; which will mean you - effectively - tell your child processes to 'just go away immediately' - but you won't be able to get a return code from them if they die.

Perl Reading from Thread::Queue with timeout

I am working in a boss worker crew multithreaded scenario with Thread::Queue in Perl.
The boss enqueues tasks and the workers dequeue from the queue.
I need to achieve that the worker crew sends downstream ping messages in case the boss does not send a task via the queue for x seconds.
Unfortunately there seems to be no dequeue method with a timeout.
Have I missed something or would you recommend a different approach/different data structure?
You can add the functionality yourself, knowing that a Thread::Queue object is a blessed reference to a shared array (which I believe is the implementation from 5.8 through 5.16):
package Thread::Queue::TimedDequeue;
use parent 'Thread::Queue';
use threads::shared qw(lock cond_timedwait);
sub timed_dequeue {
my ($q, $patience) = #_; # XXX revert to $q->dequeue() if $patience is negative?
# $q->dequeue_nb() if $patience is zero?
my $timelimit = time() + $patience;
lock(#$q);
until (#$q) {
last if !cond_timedwait(#$q, $timelimit);
}
return shift if #$q; # We got an element
# else we timed out.
}
1;
Then you'd do something like:
# main.pl
use threads;
use strict; use warnings;
use Thread::Queue::TimedDequeue;
use constant WORKER_PATIENCE => 10; # 10 seconds
my $queue = Thread::Queue::TimedDequeue->new();
...
sub worker {
my $item = $queue->dequeue(WORKER_PATIENCE);
timedout() unless $item;
...
}
Note that the above approach assumes you do not enqueue undef or an otherwise false value.
There is nothing wrong with your approach/structure, you just need to add some timeout control over your "Thread::Queue". That is either:
create some "yield" based loop to check your queue from the child side while using a time reference to detect timeout.
use the "Thread::Queue::Duplex" or "Thread::Queue::Multiplex" modules which might be a bit overill but do implement timeout controls.

Resources