Perl threads sometimes get stuck (some of them waiting for futex) - multithreading

I'm facing a problem with a multi-threaded Perl application that I'm trying to run (on Redhat 7.4 using Perl 5.10.1). The problem has been reproduced through the setup below (setup is similar to the original Perl application):
There are 2 files: main.pl, module1.pm.
main.pl:
The top level script to complete a set of tasks, invoked as "main.pl <NumberOfTestsToRun> <MaxWorkersToUse>"
# main.pl
#!/apps/perl/5.10.1/bin/perl
use strict;
use warnings;
use v5.10.1;
use threads;
use threads::shared;
use module1;
use lib "<path to Thread::Queue module>";
use Thread::Queue;
my $glNumTests = shift(); # Number of tests(jobs) to run
my $glMaxThreads = shift(); # What is the max allowed number of workers (threads) for queue mode
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
sub worker
{
# Thread will loop until no more work
while (defined(my $item = $q->dequeue())) {
# Do work on $item
sleep(5); # dummy pre-work
$item->dummy(); # Actual task that needs to be run by each thread
}
}
sub start_threads
{
my $loNumThreads = shift();
$loNumThreads = $loNumThreads > $glMaxThreads ? $glMaxThreads : $loNumThreads;
print "Creating $loNumThreads threads...";
for ( 1..$loNumThreads )
{
threads->create(\&worker);
}
print "done\n";
}
sub initialize
{
my #lotests;
my $loNumTests = shift();
for my $i (1..$loNumTests)
{
push(#lotests,"Test_".$i);
}
return \#lotests;
}
sub launchjobs
{
my #lotests = #{shift()};
my $tests = {};
# Create objects
foreach my $lotest (#lotests)
{
$tests->{$lotest}->{"obj"} = module1->new($lotest);
}
# Start the threads before adding work to the queue
start_threads(scalar(#lotests));
# Adding work to queue
foreach my $lotest (#lotests)
{
$q->enqueue($tests->{$lotest}->{"obj"});
}
# No more work to be added
$q->end();
# Wait for threads to finish
foreach my $thr ( threads -> list() )
{
$thr->join();
}
}
launchjobs(initialize($glNumTests));
module1.pm:
Support module which runs another command(echo in this case) using the IPC::Run module
# module1.pm
package module1;
use lib "<InstallationPath>/IPC-Run-20200505.0/lib";
use IPC::Run qw( run );
sub new
{
my $class = shift();
my $test = shift();
my $self = {};
$self->{"testName"} = $test;
system("\\mkdir -p test_output/$self->{testName}");
# Create new file track.log for this object
open(my $OFH,">","test_output/$self->{testName}/track.log") || die "Cannot open track.log for writing in new\n";
close($OFH);
bless($self,$class);
return $self;
}
sub logTracker
{
# Writes out the message to track.log file
my $self = shift();
my $message = shift();
open(my $OFH,">>","test_output/$self->{testName}/track.log") || die"Cannot open track.log for writing\n";
print $OFH $message;
close($OFH);
}
sub dummy
{
my $self = shift();
print "running $self->{testName}\n";
my $loCmd = "echo"; # Command to be run
my $loArgs = "This is test $self->{testName}"; # Arguments to the above command
$self->logTracker("Calling run\n");
run [$loCmd,$loArgs],'>&',"test_output/$self->{testName}/output";
$self->logTracker("run completed\n");
}
1;
The output directory structure is as below:
<pwd>/test_output/Test_<TestNumber>/
and has two files - track.log and output in each Test_<TestNumber> directory
The problem I'm facing is that sometimes some of the threads get stuck at the IPC::run command (track.log file doesn't contain the "run completed" line).
For instance, When I last did "main.pl 1000 128" (1000 tests using 128 workers), many threads do not complete and main.pl keeps running (waiting). When I ran strace, I got the following output:
strace: Process 41187 attached with 8 threads
[pid 42343] read(22, <unfinished ...>
[pid 42292] read(20, <unfinished ...>
[pid 42291] read(25, <unfinished ...>
[pid 42282] read(24, <unfinished ...>
[pid 42234] read(43, <unfinished ...>
[pid 42212] read(18, <unfinished ...>
[pid 41187] futex(0x7ff2597939d0, FUTEX_WAIT, 42212, NULL <unfinished ...>
[pid 42338] read(16,
The process tree for the above run is as shown(once all the other threads have completed):
-perl,41187 main.pl 1000 128
|-perl,42614 main.pl 1000 128
|-perl,42615 main.pl 1000 128
|-perl,42616 main.pl 1000 128
|-perl,42617 main.pl 1000 128
|-perl,42618 main.pl 1000 128
|-perl,42620 main.pl 1000 128
|-perl,42621 main.pl 1000 128
|-{perl},42212
|-{perl},42234
|-{perl},42282
|-{perl},42291
|-{perl},42292
|-{perl},42338
`-{perl},42343
I'm not able to understand why this is happening (probably has to do with Threads::Queue or pipes etc.) It would be really great if someone could help me with this. I finally want to be able to run all the tests without any futex wait/hang problems. I have tried :
Using the timeout option of IPC::Run to check if the control returns but no luck here.
Using open3() instead of IPC::Run :- problem still exists and occurs more frequently compared to IPC::Run
Note:
This problem doesn't occur when I use Perl 5.16.3. But I need to make the original application work using Perl 5.10.1, hence it would be great if someone could help me in understanding how to fix this issue :)
In module1::dummy(), if $loCmd is changed to "sleep" and $loArgs to (say) "10", then the problem doesn't appear (giving the impression that it might have something to do with pipes/IO buffers).
The number of threads that get stuck can vary with different runs and also sometimes all the threads complete without any issues (race conditions might be present).

Related

Perl multithreading - thread doesn't start

I need some help, I can't figure out why my thread doesn't want to start. I don't have experience with perl, and was asked to make a script that will process a file row by row. Depending on the row, the process should execute other functions (not in snippet), call the same function on a new file or call the same function on a new file in parallel (thread).
Below, I pasted a snippet of the actual code (removed the non-relevant code).
I'm testing the multithreading part on a function called "test" which should print "ok".
The process executes correctly, "start" is printed, but then it gets stuck and after a brief delay, the process stops executing completely.
A thousand thanks to whoever may help me!
use strict;
use warnings;
use IO::Prompter;
use Getopt::Long;
use Log::Message::Simple;
use File::Basename;
use File::Spec;
use IO::Socket::INET;
use UUID::Tiny ':std';
use threads;
use threads::shared;
# *bunch of code deleted*
process_file( $cmdline{csvfile}, 1 );
sub test {
print "ok\n";
}
sub process_file {
# get parameters
my ( $input_file, $flowid ) = #_;
# init variables
# open input file
open( my $fh, '<:encoding(UTF-8)', $input_folder . $input_file )
or die "Could not open file '$input_file' $!";
# process file
while ( my $row = <$fh> ) {
chomp $row;
#request = split ";", $row;
$flow_type = $request[0];
$flow = $request[1];
# *bunch of code deleted*
$filename = "$flow.csv";
$keep_flowid = $request[2]; # keep flowid?
$tmp_flowid = $keep_flowid ? $flowid : undef; # set flowid
$thread = $request[3];
if ( $thread == 1 ) {
### Create new thread
print "start\n";
my $process_thread = threads->create("test");
$process_thread->join();
}
elsif ( $thread == 0 ) {
# wait on process to complete
process_file( $filename, $tmp_flowid );
}
# *bunch of code deleted*
}
# close file
close $fh or die "Couldn't close inputfile: $input_file";
}
It's hard to say exactly why you're having this problem - the major possiblity seems to be:
$thread = $request[3];
if ($thread == 1){
This is input from your filehandle, so a real possiblity is that "$request[3]" isn't actually 1.
I am a bit suspicious though - your code as use strict; use warnings at the top, but you're not declaring e.g. $thread, $flow etc. with my. That either means you're not using strict, or you're reusing variables - which is a good way to end up with annoying glitches (like this one).
But as it stands - we can't tell you for sure, because we cannot reproduce the problem to test it. In order to do this, we would need some sample input and a MCVE
To expand on the point about threads made in the comments - you may see warnings that they are "Discouraged". The major reason for this, is because perl threads are not like threads in other languages. They aren't lightweight, where in other languages they are. They're perfectly viable solutions to particular classes of problems - specifically, the ones where you need parallelism with more IPC than a fork based concurrency model would give you.
I suspect you are experiencing this bug, fixed in Perl 5.24.
If so, you could work around it by performing your own decoding rather than using an encoding layer.

How to restart child process with Parallel::ForkManager on finish

I'd like to have Parallel::ForkManager use a callback to get something back from a child process and then also restart it. Is that possible? The following is from the Parallel::ForkManager docs:
use strict;
use Parallel::ForkManager;
my $max_procs = 5;
my #names = qw( Fred Jim Lily Steve Jessica Bob Dave Christine Rico Sara );
# hash to resolve PID's back to child specific information
my $pm = new Parallel::ForkManager($max_procs);
# Setup a callback for when a child finishes up so we can
# get it's exit code
$pm->run_on_finish(
sub { my ($pid, $exit_code, $ident) = #_;
print "** $ident just got out of the pool ".
"with PID $pid and exit code: $exit_code\n";
}
);
$pm->run_on_start(
sub { my ($pid,$ident)=#_;
print "** $ident started, pid: $pid\n";
}
);
$pm->run_on_wait(
sub {
print "** Have to wait for one children ...\n"
},
0.5
);
foreach my $child ( 0 .. $#names ) {
my $pid = $pm->start($names[$child]) and next;
# This code is the child process
print "This is $names[$child], Child number $child\n";
sleep ( 2 * $child );
print "$names[$child], Child $child is about to get out...\n";
sleep 1;
$pm->finish($child); # pass an exit code to finish
#####here is where I'd like each child process to restart
}
So when $pm->finish happens, the callback confirms the "child" is "out of the pool." How can I both get the callback to fire and immediately put the child back in the pool as they come out, so that it runs forever?
I think you're misunderstanding what's happening. Under the covers, what Parallel::ForkManager is doing is calling a fork(). Two processes exist at this point, with only a single difference - different PIDs.
Your child process goes and runs some stuff, then exits, generating an exit status, which your parent then reaps.
Restarting the process... well, you just need to fork again and run your code.
Now, what you're doing - a foreach loop, that - foreach array element, forks and then the fork exits.
So really - all your need to do, is call $pm -> start again. How you figure out which one exited (and thus the child name) is more difficult though - your callback runs in the parent process, so data isn't being passed back aside from the exit status of your child. You'll need to figure out some sort of IPC to notify the necessary details.
Although - I'd point out #names isn't a hash, so treating it like one is going to have strange behaviour :).
Have you considered threading as an alternative? Threads are good for shared memory operations passing keyed subprocesses is something it's good at.

Make Perl find "wanted" function multithreaded

I have the following code:
find(\&jpegCompress, #search_paths);
sub jpegCompress()
{
#do processing
}
Currently it steps through each file one by one in series, which is quite slow. Is there anyway to have the jpegCompress function create a thread (if the thread count is < maxThreads) and return to the find function quickly?
The Parallel::ForkManager module provides simple parallel processing. Example:
use Parallel::ForkManager;
$pm = new Parallel::ForkManager($MAX_PROCESSES);
foreach $file (#jpeg_files) {
# Forks and returns the pid for the child:
my $pid = $pm->start and next;
jpegCompress($file);
$pm->finish; # Terminates the child process
}

How to deal with multiple threads in perl which turn into zombie

It seems using pipe in threads might cause the threads turn into zombie. In fact the commands in the pipe truned into zombie, not the threads. This does not happen very time which is annoying since it's hard to find out the real problem. How to deal with this issue? What causes these? Was it related to the pipe? How to avoid this?
The following is the codes that creates sample files.
#buildTest.pl
use strict;
use warnings;
sub generateChrs{
my ($outfile, $num, $range)=#_;
open OUTPUT, "|gzip>$outfile";
my #set=('A','T','C','G');
my $cnt=0;
while ($cnt<$num) {
# body...
my $pos=int(rand($range));
my $str = join '' => map $set[rand #set], 1 .. rand(200)+1;
print OUTPUT "$cnt\t$pos\t$str\n";
$cnt++
}
close OUTPUT;
}
sub new_chr{
my #chrs=1..22;
push #chrs,("X","Y","M", "Other");
return #chrs;
}
for my $chr (&new_chr){
generateChrs("$chr.gz",50000,100000)
}
The following codes will create zombie threads occasionally. Reason or trigger remains unknown.
#paralRM.pl
use strict;
use threads;
use Thread::Semaphore;
my $s = Thread::Semaphore->new(10);
sub rmDup{
my $reads_chr=$_[0];
print "remove duplication $reads_chr START TIME: ",`date`;
return 0 if(!-s $reads_chr);
my $dup_removed_file=$reads_chr . ".rm.gz";
$s->down();
open READCHR, "gunzip -c $reads_chr |sort -n -k2 |" or die "Error: cannot open $reads_chr";
open OUTPUT, "|sort -k4 -n|gzip>$dup_removed_file";
my ($last_id, $last_pos, $last_reads)=split('\t',<READCHR>);
chomp($last_reads);
my $last_length=length($last_reads);
my $removalCnts=0;
while (<READCHR>) {
chomp;
my #line=split('\t',$_);
my ($id, $pos, $reads)=#line;
my $cur_length=length($reads);
if($last_pos==$pos){
#may dup
if($cur_length>$last_length){
($last_id, $last_pos, $last_reads)=#line;
$last_length=$cur_length;
}
$removalCnts++;
next;
}else{
#not dup
}
print OUTPUT join("\t",$last_id, $last_pos, $last_reads, $last_length, "\n");
($last_id, $last_pos, $last_reads)=#line;
$last_length=$cur_length;
}
print OUTPUT join("\t",$last_id, $last_pos, $last_reads, $last_length, "\n");
close OUTPUT;
close READCHR;
$s->up();
print "remove duplication $reads_chr END TIME: ",`date`;
#unlink("$reads_chr")
return $removalCnts;
}
sub parallelRMdup{
my #chrs=#_;
my %jobs;
my #removedCnts;
my #processing;
foreach my $chr(#chrs){
while (${$s}<=0) {
# body...
sleep 10;
}
$jobs{$chr}=async {
return &rmDup("$chr.gz")
}
push #processing, $chr;
};
#wait for all threads finish
foreach my $chr(#processing){
push #removedCnts, $jobs{$chr}->join();
}
}
sub new_chr{
my #chrs=1..22;
push #chrs,("X","Y","M", "Other");
return #chrs;
}
&parallelRMdup(&new_chr);
As the comments on your originating post suggest - there isn't anything obviously wrong with your code here. What might be helpful to understand is what a zombie process is.
Specifically - it's a spawned process (by your open) which has exited, but the parent hasn't collected it's return code yet.
For short running code, that's not all that significant - when your main program exits, the zombies will 'reparent' to init which will clean them up automatically.
For longer running, you can use waitpid to clean them up and collect return codes.
Now in this specific case - I can't see a specific problem, but I would guess it's to do with how you're opening your filehandles. The downside of opening filehandles like you are, is that they're globally scoped - and that's just generally bad news when you're doing thready things.
I would imagine if you changed your open calls to:
my $pid = open ( my $exec_fh, "|-", "executable" );
And then called waitpid on that $pid following your close then your zombies would finish. Test the return from waitpid to get an idea of which of your execs has errored (if any), which should help you track down why.
Alternatively - set $SIG{CHLD} = "IGNORE"; which will mean you - effectively - tell your child processes to 'just go away immediately' - but you won't be able to get a return code from them if they die.

Updating a label from a thread in Perl

I'm using perl on a linux box, and I have 2 devices - a pc(the linux box) and a router/dsl-thingy - on my local net at ip addresses 192.168.1.1 & 192.168.1.2 and am trying to list or show the progress of pinging such + a test of 8 other none existing devices, with the below code, but am having troubles with my StatusLabel updating, any help...
for($i=1;$i<=10;++$i) { # --- $i<$VarClients --- 254
my $thr_List = ("ping$i");
$thr_List = threads->create(\&pingingthreads, "$i");
}
sub pingingthreads{
my #pingpong = ping("$localAddress$i", '-c 1', '-i .2'); # -i may not count for much?
print "Pinging: $localAddress$i\n"; # output goes from address1 - address10 ok
$StatusLabel = "Pinging: $localAddress$i"; # only the last responding one(device) seems to be shown in my statuslabel?!
$val = ($val + 10); # 0.392156863
print "$val\% done...\n"; # goes to 100% for me ok
# $indicatorbar->value( $val ); # I have a ProgressBar and it gets stuck on 20% also
if ($val == 100){$val = 0;
} # reset after scanning
# then after the last ping, update the statusLable:
#my #ParamList = ('something', 'testing', 7, 8, 9);
#$thr5 = threads->create(\&updateStatusLable, #ParamList); # starting a thread within a thread ???
# ping response text...
for( #pingpong ) { # need to do something for none responding clients & any time laps/ping latency..., or *** ???
$pong=$_;
chop ($pong); # Get rid of the trailling \n ??
if ($pong =~ m/1 packets transmitted, 1 received, 0% packet loss/) {
push(#boxs, "$localAddress$i");
} else{
# see the other lines from the ping's output
# print "$pong\n";
}
}
}
# For $localAddress$i icmp_seq=1 Destination Host Unreachable ???
--------------------- # StatusBar/progress label & bar ----------------
my $sb = $main->StatusBar();
$sb->addLabel( -textvariable => \$StatusLabel,
-relief => 'flat',
-font => $font,
-foreground => "$statusbartextColour",
);
my $indicatorbar = $sb->ProgressBar( -padx=>2, -pady=>2, -borderwidth=>2,
-troughcolor=>"$Colour2",
-colors=>[ 0, "$indicatorcolour" ],
-length=>106,
-relief => 'flat',
-value => "$val",
)->pack;
# $val = 0;
# $indicatorbar->value( $val );
=====================================
my $StatusLabel :shared = ();
my $val :shared = (0); # var for progress bar value
I have uploaded my full code here (http://cid-99cdb89630050fff.office.live.com/browse.aspx/.Public) if needed, its in the Boxy.zip...
By default data in Perl threads are private; updates to a variable in one thread will not change the value of that variable in other threads (or in the main thread). You will want to declare $val as a shared variable.
See threads::shared.
I see you have declared $val as shared at the bottom of the script, so I didn't see it until it was too late. Not coincidentally, the Perl interpreter is also not going to see that declaration until it is too late. The top 95% of your program is manipulating the global, thread-private variable $var and not the lexical, shared $var you declare at the end of your script. Move this declaration to the top of the script.
Putting use strict at the top of your program would have caught this and saved you minutes, if not hours, of grief.
You don't. GUI frameworks tend to be not threadsafe. You communicate the info to the thread in which the GUI is run instead. Example
First sorry for replying here, but have lost my cookie or the ability to reply and edit etc...
Thanks ikegami, I will have to play with the example for a while to see if I can work things out and mix it into what I'm doing... but on first sight, looks just right... Thanks very much.
I was able to update the $StatusLabel using:
# in 3 seconds maybe do a fade to: Ready...
my #ParamList = ('ping', 'testing', 4, 5, 6);
$thr2 = threads->create(\&updateStatusLable, #ParamList);
sub updateStatusLable {
# should probably check if threads are running already first???
# and if so ***/*** them ???
my #InboundParameters = #_;
my $tid = threads->tid();
# my $thr_object = threads->self(); # Get a thread's object
# print("This is a new thread\($tid\)... and I am counting to 3...\n");
sleep(3);
$StatusLabel = "Ready..."; # print "Am now trying to change the status bar's label to \"Ready...\"\n";
# try updating better/smoother... My main window needs "focus and a mouse move" I think
# for the new text to appear...
# print('Recieved the parameters: ', join(', ', #InboundParameters), ".\n" );
# $returnedvalue = "the thread should return this...";
# return($returnedvalue); # try returning any value just to test/see how...
}
but will try your method... Thanks again.

Resources