perl multithreading: capturing stdio of subthread childs with "mixed" results - multithreading

I wrote a massively multithreaded application in perl which basically scans a file- or directory-structure for changes (either using inotify or polling). When it detects changes, it launches subthreads that execute programs with the changed files as an argument, according to a configuration.
This works quite nice so far, with the exception that my application also tries to capture stdout and stderr of the externally executed programs and write them to log files in a structured manner.
I am, however, experiencing an occasional but serious mixup of output here, in the way that every when and then (usually under heavy workload, of course, so that the normal tests always run fine) stdout from a program on thread A gets into the stdout pipe FH of another program running on thread B at the very same time.
My in-thread code to run the externally executed programs and capture the output from them looks like this:
my $out;
$pid = open($out, "( stdbuf -oL ".$cmd." | stdbuf -oL sed -e 's/^/:::LOG:::/' ) 2>&1 |") or xlog('async execution failed for: '.$cmd, LOG_LEVEL_NORMAL, LOG_ERROR);
# catch all worker output here
while(<$out>)
{
if( $_ =~ /^:::LOG:::/ )
{
push(#log, $wname.':::'.$acnt.':::'.time().$_);
} else {
push(#log, $wname.':::'.$acnt.':::'.time().':::ERR:::'.$_);
}
if (time() - $last > 1)
{
mlog($acnt, #log);
$last = time();
#log = ();
}
}
close($out);
waitpid($pid, 0);
push(#log, $wname.':::'.$acnt.':::'.time().':::LOG:::--- thread finished ---');
stdbuf is being used here to suppress buffering delays whereever possible and the sed pipe is being used to avoid the need of handling multiple fds in the reader while still being able to separate normal output from errors.
Captured log lines are being stuffed into a local array by the while loop and every other second contents of that array are handed over to a thread-safe global logging method using semaphores that makes sure nothing gets mixed up.
To avoid unneccesary feedback loops from you: I certainly have made sure (using debug output) that the output really is mixed up on the thread level already and is not a result of locking mistakes later in the output chain!
My Question is: how can it be, that the thread-locally defined $out FH from thread A does receive output that definitely comes from a totally different program running in thread B and therefor should end up in the separately defined thread-local $out FH of thread B? Did I make a grave mistake at some point here or is it just that perl threading is a mess? And, finally, what would be a recommended way to separate the data properly (and preferably in some elegant way)?
Update: due to popular demand I have added the full thread method here:
sub async_command {
my $wname = shift;
my $cmd = shift;
my $acnt = shift;
my $delay = shift;
my $errlog = shift;
my $last = time();
my $pid = 0;
my #log;
my $out;
push(#log, $wname.':::'.$acnt.':::'.$last.':::LOG:::--- thread started ---'.($delay ? ' (sleeping for '.$delay.' seconds)':''));
push(#log, $wname.':::'.$acnt.':::'.$last.':::ERR:::--- thread started ---') if ($errlog);
if ($delay) { sleep($delay); }
# Start worker with output pipe. stdbuf prevents unwanted buffering
# sed tags stdout vs stderr
$pid = open($out, "( stdbuf -oL ".$cmd." | stdbuf -oL sed -e 's/^/:::LOG:::/' ) 2>&1 |") or xlog('async execution failed for: '.$cmd, LOG_LEVEL_NORMAL, LOG_ERROR);
# catch all worker output here
while(<$out>)
{
if( $_ =~ /^:::LOG:::/ )
{
push(#log, $wname.':::'.$acnt.':::'.time().$_);
} else {
push(#log, $wname.':::'.$acnt.':::'.time().':::ERR:::'.$_);
}
if (time() - $last > 1)
{
mlog($acnt, #log);
$last = time();
#log = ();
}
}
close($out);
waitpid($pid, 0);
push(#log, $wname.':::'.$acnt.':::'.time().':::LOG:::--- thread finished ---');
push(#log, $wname.':::'.$acnt.':::'.time().':::ERR:::--- thread finished ---') if ($errlog);
mlog($acnt, #log);
byebye();
}
So... here you can see that #log as well as $out are thread-local variables. The xlog (global log) and mlog-methods (worker logs) actually use Thread::Queue for further processing. I just dont want to use it more than once a second per thread to avoid too much locking overhead.
I have duplicated the push(#log... statements into xlog() calls for debugging. Since the worker name $wname is somewhat tied to the $cmd executed and $acnt is a number unique for each thread, I came to see clearly that there is log output being read from the $out FH that definitely comes from a different $cmd than the one executed in this thread, while $acnt and $wname stay the ones that actually belong to the thread. Also I can see that these log lines then do NOT appear on the $out FH in the other thread where they should be.

Related

Perl: close open2 handle from a background thread

I am simply trying to find out how to properly use the open2 function.
See an example below. It works for a small $max, but naturally, if I write long enough to the $hIn, so eventually it will get blocked because nothing reads the data on the output continuously.
use 5.26.0;
use IPC::Open2;
my $max = 100000;
my $pid = open2(my $hOut, my $hIn, "cat") || die "failed 'cat': $!";
{
my $cnt = 0;
#When $max is big (e.g. 100000) so the code below will get blocked
#when writing to $hIn
while ($cnt<$max) {say $hIn $cnt++;}
close($hIn) || say "can't close hIn";
}
while(<$hOut>) { print; }
close($hOut) || say "can't close hOut";
waitpid( $pid, 0 );
The only solution, that I can think about, is launching an other thread that will do the writing on the background.
With the code below I can write into the $hIn as much data as I want and read them in the main thread but the $hIn seems not to get closed. Because of that the while(<$hOut>) will never finish while waiting for more output.
use 5.26.0;
use threads;
use IPC::Open2;
my $max = 100000;
my $pid = open2(my $hOut, my $hIn, "cat") || die "failed 'cat': $!";
my $thr = threads->create(sub {
my $cnt = 0;
while ($cnt<$max) {say $hIn $cnt++;}
#The close does not have any effect here (although no error is produced)
close($hIn) || say "can't close hIn";
});
#This outputs all the data written to $hIn but never leaves the loop...
while(<$hOut> ) { print; }
close($hOut) || say "can't close hOut";
$thr->join;
waitpid( $pid, 0 );
My questions are:
Provided that my approach with threads is ok, how can I close the file handle from the background thread?
If it is not ok (actually use threads is discouraged in Perl), so can someone provide a working example of open2 that can write and read a lot of data without a risk of getting blocked waiting for the reader or writer?
EDIT: Following your suggestions here is an implementation of the code above using IPC::Run:
use 5.26.0;
use IPC::Run qw/ run /;
my $max = 1000000;
run sub {
my $cnt = 0;
while ($cnt<$max) {say $cnt++;}
},
"|", "cat",
"|", sub {
while(<> ) {
print;
}
}
or die "run sub | cat | sub failed: $?";
It runs without flaws, the code is very readable... I am very happy to have learned about this module. Thanks to everyone!
Yet, I consider the question to be unanswered. If it is not possible to write this functionality using open2 directly, why does that even exist and confuse people? Also not being able to close the file handle from a different thread looks like a bug to me (certainly it is - the close should at least report an error).
Your program stopped because the pipe to which it was writing became full.
The pipe to cat became full because cat stopped reading from it.
cat stopped because the pipe to which it was writing became full.
The pipe from cat became full because you program isn't reading from it.
So you have two programs waiting for each other to do something. This is a deadlock.
The low-level solution is to use select to monitor both ends of the pipe.
The high-level solution is to let IPC::Run or IPC::Run3 do that hard work for you.
use IPC::Run qw( run );
my $cnt_max = 100000;
my $cnt = 0;
run [ "cat" ],
'<', sub { $cnt < $cnt_max ? $cnt++ . "\n" : undef };

Fork and return without waiting for child in perl

I have a perl CGI program that uses fork() to launch a child process. The child does some long-running (~60 second) calculations and eventually saves the results in a tmp file. The parent is supposed to return control to the browser, which displays a progress message and checks periodically whether the child has written its tmp file; once the file appears, the results are displayed. So the code looks like this:
# Get a unique name for this job
my $guid = Data::GUID->new;
my $id = $guid->as_hex;
# Do the fork
my $pid = fork;
if (!defined $pid) {
die "cannot fork: $!";
# Child does this
} elsif ($pid == 0) {
# Do some calculations
# Save results in a filename with $id
# Parent does this
} else {
# Return the location of the tmp files to the client
return "Content-Type: text/html\n\n", $id;
# Browser uses contents of $id to check for the result file
}
I originally set this up on RedHat Linux, and it worked just fine. However, now I'm trying to port it to a server running Ubuntu, and it seems that the parent is waiting for the long-running child to finish before it returns. I believe this to be the case because the browser hangs for the duration of the calculation, never shows the progress page, and immediately jumps to the results once the calculation is done.
I suspect this has something to do with fork emulation, but I'm not certain, and I haven't been able to find a way around it. Thanks for any suggestions.
EDIT: This code is part of a subroutine - I use CGI::Application, so the calling code just "uses" this code. Hence the return statement. I doubt this accounts for the problem, though, because I haven't touched the code since porting from Red Hat (where it worked) to Linux (where it doesn't).
Randal Schwartz's Watching long processes through CGI provides a useful example. It is straightforward to fork:
if ( my $pid = fork ) { # parent does
delete_all(); # clear parameters
param( 'session', $session );
print redirect( self_url() );
}
elsif ( defined $pid ) { # child does
close STDOUT; # so parent can go on
unless ( open F, "-|" ) {
open STDERR, ">&=1";
exec "/usr/sbin/traceroute", $host;
die "Cannot execute traceroute: $!";
}
The rest of the article explains how to help the user keep track of a running job.

Track and kill a process on timeout using Perl script

I want to write a Perl script which can monitor a running process. If the process executes for more than expected time,then it should be killed.
I am trying to do this on a Linux machine(Linux_x8664).
I cannot achieve the same using cronjob because I want to embed the same to another Perl script, which I have been using from a long time.
If you have any suggestions, Please suggest me.
I have a code to do that, But the problem is that my perl script is rinning a process using system command. And I want to track the pid of that invoked process and I want to kill it on timeout.
=========================
#!/usr/pde/bin/perl
my $pid;
my $finish=0;
# actions after timeout to keep SIGHANDLER short
#
sub timeout {
print "Timed out pid $pid\n";
# kill the process group, but not the parent process
local $SIG{INT}='IGNORE';
local $SIG{TERM}='IGNORE';
kill 'INT' = -$$;
# eventually try also with TERM and KILL if necessary
die 'alarm';
}
eval {
local $SIG{ALRM} = sub { $finish=1 };
alarm 5;
die "Can't fork!" unless defined ($pid=fork); # check also this!
if ($pid) { # parent
warn "child pid: $pid\n";
# Here's the code that checks for the timeout and do the work:
while (1) {
$finish and timeout() and last;
sleep 1;
}
waitpid ($pid, 0);
}
else { # child
exec (q[perl -e 'while (1) {print 1}' tee test.txt]);
exit; # the child shouldn't execute code hereafter
}
alarm 0;
};
warn "\$#=$#\n";`enter code here`
die "Timeout Exit\n" if $# and $# =~ /alarm/;
print "Exited normally.\n";
__END__
Based on your code - there is a reason why use strict and use warnings are strongly recommended.
Specifically:
Can't modify constant item in scalar assignment at line 17, near "$$;"
You aren't doing what you think you're doing there.
If you set it to
kill ( 'INT', -$$ );
Then you will send a SIGINT to the current process group - parent and child. I'm not sure why you're doing this when you don't want to kill the parent.
I'd suggest you can simplify this greatly by:
else { # child
alarm 5;
exec (q[perl -e 'while (1) {print 1}' tee test.txt]);
exit; # the child shouldn't execute code hereafter
}

Perl 5.8: possible to get any return code from backticks when SIGCHLD in use

When a CHLD signal handler is used in Perl, even uses of system and backticks will send the CHLD signal. But for the system and backticks sub-processes, neither wait nor waitpid seem to set $? within the signal handler on SuSE 11 linux. Is there any way to determine the return code of a backtick command when a CHLD signal handler is active?
Why do I want this? Because I want to fork(?) and start a medium length command and then call a perl package that takes a long time to produce an answer (and which executes external commands with backticks and checks their return code in $?), and know when my command is finished so I can take action, such as starting a second command. (Suggestions for how to accomplish this without using SIGCHLD are also welcome.) But since the signal handler destroys the backtick $? value, that package fails.
Example:
use warnings;
use strict;
use POSIX ":sys_wait_h";
sub reaper {
my $signame = shift #_;
while (1) {
my $pid = waitpid(-1, WNOHANG);
last if $pid <= 0;
my $rc = $?;
print "wait()=$pid, rc=$rc\n";
}
}
$SIG{CHLD} = \&reaper;
# system can be made to work by not using $?, instead using system return value
my $rc = system("echo hello 1");
print "hello \$?=$?\n";
print "hello rc=$rc\n";
# But backticks, for when you need the output, cannot be made to work??
my #IO = `echo hello 2`;
print "hello \$?=$?\n";
exit 0;
Yields a -1 return code in all places I might try to access it:
hello 1
wait()=-1, rc=-1
hello $?=-1
hello rc=0
wait()=-1, rc=-1
hello $?=-1
So I cannot find anywhere to access the backticks return value.
This same issue has been bugging me for a few days now. I believe there are 2 solutions required depending on where you have your backticks.
If you have your backticks inside the child code:
The solution was to put the line below inside the child fork. I think your statement above "if I completely turn off the CHLD handler around the backticks then I might not get the signal if the child ends" is incorrect. You will still get a callback in the parent when the child exits because the signal is only disabled inside the child. So the parent still gets a signal when the child exits. It's just the child doesn't get a signal when the child's child (the part in backticks) exits.
local $SIG{'CHLD'} = 'DEFAULT'
I'm no Perl expert, I have read that you should set the CHLD signal to the string 'IGNORE' but this did not work in my case. In face I believe it may have been causing the problem. Leaving that out completely appears to also solve the problem which I guess is the same as setting it to DEFAULT.
If you have backticks inside the parent code:
Add this line to your reaper function:
local ($!, $?);
What is happening is the reaper is being called when your code inside the backticks completes and the reaper is setting $?. By making $? local it does not set the global $?.
So, building on MikeKull's answer, here is a working example where the fork'd child uses backticks and still gets the proper return code. This example is a better representation of what I was doing, while the original example did not use forks and could not convey the entire issue.
use warnings;
use strict;
use POSIX ":sys_wait_h";
# simple child which returns code 5
open F, ">", "exit5.sh" or die "$!";
print F<<EOF;
#!/bin/bash
echo exit5 pid=\$\$
exit 5
EOF
close F;
sub reaper
{
my $signame = shift #_;
while (1)
{
my $pid = waitpid(-1, WNOHANG);
print "no child waiting\n" if $pid < 0;
last if $pid <= 0;
my $rc = $? >> 8;
print "wait()=$pid, rc=$rc\n";
}
}
$SIG{CHLD} = \&reaper;
if (!fork)
{
print "child pid=$$\n";
{ local $SIG{CHLD} = 'DEFAULT'; print `./exit5.sh`; }
print "\$?=" . ($? >> 8) . "\n";
exit 3;
}
# sig CHLD will interrupt sleep, so do multiple
sleep 2;sleep 2;sleep 2;
exit 0;
The output is:
child pid=32307
exit5 pid=32308
$?=5
wait()=32307, rc=3
no child waiting
So the expected return code 5 was received in the child when the parent's reaper was disabled before calling the child, but as indicated by ikegami the parent still gets the CHLD signal and a proper return code when the child exits.

How can I make Perl wait for child processes started in the background with system()?

I have some Perl code that executes a shell script for multiple parameters, to simplify, I'll just assume that I have code that looks like this:
for $p (#a){
system("/path/to/file.sh $p&");
}
I'd like to do some more things after that, but I can't find a way to wait for all the child processes to finish before continuing.
Converting the code to use fork() would be difficult. Isn't there an easier way?
Using fork/exec/wait isn't so bad:
my #a = (1, 2, 3);
for my $p (#a) {
my $pid = fork();
if ($pid == -1) {
die;
} elsif ($pid == 0) {
exec '/bin/sleep', $p or die;
}
}
while (wait() != -1) {}
print "Done\n";
You are going to have to change something, changing the code to use fork is probably simpler, but if you are dead set against using fork, you could use a wrapper shell script that touches a file when it is done and then have your Perl code check for the existence of the files.
Here is the wrapper:
#!/bin/bash
$*
touch /tmp/$2.$PPID
Your Perl code would look like:
for my $p (#a){
system("/path/to/wrapper.sh /path/to/file.sh $p &");
}
while (#a) {
delete $a[0] if -f "/tmp/$a[0].$$";
}
But I think the forking code is safer and clearer:
my #pids;
for my $p (#a) {
die "could not fork" unless defined(my $pid = fork);\
unless ($pid) { #child execs
exec "/path/to/file.sh", $p;
die "exec of file.sh failed";
}
push #pids, $pid; #parent stores children's pids
}
#wait for all children to finish
for my $pid (#pids) {
waitpid $pid, 0;
}
Converting to fork() might be difficult, but it is the correct tool. system() is a blocking call; you're getting the non-blocking behavior by executing a shell and telling it to run your scripts in the background. That means that Perl has no idea what the PIDs of the children might be, which means your script does not know what to wait for.
You could try to communicate the PIDs up to the Perl script, but that quickly gets out of hand. Use fork().

Resources