How can I change the current directory in a thread-safe manner in Perl? - multithreading

I'm using Thread::Pool::Simple to create a few working threads. Each working thread does some stuff, including a call to chdir followed by an execution of an external Perl script (from the jbrowse genome browser, if it matters). I use capturex to call the external script and die on its failure.
I discovered that when I use more then one thread, things start to be messy. after some research. it seems that the current directory of some threads is not the correct one.
Perhaps chdir propagates between threads (i.e. isn't thread-safe)?
Or perhaps it's something with capturex?
So, how can I safely set the working directory for each thread?
** UPDATE **
Following the suggestions to change dir while executing, I'd like to ask how exactly should I pass these two commands to capturex?
currently I have:
my #args = ( "bin/flatfile-to-json.pl", "--gff=$gff_file", "--tracklabel=$track_label", "--key=$key", #optional_args );
capturex( [0], #args );
How do I add another command to #args?
Will capturex continue die on errors of any of the commands?

I think that you can solve your "how do I chdir in the child before running the command" problem pretty easily by abandoning IPC::System::Simple as not the right tool for the job.
Instead of doing
my $output = capturex($cmd, #args);
do something like:
use autodie qw(open close);
my $pid = open my $fh, '-|';
unless ($pid) { # this is the child
chdir($wherever);
exec($cmd, #args) or exit 255;
}
my $output = do { local $/; <$fh> };
# If child exited with error or couldn't be run, the exception will
# be raised here (via autodie; feel free to replace it with
# your own handling)
close ($fh);
If you were getting a list of lines instead of scalar output from capturex, the only thing that needs to change is the second-to-last line (to my #output = <$fh>;).
More info on forking-open is in perldoc perlipc.
The good thing about this in preference to capture("chdir wherever ; $cmd #args") is that it doesn't give the shell a chance to do bad things to your #args.
Updated code (doesn't capture output)
my $pid = fork;
die "Couldn't fork: $!" unless defined $pid;
unless ($pid) { # this is the child
chdir($wherever);
open STDOUT, ">/dev/null"; # optional: silence subprocess output
open STDERR, ">/dev/null"; # even more optional
exec($cmd, #args) or exit 255;
}
wait;
die "Child error $?" if $?;

I don't think "current working directory" is a per-thread property. I'd expect it to be a property of the process.
It's not clear exactly why you need to use chdir at all though. Can you not launch the external script setting the new process's working directory appropriately instead? That sounds like a more feasible approach.

Related

Pass a indicator from Bash back to Perl over SSH via STDIN

We have a Linux server which can run a diagnostic script, diag.pl, which coordinates reporting over other servers.
diag.pl iterates over the child servers, and for each of them, SSHs in and runs a bash script, which passes information back:
my $cmd=sprintf("ssh %s sudo /usr/lib/support/report.sh -e %s | uudecode -o \"%s-outfile.tgz\") 2>%1 |", $server, $specialparam, $servername)
The line of code in report.sh that sends the data back is:
uuencode --base64 ${REPORT}.tar.gz /dev/stdout
I would like to update report.sh to send back an additional line of information, something like:
echo "special-file-found=${SFF}" > /tmp/sff.cfg
uuencode --base64 /tmp/sff.cfg > /dev/stdout
Once the special file has been found, the Perl script will update so that it no longer sends the specialparam back to subsequent report.sh calls.
Is there a good way to send that input so that it will be easy for Perl to catch it?
What have I tried
Setting a user.comment attr on the tar.gz using setattr, but the comment does not survive the uuencoding
Currently thinking that my best bet is to use the pseudocode above, creating a new file to encode and send along, and update the Perl script to check it with each new transmission until it finds the special file.
I take it that the objective is to modify a shell script which returns to the caller an encoded file, so that it sends yet more information, specifically a string to be used as a flag in the caller.
It is not clear how the shell script is run from the Perl script, but there are ways to do this so that the caller gets back separate "lines" that are printed, either as they are emitted or altogether after the run completes.
Then you can just add to the shell script the needed extra print to STDOUT, and in the caller check each line of shell output to see whether it conforms to some "protocol;" for example, whether it is, or starts with, special-file-found string. Then you can set flags for further calls or write control file for following runs, etc. Otherwise, the line is the encoded file.
A made-up basic example using pipe-open (see by the end of the page)
use warnings;
use strict;
use feature 'say';
my #cmd = qw(ls -l ./);
my $file_found = quotemeta 'special-file-found';
my ($flag, $binfile);
my $pid = open(my $out, '-|', #cmd) // die "Can't open #cmd: $!";
while (<$out>) {
chomp;
if (/^$file_found/) {
$flag = 1;
}
else {
$binfile = $_;
# whatever else need be done, or perhaps last;
}
}
close $out;
This example runs the command ls -l ./ but instead of it you can run any executable, like #cmd = ('report.sh', 'arg1', 'arg2',...).
Another way is to use backticks (qx) and assign its return to an array, in which case each element receives a line of output.
Yet another, better, way is to use a module which manages external commands. For example, from simple to more capable: IPC::System::Simple, Capture::Tiny, IPC::Run3, IPC::Run.

How to get PID of perl daemon in init script?

I have the following perl script:
#!/usr/bin/perl
use strict;
use warnings;
use Proc::Daemon;
Proc::Daemon::Init;
my $continue = 1;
$SIG{TERM} = sub { $continue = 0 };
while ($continue) {
# stuff
}
I have the following in my init script:
DAEMON='/path/to/perl/script.pl'
start() {
PID=`$DAEMON > /dev/null 2>&1 & echo $!`
echo $PID > /var/run/mem-monitor.pid
}
The problem is, this returns the wrong PID! This returns the PID of the parent process which is started when the daemon is run, but that process is immediately killed off. I need to get the PID of the child process!
The Proc::Daemon says
Proc::Daemon does the following:
...
9. The first child transfers the PID of the second child (daemon) to the parent. Additionally the PID of the daemon process can be written into a file if 'pid_file' is defined. Then the first child exits.
and then later, under new ( %ARGS )
pid_file
Defines the path to a file (owned by the parent user) where the PID of the daemon process will be stored. Defaults to undef (= write no file).
Also look at Init() method description. This all implies that you may want to use new first.
The point is that it is the grand-child process that is the daemon. However, the childr passes the pid along and it is available to the parent. If pid_file => $file_name is set in the constructor (the daemon's) pid is written to that file.
A comment asks to not have shell script rely on a file written by another script.
I can see two ways to do that.
Print the pid, returned by the $daemon->Init(), from the parent and pick it up in the shell. This is defeated by redirects in the question, but I don't know why they are needed. The parent and child exit right as all is set up, while the daemon is detached from everything.
Shell script can start the Perl script with the desired log-file name as an argument, letting it write the daemon pid to that file by the above process. The file is still output by Perl, but what matters about it is decided by the shell script.
I'd like to include a statement from my comment below. I consider these superior to two other things that come to mind: picking the filename from a config-style file kept by the shell is more complicated, while parsing the process table may be unreliable.
I've seen this before and had to resort to using STDERR to send back the childs PID to the calling shell script. I've always assumed it was due to the mentioned unreliability of exit codes - but details were not clear in the documentation. Please try something like this:
#!/usr/bin/perl
use strict;
use warnings;
use Proc::Daemon;
if( my $pid = Proc::Daemon::Init() ) {
print STDERR $pid;
exit;
}
my $continue = 1;
$SIG{TERM} = sub { $continue = 0 };
while ($continue) {
sleep(20);
exit;
}
With a calling script like this:
#!/bin/bash
DAEMON='./script.pl'
start() {
PID=$($DAEMON 2>&1 >/dev/null)
echo $PID > ./mem-monitor.pid
}
start;
When the bash script is ran, it will capture the STDERR output (containing the correct PID), and store it in the file. Any STDOUT the Perl script produces would be sent to /dev/null - though this is unlikely as the 1st level Perl script does (in this case) exit fairly early on.
Thank you to the suggestions from zdim and Hakon. They are certainly workable, and got me on the right track, but ultimately I went a different route. Rather than relying on $!, I used ps and awk to get the PID, as follows:
DAEMON='/path/to/perl/script.pl'
start() {
$DAEMON > /dev/null 2>&1
PID=`ps aux | grep -v 'grep' | grep "$DAEMON" | awk '{print $2}'`
echo $PID > /var/run/mem-monitor.pid
}
This works and satisfies my OCD! Note the double quotes around "$DAEMON" in grep "$DAEMON".

How to handle updates from an continuous process pipe in Perl

I am trying to follow log files in Perl on Fedora but unfortunately, Fedora uses journalctl to read binary log files that I cannot parse directly. This, according to my understanding, means I can only read Fedora's log files by calling journalctl.
I tried using IO::Pipe to do this, but the problem is that $p->reader(..) waits until journalctl --follow is done writing output (which will be never since --follow is like tail -F) and then allows me to print everything out which is not what I want. I would like to be able to set a callback function to be called each time a new line is printed to the process pipe so that I can parse/handle each new log event.
use IO::Pipe;
my $p = IO::Pipe->new();
$p->reader("journalctl --follow"); #Waits for process to exit
while (<$p>) {
print;
}
I assume that journalctl is working like tail -f. If this is correct, a simple open should do the job:
use Fcntl; # Import SEEK_CUR
my $pid = open my $fh, '|-', 'journalctl --follow'
or die "Error $! starting journalctl";
while (kill 0, $pid) {
while (<$fh>) {
print $_; # Print log line
}
sleep 1; # Wait some time for new lines to appear
seek($fh,0,SEEK_CUR); # Reset EOF
}
open opens a filehandle for reading the output of the called command: http://perldoc.perl.org/functions/open.html
seek is used to reset the EOF marker: http://perldoc.perl.org/functions/seek.html Without reset, all subsequent <$fh> calls will just return EOF even if the called script issued additional output in the meantime.
kill 0,$pid will be true as long as the child process started by open is alive.
You may replace sleep 1 by usleep from Time::HiRes or select undef,undef,undef,$fractional_seconds; to wait less than a second depending on the frequency of incoming lines.
AnyEvent should also be able to do the job via it's AnyEvent::Handle.
Update:
Adding use POSIX ":sys_wait_h"; at the beginning and waitpid $pid, WNOHANG) to the outer loop would also detect (and reap) a zombie journalctl process:
while (kill(0, $pid) and waitpid($pid, WNOHANG) != $pid) {
A daemon might also want to check if $pid is still a child of the current process ($$) and if it's still the original journalctl process.
I have no access to journalctl, but if you avoid IO::Pipe and open the piped output directly then the data will not be buffered
use strict;
use warnings 'all';
open my $follow_fh, '-|', 'journalctl --follow' or die $!;
print while <$follow_fh>;

GetAttributes uses wrong working directory in subthread

I used File::Find to traverse a directory tree and Win32::File's GetAttributes function to look at the attributes of files found in it. This worked in a single-threaded program.
Then I moved the directory traversal into a separate thread, and it stopped working. GetAttributes failed on every file with "The system cannot find the file specified" as the error message in $^E.
I traced the problem to the fact that File::Find uses chdir, and apparently GetAttributes doesn't use the current directory. I could work around this by passing it an absolute path, but then I could run into path length limits, and long paths are definitely going to be present where this script will run, so I really need to take advantage of chdir and relative paths.
To demonstrate the problem, here is a script which creates a file in the current directory, another file in a subdirectory, chdir's to the subdirectory, and looks for the file 3 ways: system("dir"), open, and GetAttributes.
When the script is run without arguments, dir shows the subdirectory, open finds the file in the subdirectory, and GetAttributes returns its attributes successfully. When run with --thread, all the tests are done in a subthread, and the dir and open still work, but the GetAttributes fails. Then it calls GetAttributes on the file that is in the original directory (which we have chdir'ed out of) and it finds that one! Somehow GetAttributes is using the original working directory of the process - or maybe the working directory of the main thread - unlike all the other file operations.
How can I fix this? I can guarantee that the main thread won't do any chdir'ing, if that matters.
use strict;
use warnings;
use threads;
use Data::Dumper;
use Win32::File qw/GetAttributes/;
sub doit
{
chdir("testdir") or die "chdir: $!\n";
system "dir";
my $attribs;
open F, '<', "file.txt" or die "open: $!\n";
print "open succeeded. File contents:\n-------\n", <F>, "\n--------\n";
close F;
my $x = GetAttributes("file.txt", $attribs);
print Dumper [$x, $attribs, $!, $^E];
if(!$x) {
# If we didn't find the file we were supposed to find, how about the
# bad one?
$x = GetAttributes("badfile.txt", $attribs);
if($x) {
print "GetAttributes found the bad file!\n";
if(open F, '<', "badfile.txt") {
print "opened the bad file\n";
close F;
} else {
print "But open didn't open it. Error: $! ($^E)\n";
}
}
}
}
# Setup
-d "testdir" or mkdir "testdir" or die "mkdir testdir: $!\n";
if(!-f "badfile.txt") {
open F, '>', "badfile.txt" or die "create badfile.txt: $!\n";
print F "bad\n";
close F;
}
if(!-f "testdir/file.txt") {
open F, '>', "testdir/file.txt" or die "create testdir/file.txt: $!\n";
print F "hello\n";
close F;
}
# Option 1: do it in the main thread - works fine
if(!(#ARGV && $ARGV[0] eq '--thread')) {
doit();
}
# Option 2: do it in a secondary thread - GetAttributes fails
if(#ARGV && $ARGV[0] eq '--thread') {
my $thr = threads->create(\&doit);
$thr->join();
}
Eventually, I figured out that perl is maintaining some kind of secondary cwd that only applies to perl built-in operators, while GetAttributes is using the native cwd. I don't know why it does this or why it only happens in the secondary thread; my best guess is that perl is trying to emulate the unix rule of one cwd per process, and failing because the Win32::* modules don't play along.
Whatever the reason, it's possible to work around it by forcing the native cwd to be the same as perl's cwd whenever you're about to do a Win32::* operation, like this:
use Cwd;
use Win32::FindFile qw/SetCurrentDirectory/;
...
SetCurrentDirectory(getcwd());
Arguably File::Find should do this when running on Win32.
Of course this only makes the "pathname too long" problem worse, because now every directory you visit will be the target of an absolute-path SetCurrentDirectory; try to work around it with a series of smaller SetCurrentDirectory calls and you have to figure out a way to get back where you came from, which is hard when you don't even have fchdir.

How do I send output of two commands' output to standard out in parallel?

I want to use xinput to monitor # of keystrokes and # of mouse movement presses. For simplification let's say what I want is these two commands:
xinput test 0
xinput test 1
to write to the screen at the same time.
I am using this in a Perl script like:
open(my $fh, '-|', 'xinput test 0') or die $!;
while(my $line = <$fh>) {
...stuff to keep count instead of logging directly to file
}
EDIT:
something like:
open(my $fh, '-|', 'xinput test 0 & xinput test 1') or die $!;
doesn't work.
I'm not sure what you want to do with the output, but it sounds like you want to run the commands simultaneously. In that case, my first thought would be to fork the Perl process once per command and then exec the child processes to the commands you care about.
foreach my $command ( #commands ) { # filter #commands for taint, etc
if( fork ) { ... } #parent
else { # child
exec $command or die "Could not exec [$command]! $!";
}
}
The forked processes share the same standard filehandles. If you need their data in the parent process, you'd have to set up some sort of communication between the two.
There are also several Perl frameworks on CPAN for handling asynchronous multi-process stuff, such as POE, AnyEvent, and so on. They'd handle all these details for you.
If you want to write both command on the console simultaneously, simply run them on the background:
xinput test 0 &
xinput test 1 &
But first you have to make sure that the console is set into the regime which allows that, otherwise the background processes will get stopped when trying to write on console. This code will switch off the stty tostop option:
stty -tostop

Resources