Need Commands to be run in parallel in perl expect - linux

I have a script as mentioned below, I see the command mentioned as "Command which needs to be run" (in for loop) executes for some time and later the process which was forked to execute the command is killed and thus only part of my script in "Command which needs to be run" (in for loop) is executed. Instead, I want the command in "Command which needs to be run" (in for loop) to all run in parallel i.e. if the for loop runs for 50 iterations, 50 different process needs to be executing in parallel and none should be stopped in between. Could some one please suggest on this ?
#!/usr/bin/perl
use Expect;
sub hostuser_expect() {
$expect= Expect->new;
$expect->raw_pty(1);
$expect->spawn($cmd)
or die "Cannot spawn $cmd: $!\n";
$expect->expect($timeout,
[ qr/.*\?/i, #/
sub {
my $self = shift;
$self->send("yes\r\n");
exp_continue;
}
]);
$expect->expect($timeout,
[ qr/password:/i, #/
sub {
my $self = shift;
$self->send("$password\n");
exp_continue;
}
]);
#$expect->expect(1500,'-re', 'Mails marked as moved successfully\.$');
#$expect->hard_close();
}
$timeout = 5;
$password = "changeme";
for ($i=1;$i<=50;$i++) {
$cmd="Command which needs to be run";
print "Invoking script - $cmd\n";
hostuser_expect();
sleep(30);
}

I believe Parallel::ForkManager will do what you want. I don't use Expect, so I've simplified the sub as an example. I've also added use strict; and use warnings;, removed the prototype parens from the sub definition, and changed from using a C-style for() loop implementation.
Just change the number in the call to Parallel::ForkManager->new() to raise or lower the maximum number of forks to be executing at any one time.
use warnings;
use strict;
use Parallel::ForkManager;
sub hostuser_expect {
my $num = shift;
print "in child $num\n";
}
my $pm = Parallel::ForkManager->new(5);
COMMANDS:
for (1..5){
$pm->start and next COMMANDS;
hostuser_expect($_);
$pm->finish;
}
$pm->wait_all_children;
Output:
in child 1
in child 2
in child 5
in child 3
in child 4

Related

Perl run the same script for different directories at the same time

I have a directory that contains other directories (the number of directories is arbitrary), like this:
Main_directory_samples/
subdirectory_sample_1/
subdirectory_sample_2/
subdirectory_sample_3/
subdirectory_sample_4/
I have a script that receives as input one directory each time and it takes 1h to run (for each directory). To run the script I have the following code:
opendir DIR, $maindirectory or die "Can't open directory!!";
while(my $dir = readdir DIR){
if($dir ne '.' && $dir ne '..'){
system("/bin/bash", "my_script.sh", $maindirectory.'/'.$dir);
}
}
closedir DIR;
However, I want to run the script for different directories at the same time. For instance, the subdirectory_sample_1/ and subdirectory_sample_2/ would run in the same thread; subdirectory_sample_3/ and subdirectory_sample_4/ in another. But I just can't find a way to do this.
As you're just starting external processes and waiting for them, a non-threading option:
use strict;
use warnings;
use Path::Tiny;
use IO::Async::Loop;
use Future::Utils 'fmap_concat';
my $loop = IO::Async::Loop->new;
my $maindirectory = '/foo/bar';
my #subdirs = grep { -d } path($maindirectory)->children; # excludes . and ..
# runs this code to maintain up to 'concurrent' pending futures at once
my $main_future = fmap_concat {
my $dir = shift;
my $future = $loop->new_future;
my $process = $loop->open_process(
command => ['/bin/bash', 'my_script.sh', $dir],
on_finish => sub { $future->done(#_) },
on_exception => sub { $future->fail(#_) },
);
return $future;
} foreach => \#subdirs, concurrent => 2;
# run event loop until all futures are done or one fails, throw exception on failure
my #exit_codes = $main_future->get;
See the docs for IO::Async::Loop and Future::Utils.
One way is to fork and in each child process a group of directories.
A basic example
use warnings;
use strict;
use feature 'say';
use List::MoreUtils qw(natatime);
use POSIX qw(:sys_wait_h); # for WNOHANG
use Time::HiRes qw(sleep); # for fractional seconds
my #all_dirs = qw(d1 d2 d3 d4);
my $path = 'maindir';
my #procs;
# Get iterator over groups (of 2)
my $it = natatime 2, #all_dirs;
while (my #dirs = $it->()) {
my $pid = fork // do { #/
warn "Can't fork for #dirs: $!";
next;
};
if ($pid == 0) {
foreach my $dir (#dirs) {
my #cmd = ('/bin/bash/', 'my_script.sh', "$path/$dir");
say "in $$, \#cmd: (#cmd)";
# system(#cmd) == 0 or do { inspect $? }
};
exit;
};
push #procs, $pid;
}
# Poll with non-blocking wait for processes (reap them)
my $gone;
while (($gone = waitpid -1, WNOHANG) > -1) {
my $status = $?;
say "Process $gone exited with $status" if $gone > 0;
sleep 0.1;
}
See system and/or exec for details, in particular on error checking, as well as $? variable. It can be unpacked to retrieve more details about the error; or, at least print a warning and skip to the next item (which happens above anyway).
The code above prints out the command and pid's with their exit status, but replace #cmd with a test command of no consequence and un-comment the system line to try this out.
Watch for how many jobs there are. A basic rule of thumb is to not have more than 2 per core at which point the performance starts suffering, but this depends on many details. Experiment to find the sweet spot for your case. I like to have a job per core and then at least one core free. In order to throttle this see modules linked at the end.
To break all jobs (directories) into groups I used natatime from List::MoreUtils (n-at-a-time). If there are more specific criteria about how to group directories adjust that.
See Forks::Super and Parallel::ForkManager for higher-level ways to work with forked processes.

Using threads in loop

I want to use threads in loops. The way I want to use this is that start threads in a loop and wait for them to finish. Once all threads are finished then sleep for some predefined number of time and then start those threads again.
Actually I want to run these threads once every hour and that is why I am using sleep. Also I know that hourly run can be done via cron but I can't do that so I am using sleep.
I am getting this error when I am trying to run my code:
Thread already joined at ex.pl line 33.
Perl exited with active threads:
5 running and unjoined
0 finished and unjoined
0 running and detached
This is my code:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $queue = Thread::Queue->new();
my #threads_arr;
sub main {
$queue->enqueue($_) for 1 .. 5;
$queue->enqueue(undef) for 1 .. 5;
}
sub thread_body {
while ( my $num = $queue->dequeue() ) {
print "$num is popped by " . threads->tid . " \n";
sleep(5);
}
}
while (1) {
my $main_thread = threads->new( \&main );
push #threads_arr, threads->create( \&thread_body ) for 1 .. 5;
$main_thread->join();
foreach my $x (#threads_arr) {
$x->join();
}
sleep(1);
print "sleep \n";
}
Also I am able to see other questions similar to this but I am not able to get any of them.
Your #threads_arr array never gets cleared after you join the first 5 threads. The old (already joined) threads still exist in the array the second time around the loop, so Perl throws the "Thread already joined" error when attempting to join them. Clearing or locally initializing #threads_arr every time around the loop will fix the problem.
#threads_arr=(); # Reinitialize the array
push #threads_arr, threads->create( \&thread_body ) for 1 .. 5;

Perl: Run multiple system commands at once

In perl, I have some code like
my $enter = `curl -s -m 10 http://URL`;
How would I use threading to run this function 10 times at once?
I found this but I am not sure how to use it to set a specific amount of threads
Edit: I guess I misunderstood what Threads::Queue was doing. My original question still stands for simultaneously running multiple commands at once.
You can use fork(). In this example, I use the Parallel::ForkManager module. $max_forks is the number of processes to run simultaneously (set to two for an example), and you'd put your system/curl code after ### add curl logic here, and remove the print() and sleep() example statements from there as well.
#!/usr/bin/perl
use warnings;
use strict;
use Parallel::ForkManager;
my $max_forks = 2;
my $fork = new Parallel::ForkManager($max_forks);
my #urls = (
'http://perlmonks.org',
'http://stackoverflow.com',
'http://slashdot.org',
'http://wired.com',
);
# on start callback
$fork->run_on_start(
sub {
my $pid = shift;
print "Starting PID $pid\n";
}
);
# on finish callback
$fork->run_on_finish(
sub {
my ( $pid, $exit, $ident, $signal, $core) = #_;
if ($core){
print "PID $pid core dumped.\n";
}
else {
print "PID $pid exited with exit code $exit " .
" and signal $signal\n";
}
}
);
# forking code
for my $url (#urls){
$fork->start and next;
### add curl logic here
print "$url\n";
sleep(2);
$fork->finish;
}
$fork->wait_all_children;

Perl 5.8: possible to get any return code from backticks when SIGCHLD in use

When a CHLD signal handler is used in Perl, even uses of system and backticks will send the CHLD signal. But for the system and backticks sub-processes, neither wait nor waitpid seem to set $? within the signal handler on SuSE 11 linux. Is there any way to determine the return code of a backtick command when a CHLD signal handler is active?
Why do I want this? Because I want to fork(?) and start a medium length command and then call a perl package that takes a long time to produce an answer (and which executes external commands with backticks and checks their return code in $?), and know when my command is finished so I can take action, such as starting a second command. (Suggestions for how to accomplish this without using SIGCHLD are also welcome.) But since the signal handler destroys the backtick $? value, that package fails.
Example:
use warnings;
use strict;
use POSIX ":sys_wait_h";
sub reaper {
my $signame = shift #_;
while (1) {
my $pid = waitpid(-1, WNOHANG);
last if $pid <= 0;
my $rc = $?;
print "wait()=$pid, rc=$rc\n";
}
}
$SIG{CHLD} = \&reaper;
# system can be made to work by not using $?, instead using system return value
my $rc = system("echo hello 1");
print "hello \$?=$?\n";
print "hello rc=$rc\n";
# But backticks, for when you need the output, cannot be made to work??
my #IO = `echo hello 2`;
print "hello \$?=$?\n";
exit 0;
Yields a -1 return code in all places I might try to access it:
hello 1
wait()=-1, rc=-1
hello $?=-1
hello rc=0
wait()=-1, rc=-1
hello $?=-1
So I cannot find anywhere to access the backticks return value.
This same issue has been bugging me for a few days now. I believe there are 2 solutions required depending on where you have your backticks.
If you have your backticks inside the child code:
The solution was to put the line below inside the child fork. I think your statement above "if I completely turn off the CHLD handler around the backticks then I might not get the signal if the child ends" is incorrect. You will still get a callback in the parent when the child exits because the signal is only disabled inside the child. So the parent still gets a signal when the child exits. It's just the child doesn't get a signal when the child's child (the part in backticks) exits.
local $SIG{'CHLD'} = 'DEFAULT'
I'm no Perl expert, I have read that you should set the CHLD signal to the string 'IGNORE' but this did not work in my case. In face I believe it may have been causing the problem. Leaving that out completely appears to also solve the problem which I guess is the same as setting it to DEFAULT.
If you have backticks inside the parent code:
Add this line to your reaper function:
local ($!, $?);
What is happening is the reaper is being called when your code inside the backticks completes and the reaper is setting $?. By making $? local it does not set the global $?.
So, building on MikeKull's answer, here is a working example where the fork'd child uses backticks and still gets the proper return code. This example is a better representation of what I was doing, while the original example did not use forks and could not convey the entire issue.
use warnings;
use strict;
use POSIX ":sys_wait_h";
# simple child which returns code 5
open F, ">", "exit5.sh" or die "$!";
print F<<EOF;
#!/bin/bash
echo exit5 pid=\$\$
exit 5
EOF
close F;
sub reaper
{
my $signame = shift #_;
while (1)
{
my $pid = waitpid(-1, WNOHANG);
print "no child waiting\n" if $pid < 0;
last if $pid <= 0;
my $rc = $? >> 8;
print "wait()=$pid, rc=$rc\n";
}
}
$SIG{CHLD} = \&reaper;
if (!fork)
{
print "child pid=$$\n";
{ local $SIG{CHLD} = 'DEFAULT'; print `./exit5.sh`; }
print "\$?=" . ($? >> 8) . "\n";
exit 3;
}
# sig CHLD will interrupt sleep, so do multiple
sleep 2;sleep 2;sleep 2;
exit 0;
The output is:
child pid=32307
exit5 pid=32308
$?=5
wait()=32307, rc=3
no child waiting
So the expected return code 5 was received in the child when the parent's reaper was disabled before calling the child, but as indicated by ikegami the parent still gets the CHLD signal and a proper return code when the child exits.

Perl: write value in thread

I am trying to get text of two large files. To speed it up i tried threads.
Before i used threads the script worked, now it does not.
The problem is: I save everything I read in the file into a hash.
When i print out the size (or keys/values) after the read-in in the sub (which the thread executed) it shows a correct number > 0, when i print out the size of the hash anywhere else (after the threads have run) it shows me 0.
print ": ".keys(%c);
is used 2 times, and has different output each time.
(In the final programm 2 Threads are running and a method to compare the stuff is called after the threads finished)
Example code:
my %c;
my #threads = initThreads();
#threads[0] = threads->create(\&ce);
foreach(#threads){
$_->join();
}
print ": ".keys(%c);
sub initThreads{
my #initThreads;
for(my $i = 0; $i<2;$i++){
push(#initThreads, $i);
}
return #initThreads;
}
sub ce(){
my $id = threads->tid();
open my $file, "<", #arg1[1] or die $!;
my #cXY;
my #cDa;
while(my $line = <$file>){
# some regex and push to arrays, works
#c{#cXY} = #cDa;
}
print "Thread $id is done\n";
close $file;
print ": ".keys(%c);
threads->exit();
}
Do i have to run the things after the first 2 threads finished in another thread which waits until the first two are finished?
Or what am i doing wrong with threads?
Thanks.
%c isn't shared across your threads.
use threads;
use threads::shared
my %c :shared;
See threads::shared.
In Perl, threads don't share memory. Each thread operates on a different copy of %c, so the changes aren't reflected to the parent thread. While sharing a variable across threads is possible, this is not generally advisable.
Make use of the possibility to return data from a thread. E.g
my %c = map %{ $_->join }, #threads; # flatten all returned hashes
sub ce {
my %hash;
...;
return \%hash;
}
Some other suggestions:
use strict; use warnings; if you aren't already.
use better variable names.
you only seem to be spawning one thread (in $threads[0]).
my #array; for (my $i = 0; $i < 2; $i++){ push(#array, $i) } is equivalent to my #array = 0 .. 1.
#arg1 is not declared in the current scope.
manually exiting a thread is not neccessary in your case.

Resources