the extract_system_kpis function is not being called - only the puts statements before are being printed.
proc process_all_scenarios_multithreaded {results_folder} {
package require dai
package require Thread
set scenario_list [::simulation::get_scenarios Simulation]
set curr_dir [pwd]
# use mulithreading to run scenarios in parallel via tcl thread pool
set pool [tpool::create -maxworkers 100 -initcmd {
proc process_scenarios {scenario} {
set sname [lindex $scenario 0]
puts "Scenario: $sname"
set sdir "$curr_dir/$results_folder/$sname"
puts "Results from: $sdir"
extract_system_kpis $sname "SUCCESS" $sdir
}
}]
foreach scenario $scenario_list {
lappend work [tpool::post -nowait $pool [list process_scenarios $scenario]]
}
# Wait until all threads complete
foreach id $work {
tpool::wait $pool $id
}
tpool::release $pool
}
I suspect the problem is that the package that provides extract_system_kpis (is that from dai?) is not not being loaded into the worker threads.
You probably need to make the thread pool with this:
set pool [tpool::create -maxworkers 100 -initcmd {
package require dai; # CRITICAL!
proc process_scenarios {scenario} {
set sname [lindex $scenario 0]
puts "Scenario: $sname"
set sdir "$curr_dir/$results_folder/$sname"
puts "Results from: $sdir"
extract_system_kpis $sname "SUCCESS" $sdir
}
}]
You're also super-dependent on the dai package being happy being loaded into multiple threads at once. It might be entirely happy, or it might not; there's no easy way to tell by inspection from outside, and it depends on how it is implemented.
Related
I am very new to the Perl Parallel Fork manager. Here is the code,
my $max_procs = 3;
my $forking = Parallel::ForkManager->new($max_procs);
my #collect_processed_data;
foreach my $row (#data_array) {
$forking->start and next;
# Code processing -- Started
push(#collect_processed_data, $obtained_data);
# where $obtained_data is an array which we get it from DB.
# Code Processing -- Ended.
#print Dumper #collect_processed_data; --> Works here.
$forking->finish; # do the exit in the child process
}
$forking->wait_all_children;
print Dumper #collect_processed_data; --> But not working here;
Here is the issue, The forking works fine here. But the issue i am not able to collect the collect the data outside in the foreach loop. But at the same time, i am able to collect before "$forking->finish"
Is there any anything i need to do with "$forking->finish;"?
Any help would be appreciated.
Everything between $pm->start and $pm->finish execute in a fork of the parent. You're adding to the child's copy of #collect_processed_data, but printing the parent's #collect_processed_data.
P::FM provides a mechanism for returning data back to the parent, which is documented under the heading "RETRIEVING DATASTRUCTURES from child processes".
my $pm = Parallel::ForkManager->new($max_procs);
my #collect_processed_data;
$pm->run_on_finish(sub {
my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $processed_data) = #_;
if ($exit_signal) { warn("$pid killed by signal $exit_signal\n"); }
elsif ($exit_code) { warn("$pid exited with error $exit_code\n"); }
push #collect_processed_data, $processed_data
if defined($processed_data);
});
for (...) {
$pm->start and next;
...
my $processed_data = ...;
...
$pm->finish(0, $processed_data);
}
I have a subroutine which I would like to execute in parallel with Coro:
use strict;
use warnings;
use Coro;
sub mysub {
my ($in) = #_;
print "$in \n";
foreach my $i (0..100000000){
$i=$i+1;
}
return 1;
}
from the Coro intro I read how I can create threads:
for (
( async{ mysub "A" } ),
( async{ mysub "B" } ),
( async{ mysub "C" } ),
( async{ mysub "X" } ),
( async{ mysub "Y" } )
) {
$_->join;
}
However, threads are created but how can I run them all in parallel? The example states that Coro::Socket (or better AnyEvent::Socket) makes parallel execution possible but how can I make this work in my simple example?
Also (but this is a second question), why does in the above for-loop the arguments to mysub get passed but not in the example below?
my #letters = ("A", "B", "C", "X", "Y");
my #in = map { (async {mysub $_ }) } #letters;
for ( #in ) {$_->join};
Coro is a co-operative multitasking system. A thread will only cede the CPU to another when the program explicitly does so, or when it's blocked waiting for an event in a Coro-aware call.
For example, the following will wait for HTTP responses on parallel:
use Coro qw( async );
use LWP::Protocol::AnyEvent::http qw( );
use LWP::UserAgent qw( );
...
for my $url (#urls) {
async { process( $ua->get($url) ) };
}
...
Coro is powerless to split arithmetic among CPUs as your example attempts to do since it doesn't create any OS threads.
Coro does not run the coroutines in parallel, only asynchronous. See documentation:
...They are similar to kernel threads but don't (in general) run in parallel at the same time even on SMP machine..
Instead it will switch between "threads" at usually blocking points, like read, write etc, but there will be only one "thread" running at a specific time.
I currently have a script that kicks off threads to perform various actions on several directories. A snippet of my script is:
#main
sub BuildInit {
my $actionStr = "";
my $compStr = "";
my #component_dirs;
my #compToBeBuilt;
foreach my $comp (#compList) {
#component_dirs = GetDirs($comp); #populates #component_dirs
}
print "Printing Action List: #actionList\n";
#---------------------------------------
#---- Setup Worker Threads ----------
for ( 1 .. NUM_WORKERS ) {
async {
while ( defined( my $job = $q->dequeue() ) ) {
worker($job);
}
};
}
#-----------------------------------
#---- Enqueue The Work ----------
for my $action (#actionList) {
my $sem = Thread::Semaphore->new(0);
$q->enqueue( [ $_, $action, $sem ] ) for #component_dirs;
$sem->down( scalar #component_dirs );
print "\n------>> Waiting for prior actions to finish up... <<------\n";
}
# Nothing more to do - notify the Queue that we're not adding anything else
$q->end();
$_->join() for threads->list();
return 0;
}
#worker
sub worker {
my ($job) = #_;
my ( $component, $action, $sem ) = #$job;
Build( $component, $action );
$sem->up();
}
#builder method
sub Build {
my ( $comp, $action ) = #_;
my $cmd = "$MAKE $MAKE_INVOCATION_PATH/$comp ";
my $retCode = -1;
given ($action) {
when ("depend") { $cmd .= "$action >nul 2>&1" } #suppress output
when ("clean") { $cmd .= $action }
when ("build") { $cmd .= 'l1' }
when ("link") { $cmd .= '' } #add nothing; default is to link
default { die "Action: $action is unknown to me." }
}
print "\n\t\t*** Performing Action: \'$cmd\' on $comp ***" if $verbose;
if ( $action eq "link" ) {
# hack around potential race conditions -- will only be an issue during linking
my $tries = 1;
until ( $retCode == 0 or $tries == 0 ) {
last if ( $retCode = system($cmd) ) == 2; #compile error; stop trying
$tries--;
}
}
else {
$retCode = system($cmd);
}
push( #retCodes, ( $retCode >> 8 ) );
#testing
if ( $retCode != 0 ) {
print "\n\t\t*** ERROR IN $comp: $# !! ***\n";
print "\t\t*** Action: $cmd -->> Error Level: " . ( $retCode >> 8 ) . "\n";
#exit(-1);
}
return $retCode;
}
The print statement I'd like to be thread-safe is: print "\n\t\t*** Performing Action: \'$cmd\' on $comp ***" if $verbose; Ideally, I would like to have this output, and then each component that is having the $action performed on it, would output in related chunks. However, this obviously doesn't work right now - the output is interleaved for the most part, with each thread spitting out it's own information.
E.g.,:
ComponentAFile1.cpp
ComponentAFile2.cpp
ComponentAFile3.cpp
ComponentBFile1.cpp
ComponentCFile1.cpp
ComponentBFile2.cpp
ComponentCFile2.cpp
ComponentCFile3.cpp
... etc.
I considered executing the system commands using backticks, and capturing all of the output in a big string or something, then output it all at once, when the thread terminates. But the issue with this is (a) it seems super inefficient, and (b) I need to capture stderr.
Can anyone see a way to keep my output for each thread separate?
clarification:
My desired output would be:
ComponentAFile1.cpp
ComponentAFile2.cpp
ComponentAFile3.cpp
------------------- #some separator
ComponentBFile1.cpp
ComponentBFile2.cpp
------------------- #some separator
ComponentCFile1.cpp
ComponentCFile2.cpp
ComponentCFile3.cpp
... etc.
To ensure your output isn't interrupted, access to STDOUT and STDERR must be mutually exclusive. That means that between the time a thread starts printing and finishes printing, no other thread can be allowed to print. This can be done using Thread::Semaphore[1].
Capturing the output and printing it all at once allows you to reduce the amount of time a thread holds a lock. If you don't do that, you'll effectively make your system single-threaded system as each thread attempts lock STDOUT and STDERR while one thread runs.
Other options include:
Using a different output file for each thread.
Prepending a job id to each line of output so the output can be sorted later.
In both of those cases, you only need to lock it for a very short time span.
# Once
my $mutex = Thread::Semaphore->new(); # Shared by all threads.
# When you want to print.
$mutex->down();
print ...;
STDOUT->flush();
STDERR->flush();
$mutex->up();
or
# Once
my $mutex = Thread::Semaphore->new(); # Shared by all threads.
STDOUT->autoflush();
STDERR->autoflush();
# When you want to print.
$mutex->down();
print ...;
$mutex->up();
You can utilize the blocking behavior of $sem->down if it attempts to decrease the semaphore counter below zero, as mentioned in perldoc perlthrtut:
If down() attempts to decrement the counter below zero, it blocks
until the counter is large enough.
So here's what one could do:
Initialize a semaphore with counter 1 that is shared across all threads
my $sem = Thread::Semaphore->new( 1 );
Pass a thread counter to worker and Build
for my $thr_counter ( 1 .. NUM_WORKERS ) {
async {
while ( defined( my $job = $q->dequeue() ) ) {
worker( $job, $thr_counter );
}
};
}
sub worker {
my ( $job, $counter ) = #_;
Build( $component, $action, $counter );
}
Go ->down and ->up inside Build (and nowhere else)
sub Build {
my ( $comp, $action, $counter ) = #_;
... # Execute all concurrently-executed code here
$sem->down( 1 << ( $counter -1 ) );
print "\n\t\t*** Performing Action: \'$cmd\' on $comp ***" if $verbose;
# Execute all sequential 'chunks' here
$sem->up( 1 << ( $counter - 1) );
}
By using the thread counter to left-shift the semaphore counter, it guarantees that the threads won't trample on one another:
+-----------+---+---+---+---+
| Thread | 1 | 2 | 3 | 4 |
+-----------+---+---+---+---+
| Semaphore | 1 | 2 | 4 | 8 |
+-----------+---+---+---+---+
I've approached this problem differently in the past, by creating an IO thread, and using that to serialise the file access.
E.g.
my $output_q = Thread::Queue -> new();
sub writer {
open ( my $output_fh, ">", $output_filename );
while ( my $line = $output_q -> dequeue() ) {
print {$output_fh} $line;
}
close ( $output_fh );
}
And within threads, 'print' by:
$output_q -> enqueue ( "text_to_print\n"; );
Either with or without a wrapper - e.g. for timestamping statements if they're going to a log. (You probably want to timestamp when queued, rather than when actually printer).
I was provided some guidance on here at one time, with the following snippet:
my $q = Thread::Queue->new();
sub worker {
my ($job, $action) = #_;
Build($job, $action);
}
for (1..NUM_WORKERS) {
async {
while (defined(my $job = $q->dequeue())) {
worker($job, 'clean');
}
};
}
$q->enqueue($_) for #compsCopy;
# When you're done adding to the queue.
$q->end();
$_->join() for threads->list();
What is the best option for reusing q? Currently, I'm just making new q objects, q2, q3 and doing all of this over again for each $action that I want to perform. Is there a better way though? I could potentially pass in an array of "actions" that I would like to perform, and would like to avoid duplicating this code 7 times if possible.
Maybe I don't fully understand what a Thread::Queue is..
You should use one queue for one direction. If you just would like to some operation paralel, use one queue. If you would like to report back errors and process those error in main or another thread then you use two queue.
for simple use here is for your reference:
use strict;
use warnings;
use threads;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
my %seen: shared;
# Worker thread
my #thrs = threads->create(\&doOperation ) for 1..5;#for 5 threads
add_file_to_q('/tmp/');
$q->enqueue('//_DONE_//') for #thrs;
$_->join() for #thrs;
sub add_file_to_q {
my $dir = shift;
my #files = `ls -1 $dir/`;chomp(#files);
#add files to queue
foreach my $f (#files){
# Send work to the thread
$q->enqueue($f);
print "Pending items: "$q->pending()."\n";
}
}
sub doOperation () {
my $ithread = threads->tid() ;
while (my $filename = $q->dequeue()) {
# Do work on $item
sleep(1) if ! defined $filename;
return 1 if $filename eq '//_DONE_//';
next if $seen{$filename};
print "[id=$ithread]\t$filename\n";
$seen{$filename} = 1;
### add files if it is a directory (check with symlinks, no file with //_DONE_// name!)
add_file_to_q($filename) if -d $filename;
}
return 1;
}
I'm trying to accomplish the following:
Have a thread that reads data from a very large file say about
10GB and push them into the queue. (I do not wish for the queue to
get very large either)
While the buildQueue thread is pushing data to the queue at the same time have
about 5 worker threads de-queue and process data.
I've made an attempt but my other threads are unreachable because of a continuous loop in my buildQueue thread.
My approach may be totally wrong. Thanks for any help, it's much appreciated.
Here's the code for buildQueue:
sub buildQueue {
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
open DICT_FILE, $dict_path or die("Sorry, could not open file!");
while (1) {
if (<DICT_FILE>) {
if ($queue->pending() < 100) {
my $query = <DICT_FILE>;
chomp($query);
$queue->enqueue($query);
my $count = $queue->pending();
print "Queue Size: $count Query: $query\n";
}
}
}
}
And as I've expected when this thread gets executed nothing else after will be executed because this thread will not finish.
my $builder = new Thread(&buildQueue);
Since the builder thread will be running for a long time I never get to create worker threads.
Here's the entire code:
#!/usr/bin/perl -w
use strict;
use Thread;
use Thread::Queue;
my $queue = new Thread::Queue();
my #threads;
sub buildQueue {
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
open dict_file, $dict_path or die("Sorry, could not open file!");
while (1) {
if (<dict_file>) {
if ($queue->pending() < 100) {
my $query = <dict_file>;
chomp($query);
$queue->enqueue($query);
my $count = $queue->pending();
print "Queue Size: $count Query: $query\n";
}
}
}
}
sub processor {
my $query;
while (1) {
if ($query = $queue->dequeue) {
print "$query\n";
}
}
}
my $builder = new Thread(&buildQueue);
push #threads, new Thread(&processor) for 1..5;
You'll need to mark when you want your threads to exit (via either joinor detach ). The fact that you have infinite loops with no last statements to break out of them is also a problem.
Edit: I also forgot a very important part! Each worker thread will block, waiting for another item to process off of the queue until they get an undef in the queue. Hence why we specifically enqueue undef once for each thread after the queue builder is done.
Try:
#!/usr/bin/perl -w
use strict;
use threads;
use Thread::Queue;
my $queue = new Thread::Queue();
our #threads; #Do you really need our instead of my?
sub buildQueue
{
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
#Three-argument open, please!
open my $dict_file, "<",$dict_path or die("Sorry, could not open file!");
while(my $query=<$dict_file>)
{
chomp($query);
while(1)
{ #Wait to see if our queue has < 100 items...
if ($queue->pending() < 100)
{
$queue->enqueue($query);
print "Queue Size: " . $queue->pending . "\n";
last; #This breaks out of the infinite loop
}
}
}
close($dict_file);
foreach(1..5)
{
$queue->enqueue(undef);
}
}
sub processor
{
my $query;
while ($query = $queue->dequeue)
{
print "Thread " . threads->tid . " got $query\n";
}
}
my $builder=threads->create(\&buildQueue);
push #threads,threads->create(\&process) for 1..5;
#Waiting for our threads to finish.
$builder->join;
foreach(#threads)
{
$_->join;
}
The MCE module for Perl loves big files. With MCE, one can chunk many lines at once, slurp a big chunk as a scalar string, or read 1 line at a time. Chunking many lines at once reduces the overhead for IPC.
MCE 1.504 is out now. It provides MCE::Queue with support for child processes including threads. In addition, the 1.5 release comes with 5 models (MCE::Flow, MCE::Grep, MCE::Loop, MCE::Map, and MCE::Stream) which take care of instantiating the MCE instance as well as auto-tuning max_workers and chunk_size. One may override these options btw.
Below, MCE::Loop is used for the demonstration.
use MCE::Loop;
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
mce_loop_f {
my ($mce, $chunk_ref, $chunk_id) = #_;
foreach my $line ( #$chunk_ref ) {
chomp $line;
## add your code here to process $line
}
} $dict_path;
If you want to specify the number of workers and/or chunk_size, then there are 2 ways to do it.
use MCE::Loop max_workers => 5, chunk_size => 300000;
Or...
use MCE::Loop;
MCE::Loop::init {
max_workers => 5,
chunk_size => 300000
};
Although chunking is preferred for large files, one can compare the time with chunking one line at a time. One may omit the first line inside the block (commented out). Notice how there's no need for an inner for loop. $chunk_ref is still an array ref containing 1 line. The input scalar $_ contains the line when chunk_size equals 1, otherwise points to $chunk_ref.
use MCE::Loop;
MCE::Loop::init {
max_workers => 5,
chunk_size => 1
};
print "Enter a file name: ";
my $dict_path = <STDIN>;
chomp($dict_path);
mce_loop_f {
# my ($mce, $chunk_ref, $chunk_id) = #_;
my $line = $_;
## add your code here to process $line or $_
} $dict_path;
I hope that this demonstration was helpful for folks wanting to process a file in parallel.
:) mario
It sounds like this case could do with the Parallel::ForkManager module.
A different approach: You can also use user_tasks in MCE 1.2+ and create two multi-worker multithreading tasks, one task for reading (since it's a big file, you could also benefit from parallel reading while preserving file read seek) and one task for processing, etc.
The code below still uses Thread::Queue to manage your buffer queue.
The buildQueue sub has your queue size control and it pushes the data directly to the manager process' $R_QUEUE since we've used threads, so it has access to the parent's memory space. If you want to use forks instead, you can still access the queue through a call back function. But here I chose to simply just push to the queue.
The processQueue sub will simply de-queue whatever is in the queue until there's nothing more pending.
The task_end sub in each task is run only once by the manager process at the end of each task, so we use it to signal a stop to our worker processes.
Obviously, there's a lot of freedom in how you want to chunk your data to the workers, so you can decide upon the size of the chunk or even how to slurp your data in.
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use threads::shared;
use Thread::Queue;
use MCE;
my $R_QUEUE = Thread::Queue->new;
my $queue_workers = 8;
my $process_workers = 8;
my $chunk_size = 1;
print "Enter a file name: ";
my $input_file = <STDIN>;
chomp($input_file);
sub buildQueue {
my ($self, $chunk_ref, $chunk_id) = #_;
if ($R_QUEUE->pending() < 100) {
$R_QUEUE->enqueue($chunk_ref);
$self->sendto('stdout', "Queue Size: " . $R_QUEUE->pending ."\n");
}
}
sub processQueue {
my $self = shift;
my $wid = $self->wid;
while (my $buff = $R_QUEUE->dequeue) {
$self->sendto('stdout', "Thread " . $wid . " got $$buff");
}
}
my $mce = MCE->new(
input_data => $input_file, # this could be a filepath or a file handle or even a scalar to treat like a file, check the documentation for more details.
chunk_size => $chunk_size,
use_slurpio => 1,
user_tasks => [
{ # queueing task
max_workers => $queue_workers,
user_func => \&buildQueue,
use_threads => 1, # we'll use threads to have access to the parent's variables in shared memory.
task_end => sub { $R_QUEUE->enqueue( (undef) x $process_workers ) } # signal stop to our process workers when they hit the end of the queue. Thanks > Jack Maney!
},
{ # process task
max_workers => $process_workers,
user_func => \&processQueue,
use_threads => 1, # we'll use threads to have access to the parent's variables in shared memory
task_end => sub { print "Finished processing!\n"; }
}
]
);
$mce->run();
exit;