Efficiency of Threading/Forking - multithreading

So, I was considering using forking or threading to do some simple parralelization. To make sure that it was worth it, I wrote three simple scripts to benchmark sequential vs threading vs forking. I used two very simple methods to initialize an array of arrays and then another method to find the max element in each array and write it to a file.
Methods:
sub initialize
{
for (my $i=0; $i <= 2; $i++)
{
for (my $j=0; $j < 5000000; $j++)
{
$array[$i][$j]=$j+$i;
}
}
}
sub getMax
{
my $num = shift;
my $array = shift;
my $length=scalar(#{$array});
my $max=-9**9**9;
my #ra;
for (my $i=0; $i < $length; $i++)
{
if ($max < ${$array}[$i])
{
$max=${$array}[$i];
}
}
tie #ra, 'Tie::File', "test.txt" or die;
$ra[$num]=$max;
}
Sequential:
my $start = Time::HiRes::time();
for (my $count = 0; $count <= 2; $count++)
{
getMax($count,$array[$count]);
}
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "End of main program\n";
Threading:
my #threads=();
my $start = Time::HiRes::time();
for (my $count = 0; $count <= 2; $count++)
{
my $t = threads->new(\&getMax, $count, $array[$count]);
push(#threads,$t);
}
foreach (#threads)
{
my $num = $_->join;
}
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "End of main program\n";
Forking:
my $pm = Parallel::ForkManager->new(3);
my $start = Time::HiRes::time();
for (my $count = 0; $count <= 2; $count++)
{
my $pid = $pm->start and next;
getMax($count,$array[$count]);
$pm->finish;
}
$pm->wait_all_children;
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "\nEnd of main program\n";
Sequential: 2.88 sec
Threading: 4.10 sec
Forking: 3.88 sec
I guess that for my purposes (obviously not this, but something not too much more computationally intensive), threading/forking is not helpful. I understand that the two are not solely used for temporal efficiency, but I imagine that's one of the benefits depending on what you're doing. So, my question is when exactly does threading/forking actually make one's code run faster?

The processor and memory are the fastest components of a computer. Because fast memory is also expensive, disk drives are used to store large amounts of data inexpensively, with the trade-off that it is very much slower to access.
When computer programs rely on data from slow media, the faster components can often be left with nothing to do until the necessary data arrives. The primary use of multithreading is to allow the processor to get on with something else while waiting for a required resource.
The sorts of things that can be done in parallel are
Keeping the user interface functional while waiting for something to complete
Doing multi-processor calculations
Fetching data from from multiple internet sites
Reading from multiple disk drives
The important thing about all of these is that multithreading is only advantageous if the threads don't compete with each other for the same resources.
Trying to speed up a disk read by reading half the data in each of two threads, for instance, will not be successful, because there is a bottleneck at the disk controller and a limit to how fast it can return data. But RAID drives can speed things up by reading part of the data from each of several drives at the same time.
In your example, there is only one processor that can do the maximum calculation. Getting several threads doing it doesn't mean the processor can do the work any faster, and in fact it will be slowed down by having to switch between threads. However, if you could arrange for each thread to be run on a separate processor of a multi-processor system you would get an advantage. This technique is often used by audio-visual software to get the maximum speed of processing.
Similarly, fetching data from multiple internet sources in parallel can be very useful, but only until the capacity of the link has been reached, when the threads will start competing with each other for bandwidth.

Related

Address certain core for threads in Perl

I have a list of 40 files, which I want to modify through my script.
Since every file processed in the same way, I want to use Threads to speed it up.
Therefore I have this construct :
my $threads_ = sub
{
while (defined(my $taskRef = $q->dequeue()))
{
my $work= shift(#{$workRef});
&{\&{$work}}(#{$workRef});
my $open= $q->open() - 1;
}
};
my #Working;
for( my $i = 1; $i < 8; $i++)
{
push #Working, threads->new($threads_);
}
And I have this code for starting a thread for every file
foreach my $File (#Filelist)
{
$q->enqueue(['mySub',$FirstVar,$SecondVar]);
}
But it still takes way to long time.
My question is, is there a certain way to assign each thread to a single Core, in order to speed it up?
I'd use Parallel::ForkManager for something like this; it works great. I'd recommend not brewing your own when an accepted standard solution exists. By "address certain core", I take it to mean your purpose is to limit the number of concurrent tasks to the number of available processors and ForkManager will do this for you -- just set the max number of processes when you initialize your ForkManager object.
The commenters above were absolutely correct to point out that I/O will eventually limit your throughput, but it's easy enough to determine when adding more processes fails to speed things up.

How to get started multithreading in Perl

I have a perl program that takes over 13 hours to run. I think it could benefit from introducing multithreading but I have never done this before and I'm at a loss as to how to begin.
Here is my situation:
I have a directory of hundreds of text files. I loop through every file in the directory using a basic for loop and do some processing (text processing on the file itself, calling an outside program on the file, and compressing it). When complete I move on to the next file. I continue this way doing each file, one after the other, in a serial fashion. The files are completely independent from each other and the process returns no values (other than success/failure codes) so this seems like a good candidate for multithreading.
My questions:
How do I rewrite my basic loop to take advantage of threads? There appear to be several moduals for threading out there.
How do I control how many threads are currently running? If I have N cores available, how do I limit the number of threads to N or N - n?
Do I need to manage the thread count manually or will Perl do that for me?
Any advice would be much appreciated.
Since your threads are simply going to launch a process and wait for it to end, best to bypass the middlemen and just use processes. Unless you're on a Windows system, I'd recommend Parallel::ForkManager for your scenario.
use Parallel::ForkManager qw( );
use constant MAX_PROCESSES => ...;
my $pm = Parallel::ForkManager->new(MAX_PROCESSES);
my #qfns = ...;
for my $qfn (#qfns) {
my $pid = $pm->start and next;
exec("extprog", $qfn)
or die $!;
}
$pm->wait_all_children();
If you wanted you avoid using needless intermediary threads in Windows, you'd have to use something akin to the following:
use constant MAX_PROCESSES => ...;
my #qfns = ...;
my %children;
for my $qfn (#qfns) {
while (keys(%children) >= MAX_PROCESSES) {
my $pid = wait();
delete $children{$pid};
}
my $pid = system(1, "extprog", $qfn);
++$children{$pid};
}
while (keys(%children)) {
my $pid = wait();
delete $children{$pid};
}
Someone's given your a forking example. Forks aren't native on Windows, so I'd tend to prefer threading.
For the sake of completeness - here's a rough idea of how threading works (and IMO is one of the better approaches, rather than respawning threads).
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $nthreads = 5;
my $process_q = Thread::Queue->new();
my $failed_q = Thread::Queue->new();
#this is a subroutine, but that runs 'as a thread'.
#when it starts, it inherits the program state 'as is'. E.g.
#the variable declarations above all apply - but changes to
#values within the program are 'thread local' unless the
#variable is defined as 'shared'.
#Behind the scenes - Thread::Queue are 'shared' arrays.
sub worker {
#NB - this will sit a loop indefinitely, until you close the queue.
#using $process_q -> end
#we do this once we've queued all the things we want to process
#and the sub completes and exits neatly.
#however if you _don't_ end it, this will sit waiting forever.
while ( my $server = $process_q->dequeue() ) {
chomp($server);
print threads->self()->tid() . ": pinging $server\n";
my $result = `/bin/ping -c 1 $server`;
if ($?) { $failed_q->enqueue($server) }
print $result;
}
}
#insert tasks into thread queue.
open( my $input_fh, "<", "server_list" ) or die $!;
$process_q->enqueue(<$input_fh>);
close($input_fh);
#we 'end' process_q - when we do, no more items may be inserted,
#and 'dequeue' returns 'undefined' when the queue is emptied.
#this means our worker threads (in their 'while' loop) will then exit.
$process_q->end();
#start some threads
for ( 1 .. $nthreads ) {
threads->create( \&worker );
}
#Wait for threads to all finish processing.
foreach my $thr ( threads->list() ) {
$thr->join();
}
#collate results. ('synchronise' operation)
while ( my $server = $failed_q->dequeue_nb() ) {
print "$server failed to ping\n";
}
If you need to move complicated data structures around, I'd recommend having a look at Storable - specifically freeze and thaw. These will let you shuffle around objects, hashes, arrays etc. easily in queues.
Note though - for any parallel processing option, you get good CPU utilisation, but you don't get more disk IO - that's often a limiting factor.

How do I queue perl subroutines to a thread queue instead of data?

Background:
In reading how to multithread my perl script, I read (from http://perldoc.perl.org/threads.html#BUGS-AND-LIMITATIONS)
On most systems, frequent and continual creation and destruction of
threads can lead to ever-increasing growth in the memory footprint of
the Perl interpreter. While it is simple to just launch threads and
then ->join() or ->detach() them, for long-lived applications, it is
better to maintain a pool of threads, and to reuse them for the work
needed, using queues to notify threads of pending work.
My script will be long-lived; it's an PKI LDAP directory monitoring daemon that will always be running. The enterprise monitoring solution will generate an alarm if it stops running for any reason. My script will check that I can reach another PKI LDAP directory, as well as validate revocation lists on both.
Problem: Everything I can find on google shows passing variables (e.g. scalars) to the thread queue rather than the subroutine itself... I think I'm just not understanding how to implement a thread queue properly compared to how you implement a thread (without queues).
Question 1: How can I "maintain a pool of threads" to avoid the perl interpreter from slowly eating up more and more memory?
Question 2: (Unrelated but while I have this code posted) Is there a safe amount of sleep at the end of the main program so that I don't start a thread more than once in a minute? 60 seems obvious but could that ever cause it to run more than once if the loop is fast, or perhaps miss a minute because of processing time or something?
Thanks in advance!
#!/usr/bin/perl
use feature ":5.10";
use warnings;
use strict;
use threads;
use Proc::Daemon;
#
### Global Variables
use constant false => 0;
use constant true => 1;
my $app = $0;
my $continue = true;
$SIG{TERM} = sub { $continue = false };
# Directory Server Agent (DSA) info
my #ListOfDSAs = (
{ name => "Myself (inbound)",
host => "ldap.myco.ca",
base => "ou=mydir,o=myco,c=ca",
},
{ name => "Company 2",
host => "ldap.comp2.ca",
base => "ou=their-dir,o=comp2,c=ca",
}
);
#
### Subroutines
sub checkConnections
{ # runs every 5 minutes
my (#DSAs, $logfile) = #_;
# Code to ldapsearch
threads->detach();
}
sub validateRevocationLists
{ # runs every hour on minute xx:55
my (#DSAs, $logfile) = #_;
# Code to validate CRLs haven't expired, etc
threads->detach();
}
#
### Main program
Proc::Daemon::Init;
while ($continue)
{
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
# Question 1: Queues??
if ($min % 5 == 0 || $min == 0)
{ threads->create(&checkConnections, #ListOfDSAs, "/var/connect.log"); }
if ($min % 55 == 0)
{ threads->create(&validateRevocationLists, #ListOfDSAs, "/var/RLs.log"); }
sleep 60; # Question 2: Safer/better way to prevent multiple threads being started for same check in one matching minute?
}
# TERM RECEIVED
exit 0;
__END__
use threads;
use Thread::Queue 3.01 qw( );
my $check_conn_q = Thread::Queue->new();
my $validate_revoke_q = Thread::Queue->new();
my #threads;
push #threads, async {
while (my $job = $check_conn_q->dequeue()) {
check_conn(#$job);
}
};
push #threads, async {
while (my $job = $validate_revoke_q->dequeue()) {
validate_revoke(#$job);
}
};
while ($continue) {
my ($S,$M,$H,$m,$d,$Y) = localtime; $m+=1; $Y+=1900;
$check_conn_q->enqueue([ #ListOfDSAs, "/var/connect.log" ])
if $M % 5 == 0;
$validate_revoke_q->enqueue([ #ListOfDSAs, "/var/RLs.log" ])
if $M == 55;
sleep 30;
}
$check_conn_q->end();
$validate_revoke_q->end();
$_->join for #threads;
I'm not sure parallelisation is needed here. If it's not, you could simply use
use List::Util qw( min );
sub sleep_until {
my ($until) = #_;
my $time = time;
return if $time >= $until;
sleep($until - $time);
}
my $next_check_conn = my $next_validate_revoke = time;
while ($continue) {
sleep_until min $next_check_conn, $next_validate_revoke;
last if !$continue;
my $time = time;
if ($time >= $next_check_conn) {
check_conn(#ListOfDSAs, "/var/connect.log");
$next_check_conn = time + 5*60;
}
if ($time >= $next_validate_revoke) {
validate_revoke(#ListOfDSAs, "/var/RLs.log");
$next_validate_revoke = time + 60*60;
}
}
I would recommend just running the checks one at a time, as there does not appear to be a compelling reason to use threads here, and you don't want to add unnecessary complexity to a program that will be running all the time.
If you do want to learn how use a thread pool, there are examples included with the threads module. There is also a Thread::Pool module that may be useful.
As for ensuring you don't repeat the checks in the same minute, you are correct that sleeping for 60 seconds will be inadequate. No matter what value you choose to sleep, you will have edge cases in which it fails: either it will be slightly shorter than a minute, and you will occasionally have two checks in the same minute, or it will be slightly longer than a minute, and you will occasionally miss a check altogether.
Instead, use a variable to remember when the task was last done. You can then use a shorter sleep time without worrying about multiple checks per minute.
my $last_task_time = -1;
while ($continue)
{
my $min = (localtime(time))[1];
if ($last_task_time != $min &&
($min % 5 == 0 || $min > ($last_task_time+5)%60))
{
#Check connections here.
if ($min == 55 || ($last_task_time < 55 && $min > 55))
{
#Validate revocation lists here.
}
$last_task_time = $min;
}
else
{
sleep 55; #Ensures there is at least one check per minute.
}
}
Update: I fixed the code so that it will recover if the last task ran too long. This would be fine if it occasionally takes a long time. If the tasks are frequently taking longer than five minutes, though, you need a different solution (threads would probably make sense in that case).

Perl script execution keeps getting killed - running out of memory

I am trying to execute a perl script that processes a small 12 x 2 text file (approx. 260 bytes) and a large .bedgraph file (at least 1.3 MB in size). From these two files, the script outputs a new bedgraph file.
I have ran this script on 3 other .bedgraph files but I try to run it on the rest of them the process keeps getting Killed.
It should take about 20 minutes on average for the perl script to run on each of the .bedgraph files.
I'm running the perl script on my local machine (not from a server). I'm using a Linux OS Ubuntu 12.04 system 64-bit 4GB RAM.
Why does my perl script execution keeps getting killed and how can I fix this?
Here's the script:
# input file handle
open(my $sizes_fh, '<', 'S_lycopersicum_chromosomes.size') or die $!;
# output file handles
open(my $output, '+>', 'tendaysafterbreaker_output.bedgraph') or die $!;
my #array;
while(<$sizes_fh>){
chomp;
my ($chrom1, $size) = split(/\t/, $_);
#array = (0) x $size;
open(my $bedgraph_fh, '<', 'Solanum_lycopersicum_tendaysafterbreaker.bedgraph') or die $!;
while(<$bedgraph_fh>){
chomp;
my ($chrom2, $start, $end, $FPKM) = split(/\t/, $_);
if ($chrom1 eq $chrom2){
for(my $i = $start; $i < $end; $i++){
$array[$i] += $FPKM;
}
}
}
close $bedgraph_fh or warn $!;
my ($last_start, $last_end) = 0;
my $last_value = $array[0];
for (my $i = 1; $i < $#array; $i++){
my $curr_val = $array[$i];
my $curr_pos = $i;
# if the current value is not equal to the last value
if ($curr_val != $last_value){
my $last_value = $curr_val;
print $output "$chrom1\t$last_start\t$last_end\t$last_value\n";
$last_start = $last_end = $curr_pos;
} else {
$last_end = $i;
}
}
}
close $sizes_fh or warn $!;
You are trying to allocate an array of 90,000,000 elements. Perl, due to its flexible typing and other advanced variable features, uses a lot more memory for this than you would expect.
On my (Windows 7) machine, a program that just allocates such an array and does nothing else eats up 3.5 GB of RAM.
There are various ways to avoid this huge memory usage. Here are a couple:
The PDL module for scientific data processing, which is designed to efficiently store huge numeric arrays in memory. This will change the syntax for allocating and using the array, though (and it messes around with Perl's syntax in various other ways).
DBM::Deep is a module that allocates a database in a file--and then lets you access that database through a normal array or hash:
use DBM::Deep;
my #array;
my $db = tie #array, "DBM::Deep", "array.db";
#Now you can use #array like a normal array, but it will be stored in a database.
If you know a bit of C, it is quite simple to offload the array manipulation into low-level code. Using a C array takes less space, and is a lot faster. However, you loose nice stuff like bounds checking. Here is an implementation with Inline::C:
use Inline 'C';
...;
__END__
__C__
// note: I don't know if your data contains only ints or doubles. Adjust types as needed
int array_len = -1; // last index
int *array = NULL;
void make_array(int size) {
free(array);
// if this fails, start checking return value of malloc for != NULL
array = (int*) malloc(sizeof(int) * size);
array_len = size - 1;
}
// returns false on bounds error
int array_increment(int start, int end, int fpkm) {
if ((end - 1) > array_len) return 0;
int i;
for (i = start; i < end; i++) {
array[i] += fpkm;
}
return 1;
}
// please check if this is actually equivalent to your code.
// I removed some unneccessary-looking variables.
void loop_over_array(char* chrom1) {
int
i,
last_start = 0,
last_end = 0,
last_value = array[0];
for(i = 1; i < array_len; i++) { // are you sure not `i <= array_len`?
if (array[i] != last_value) {
last_value = array[i];
// I don't know how to use Perl filehandles from C,
// so just redirect the output on the command line
printf("%s\t%d\t%d\t%d\n", chrom1, last_start, last_end, last_value);
last_start = i;
}
last_end = i;
}
}
void free_array {
free(array);
}
Minimal testing code:
use Test::More;
make_array(15);
ok !array_increment(0, 16, 2);
make_array(95_000_000);
ok array_increment(0, 3, 1);
ok array_increment(2, 95_000_000, 1);
loop_over_array("chrom");
free_array();
done_testing;
The output of this test case is
chrom 0 1 2
chrom 2 2 1
(with testing output removed). It may take a second to compile, but after that it should be quite fast.
In the records read from $bedgraph_fh, what's a typical value for $start? Although hashes have more overhead per entry than arrays, you may be able to save some memory if #array starts with a lot of unused entries. e.g., If you have an #array of 90 million elements, but the first 80 million are never used, then there's a good chance you'll be better off with a hash.
Other than that, I don't see any obvious cases of this code holding on to data that's not needed by the algorithm it implements, although, depending on your actual objective, it is possible that there may be an alternative algorithm which doesn't require as much data to be held in memory.
If you really need to be dealing with a set of 90 million active data elements, though, then your primary options are going to be either buy a lot of RAM or use some form of database. In the latter case, I'd opt for SQLite (via DBD::SQLite) for simplicity and light weight, but YMMV.

Memory leak in WebTestCase or in Symfony2-Kernel - how to report to the gurus?

I think I have found a memory-leak in the WebTestCase class or in the Kernel itself. My questions at the end.
To reproduce, I make a new empty WebTestCase that does $max asserts (in my tests, $max ranges from 1 to 100.000) without instancing any kernel.
<?php
use Symfony\Bundle\FrameworkBundle\Test\WebTestCase;
class DemoTest extends WebTestCase
{
public function testHello( )
{
$max = 100000;
for( $i = 0; $i < 1; $i++ )
{
$this->assertTrue( true );
}
}
}
With $max=1
Time: 0 seconds, Memory: 5.75Mb
OK (1 test, 1 assertion)
With $max=100000
Time: 1 second, Memory: 5.75Mb
OK (1 test, 100000 assertions)
As expected, either running 1 time or 100.000 times, I consume the same memory.
Now I try the loop creating any arbitrary class, working with it and destroying it. I choose DOMDocument the same way I could have chosen any other class.
public function testHello( )
{
$max = 100000;
for( $i = 0; $i < $max; $i++ )
{
$dummy = new \DOMDocument();
$dummy->loadHTML( '<html><head><title>dummy</title></head><body>dummy</body></html>' );
unset( $dummy );
$this->assertTrue( true );
}
}
With $max=1
Time: 0 seconds, Memory: 5.75Mb
OK (1 test, 1 assertion)
With $max=100000
Time: 2 seconds, Memory: 5.75Mb
OK (1 test, 100000 assertions)
Again, regardless if I execute it 1 time or 100.000, I consume the same memory.
But... Now I create and destroy kernels within the loop. I NEITHER DO the ->boot() nor the ->shutdown(), just obtain the new kernel objects that, without even being booted, are then unreferenced, ans so they should die freeing all their resources.
public function testHello( )
{
$max = 10000;
for( $i = 0; $i < $max; $i++ )
{
$dummy = static::createKernel();
unset( $dummy );
$this->assertTrue( true );
}
}
With $max=1
Time: 0 seconds, Memory: 6.25Mb
OK (1 test, 1 assertion)
With $max=100000
Time: 9 seconds, Memory: 165.50Mb
OK (1 test, 100000 assertions)
It is logical that it raises from 5.75 to 6.25, it might be the size of the kernel's resources. Half mega. But what is incorrect it is the fact of taht by creating 100.000 kernels without any operations on them, we tend to consume 165 megas. Somebody is freeing bad the resources that it allocated.
My questions:
Is this a leak? Or it is the expected behaviour which, for any extrange reason I don't get why should this work as it does?
I use 2.0.10. If it is a leak, how can I know if this is corrected in a later version?
If it has not been corrected, which is the easiest way to report it to the core-team?
Thanks!
Xavi.
Symfony2 code is hosted on github.com and issues must declared on this site https://github.com/symfony/symfony/issues
I encounter this kind of issue with heavy web testcases too, and it's not fixed in 2.0.x versions. I can't ensure you it's a memory leak or normal behaviour...
Google Groups for developers : http://groups.google.com/group/symfony-devs?pli=1
Google Groups for users : http://groups.google.com/group/symfony2?pli=1
Symfony2 forum : http://forum.symfony-project.org/viewforum.php?f=23
I advise you the Google Groups mailing list, and then GitHub issues.

Resources