Memory leak in WebTestCase or in Symfony2-Kernel - how to report to the gurus?

Memory leak in WebTestCase or in Symfony2-Kernel - how to report to the gurus? - memory-leaks

I think I have found a memory-leak in the WebTestCase class or in the Kernel itself. My questions at the end.
To reproduce, I make a new empty WebTestCase that does $max asserts (in my tests, $max ranges from 1 to 100.000) without instancing any kernel.
<?php
use Symfony\Bundle\FrameworkBundle\Test\WebTestCase;
class DemoTest extends WebTestCase
{
public function testHello( )
{
$max = 100000;
for( $i = 0; $i < 1; $i++ )
{
$this->assertTrue( true );
}
}
}
With $max=1
Time: 0 seconds, Memory: 5.75Mb
OK (1 test, 1 assertion)
With $max=100000
Time: 1 second, Memory: 5.75Mb
OK (1 test, 100000 assertions)
As expected, either running 1 time or 100.000 times, I consume the same memory.
Now I try the loop creating any arbitrary class, working with it and destroying it. I choose DOMDocument the same way I could have chosen any other class.
public function testHello( )
{
$max = 100000;
for( $i = 0; $i < $max; $i++ )
{
$dummy = new \DOMDocument();
$dummy->loadHTML( '<html><head><title>dummy</title></head><body>dummy</body></html>' );
unset( $dummy );
$this->assertTrue( true );
}
}
With $max=1
Time: 0 seconds, Memory: 5.75Mb
OK (1 test, 1 assertion)
With $max=100000
Time: 2 seconds, Memory: 5.75Mb
OK (1 test, 100000 assertions)
Again, regardless if I execute it 1 time or 100.000, I consume the same memory.
But... Now I create and destroy kernels within the loop. I NEITHER DO the ->boot() nor the ->shutdown(), just obtain the new kernel objects that, without even being booted, are then unreferenced, ans so they should die freeing all their resources.
public function testHello( )
{
$max = 10000;
for( $i = 0; $i < $max; $i++ )
{
$dummy = static::createKernel();
unset( $dummy );
$this->assertTrue( true );
}
}
With $max=1
Time: 0 seconds, Memory: 6.25Mb
OK (1 test, 1 assertion)
With $max=100000
Time: 9 seconds, Memory: 165.50Mb
OK (1 test, 100000 assertions)
It is logical that it raises from 5.75 to 6.25, it might be the size of the kernel's resources. Half mega. But what is incorrect it is the fact of taht by creating 100.000 kernels without any operations on them, we tend to consume 165 megas. Somebody is freeing bad the resources that it allocated.
My questions:
Is this a leak? Or it is the expected behaviour which, for any extrange reason I don't get why should this work as it does?
I use 2.0.10. If it is a leak, how can I know if this is corrected in a later version?
If it has not been corrected, which is the easiest way to report it to the core-team?
Thanks!
Xavi.

Symfony2 code is hosted on github.com and issues must declared on this site https://github.com/symfony/symfony/issues
I encounter this kind of issue with heavy web testcases too, and it's not fixed in 2.0.x versions. I can't ensure you it's a memory leak or normal behaviour...
Google Groups for developers : http://groups.google.com/group/symfony-devs?pli=1
Google Groups for users : http://groups.google.com/group/symfony2?pli=1
Symfony2 forum : http://forum.symfony-project.org/viewforum.php?f=23
I advise you the Google Groups mailing list, and then GitHub issues.

Related

NodeJS, Promises and performance

My question is about performance in my NodeJS app...
If my program run 12 iteration of 1.250.000 each = 15.000.000 iterations all together - it takes dedicated servers at Amazon the following time to process:
r3.large: 2 vCPU, 6.5 ECU, 15 GB memory --> 123 minutes
4.8xlarge: 36 vCPU, 132 ECU, 60 GB memory --> 102 minutes
I have some code similair to the code below...
start();
start(){
for(var i=0; i<12; i++){
function2(); // Iterates over a collection - which contains data split up in intervals - by date intervals. This function is actually also recursive - due to the fact - that is run through the data many time (MAX 50-100 times) - due to different intervals sizes...
}
}
function2(){
return new Promise{
for(var i=0; i<1.250.000; i++){
return new Promise{
function3(); // This function simple iterate through all possible combinations - and call function3 - with all given values/combinations
}
}
}
}
function3(){
return new Promise{ // This function simple make some calculations based on the given values/combination - and then return the result to function2 - which in the end - decides which result/combination was the best...
}}
This is equal to 0.411 millisecond / 441 microseconds pér iteration!
When i look at performance and memory usage in the taskbar... the CPU is not running at 100% - but more like 50%...the entire time?
The memory usage starts very low - but KEEPS growing in GB - every minute until the process is done - BUT the (allocated) memory is first released when i press CTRL+C in the Windows CMD... so its like the NodeJS garbage collection doesn't not work optimal - or may be its simple the design of the code again...
When i execute the app i use the memory opt like:
node --max-old-space-size="50000" server.js
PLEASE tell me every thing you thing i can do - to make my program FASTER!
Thank you all - so much!

It's not that the garbage collector doesn't work optimally but that it doesn't work at all - you don't give it any chance to.
When developing the tco module that does tail call optimization in Node i noticed a strange thing. It seemed to leak memory and I didn't know why. It turned out that it was because of few console.log()
calls in various places that I used for testing to see what's going on because seeing a result of recursive call millions levels deep took some time so I wanted to see something while it was doing it.
Your example is pretty similar to that.
Remember that Node is single-threaded. When your computations run, nothing else can - including the GC. Your code is completely synchronous and blocking - even though it's generating millions of promises in a blocking manner. It is blocking because it never reaches the event loop.
Consider this example:
var a = 0, b = 10000000;
function numbers() {
while (a < b) {
console.log("Number " + a++);
}
}
numbers();
It's pretty simple - you want to print 10 million numbers. But when you run it it behaves very strangely - for example it prints numbers up to some point, and then it stops for several seconds, then it keeps going or maybe starts trashing if you're using swap, or maybe gives you this error that I just got right after seeing the Number 8486:
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted
What's going on here is that the main thread is blocked in a synchronous loop where it keeps creating objects but the GC has no chance to release them.
For such long running tasks you need to divide your work and get into the event loop once in a while.
Here is how you can fix this problem:
var a = 0, b = 10000000;
function numbers() {
var i = 0;
while (a < b && i++ < 100) {
console.log("Number " + a++);
}
if (a < b) setImmediate(numbers);
}
numbers();
It does the same - it prints numbers from a to b but in bunches of 100 and then it schedules itself to continue at the end of the event loop.
Output of $(which time) -v node numbers1.js 2>&1 | egrep 'Maximum resident|FATAL'
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Maximum resident set size (kbytes): 1495968
It used 1.5GB of memory and crashed.
Output of $(which time) -v node numbers2.js 2>&1 | egrep 'Maximum resident|FATAL'
Maximum resident set size (kbytes): 56404
It used 56MB of memory and finished.
See also those answers:
How to write non-blocking async function in Express request handler
How node.js server serve next request, if current request have huge computation?
Maximum call stack size exceeded in nodejs
Node; Q Promise delay
How to avoid jimp blocking the code node.js

Address certain core for threads in Perl

I have a list of 40 files, which I want to modify through my script.
Since every file processed in the same way, I want to use Threads to speed it up.
Therefore I have this construct :
my $threads_ = sub
{
while (defined(my $taskRef = $q->dequeue()))
{
my $work= shift(#{$workRef});
&{\&{$work}}(#{$workRef});
my $open= $q->open() - 1;
}
};
my #Working;
for( my $i = 1; $i < 8; $i++)
{
push #Working, threads->new($threads_);
}
And I have this code for starting a thread for every file
foreach my $File (#Filelist)
{
$q->enqueue(['mySub',$FirstVar,$SecondVar]);
}
But it still takes way to long time.
My question is, is there a certain way to assign each thread to a single Core, in order to speed it up?

I'd use Parallel::ForkManager for something like this; it works great. I'd recommend not brewing your own when an accepted standard solution exists. By "address certain core", I take it to mean your purpose is to limit the number of concurrent tasks to the number of available processors and ForkManager will do this for you -- just set the max number of processes when you initialize your ForkManager object.
The commenters above were absolutely correct to point out that I/O will eventually limit your throughput, but it's easy enough to determine when adding more processes fails to speed things up.

How do I queue perl subroutines to a thread queue instead of data?

Background:
In reading how to multithread my perl script, I read (from http://perldoc.perl.org/threads.html#BUGS-AND-LIMITATIONS)
On most systems, frequent and continual creation and destruction of
threads can lead to ever-increasing growth in the memory footprint of
the Perl interpreter. While it is simple to just launch threads and
then ->join() or ->detach() them, for long-lived applications, it is
better to maintain a pool of threads, and to reuse them for the work
needed, using queues to notify threads of pending work.
My script will be long-lived; it's an PKI LDAP directory monitoring daemon that will always be running. The enterprise monitoring solution will generate an alarm if it stops running for any reason. My script will check that I can reach another PKI LDAP directory, as well as validate revocation lists on both.
Problem: Everything I can find on google shows passing variables (e.g. scalars) to the thread queue rather than the subroutine itself... I think I'm just not understanding how to implement a thread queue properly compared to how you implement a thread (without queues).
Question 1: How can I "maintain a pool of threads" to avoid the perl interpreter from slowly eating up more and more memory?
Question 2: (Unrelated but while I have this code posted) Is there a safe amount of sleep at the end of the main program so that I don't start a thread more than once in a minute? 60 seems obvious but could that ever cause it to run more than once if the loop is fast, or perhaps miss a minute because of processing time or something?
Thanks in advance!
#!/usr/bin/perl
use feature ":5.10";
use warnings;
use strict;
use threads;
use Proc::Daemon;
#
### Global Variables
use constant false => 0;
use constant true => 1;
my $app = $0;
my $continue = true;
$SIG{TERM} = sub { $continue = false };
# Directory Server Agent (DSA) info
my #ListOfDSAs = (
{ name => "Myself (inbound)",
host => "ldap.myco.ca",
base => "ou=mydir,o=myco,c=ca",
},
{ name => "Company 2",
host => "ldap.comp2.ca",
base => "ou=their-dir,o=comp2,c=ca",
}
);
#
### Subroutines
sub checkConnections
{ # runs every 5 minutes
my (#DSAs, $logfile) = #_;
# Code to ldapsearch
threads->detach();
}
sub validateRevocationLists
{ # runs every hour on minute xx:55
my (#DSAs, $logfile) = #_;
# Code to validate CRLs haven't expired, etc
threads->detach();
}
#
### Main program
Proc::Daemon::Init;
while ($continue)
{
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
# Question 1: Queues??
if ($min % 5 == 0 || $min == 0)
{ threads->create(&checkConnections, #ListOfDSAs, "/var/connect.log"); }
if ($min % 55 == 0)
{ threads->create(&validateRevocationLists, #ListOfDSAs, "/var/RLs.log"); }
sleep 60; # Question 2: Safer/better way to prevent multiple threads being started for same check in one matching minute?
}
# TERM RECEIVED
exit 0;
__END__

use threads;
use Thread::Queue 3.01 qw( );
my $check_conn_q = Thread::Queue->new();
my $validate_revoke_q = Thread::Queue->new();
my #threads;
push #threads, async {
while (my $job = $check_conn_q->dequeue()) {
check_conn(#$job);
}
};
push #threads, async {
while (my $job = $validate_revoke_q->dequeue()) {
validate_revoke(#$job);
}
};
while ($continue) {
my ($S,$M,$H,$m,$d,$Y) = localtime; $m+=1; $Y+=1900;
$check_conn_q->enqueue([ #ListOfDSAs, "/var/connect.log" ])
if $M % 5 == 0;
$validate_revoke_q->enqueue([ #ListOfDSAs, "/var/RLs.log" ])
if $M == 55;
sleep 30;
}
$check_conn_q->end();
$validate_revoke_q->end();
$_->join for #threads;
I'm not sure parallelisation is needed here. If it's not, you could simply use
use List::Util qw( min );
sub sleep_until {
my ($until) = #_;
my $time = time;
return if $time >= $until;
sleep($until - $time);
}
my $next_check_conn = my $next_validate_revoke = time;
while ($continue) {
sleep_until min $next_check_conn, $next_validate_revoke;
last if !$continue;
my $time = time;
if ($time >= $next_check_conn) {
check_conn(#ListOfDSAs, "/var/connect.log");
$next_check_conn = time + 5*60;
}
if ($time >= $next_validate_revoke) {
validate_revoke(#ListOfDSAs, "/var/RLs.log");
$next_validate_revoke = time + 60*60;
}
}

I would recommend just running the checks one at a time, as there does not appear to be a compelling reason to use threads here, and you don't want to add unnecessary complexity to a program that will be running all the time.
If you do want to learn how use a thread pool, there are examples included with the threads module. There is also a Thread::Pool module that may be useful.
As for ensuring you don't repeat the checks in the same minute, you are correct that sleeping for 60 seconds will be inadequate. No matter what value you choose to sleep, you will have edge cases in which it fails: either it will be slightly shorter than a minute, and you will occasionally have two checks in the same minute, or it will be slightly longer than a minute, and you will occasionally miss a check altogether.
Instead, use a variable to remember when the task was last done. You can then use a shorter sleep time without worrying about multiple checks per minute.
my $last_task_time = -1;
while ($continue)
{
my $min = (localtime(time))[1];
if ($last_task_time != $min &&
($min % 5 == 0 || $min > ($last_task_time+5)%60))
{
#Check connections here.
if ($min == 55 || ($last_task_time < 55 && $min > 55))
{
#Validate revocation lists here.
}
$last_task_time = $min;
}
else
{
sleep 55; #Ensures there is at least one check per minute.
}
}
Update: I fixed the code so that it will recover if the last task ran too long. This would be fine if it occasionally takes a long time. If the tasks are frequently taking longer than five minutes, though, you need a different solution (threads would probably make sense in that case).

Perl script execution keeps getting killed - running out of memory

I am trying to execute a perl script that processes a small 12 x 2 text file (approx. 260 bytes) and a large .bedgraph file (at least 1.3 MB in size). From these two files, the script outputs a new bedgraph file.
I have ran this script on 3 other .bedgraph files but I try to run it on the rest of them the process keeps getting Killed.
It should take about 20 minutes on average for the perl script to run on each of the .bedgraph files.
I'm running the perl script on my local machine (not from a server). I'm using a Linux OS Ubuntu 12.04 system 64-bit 4GB RAM.
Why does my perl script execution keeps getting killed and how can I fix this?
Here's the script:
# input file handle
open(my $sizes_fh, '<', 'S_lycopersicum_chromosomes.size') or die $!;
# output file handles
open(my $output, '+>', 'tendaysafterbreaker_output.bedgraph') or die $!;
my #array;
while(<$sizes_fh>){
chomp;
my ($chrom1, $size) = split(/\t/, $_);
#array = (0) x $size;
open(my $bedgraph_fh, '<', 'Solanum_lycopersicum_tendaysafterbreaker.bedgraph') or die $!;
while(<$bedgraph_fh>){
chomp;
my ($chrom2, $start, $end, $FPKM) = split(/\t/, $_);
if ($chrom1 eq $chrom2){
for(my $i = $start; $i < $end; $i++){
$array[$i] += $FPKM;
}
}
}
close $bedgraph_fh or warn $!;
my ($last_start, $last_end) = 0;
my $last_value = $array[0];
for (my $i = 1; $i < $#array; $i++){
my $curr_val = $array[$i];
my $curr_pos = $i;
# if the current value is not equal to the last value
if ($curr_val != $last_value){
my $last_value = $curr_val;
print $output "$chrom1\t$last_start\t$last_end\t$last_value\n";
$last_start = $last_end = $curr_pos;
} else {
$last_end = $i;
}
}
}
close $sizes_fh or warn $!;

You are trying to allocate an array of 90,000,000 elements. Perl, due to its flexible typing and other advanced variable features, uses a lot more memory for this than you would expect.
On my (Windows 7) machine, a program that just allocates such an array and does nothing else eats up 3.5 GB of RAM.
There are various ways to avoid this huge memory usage. Here are a couple:
The PDL module for scientific data processing, which is designed to efficiently store huge numeric arrays in memory. This will change the syntax for allocating and using the array, though (and it messes around with Perl's syntax in various other ways).
DBM::Deep is a module that allocates a database in a file--and then lets you access that database through a normal array or hash:
use DBM::Deep;
my #array;
my $db = tie #array, "DBM::Deep", "array.db";
#Now you can use #array like a normal array, but it will be stored in a database.

If you know a bit of C, it is quite simple to offload the array manipulation into low-level code. Using a C array takes less space, and is a lot faster. However, you loose nice stuff like bounds checking. Here is an implementation with Inline::C:
use Inline 'C';
...;
__END__
__C__
// note: I don't know if your data contains only ints or doubles. Adjust types as needed
int array_len = -1; // last index
int *array = NULL;
void make_array(int size) {
free(array);
// if this fails, start checking return value of malloc for != NULL
array = (int*) malloc(sizeof(int) * size);
array_len = size - 1;
}
// returns false on bounds error
int array_increment(int start, int end, int fpkm) {
if ((end - 1) > array_len) return 0;
int i;
for (i = start; i < end; i++) {
array[i] += fpkm;
}
return 1;
}
// please check if this is actually equivalent to your code.
// I removed some unneccessary-looking variables.
void loop_over_array(char* chrom1) {
int
i,
last_start = 0,
last_end = 0,
last_value = array[0];
for(i = 1; i < array_len; i++) { // are you sure not `i <= array_len`?
if (array[i] != last_value) {
last_value = array[i];
// I don't know how to use Perl filehandles from C,
// so just redirect the output on the command line
printf("%s\t%d\t%d\t%d\n", chrom1, last_start, last_end, last_value);
last_start = i;
}
last_end = i;
}
}
void free_array {
free(array);
}
Minimal testing code:
use Test::More;
make_array(15);
ok !array_increment(0, 16, 2);
make_array(95_000_000);
ok array_increment(0, 3, 1);
ok array_increment(2, 95_000_000, 1);
loop_over_array("chrom");
free_array();
done_testing;
The output of this test case is
chrom 0 1 2
chrom 2 2 1
(with testing output removed). It may take a second to compile, but after that it should be quite fast.

In the records read from $bedgraph_fh, what's a typical value for $start? Although hashes have more overhead per entry than arrays, you may be able to save some memory if #array starts with a lot of unused entries. e.g., If you have an #array of 90 million elements, but the first 80 million are never used, then there's a good chance you'll be better off with a hash.
Other than that, I don't see any obvious cases of this code holding on to data that's not needed by the algorithm it implements, although, depending on your actual objective, it is possible that there may be an alternative algorithm which doesn't require as much data to be held in memory.
If you really need to be dealing with a set of 90 million active data elements, though, then your primary options are going to be either buy a lot of RAM or use some form of database. In the latter case, I'd opt for SQLite (via DBD::SQLite) for simplicity and light weight, but YMMV.

Efficiency of Threading/Forking

So, I was considering using forking or threading to do some simple parralelization. To make sure that it was worth it, I wrote three simple scripts to benchmark sequential vs threading vs forking. I used two very simple methods to initialize an array of arrays and then another method to find the max element in each array and write it to a file.
Methods:
sub initialize
{
for (my $i=0; $i <= 2; $i++)
{
for (my $j=0; $j < 5000000; $j++)
{
$array[$i][$j]=$j+$i;
}
}
}
sub getMax
{
my $num = shift;
my $array = shift;
my $length=scalar(#{$array});
my $max=-9**9**9;
my #ra;
for (my $i=0; $i < $length; $i++)
{
if ($max < ${$array}[$i])
{
$max=${$array}[$i];
}
}
tie #ra, 'Tie::File', "test.txt" or die;
$ra[$num]=$max;
}
Sequential:
my $start = Time::HiRes::time();
for (my $count = 0; $count <= 2; $count++)
{
getMax($count,$array[$count]);
}
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "End of main program\n";
Threading:
my #threads=();
my $start = Time::HiRes::time();
for (my $count = 0; $count <= 2; $count++)
{
my $t = threads->new(\&getMax, $count, $array[$count]);
push(#threads,$t);
}
foreach (#threads)
{
my $num = $_->join;
}
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "End of main program\n";
Forking:
my $pm = Parallel::ForkManager->new(3);
my $start = Time::HiRes::time();
for (my $count = 0; $count <= 2; $count++)
{
my $pid = $pm->start and next;
getMax($count,$array[$count]);
$pm->finish;
}
$pm->wait_all_children;
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "\nEnd of main program\n";
Sequential: 2.88 sec
Threading: 4.10 sec
Forking: 3.88 sec
I guess that for my purposes (obviously not this, but something not too much more computationally intensive), threading/forking is not helpful. I understand that the two are not solely used for temporal efficiency, but I imagine that's one of the benefits depending on what you're doing. So, my question is when exactly does threading/forking actually make one's code run faster?

The processor and memory are the fastest components of a computer. Because fast memory is also expensive, disk drives are used to store large amounts of data inexpensively, with the trade-off that it is very much slower to access.
When computer programs rely on data from slow media, the faster components can often be left with nothing to do until the necessary data arrives. The primary use of multithreading is to allow the processor to get on with something else while waiting for a required resource.
The sorts of things that can be done in parallel are
Keeping the user interface functional while waiting for something to complete
Doing multi-processor calculations
Fetching data from from multiple internet sites
Reading from multiple disk drives
The important thing about all of these is that multithreading is only advantageous if the threads don't compete with each other for the same resources.
Trying to speed up a disk read by reading half the data in each of two threads, for instance, will not be successful, because there is a bottleneck at the disk controller and a limit to how fast it can return data. But RAID drives can speed things up by reading part of the data from each of several drives at the same time.
In your example, there is only one processor that can do the maximum calculation. Getting several threads doing it doesn't mean the processor can do the work any faster, and in fact it will be slowed down by having to switch between threads. However, if you could arrange for each thread to be run on a separate processor of a multi-processor system you would get an advantage. This technique is often used by audio-visual software to get the maximum speed of processing.
Similarly, fetching data from multiple internet sources in parallel can be very useful, but only until the capacity of the link has been reached, when the threads will start competing with each other for bandwidth.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string