Torch: collectgarbage() not deallocating memory of torch tensors

Torch: collectgarbage() not deallocating memory of torch tensors - memory-leaks

I am running a code that has following structure:
network = createNetwork() -- loading a pre-trained network.
function train()
for i=1,#trainingsamples do
local ip = loadInput()
local ip_1 = someImageProcessing(ip)
local ip_2 = someImageProcessing(ip)
network:forward( ...some manipulation on ip_1,ip_2...)
network:backward()
collectgarbage('collect')
print debug.getlocal -- all local variables.
end
end
I am expecting that collectgarbage() will release all the memory held by ip_1, ip_2, and ip. But I could see the memory is not released. This causes memory leak. I am wondering what's happening. Can someone please help me in understanding the strange behavior of collectgarbage() and fixing the memory leak.
I am really sorry that I could not add the full code. Hope the snippet I have added is sufficient to understand the flow of my code and my network training code is very similar to a standard CNN training code.
EDIT:
Sorry for not mentioning the variables were declared local and using a keyword for a variable in the sample snippet. I have edited it now. The only global variable is the network which is declared outside of the train function and I feed ip_1, ip_2 as inputs to the network. Also I have added trimmed version of my actual code below.
network = createNetwork()
function trainNetwork()
local parameters,gradParameters = network:getParameters()
network:training() -- set flag for dropout
local bs = 1
local lR = params.learning_rate / torch.sqrt(bs)
local optimConfig = {learningRate = params.learning_rate,
momentum = params.momentum,
learningRateDecay = params.lr_decay,
beta1 = params.optim_beta1,
beta2 = params.optim_beta2,
epsilon = params.optim_epsilon}
local nfiles = getNoofFiles('train')
local weights = torch.Tensor(params.num_classes):fill(1)
criterion = nn.ClassNLLCriterion(weights)
for ep=1,params.epochs do
IMAGE_SEQ = 1
while (IMAGE_SEQ <= nfiles) do
xlua.progress(IMAGE_SEQ, nfiles)
local input, inputd2
local color_image, depth_image2, target_image
local nextInput = loadNext('train')
color_image = nextInput.data.rgb
depth_image2 = nextInput.data.depth
target_image = nextInput.data.labels
input = network0:forward(color_image) -- process RGB
inputd2 = networkd:forward(depth_image2):squeeze() -- HHA
local input_concat = torch.cat(input,inputd2,1):squeeze() -- concat RGB, HHA
collectgarbage('collect')
target = target_image:reshape(params.imWidth*params.imHeight) -- reshape target as vector
-- create closure to evaluate f(X) and df/dX
local loss = 0
local feval = function(x)
-- get new parameters
if x ~= parameters then parameters:copy(x) end
collectgarbage()
-- reset gradients
gradParameters:zero()
-- f is the average of all criterions
-- evaluate function for complete mini batch
local output = network:forward(input_concat) -- run forward pass
local err = criterion:forward(output, target) -- compute loss
loss = loss + err
-- estimate df/dW
local df_do = criterion:backward(output, target)
network:backward(input_concat, df_do) -- update parameters
local _,predicted_labels = torch.max(output,2)
predicted_labels = torch.reshape(predicted_labels:squeeze():float(),params.imHeight,params.imWidth)
return err,gradParameters
end -- feval
pm('Training loss: '.. loss, 3)
_,current_loss = optim.adam(feval, parameters, optimConfig)
print ('epoch / current_loss ',ep,current_loss[1])
os.execute('cat /proc/$PPID/status | grep RSS')
collectgarbage('collect')
-- for memory leakage debugging
print ('locals')
for x, v in pairs(locals()) do
if type(v) == 'userdata' then
print(x, v:size())
end
end
print ('upvalues')
for x,v in pairs(upvalues()) do
if type(v) == 'userdata' then
print(x, v:size())
end
end
end -- ii
print(string.format('Loss: %.4f Epoch: %d grad-norm: %.4f',
current_loss[1], ep, torch.norm(parameters)/torch.norm(gradParameters)))
if (current_loss[1] ~= current_loss[1] or gradParameters ~= gradParameters) then
print ('nan loss or gradParams. quiting...')
abort()
end
-- some validation code here
end --epochs
print('Training completed')
end

As #Adam said in the comment, in_1 and in_2 variables continue to be referenced and their values can't be garbage collected. Even if you change them to be local variables, they won't be garbage collected at that point as the block in which they are defined is not closed yet.
What you can do is to set in_1 and in_2 values to nil before calling collectgarbage, which should make the previously assigned values to be unreachable and eligible for garbage collection. This will only work if there is no other variable that may be storing the same value.

+1 to Paul's answer above; but note the word "should". Almost all of the time you will be fine. However if e.g. your code gets more complicated (and you start passing memory objects around and working on them), you may find that occasionally the Lua gc may decide to hold onto a memory object just for a little bit longer than expected. But don't worry (or waste time trying to work out why), eventually all unused memory objs will be collected by the Lua gc. A garbage collector is a complicated algorithm and can appear a little non-deterministic at times.

You create global variables to store values. So this variables will be avaliable all the time. So until you rewrite values it such vars gc can not collect them.
Just make vars local and call gc out from scope.
Also first cycle of GC may just call finalizer and second one free memory.
But not sure about that. So you can try call gc twice.
function train()
do
local in = loadInput()
local in_1 = someImageProcessing(in)
local in_2 = someImageProcessing(in)
network:forward( ...some manipulation on in_1,in_2...)
network:backward()
end
collectgarbage('collect')
collectgarbage('collect')
print debug.getlocal -- all local variables.
PS. in is not valid variable name in Lua

Related

Why is my merge sort algorithm not working?

I am implementing the merge sort algorithm in Python. Previously, I have implemented the same algorithm in C, it works fine there, but when I implement in Python, it outputs an unsorted array.
I've already rechecked the algorithm and code, but to my knowledge the code seems to be correct.
I think the issue is related to the scope of variables in Python, but I don't have any clue for how to solve it.
from random import shuffle
# Function to merge the arrays
def merge(a,beg,mid,end):
i = beg
j = mid+1
temp = []
while(i<=mid and j<=end):
if(a[i]<a[j]):
temp.append(a[i])
i += 1
else:
temp.append(a[j])
j += 1
if(i>mid):
while(j<=end):
temp.append(a[j])
j += 1
elif(j>end):
while(i<=mid):
temp.append(a[i])
i += 1
return temp
# Function to divide the arrays recursively
def merge_sort(a,beg,end):
if(beg<end):
mid = int((beg+end)/2)
merge_sort(a,beg,mid)
merge_sort(a,mid+1,end)
a = merge(a,beg,mid,end)
return a
a = [i for i in range(10)]
shuffle(a)
n = len(a)
a = merge_sort(a, 0, n-1)
print(a)

To make it work you need to change merge_sort declaration slightly:
def merge_sort(a,beg,end):
if(beg<end):
mid = int((beg+end)/2)
merge_sort(a,beg,mid)
merge_sort(a,mid+1,end)
a[beg:end+1] = merge(a,beg,mid,end) # < this line changed
return a
Why:
temp is constructed to be no longer than end-beg+1, but a is the initial full array, if you managed to replace all of it, it'd get borked quick. Therefore we take a "slice" of a and replace values in that slice.
Why not:
Your a luckily was not getting replaced, because of Python's inner workings, that is a bit tricky to explain but I'll try.
Every variable in Python is a reference. a is a reference to a list of variables a[i], which are in turn references to a constantant in memory.
When you pass a to a function it makes a new local variable a that points to the same list of variables. That means when you reassign it as a=*** it only changes where a points. You can only pass changes outside either via "slices" or via return statement
Why "slices" work:
Slices are tricky. As I said a points to an array of other variables (basically a[i]), that in turn are references to a constant data in memory, and when you reassign a slice it goes trough the slice element by element and changes where those individual variables are pointing, but as a inside and outside are still pointing to same old elements the changes go through.
Hope it makes sense.

You don't use the results of the recursive merges, so you essentially report the result of the merge of the two unsorted halves.

Python 3.6: Memory address of a value vs Memory address of a variable

I am currently using python 3.6, and I was playing around with the id() function.
When I run the following code in IDLE,
x = 1
print(id(x), id(1))
The two memory addresses are the same. (1499456272 for me) My understanding is the integer 1, which is an object, has a memory address, and when the object is assigned to x, the variable gains the same memory address of the object. (not sure if this is correct)
When I replicate the above code using a string, for instance
s = "a"
print(id(s), id("a"))
I also get two memory addresses which are the same. Again, my current reasoning for why this occurs is the same as above.
However, when I try this using lists, I don't get the same memory address. For example,
l = [1]
print(id(l), id([1]))
gives me 1499456272 and 67146456.
Can anyone explain to me why this occurs? Perhaps my current reasoning for why ints and strings have the same memory address is flawed. Thanks :D

cPython interns all integers from -5 to 256 as well as string literals. This means whenever you get such a value, Python knows it has a copie of it in memory and returns the same object.
Although, the way this happens is different for both types.
For integers, those values are always interned, allowing the process to be dynamic.
On the other hand, string interning happens at compilation and thus is specific to string literals.
We can do some experiment with is which is equivalent to comparing id of live objects.
x = 1 + 1
y = 3 - 1
x is y # True
x = 'b'
y = 'b'
x is y # True
x = 257
y = 257
x is y # False
x = 'ab'.replace('a', '')
y = 'b'
x is y # False
Although, this is not the case for objects of other types, such as list, namely because they are mutable, so you absolutely would not want the same object to be returned.
[] is [] # False
Although, the bottom line is that this is an optimisation implementation and you should not rely on it in your code. In general, assume that different expressions return different objects, for the above are exceptions.

Non blocking reads with Julia

I would like to read an user input without blocking the main thread, much like the getch() function from conio.h. Is it possible in Julia?
I tried with #async but it looked like my input wasn't being read although the main thread wasn't blocked.

The problem, I believe, is either you are running on global scope which makes #async create its own local variables (when it reads, it reads into a variable in another scope) or you are using an old version of Julia.
The following examples read an integer from STDIN in a non-blocking fashion.
function foo()
a = 0
#async a = parse(Int64, readline())
println("See, it is not blocking!")
while (a == 0)
print("")
end
println(a)
end
The following two examples do the job in global scope, using an array. You can do the same trick with other types mutable objects.
Array example:
function nonblocking_readInt()
arr = [0]
#async arr[1] = parse(Int64, readline())
arr
end
r = nonblocking_readInt() # is an array
println("See, it is not blocking!")
while(r[1] == 0) # sentinel value check
print("")
end
println(r[1])

How to use metaprogramming with function args?

its my second Day learning and experiment with Julia. Although I read the Documantation concerning Metaprogramming carefully (but maybe not carefully enough) and several simular threads. I still can't figure out how I can use it inside a function.
I tryed to make following function for simulation of some data more flexible:
using Distributions
function gendata(N,NLATENT,NITEMS)
latent = repeat(rand(Normal(6,2),N,NLATENT), inner=(1,NITEMS))
errors = rand(Normal(0,1),N,NLATENT*NITEMS)
x = latent+errors
end
By doing this:
using Distributions
function gendata(N,NLATENT,NITEMS,LATENT_DIST="Normal(0,1)",ERRORS_DIST="Normal(0,1)")
to_eval_latent = parse("latent = repeat(rand($LATENT_DIST,N,NLATENT), inner=(1,NITEMS))")
eval(to_eval_latent)
to_eval_errors = parse("error = rand($ERRORS_DIST,N,NLATENT*NITEMS)")
eval(to_eval_errors)
x = latent+errors
end
But since eval don't work on the local scope it dont work. What can I do to work arround this?
Also the originally function, don't seem to be that fast, did I make any major mistakes concerning perfomance?
I really appriciate any recommandation.
Thanks in advance.

There is no need to use eval there, you can retain the same flexibility by passing the distribution types as keyword args (or named args with default values). Parsing and eval'ing "stringly-typed" arguments will often defeat optimizations and should be avoided.
function gendata(N,NLATENT,NITEMS; LATENT_DIST=Normal(0,1),ERRORS_DIST=Normal(0,1))
latent = repeat(rand(LATENT_DIST,N,NLATENT), inner=(1,NITEMS))
errors = rand(ERRORS_DIST,N,NLATENT*NITEMS)
x = latent+errors
end
julia> gendata(10,2,3, LATENT_DIST=Pareto(.3))
...
julia> gendata(10,2,3, ERRORS_DIST=Gamma(.6))
...
etc.

You're not really supposed to use eval here (slower, won't produce type information, will interfere with compilation, etc) but in case you're trying to understand what went wrong, here's how you would do it:
Either separate it from the rest of the code:
function gendata(N,NLATENT,NITEMS,LDIST_EX="Normal(0,1)",EDIST_EX="Normal(0,1)")
# Eval your expressions separately
LATENT_DIST = eval(parse(LDIST_EX))
ERRORS_DIST = eval(parse(EDIST_EX))
# Do your thing
latent = repeat(rand(LATENT_DIST,N,NLATENT), inner=(1,NITEMS))
errors = rand(ERROR_DIST,N,NLATENT*NITEMS)
x = latent+errors
end
Or use interpolation with quoted expressions:
function gendata(N,NLATENT,NITEMS,LDIST_EX="Normal(0,1)",EDIST_EX="Normal(0,1)")
# Obtain expression objects
LATENT_DIST = parse(LDIST_EX)
ERRORS_DIST = parse(EDIST_EX)
# Eval but interpolate in everything that's local to the function
# And you can't introduce local variables with eval so keep them
# out of it.
latent = eval( :(repeat(rand($LATENT_DIST,$N,$NLATENT), inner=(1,$NITEMS))) )
errors = eval( :(rand($ERRORS_DIST, $N, $NLATENT*$NITEMS)) )
x = latent+errors
end
You can also use a single eval with a let block to introduce a self-contained scope:
function gendata(N,NLATENT,NITEMS,LDIST_EX="Normal(0,1)",EDIST_EX="Normal(0,1)")
LATENT_DIST = parse(LDIST_EX)
ERRORS_DIST = parse(EDIST_EX)
x =
#eval let
latent = repeat(rand($LATENT_DIST,$N,$NLATENT), inner=(1,$NITEMS))
errors = (rand($ERRORS_DIST, $N, $NLATENT*$NITEMS))
latent+errors
end
end
((#eval x) == eval(:(x)))
Well, hope you understand the eval thing a little better. Day two I mean, you should be experimenting ;)

Why is MATLAB job taking a long time running?

I have a function (a convolution) which can get very slow if it operates on matrices of many many columns (function code below). I hence want to parallelize the code.
Example MATLAB code:
x = zeros(1,100);
x(rand(1,100)>0.8) = 1;
x = x(:);
c = convContinuous(1:100,x,#(t,p)p(1)*exp(-(t-p(2)).*(t-p(2))./(2*p(3).*p(3))),[1,0,3],false)
plot(1:100,x,1:100,c)
if x is a matrix of many columns, the code gets very slow... My first attempt was to change for to parfor statement, but it went wrong (see Concluding remarks below).
My second attempt was to follow this example, which shows how to schedule tasks in a job and then submit the job to a local server. That example is implemented in my function below by letting the last argument isParallel being true.
The example MATLAB code would be:
x = zeros(1,100);
x(rand(1,100)>0.8) = 1;
x = x(:);
c = convContinuous(1:100,x,#(t,p)p(1)*exp(-(t-p(2)).*(t-p(2))./(2*p(3).*p(3))),[1,0,3],true)
Now, MATLAB tells me:
Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.
Warning: This job will remain queued until the Parallel Pool is closed.
And MATLAB terminal keeps on hold, waiting for something to finish. I then open Jobs Monitor by Home -> Parallel -> Monitor jobs and see there are two jobs, one of which has the state running. But none of them will ever finish.
Questions
Why is it taking too long to run, given it is a really simple task?
What would be the best way to parallelize my function below? (the "heavy" part is in the separated function convolveSeries)
File convContinuous.m
function res = convContinuous(tData, sData, smoothFun, par, isParallel)
% performs the convolution of a series of delta with a smooth function of parameters par
% tData = temporal space
% sData = matrix of delta series (each column is a different series that will be convolved with smoothFunc)
% smoothFun = function used to convolve with each column of sData
% must be of the form smoothFun(t, par)
% par = parameters to smoothing function
if nargin < 5 || isempty(isParallel)
isParallel = false;
end
if isvector(sData)
[mm,nn] = size(sData);
sData = sData(:);
end
res = zeros(size(sData));
[ ~, n ] = size(sData);
if ~isParallel
%parfor i = 1:n % uncomment this and comment line below for strange error
for i = 1:n
res(:,i) = convolveSeries(tData, sData(:,i), smoothFun, par);
end
else
myPool = gcp; % creates parallel pool if needed
sched = parcluster; % creates scheduler
job = createJob(sched);
task = cell(1,n);
for i = 1:n
task{i} = createTask(job, #convolveSeries, 1, {tData, sData(:,i), smoothFun, par});
end
submit(job);
wait(job);
jobRes = fetchOutputs(job);
for i = 1:n
res(:,i) = jobRes{i,1}(:);
end
delete(job);
end
if isvector(sData)
res = reshape(res, mm, nn);
end
end
function r = convolveSeries(tData, s, smoothFun, par)
r = zeros(size(s));
tSpk = s == 1;
j = 1;
for t = tData
for tt = tData(tSpk)
if (tt > t)
break;
end
r(j) = r(j) + smoothFun(t - tt, par);
end
j = j + 1;
end
end
Concluding remarks
As a side note, I was not able to do it using parfor because MATLAB R2015a gave me a strange error:
Error using matlabpool (line 27)
matlabpool has been removed.
To query the size of an already started parallel pool, query the 'NumWorkers' property of the pool.
To check if a pool is already started use 'isempty(gcp('nocreate'))'.
Error in parallel_function (line 317)
Nworkers = matlabpool('size');
Error in convContinuous (line 18)
parfor i = 1:n
My version command outputs
Parallel Computing Toolbox Version 6.6 (R2015a)
which is compatible with my MATLAB version. Almost all other tests I have done are OK. I am then compelled to think that this is a MATLAB bug.
I tried changing matlabpool to gcp and then retrieving the number of workers by parPoolObj.NumWorkers, and after altering this detail in two different built-in functions, I received another error:
Error in convContinuous>makeF%1/F% (line 1)
function res = convContinuous(tData, sData, smoothFun, par)
Output argument "res" (and maybe others) not assigned during call to "convContinuous>makeF%1/F%".
Error in parallel_function>iParFun (line 383)
output.data = processInfo.fun(input.base, input.limit, input.data);
Error in parProcess (line 167)
data = processFunc(processInfo, data);
Error in parallel_function (line 358)
stateInfo = parProcess(#iParFun, #iConsume, #iSupply, ...
Error in convContinuous (line 14)
parfor i = 1:numel(sData(1,:))
I suspect that this last error is generated because the function call inside parfor loop requires many arguments, but I don't really know it.
Solving the errors
Thanks to wary comments of people here (saying they could not reproduce my errors), I went on looking for the source of the error. I realized it was a local error due to having pforfun in my pathdef.m which I downloaded long ago from File Exchange.
Once I removed pforfun from my pathdef.m, parfor (line 18 in convContinuous function) started working well.
Thank you in advance!

The parallel pool you created is blocking your job from running. When you are using the jobs and tasks API you do not need (and must not have) a pool open. When you looked in Job Monitor, the running job you saw was the job that backs the parallel pool, that only finishes when the pool is deleted.
If you delete the line in convContinuous that says myPool = gcp, then it should work. As an optimization you can use the vectorised form of createTask, which is much more efficient than creating tasks in a loop i.e.
inputCell = cell(1, n);
for i = 1:n
inputCell{i} = {tData, sData(:,i), smoothFun, par};
end
task = createTask(job, #convolveSeries, 1, inputCell);
However, having said all that, you should be able to make this code work using parfor. The first error you encountered was due to matlabpool being removed, it has now been replaced by parpool.
The second error appears to be caused by your function not returning the correct outputs, but the error message does not appear to correspond to the code you posted, so I'm not sure. Specifically I don't know what convContinuous>makeF%1/F% (line 1) refers to.

Thanks to wary comments of people here (saying they could not reproduce my errors), I went on looking for the source of the error. I realized it was a local error due to having pforfun in my pathdef.m which I downloaded long ago from File Exchange.
Once I removed pforfun from my pathdef.m, parfor (line 18 in convContinuous function) started working well.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Torch: collectgarbage() not deallocating memory of torch tensors - memory-leaks

Related

Why is my merge sort algorithm not working?

Python 3.6: Memory address of a value vs Memory address of a variable

Non blocking reads with Julia

How to use metaprogramming with function args?

Why is MATLAB job taking a long time running?

Categories

Resources