Corona SDK / Lua / memory leak - memory-leaks

So I have this rain module for a game that I am developing, which is causing a massive system memory leak, which leads to lag and ultimately crash of the application.
The function "t.start" is called with a timer every 50 ms.
Though I I've tried I can't really find the cause for this! Maybe I am overlooking something but I can't help it. As you see I niled out the graphics related locals...Does anyone notice something?
As a secondary issue : Does anyone have tips on preloading next scene for a smooth scene change? Because the loading itself is causing a short freeze when I put it in "scene:show()"...
Thanks for your help!
Greetings, Nils
local t = {}
local composer = require("composer")
t.drops = {}
function t.fall(drops, group)
for i = 1, #drops, 1 do
local thisDrop = drops[i]
function thisDrop:enterFrame()
if aboutToBeDestroyed == true then
Runtime:removeEventListener("enterFrame", self)
return true
end
local randomY = math.random(32, 64)
if self.x ~= nil then
self:translate(0, randomY)
if self.y > 2000 then
self:removeSelf()
Runtime:removeEventListener("enterFrame", self)
self = nil
end
end
end
Runtime:addEventListener("enterFrame", drops[i])
thisDrop = nil
end
end
t.clean = function()
for i = 1, #t.drops, 1 do
if t.drops[i] ~= nil then
table.remove(t.drops, i)
t.drops[i] = nil
end
end
end
function t.start(group)
local drops = {}
local theGroup = group
for i = 1, 20, 1 do
local randomWidth = math.random(5, 30)
local dropV = display.newRect(group, 1, 1, randomWidth, 30)
local drop1 = display.newSnapshot(dropV.contentWidth , dropV.contentHeight * 3)
drop1.canvas:insert(dropV)
drop1.fill.effect = "filter.blurVertical"
drop1.fill.effect.blurSize = 30
drop1.fill.effect.sigma = 140
drop1:invalidate("canvas")
drop1:scale(0.75, 90)
drop1:invalidate("canvas")
drop1:scale(1, 1 / 60)
drop1:invalidate("canvas")
local drop = display.newSnapshot(drop1.contentWidth * 1.5, drop1.contentHeight)
drop.canvas:insert(drop1)
drop.fill.effect = "filter.blurHorizontal"
drop.fill.effect.blurSize = 10
drop:invalidate("canvas")
drop.alpha = 0.375
local randomY = math.random(-500, 500)
drop.y = randomY
drop.anchorY = 0
drop.x = (i - 1) * 54
drops[i] = drop
table.insert(t.drops, drop)
local dropV, drop1, drop = nil
end
composer.setVariable("drops", t.drops)
t.fall(drops, group)
drops = nil
t.clean()
end
return t
EDIT : I found out that it definitely has something to do with the nested snapshots, which are created for the purpose of applying filter effects. I removed one snapshot, so that I only have a vector object inside a snapshot and voila : memory increases way slower. The question is : why?

Generally, you don't need enterFrame event at all - you can simply do transition from start point (math.random(-500, 500)) to end point (2000 in your code). Just randomise speed and use onComplete handler to remove object
local targetY = 2000
local speedPerMs = math.random(32, 64) * 60 / 1000
local timeToTravel = (targetY - randomY) / speedPerMs
transition.to( drop, {
time = timeToTravel,
x = xx,
y = targetY,
onComplete = function()
drop:removeSelf()
end
} )
Edit 1: I found that with your code removing drop is not enough. This works for me:
drop:removeSelf()
dropV:removeSelf()
drop1:removeSelf()
Some ideas about memory consumption:
1) Probably you can use 1 enterFrame handler for array of drops - this will reduce memory consumption. Also don't add methods to local objects like 'function thisDrop:enterFrame()' - this is not optimal here, because you creating 20 new functions every 50 ms
2) Your code creates 400 'drop' objects every second and they usually live no more than ~78 frames (means 1.3 sec in 60fps environment). Better to use pool of objects and reuse existing objects
3) enterFrame function depends on current fps of device, so your rain will be slower with low fps. Low fps -> objects falls slower -> more objects on scene -> fps go down. I suggest you to calculate deltaTime between 2 enterFrame calls and ajust falling speed according to deltaTime
Edit 2 Seems like :removeSelf() for snapshot didn't remove child object. I modified your code and memory consumption drops a lot
if self.y > 2000 then
local drop1 = self.group[1]
local dropV = drop1.group[1]
dropV:removeSelf()
drop1:removeSelf()
self:removeSelf()
Runtime:removeEventListener("enterFrame", self)
self = nil
end

Related

Drawing lines in love2d crashes program

I'm trying to draw a hexagonal grid with love2d using love.graphics.line but after drawing around 15000 lines the program crashes with only one message abort (core dumped).
function love.load()
ww, wh = love.graphics.getDimensions()
hexradius = 10
size = 50
hexgrid = newGrid(hexradius, size, size)
end
function love.draw(dt)
drawgrid()
end
function drawgrid()
local jxOffset = hexgrid.rad * -math.tan(math.pi/1.5)
local ixOffset = jxOffset/4
local iyOffet = jxOffset * math.sin(math.pi/3)
for i=1,hexgrid.size.x do
for j=1,hexgrid.size.y do
love.graphics.push()
love.graphics.translate(ixOffset + j * jxOffset, i * iyOffet)
love.graphics.line(
hexgrid.hex[1].x, hexgrid.hex[1].y,
hexgrid.hex[2].x, hexgrid.hex[2].y,
hexgrid.hex[3].x, hexgrid.hex[3].y,
hexgrid.hex[4].x, hexgrid.hex[4].y,
hexgrid.hex[5].x, hexgrid.hex[5].y,
hexgrid.hex[6].x, hexgrid.hex[6].y,
hexgrid.hex[1].x, hexgrid.hex[1].y)
love.graphics.pop()
end
ixOffset = -ixOffset
end
end
function newGrid(rad, xsize, ysize)
local g = {
rad = rad,
hex = {},
size = {
x = xsize,
y = ysize,
},
}
for i=1,6 do
local dir = math.pi/3 * (i+0.5)
g.hex[i] = {}
g.hex[i].x = g.rad * math.cos(dir)
g.hex[i].y = g.rad * math.sin(dir)
end
return g
end
I'm quite new to love2d and graphical stuff in general so maybe I'm trying to draw a great amount of lines and it's just not possible, but the grid I'm generating does not appear to be that big. I'm using LOVE 11.3.
Thanks in advance!
I'd try drawing with a polygon instead, see if you have better results.
love.graphics.polygon( mode, vertices )
https://love2d.org/wiki/love.graphics.polygon
for i = 1, hexgrid.size.x do
for j = 1, hexgrid.size.y do
love.graphics.push()
love.graphics.translate( ixOffset +j *jxOffset, i *iyOffet )
love.graphics.polygon( 'line',
hexgrid.hex[1].x, hexgrid.hex[1].y,
hexgrid.hex[2].x, hexgrid.hex[2].y,
hexgrid.hex[3].x, hexgrid.hex[3].y,
hexgrid.hex[4].x, hexgrid.hex[4].y,
hexgrid.hex[5].x, hexgrid.hex[5].y,
hexgrid.hex[6].x, hexgrid.hex[6].y )
love.graphics.pop()
end
...
Alternatively, circles drawn with optional segment parameter set # 6, draws hexagons.
love.graphics.circle( mode, x, y, radius, segments )
https://love2d.org/wiki/love.graphics.circle
for i = 1, hexgrid.size.x do
for j = 1, hexgrid.size.y do
love.graphics.circle( 'line', ixOffset +j *jxOffset, i *iyOffet, hexradius, 6 )
end
...
As Luther suggested in a comment it's a bug in Löve. It's already fixed and will be available in the next release Löve 11.4.
Thanks!

Matrix multiplication is slower when multithreading in Julia

I am working with big matrices (size of 30k rows and ~100 columns). I am doing some matrix multiplication and the process would take around 20 seconds. This is my code:
#time begin
result = -1
data = -1
for i=1:size
first_matrix = #view data[i * split,:]
for j=1:size
second_matrix = #view Qg[j * split,:]
matrix_multiplication = first_matrix * second_matrix'
current_sum = sum(matrix_multiplication)
global result
if current_sum > result
result = current_sum
data = matrix_multiplication[1,1]
end
end
end
end
Trying to optimize this a little more, I tried to use multi-threading (julia --thread 4) to get better performance.
#time begin
global result = -1
global data = -1
lock = ReentrantLock()
for i=1:size
first_matrix = #view data[i * split,:]
Threads.#threads for j=1:size
second_matrix = #view Qg[j * split,:]
matrix_multiplication = first_matrix * second_matrix'
current_sum = sum(matrix_multiplication)
global result
if current_sum > result
lock(lock)
result = current_sum
data = matrix_multiplication[1,1]
unlock(lock)
end
end
end
end
By adding multi-threading I thought I would get an increase in performance, but the performance got worse (~40 seconds). I removed the lock to see if that was the issue, but still got the same performance. I am running this on a Dual-Core Intel Core i5 (MacBook pro). Does anyone know why my multi-threading code doesn't work?

How to create a watchdog on a program in python?

I want to know is it even possible to create a watchdog on a program,
I am trying to do Discrete event simulation to simulate a functioning machine,
the problem is, once I inspect my machine at let's say time = 12 (inspection duration is 2 hours lets say) if the event failure is at 13-time units) there is no way that it can be because I am "busy inspecting"
so is there a sort of "watchdog" to constantly test if the value of a variable reached a certain limit to stop doing what the program is doing,
Here is my inspection program
def machine_inspection(tt, R, Dpmi, Dinv, FA, TRF0, Tswitch, Trfn):
End = 0
TPM = 0
logging.debug(' cycle time %f' % tt)
TRF0 = TRF0 - Dinv
Tswitch = Tswitch - Dinv
Trfn = Trfn - Dinv
if R == 0:
if falsealarm == 1:
FA += 1
else:
tt = tt + Dpmi
TPM = 1
End = 1
return (tt, End, R, TPM, FA, TRF0, Trfn, Tswitch)
Thank you very much!
basically you can't be inspecting during x time if tt + x will be superior to the time to failure TRF0 or Trfn

Sleep a long running process

I have an app that iterates over tens of thousands of records using various enumerators (such as directory enumerators)
I am seeing OS X saying my process is "Caught burning CPU" since its taking a large amount of CPU in doing so.
What I would like to do is build in a "pressure valve" such as a
[NSThread sleepForTimeInterval:cpuDelay];
that does not block other processes/threads on things like a dual core machine.
My processing is happening on a separate thread, but I can't break out of and re-enter the enumerator loop and use NSTimers to allow the machine to "breathe"
Any suggestions - should [NSThread sleepForTimeInterval:cpuDelay]; be working?
I run this stuff inside a dispatch queue:
if(!primaryTask)primaryTask=dispatch_queue_create( "com.me.app.task1",backgroundPriorityAttr);
dispatch_async(primaryTask,^{
[self doSync];
});
Try wrapping your processing in NSOperation and set lower QoS priority. Here is a little more information:
http://nshipster.com/nsoperation/
Here is code example I made up. Operation is triggered in view load event:
let processingQueue = NSOperationQueue()
override func viewDidLoad() {
let backgroundOperation = NSBlockOperation {
//My very long and intesive processing
let nPoints = 100000000000
var nPointsInside = 0
for _ in 1...nPoints {
let (x, y) = (drand48() * 2 - 1, drand48() * 2 - 1)
if x * x + y * y <= 1 {
nPointsInside += 1
}
}
let _ = 4.0 * Double(nPointsInside) / Double(nPoints)
}
backgroundOperation.queuePriority = .Low
backgroundOperation.qualityOfService = .Background
processingQueue.addOperation(backgroundOperation)
}

Torch out of memory in thread when using torch.serialize twice

I'm trying to add a parallel dataloader to the torch-dataframe in order to add torchnet compatibility. I've used the tnt.ParallelDatasetIterator and changed it so that:
A basic batch is loaded outside the threads
The batch is serialized and sent to the thread
In the thread the batch is deserialized and converts the batch data to tensors
The tensors are returned in a table that has the input and target keys in order to match the tnt.Engine setup.
The problem occurs the second time the enque is called with an error: .../torch_distro/install/bin/luajit: not enough memory. I'm currently only working with mnist with an adapted mnist-example. The enque loop now looks like this (with debugging memory output):
-- `samplePlaceholder` stands in for samples which have been
-- filtered out by the `filter` function
local samplePlaceholder = {}
-- The enque does the main loop
local idx = 1
local function enqueue()
while idx <= size and threads:acceptsjob() do
local batch, reset = self.dataset:get_batch(batch_size)
if (reset) then
idx = size + 1
else
idx = idx + 1
end
if (batch) then
local serialized_batch = torch.serialize(batch)
-- In the parallel section only the to_tensor is run in parallel
-- this should though be the computationally expensive operation
threads:addjob(
function(argList)
io.stderr:write("\n Start");
io.stderr:write("\n 1: " ..tostring(collectgarbage("count")))
local origIdx, serialized_batch, samplePlaceholder = unpack(argList)
io.stderr:write("\n 2: " ..tostring(collectgarbage("count")))
local batch = torch.deserialize(serialized_batch)
serialized_batch = nil
collectgarbage()
collectgarbage()
io.stderr:write("\n 3: " .. tostring(collectgarbage("count")))
batch = transform(batch)
io.stderr:write("\n 4: " .. tostring(collectgarbage("count")))
local sample = samplePlaceholder
if (filter(batch)) then
sample = {}
sample.input, sample.target = batch:to_tensor()
end
io.stderr:write("\n 5: " ..tostring(collectgarbage("count")))
collectgarbage()
collectgarbage()
io.stderr:write("\n 6: " ..tostring(collectgarbage("count")))
io.stderr:write("\n End \n");
return {
sample,
origIdx
}
end,
function(argList)
sample, sampleOrigIdx = unpack(argList)
end,
{idx, serialized_batch, samplePlaceholder}
)
end
end
end
I've sprinkled collectgarbage and also tried to remove any objects not needed. The memory output is rather straight forward:
Start
1: 374840.87695312
2: 374840.94433594
3: 372023.79101562
4: 372023.85839844
5: 372075.41308594
6: 372023.73632812
End
The function that loops the enque is the non-ordered function that is trivial (the memory error is thrown at the second enque and the ):
iterFunction = function()
while threads:hasjob() do
enqueue()
threads:dojob()
if threads:haserror() then
threads:synchronize()
end
enqueue()
if table.exact_length(sample) > 0 then
return sample
end
end
end
So the problem was the torch.serialize where the function in the set-up coupled the entire dataset to the function. When adding:
serialized_batch = nil
collectgarbage()
collectgarbage()
The problem was resolved. I further wanted to know what was taking up so much space and the culprit turned out to be that I had defined the function in an environment with a large dataset that got intertwined with the function, massively increasing the size. Here the original definition of the data local
mnist = require 'mnist'
local dataset = mnist[mode .. 'dataset']()
-- PROBLEMATIC LINE BELOW --
local ext_resource = dataset.data:reshape(dataset.data:size(1),
dataset.data:size(2) * dataset.data:size(3)):double()
-- Create a Dataframe with the label. The actual images will be loaded
-- as an external resource
local df = Dataframe(
Df_Dict{
label = dataset.label:totable(),
row_id = torch.range(1, dataset.data:size(1)):totable()
})
-- Since the mnist package already has taken care of the data
-- splitting we create a single subsetter
df:create_subsets{
subsets = Df_Dict{core = 1},
class_args = Df_Tbl({
batch_args = Df_Tbl({
label = Df_Array("label"),
data = function(row)
return ext_resource[row.row_id]
end
})
})
}
it turns out that removing the line that I highlighted reduces the memory usage from 358 Mb down to 0.0008 Mb! The code that I used for testing the performance was:
local mem = {}
table.insert(mem, collectgarbage("count"))
local ser_data = torch.serialize(batch.dataset)
table.insert(mem, collectgarbage("count"))
local ser_retriever = torch.serialize(batch.batchframe_defaults.data)
table.insert(mem, collectgarbage("count"))
local ser_raw_retriever = torch.serialize(function(row)
return ext_resource[row.row_id]
end)
table.insert(mem, collectgarbage("count"))
local serialized_batch = torch.serialize(batch)
table.insert(mem, collectgarbage("count"))
for i=2,#mem do
print(i-1, (mem[i] - mem[i-1])/1024)
end
Which produced originally the output:
1 0.0082607269287109
2 358.23344707489
3 0.0017471313476562
4 358.90182781219
and after the fix:
1 0.0094480514526367
2 0.00080204010009766
3 0.00090408325195312
4 0.010146141052246
I tried using the setfenv for the function but it didn't resolve the issue. There is still a performance penalty for sending the serialized data to the thread but the main problem is resolved and without the expensive data retriever the function is considerably smaller.

Resources