I have an app that iterates over tens of thousands of records using various enumerators (such as directory enumerators)
I am seeing OS X saying my process is "Caught burning CPU" since its taking a large amount of CPU in doing so.
What I would like to do is build in a "pressure valve" such as a
[NSThread sleepForTimeInterval:cpuDelay];
that does not block other processes/threads on things like a dual core machine.
My processing is happening on a separate thread, but I can't break out of and re-enter the enumerator loop and use NSTimers to allow the machine to "breathe"
Any suggestions - should [NSThread sleepForTimeInterval:cpuDelay]; be working?
I run this stuff inside a dispatch queue:
if(!primaryTask)primaryTask=dispatch_queue_create( "com.me.app.task1",backgroundPriorityAttr);
dispatch_async(primaryTask,^{
[self doSync];
});
Try wrapping your processing in NSOperation and set lower QoS priority. Here is a little more information:
http://nshipster.com/nsoperation/
Here is code example I made up. Operation is triggered in view load event:
let processingQueue = NSOperationQueue()
override func viewDidLoad() {
let backgroundOperation = NSBlockOperation {
//My very long and intesive processing
let nPoints = 100000000000
var nPointsInside = 0
for _ in 1...nPoints {
let (x, y) = (drand48() * 2 - 1, drand48() * 2 - 1)
if x * x + y * y <= 1 {
nPointsInside += 1
}
}
let _ = 4.0 * Double(nPointsInside) / Double(nPoints)
}
backgroundOperation.queuePriority = .Low
backgroundOperation.qualityOfService = .Background
processingQueue.addOperation(backgroundOperation)
}
Related
I'm trying to better understand lock-free programming:
Suppose we have two threads in a data race:
// Thread 1
x = 1
// Thread 2
x = 2
Is there a lock-free way a third thread can know the result of the race without being able to read x?
Suppose thread 3 consumes a lock-free queue, and the code is:
// Thread 1
x = 1
queue.push(1)
// Thread 2
x = 2
queue.push(2)
Then the operations could be ordered as:
x = 1
x = 2
queue.push(1)
queue.push(2)
or
x = 1
x = 2
queue.push(2)
queue.push(1)
So having a lock-free queue alone would not suffice for thread 3 to know the value of x after the race.
If you know the value of x before the race began, the following code using atomic Read-Modify-Write operations should do the job.
// Notes:
// x == 0
// x and winner are both atomic
// atomic_swap swaps the content of two variables atomically,
// meaning, that no other thread can interfere with this operation
//thread-1:
t = 1;
atomic_swap(x, t);
if (t != 0) {
//x was non zero, when thread-1 called the swap operation
//--> thread-2 was faster
winner = 1;
}
//thread-2
t = 2;
atomic_swap(x, t);
if (t != 0) {
//x was non zero, when thread-2 called the swap operation
//--> thread-1 was faster
winner = 2;
}
//thread-3
while (winner == 0) {}
print("Winner is " + winner);
So I have this rain module for a game that I am developing, which is causing a massive system memory leak, which leads to lag and ultimately crash of the application.
The function "t.start" is called with a timer every 50 ms.
Though I I've tried I can't really find the cause for this! Maybe I am overlooking something but I can't help it. As you see I niled out the graphics related locals...Does anyone notice something?
As a secondary issue : Does anyone have tips on preloading next scene for a smooth scene change? Because the loading itself is causing a short freeze when I put it in "scene:show()"...
Thanks for your help!
Greetings, Nils
local t = {}
local composer = require("composer")
t.drops = {}
function t.fall(drops, group)
for i = 1, #drops, 1 do
local thisDrop = drops[i]
function thisDrop:enterFrame()
if aboutToBeDestroyed == true then
Runtime:removeEventListener("enterFrame", self)
return true
end
local randomY = math.random(32, 64)
if self.x ~= nil then
self:translate(0, randomY)
if self.y > 2000 then
self:removeSelf()
Runtime:removeEventListener("enterFrame", self)
self = nil
end
end
end
Runtime:addEventListener("enterFrame", drops[i])
thisDrop = nil
end
end
t.clean = function()
for i = 1, #t.drops, 1 do
if t.drops[i] ~= nil then
table.remove(t.drops, i)
t.drops[i] = nil
end
end
end
function t.start(group)
local drops = {}
local theGroup = group
for i = 1, 20, 1 do
local randomWidth = math.random(5, 30)
local dropV = display.newRect(group, 1, 1, randomWidth, 30)
local drop1 = display.newSnapshot(dropV.contentWidth , dropV.contentHeight * 3)
drop1.canvas:insert(dropV)
drop1.fill.effect = "filter.blurVertical"
drop1.fill.effect.blurSize = 30
drop1.fill.effect.sigma = 140
drop1:invalidate("canvas")
drop1:scale(0.75, 90)
drop1:invalidate("canvas")
drop1:scale(1, 1 / 60)
drop1:invalidate("canvas")
local drop = display.newSnapshot(drop1.contentWidth * 1.5, drop1.contentHeight)
drop.canvas:insert(drop1)
drop.fill.effect = "filter.blurHorizontal"
drop.fill.effect.blurSize = 10
drop:invalidate("canvas")
drop.alpha = 0.375
local randomY = math.random(-500, 500)
drop.y = randomY
drop.anchorY = 0
drop.x = (i - 1) * 54
drops[i] = drop
table.insert(t.drops, drop)
local dropV, drop1, drop = nil
end
composer.setVariable("drops", t.drops)
t.fall(drops, group)
drops = nil
t.clean()
end
return t
EDIT : I found out that it definitely has something to do with the nested snapshots, which are created for the purpose of applying filter effects. I removed one snapshot, so that I only have a vector object inside a snapshot and voila : memory increases way slower. The question is : why?
Generally, you don't need enterFrame event at all - you can simply do transition from start point (math.random(-500, 500)) to end point (2000 in your code). Just randomise speed and use onComplete handler to remove object
local targetY = 2000
local speedPerMs = math.random(32, 64) * 60 / 1000
local timeToTravel = (targetY - randomY) / speedPerMs
transition.to( drop, {
time = timeToTravel,
x = xx,
y = targetY,
onComplete = function()
drop:removeSelf()
end
} )
Edit 1: I found that with your code removing drop is not enough. This works for me:
drop:removeSelf()
dropV:removeSelf()
drop1:removeSelf()
Some ideas about memory consumption:
1) Probably you can use 1 enterFrame handler for array of drops - this will reduce memory consumption. Also don't add methods to local objects like 'function thisDrop:enterFrame()' - this is not optimal here, because you creating 20 new functions every 50 ms
2) Your code creates 400 'drop' objects every second and they usually live no more than ~78 frames (means 1.3 sec in 60fps environment). Better to use pool of objects and reuse existing objects
3) enterFrame function depends on current fps of device, so your rain will be slower with low fps. Low fps -> objects falls slower -> more objects on scene -> fps go down. I suggest you to calculate deltaTime between 2 enterFrame calls and ajust falling speed according to deltaTime
Edit 2 Seems like :removeSelf() for snapshot didn't remove child object. I modified your code and memory consumption drops a lot
if self.y > 2000 then
local drop1 = self.group[1]
local dropV = drop1.group[1]
dropV:removeSelf()
drop1:removeSelf()
self:removeSelf()
Runtime:removeEventListener("enterFrame", self)
self = nil
end
Basically, I want to change the following into a limited threading solution, because in my situation the list of calculations is too large, spawning too many threads, and I'd like to experiment and measure performance with less threads.
// the trivial approach (and largely my current situation)
let doWork() =
[1 .. 10]
|> List.map (fun i -> async {
do! Async.Sleep (100 * i) // longest thread will run 1 sec
return i * i // some complex calculation returning a certain type
})
|> Async.Parallel
|> Async.RunSynchronously // works, total wall time 1s
My new approach, this code is borrowed/inspired by this online snippet from Tomas Petricek (which I tested, it works, but I need it to return a value, not unit).
type LimitAgentMessage =
| Start of Async<int> * AsyncReplyChannel<int>
| Finished
let threadingLimitAgent limit = MailboxProcessor.Start(fun inbox -> async {
let queue = System.Collections.Generic.Queue<_>()
let count = ref 0
while true do
let! msg = inbox.Receive()
match msg with
| Start (work, reply) -> queue.Enqueue((work, reply))
| Finished -> decr count
if count.Value < limit && queue.Count > 0 then
incr count
let work, reply = queue.Dequeue()
// Start it in a thread pool (on background)
Async.Start(async {
let! x = work
do! async {reply.Reply x }
inbox.Post(Finished)
})
})
// given a synchronous list of tasks, run each task asynchronously,
// return calculated values in original order
let worker lst =
// this doesn't work as expected, it waits for each reply
let agent = threadingLimitAgent 10
lst
|> List.map(fun x ->
agent.PostAndReply(
fun replyChannel -> Start(x, replyChannel)))
Now, with this in place, the original code would become:
let doWork() =
[1 .. 10]
|> List.map (fun i -> async {
do! Async.Sleep (100 * i) // longest thread will run 1 sec
return i * i // some complex calculation returning a certain type
})
|> worker // worker is not working (correct output, runs 5.5s)
All in all, the output is correct (it does calculate and propagate back the replies), but it does not do so in the (limited set) of threads.
I've been playing around a bit, but think I'm missing the obvious (and besides, who knows, someone may like the idea of a limited-threads mailbox processor that returns its calculations in order).
The problem is the call to agent.PostAndReply. PostAndReply will block until the work has finished. Calling this inside List.map will cause the work to be executed sequentially. One solution is to use PostAndAsyncReply which does not block and also returns you an async handle for getting the result back.
let worker lst =
let agent = threadingLimitAgent 10
lst
|> List.map(fun x ->
agent.PostAndAsyncReply(
fun replyChannel -> Start(x, replyChannel)))
|> Async.Parallel
let doWork() =
[1 .. 10]
|> List.map (fun i -> async {
do! Async.Sleep (100 * i)
return i * i
})
|> worker
|> Async.RunSynchronously
That's of course only one possible solution (getting all async handles back and awaiting them in parallel).
I am trying to run run some code 1 million times. I initially wrote it using Threads but this seemed clunky. I started doing some more reading and I came across ForkJoin. This seemed like exactly what I needed but I cant figure out how to translate what I have below into "scala-style". Can someone explain the best way to use ForkJoin in my code?
val l = (1 to 1000000) map {_.toLong}
println("running......be patient")
l.foreach{ x =>
if(x % 10000 == 0) println("got to: "+x)
val thread = new Thread {
override def run {
//my code (API calls) here. writes to file if call success
}
}
}
The easiest way is to use par (it will use ForkJoinPool automatically):
val l = (1 to 1000000) map {_.toLong} toList
l.par.foreach { x =>
if(x % 10000 == 0) println("got to: " + x) //will be executed in parallel way
//your code (API calls) here. will also be executed in parallel way (but in same thread with `println("got to: " + x)`)
}
Another way is to use Future:
import scala.concurrent._
import ExecutionContext.Implicits.global //import ForkJoinPool
val l = (1 to 1000000) map {_.toLong}
println("running......be patient")
l.foreach { x =>
if(x % 10000 == 0) println("got to: "+x)
Future {
//your code (API calls) here. writes to file if call success
}
}
If you need work stealing - you should mark blocking code with scala.concurrent.blocking:
Future {
scala.concurrent.blocking {
//blocking API call here
}
}
It will tell ForkJoinPool to compensate blocked thread with new one - so you can avoid thread starvation (but there is some disadvantages).
In Scala, you can use Future and Promise:
val l = (1 to 1000000) map {
_.toLong
}
println("running......be patient")
l.foreach { x =>
if (x % 10000 == 0) println("got to: " + x)
Future{
println(x)
}
}
How can I use the Timer class and timer events to turn this loop into one that executes chunks at a time?
My current method of just running the loop keeps freezing up the flash/air UI.
I'm trying to acheive psuedo multithreading. Yes, this is from wavwriter.as:
// Write to file in chunks of converted data.
while (dataInput.bytesAvailable > 0)
{
tempData.clear();
// Resampling logic variables
var minSamples:int = Math.min(dataInput.bytesAvailable/4, 8192);
var readSampleLength:int = minSamples;//Math.floor(minSamples/soundRate);
var resampleFrequency:int = 100; // Every X frames drop or add frames
var resampleFrequencyCheck:int = (soundRate-Math.floor(soundRate))*resampleFrequency;
var soundRateCeil:int = Math.ceil(soundRate);
var soundRateFloor:int = Math.floor(soundRate);
var jlen:int = 0;
var channelCount:int = (numOfChannels-inputNumChannels);
/*
trace("resampleFrequency: " + resampleFrequency + " resampleFrequencyCheck: " + resampleFrequencyCheck
+ " soundRateCeil: " + soundRateCeil + " soundRateFloor: " + soundRateFloor);
*/
var value:Number = 0;
// Assumes data is in samples of float value
for (var i:int = 0;i < readSampleLength;i+=4)
{
value = dataInput.readFloat();
// Check for sanity of float value
if (value > 1 || value < -1)
throw new Error("Audio samples not in float format");
// Special case with 8bit WAV files
if (sampleBitRate == 8)
value = (bitResolution * value) + bitResolution;
else
value = bitResolution * value;
// Resampling Logic for non-integer sampling rate conversions
jlen = (resampleFrequencyCheck > 0 && i % resampleFrequency < resampleFrequencyCheck) ? soundRateCeil : soundRateFloor;
for (var j:int = 0; j < jlen; j++)
{
writeCorrectBits(tempData, value, channelCount);
}
}
dataOutput.writeBytes(tempData);
}
}
I have once implemented pseudo multithreading in AS3 by splitting the task into chunks, instead of splitting the data into chunks.
My solution might not be optimal, but it worked nicely for me in the context of performing a large Depth-First Search while allowing the Flash game to flow nicely.
Use a variable ticks to count computation "ticks", similar to CPU clock cycles. Every time you perform some operation, you increment this counter by 1. Increment it even more after a heavier operation is performed.
In specific parts of your code, insert checkpoints where you check if ticks > threshold, where threshold is a parameter you want to tune after you have this pseudo multithreading working.
If ticks > threshold at the checkpoint, you save the current state of your task, set ticks to zero, then exit the function.
The method has to be retried later, so here you employ a Timer with an interval parameter that should also be tuned later.
When restarting the method, use the saved state of your paused task to detect where your task should be resumed.
For your specific situation, I would suggest splitting the tasks of the for loops, instead of thinking about the while loop. The idea is to interrupt the for loops, remember their state, then continue from there after the resting interval.
To simplify, imagine that we have only the outmost for loop. A sketch of the new method is:
WhileLoop: while (dataInput.bytesAvailable > 0 && ticks < threshold)
{
if(!didSubTaskA) {
// do subtask A...
ticks += 2;
didSubTaskA = true;
}
if(ticks > threshold) {
ticks = 0;
restTimer.reset();
restTimer.start(); // This dispatches an event that should trigger this method
break WhileLoop;
}
for (var i:int = next_unused_i;i < readSampleLength;i+=4) {
next_unused_i = i+1;
// do subtask B...
ticks += 1;
if(ticks > threshold) {
ticks = 0;
restTimer.reset();
restTimer.start();
break WhileLoop;
}
}
next_unused_i = 0;
didSubTaskA = false;
}
if(ticks > threshold) {
ticks = 0;
restTimer.reset();
restTimer.start();
}
The variables ticks, threshold, restTimer, next_unused_i, and didSubTaskA are important, and can't be local method variables. They could be static or class variables. Subtask A is that part where you "Resampling logic variables", and also the variables used there can't be local variables, so make them class variables as well, so their values can persist when you leave and come back to the method.
You can make it look nicer by creating your own Task class, then storing there the whole state of interrupted state of your "threaded"-algorithm. Also, you could maybe make the checkpoint become a local function.
I didn't test the code above so I can't guarantee it works, but the idea is basically that. I hope it helps