I have ARC enabled in my app and noticed that if I create a ton of images, my app would crash. As part of my investigation, I created a small project that reproduces the issue which can be found here. The meat of the sample project code is the following:
int width = 10;
int height = 10;
uint8_t data[100];
while (true)
{
CGColorSpaceRef colorspace = CGColorSpaceCreateDeviceGray();
CGContextRef context = CGBitmapContextCreate(data, width, height, 8, width, colorspace, kCGBitmapByteOrderDefault | kCGImageAlphaNone);
CGImageRef cgimage = CGBitmapContextCreateImage(context);
// Remove this line and memory is stable, keep it and you lose 15-20 MB per second. Why?
UIImage* uiimage = [UIImage imageWithCGImage:cgimage];
CGImageRelease(cgimage);
CGContextRelease(context);
CGColorSpaceRelease(colorspace);
}
While running this code, the sidebar in Xcode will show the total memory of the app increasing at around 15-20 MB per second. If you comment out the line that creates the UIImage, the leak disappears.
There are a number of questions on Stack Overflow about whether or not you should release a CGImage after creating a UIImage via imageWithCGImage, and it doesn't look like there is a real consensus. However, if I don't call CGImageRelease(cgimage), then the memory usage increases by over 100 MB per second, so I'm certain that manually releasing the image is the correct thing to do.
Since I have ARC enabled, I tried setting uiimage to nil after releasing everything, which didn't work. Not storing the return value of the call to imageWithCGImage: also doesn't prevent the leak.
Is there something fundamental I'm missing about how to use Core Graphics?
Is there something fundamental I'm missing about how to use Core Graphics?
It seems more likely that you are missing something fundamental about memory management.
Many Foundation / Cocoa framework commands, especially those that create ready-made objects, make objects that are autoreleased. That means you don't have to release the object, because it will be released later, automatically, when you are done with it. But how is that possible? Such objects go into the autorelease pool and, once their retain count drops to zero, they are drained later on, when there is an opportunity. But you are looping continuously, so there is no such opportunity. So you need to wrap your troublesome line in an #autoreleasepool{} block so as to construct and drain your own pool.
Also note that there can be intermediate autoreleased objects of which you are unaware. The autorelease pool can help with those too.
See this section of my book for more information about autoreleased objects.
Related
I am trying to access pixel data and save images from an in-game camera to disk. Initially, the simple approach was to use a render target and subsequently RenderTarget->ReadPixels(), but as the native implementation of ReadPixels() contains a call to FlushRenderingCommands(), it would block the game thread until the image is saved. Being a computationally intensive operation, this was lowering my FPS way too much.
To solve this problem, I am trying to create a dedicated thread that can access the camera as a CaptureComponent, and then follow a similar approach. But as the FlushRenderingCommands() block can only be called from a game thread, I had to rewrite ReadPixels() without that call, (in a non-blocking way of sorts, inspired by the tutorial at https://wiki.unrealengine.com/Render_Target_Lookup): but even then I am facing a problem with my in-game FPS being jerky whenever an image is saved (I confirmed this is not because of the actual saving to disk operation, but because of the pixel data access). My rewritten ReadPixels() function looks as below, I was hoping to get some suggestions as to what could be going wrong here. I am not sure if ENQUEUE_UNIQUE_RENDER_COMMAND_ONEPARAMETER can be called from a non-game thread, and if that's part of my problem.
APIPCamera* cam = GameThread->CameraDirector->getCamera(0);
USceneCaptureComponent2D* capture = cam->getCaptureComponent(EPIPCameraType::PIP_CAMERA_TYPE_SCENE, true);
if (capture != nullptr) {
if (capture->TextureTarget != nullptr) {
FTextureRenderTargetResource* RenderResource = capture->TextureTarget->GetRenderTargetResource();
if (RenderResource != nullptr) {
width = capture->TextureTarget->GetSurfaceWidth();
height = capture->TextureTarget->GetSurfaceHeight();
// Read the render target surface data back.
struct FReadSurfaceContext
{
FRenderTarget* SrcRenderTarget;
TArray<FColor>* OutData;
FIntRect Rect;
FReadSurfaceDataFlags Flags;
};
bmp.Reset();
FReadSurfaceContext ReadSurfaceContext =
{
RenderResource,
&bmp,
FIntRect(0, 0, RenderResource->GetSizeXY().X, RenderResource->GetSizeXY().Y),
FReadSurfaceDataFlags(RCM_UNorm, CubeFace_MAX)
};
ENQUEUE_UNIQUE_RENDER_COMMAND_ONEPARAMETER(
ReadSurfaceCommand,
FReadSurfaceContext, Context, ReadSurfaceContext,
{
RHICmdList.ReadSurfaceData(
Context.SrcRenderTarget->GetRenderTargetTexture(),
Context.Rect,
*Context.OutData,
Context.Flags
);
});
}
}
}
EDIT: One more thing I have noticed is that the stuttering goes away if I disable HDR in my render target settings (but this results in low quality images): so it seems plausible that the size of the image, perhaps, is still blocking one of the core threads because of the way I am implementing it.
It should be possible to call ENQUEUE_UNIQUE_RENDER_COMMAND_ONEPARAMETER from any thread since there is underlying call of Task Graph. You can see it, when you analize what code this macro generates:
if(ShouldExecuteOnRenderThread())
{
CheckNotBlockedOnRenderThread();
TGraphTask<EURCMacro_##TypeName>::CreateTask().ConstructAndDispatchWhenReady(ParamValue1);
}
You should be cautious about accessing UObjects (like USceneCaptureComponent2D) from different threads cause these are managed by Garbage Collector and own by game thread.
(...) but even then I am facing a problem with my in-game FPS being jerky whenever an image is saved
Did you check what thread is causing FPS drop with stat unit or stat unitgraph command? You could also use profiling tools to perform more detailed insight and make sure there is no other causes of lag.
Edit:
I've found yet another method of accessing pixel data. Try this without actually copying data in for loop and check, if there is any improvement in FPS. This could be a bit faster cause there is no pixel manipulation/conversion in-between.
#property (nonomic,retain) CMMotionManager *motionManager; //delcare as a property.
motionManager = [[CMMotionManager alloc] init]; //init it.
motionManager.accelerometerUpdateInterval = 0.2f;
All below perform in a thread(not main thread).
-(void)handle{
[motionManager startAccelerometerUpdates];
while (1) {
CMAcceleration acceleration = motionManager.accelerometerData.acceleration;
NSLog(#"%f %f %f",acceleration.x,acceleration.y,acceleration.z);
sleep(0.5);
}
}
When I run the app in Xcode -> Instruments, I found the Living Memory increase uninterruptly,
until the app recieved memory warming and killed by system.
I try to relase the accelerometerData in while block. but make no effect. motionManager.accelerometerData release];
I don't wish to use startAccelerometerUpdatesToQueue:[NSOperationQueue currentQueue]
withHandler:^(CMAccelerometerData *accelerometerData,NSError *error), becuase I want to run it in background,and this block make no function when the app be suspended.
Can anyone help me?
screenshot image:http://img.blog.csdn.net/20130702120140593
Are you using ARC? You should use it.
You must also make sure your bakground thread has an autorelease pool in place. The syntax for creating an autorelease pool with ARC enabled is #autorelease { ... } and the syntax without ARC is NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; ... [pool release];.
Apple has excellent documentation on how autorelease pools work. One is created automatically for the main thread but you must manually create it for background threads. You need to spend a couple of hours learning how it works, it's mandatory learning for any obj-c programmer.
Without seeing all your code, I can't tell you how it needs to work... but most likely the block or method you create the thread with needs it's contents wrapped in an autorelease pool and also the contents of your while loop needs a second autorelease pool.
EDIT: now that I've seen your code, you here is an example of how #autoreleasepool must be used to avoid leaking memory. I added line 6 and 23 to this code: https://gist.github.com/abhibeckert/5907754
I haven't tested, but that should solve your problem. It will definitely leak memory without those autorelease pools.
Basically if you have a background thread or long while loop, each needs to have it's own autorelease pool. I recommend reading this: http://developer.apple.com/library/ios/#documentation/cocoa/Conceptual/MemoryMgmt/Articles/mmAutoreleasePools.html
I've been looking at D today and on the surface it looks quite amazing. I like how it includes many higher level constructs directly in the language so silly hacks or terse methods don't have to be used. One thing that really worries me if the GC. I know this is a big issues and have read many discussions about it.
My own simple tests sprouted from a question here shows that the GC is extremely slow. Over 10 times slower than straight C++ doing the same thing. (obviously the test does not directly convert into real world but the performance hit is extreme and would slow down real world happens that behave similarly(allocating many small objects quickly)
I'm looking into writing a real time low latency audio application and it is possible that the GC will ruin the performance of the application to make it nearly useless. In a sense, if it has any issues it will ruin the real time audio aspect which is much more crucial since, unlike graphics, audio runs at a much higher frame rate(44000+ vs 30-60). (due to it's low latency it is more crucial than a standard audio player which can buffer significant amounts of data)
Disabling the GC improved the results to within about 20% of the C++ code. This is significant. I'll give the code at the end for analysis.
My questions are:
How difficult is it to replace D's GC with a standard smart pointers implementation so that libraries that rely on the GC can still be used. If I remove GC completely I'll lose a lot of grunt work, as D already has limit libraries compared to C++.
Does GC.Disable only halt the garbage collection temporarily(preventing the GC thread from running) and GC.Enable pick back up where it left off. So I could potentially disable the GC from running in high cpu usage moments to prevent latency issues.
Is there any way to enforce a pattern to not use GC consistently. (this is because I've not programming in D and when I start writing my glasses that do not use the GC I would like to be sure I don't forget to implement their own clean up.
Is it possible to replace the GC in D easily? (not that I want to but it might be fun to play around with different methods of GC one day... this is similar to 1 I suppose)
What I'd like to do is trade memory for speed. I do not need the GC to run every few seconds. In fact, if I can properly implement my own memory management for my data structures then chances are it will not need to run very often at all. I might need to run it only when memory becomes scarce. From what I've read, though, the longer you wait to call it the slower it will be. Since there generally will be times in my application where I can get away with calling it without issues this will help alleviate some of the pressure(but then again, there might be hours when I won't be able to call it).
I am not worried about memory constraints as much. I'd prefer to "waste" memory over speed(up to a point, of course). First and foremost is the latency issues.
From what I've read, I can, at the very least, go the route of C/C++ as long as I don't use any libraries or language constructs that rely on the GC. The problem is, I do not know the ones that do. I've seen string, new, etc mentioned but does that mean I can't use the build in strings if I don't enable the GC?
I've read in some bug reports that the GC might be really buggy and that could explain its performance problems?
Also, D uses a bit more memory, in fact, D runs out of memory before the C++ program. I guess it is about 15% more or so in this case. I suppose that is for the GC.
I realize the following code is not representative of your average program but what it says is that when programs are instantiating a lot of objects(say, at startup) they will be much slower(10 times is a large factor). Of the GC could be "paused" at startup then it wouldn't necessarily be an issue.
What would really be nice is if I could somehow have the compiler automatically GC a local object if I do not specifically deallocate it. This almost give the best of both worlds.
e.g.,
{
Foo f = new Foo();
....
dispose f; // Causes f to be disposed of immediately and treats f outside the GC
// If left out then f is passed to the GC.
// I suppose this might actually end up creating two kinds of Foo
// behind the scenes.
Foo g = new manualGC!Foo(); // Maybe something like this will keep GC's hands off
// g and allow it to be manually disposed of.
}
In fact, it might be nice to actually be able to associate different types of GC's with different types of data with each GC being completely self contained. This way I could tailor the performance of the GC to my types.
Code:
module main;
import std.stdio, std.conv, core.memory;
import core.stdc.time;
class Foo{
int x;
this(int _x){x=_x;}
}
void main(string args[])
{
clock_t start, end;
double cpu_time_used;
//GC.disable();
start = clock();
//int n = to!int(args[1]);
int n = 10000000;
Foo[] m = new Foo[n];
foreach(i; 0..n)
//for(int i = 0; i<n; i++)
{
m[i] = new Foo(i);
}
end = clock();
cpu_time_used = (end - start);
cpu_time_used = cpu_time_used / 1000.0;
writeln(cpu_time_used);
getchar();
}
C++ code
#include <cstdlib>
#include <iostream>
#include <time.h>
#include <math.h>
#include <stdio.h>
using namespace std;
class Foo{
public:
int x;
Foo(int _x);
};
Foo::Foo(int _x){
x = _x;
}
int main(int argc, char** argv) {
int n = 120000000;
clock_t start, end;
double cpu_time_used;
start = clock();
Foo** gx = new Foo*[n];
for(int i=0;i<n;i++){
gx[i] = new Foo(i);
}
end = clock();
cpu_time_used = (end - start);
cpu_time_used = cpu_time_used / 1000.0;
cout << cpu_time_used;
std::cin.get();
return 0;
}
D can use pretty much any C library, just define the functions needed. D can also use C++ libraries, but D does not understand certain C++ constructs. So... D can use almost as many libraries as C++. They just aren't native D libs.
From D's Library reference.
Core.memory:
static nothrow void disable();
Disables automatic garbage collections performed to minimize the process footprint. Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition. This function is reentrant, but enable must be called once for each call to disable.
static pure nothrow void free(void* p);
Deallocates the memory referenced by p. If p is null, no action occurs. If p references memory not originally allocated by this garbage collector, or if it points to the interior of a memory block, no action will be taken. The block will not be finalized regardless of whether the FINALIZE attribute is set. If finalization is desired, use delete instead.
static pure nothrow void* malloc(size_t sz, uint ba = 0);
Requests an aligned block of managed memory from the garbage collector. This memory may be deleted at will with a call to free, or it may be discarded and cleaned up automatically during a collection run. If allocation fails, this function will call onOutOfMemory which is expected to throw an OutOfMemoryError.
So yes. Read more here: http://dlang.org/garbage.html
And here: http://dlang.org/memory.html
If you really need classes, look at this: http://dlang.org/memory.html#newdelete
delete has been deprecated, but I believe you can still free() it.
Don't use classes, use structs. Structs are stack allocated, classes are heap. Unless you need polymorphism or other things classes support, they are overhead for what you are doing. You can use malloc and free if you want to.
More or less... fill out the function definitions here: https://github.com/D-Programming-Language/druntime/blob/master/src/gcstub/gc.d . There's a GC proxy system set up to allow you to customize the GC. So it's not like it is something that the designers do not want you to do.
Little GC knowledge here:
The garbage collector is not guaranteed to run the destructor for all unreferenced objects. Furthermore, the order in which the garbage collector calls destructors for unreference objects is not specified. This means that when the garbage collector calls a destructor for an object of a class that has members that are references to garbage collected objects, those references may no longer be valid. This means that destructors cannot reference sub objects. This rule does not apply to auto objects or objects deleted with the DeleteExpression, as the destructor is not being run by the garbage collector, meaning all references are valid.
import std.c.stdlib; that should have malloc and free.
import core.memory; this has GC.malloc, GC.free, GC.addroots, //add external memory to GC...
strings require the GC because they are dynamic arrays of immutable chars. ( immutable(char)[] ) Dynamic arrays require GC, static do not.
If you want manual management, go ahead.
import std.c.stdlib;
import core.memory;
char* one = cast(char*) GC.malloc(char.sizeof * 8);.
GC.free(one);//pardon me, I'm not used to manual memory management.
//I am *asking* you to edit this to fix it, if it needs it.
why create a wrapper class for an int? you are doing nothing more than slowing things down and wasting memory.
class Foo { int n; this(int _n){ n = _n; } }
writeln(Foo.sizeof); //it's 8 bytes, btw
writeln(int.sizeof); //Its *half* the size of Foo; 4 bytes.
Foo[] m;// = new Foo[n]; //8 sec
m.length=n; //7 sec minor optimization. at least on my machine.
foreach(i; 0..n)
m[i] = new Foo(i);
int[] m;
m.length=n; //nice formatting. and default initialized to 0
//Ooops! forgot this...
foreach(i; 0..n)
m[i] = i;//.145 sec
If you really need to, then write the Time-sensitive function in C, and call it from D.
Heck, if time is really that big of a deal, use D's inline assembly to optimize everything.
I suggest you read this article: http://3d.benjamin-thaut.de/?p=20
There you will find a version of the standard library that does own memory management and completely avoids garbage collection.
D's GC simply isn't as sophisticated as others like Java's. It's open-source so anyone can try to improve it.
There is an experimental concurrent GC named CDGC and there is a current GSoC project to remove the global lock: http://www.google-melange.com/gsoc/project/google/gsoc2012/avtuunainen/17001
Make sure to use LDC or GDC for compilation to get better optimized code.
The XomB project also uses a custom runtime but it's D version 1 I think.
http://wiki.xomb.org/index.php?title=Main_Page
You can also just allocate all memory blocks you need then use a memory pool to get blocks without the GC.
And by the way, it’s not as slow as you mentionned. And GC.disable() doesn’t really disable it.
We might look at the problem from a bit different view. Suboptimal performance of allocating many little objects, which you mention as a rationale for the question, has little to do with GC alone. Rather, it's a matter of balance between general-purpose (but suboptimal) and highly-performant (but task-specialised) memory management tools. The idea is: presence of GC doesn't prevent you from writing a real-time app, you just have to use more specific tools (say, object pools) for special cases.
Since this hasn't been closed yet, recent versions of D have the std.container library which contains an Array data structure that is significantly more efficient with respect to memory than the built-in arrays. I can't confirm that the other data structures in the library are also efficient, but it may be worth looking into if you need to be more memory conscious without having to resort to manually creating data structures that don't require garbage collection.
D is constantly evolving. Most of the answers here are 9+ years old, so I figured I'd answer these questions again for anyone curious what the current situation is.
(...) replace D's GC with a standard smart pointers implementation so that libraries that rely on the GC can still be used. (...)
Replacing the GC itself with smart pointers is not something I've looked into (i.e. where new creates a smart pointer). There are several D libraries that add smart pointers. You can interface with any C library. Interfacing with C++ and even Objective-C is also supported to some degree, so that should cover you pretty well.
Does GC.disable only halt the garbage collection temporarily (preventing the GC thread from running) and GC.enable pick back up where it left off. (...)
"Collections may continue to occur in instances where the implementation deems necessary for correct program behaviour, such as during an out of memory condition."
[source]
So mostly, yes. You can also manually invoke collection during down-time.
Is there any way to enforce a pattern to not use GC consistently. (...) when I start writing my classes that do not use the GC I would like to (...)
Classes are always allocated on the GC and are reference types. Structs should be used instead. However, keep in mind that structs are value types, so by default they're copied when being moved. You can #disable the copy constructor if you don't like this behaviour, but then your struct won't be POD.
What you're probably looking for is #nogc, which is a function attribute that stops a function from using the GC. You can't mark a struct type as #nogc, but you can mark each of its methods as #nogc. Just keep in mind that #nogc code can't call GC code. There's also nothrow.
If you intend to never use GC, you ought to look into Better C. It's a D language setting that removes all of D's runtime, standard library (Phobos), GC and all GC-reliant features (namely associative arrays and exceptions) in favour of using C's runtime and the C Standard Library.
Is it possible to replace the GC in D (...)
Yes it is: https://dlang.org/spec/garbage.html#gc_registry
And you can configure the pre-existing GC to better suit your needs if you don't want to make your own GC.
Dear Community. I like to understand a little task, which have to help me improve performance for my application.
I have array of dictionaries, in singleton area with objects NSDictionary and keys
code
country
specific
I have to receive country and specific values from this array.
My first version of application was using predicate, but later i find a lot of memory leaks and performance issues by this way. Application was too slow and don't empty very quickly a memory stack, coming to around 1G and crash.
My second version was little bit more complicated. I was filled array in singleton area with objects per one code and function, which u can see bellow.
-(void)codeIsSame:(NSArray *)codeForCheck;
{
//#synchronized(self) {
NSString *code = [codeForCheck objectAtIndex:0];
if ([_code isEqualToString:code])
{
code = nil;
NSUInteger queneNumberInt = [[codeForCheck objectAtIndex:1] intValue];
NSLog(#"We match code:%# country:%# specific:%# quene:%lu",_code, _country,_specific, queneNumberInt);
[[ProjectArrays sharedProjectArrays].arrayDictionaryesForCountryCodesResult insertObject:_result atIndex:queneNumberInt];
}
code = nil;
//}
return;
}
The way to receive necessary issues is a :
SEL selector = #selector(codeIsSame:);
[[ProjectArrays sharedProjectArrays].myCountrySpecificCodeListWithClass makeObjectsPerformSelector:selector withObject:codePlusQueueNumber];
This version working much better, no memory leaks, very quickly, but too hard to debug. Sometimes i receive empty result, i tried to synchronize thread jobs, but it still not work stable. The main problem in this way is that in strange reason sometimes i don't have result in my singleton array. I tried to debug it, using index of array for different threads, and have result that class just missed answer.
Core data don't allow me to make copy of main MOC and for multithreading design i can't using it (lock and unlock is not good idea, and that's way product too much error in lock/unlock part of code.
Maybe anybody can suggest, what i can do better in this case? I need a best way to make decision which will work stable, will be easy to coding and understand it?
My current solution is using NSDictionary, where is a keys is a code and under that code i have dictionary with country/specific. Working fine as well, but don't decide a main task - using core data if u need multiply access from too many threads to the same data.
I remembered someone telling me one good one. But i cannot remember it. I spent the last 20mins with google trying to learn more.
What are examples of bad/not great code that causes a performance hit due to garbage collection ?
from an old sun tech tip -- sometimes it helps to explicitly nullify references in order to make them eligible for garbage collection earlier:
public class Stack {
private static final int MAXLEN = 10;
private Object stk[] = new Object[MAXLEN];
private int stkp = -1;
public void push(Object p) {stk[++stkp] = p;}
public Object pop() {return stk[stkp--];}
}
rewriting the pop method in this way helps ensure that garbage collection gets done in a timely fashion:
public Object pop() {
Object p = stk[stkp];
stk[stkp--] = null;
return p;
}
What are examples of bad/not great code that causes a performance hit due to garbage collection ?
The following will be inefficient when using a generational garbage collector:
Mutating references in the heap because write barriers are significantly more expensive than pointer writes. Consider replacing heap allocation and references with an array of value types and an integer index into the array, respectively.
Creating long-lived temporaries. When they survive the nursery generation they must be marked, copied and all pointers to them updated. If it is possible to coalesce updates in order to reuse of an old version of a collection, do so.
Complicated heap topologies. Again, consider replacing many references with indices.
Deep thread stacks. Try to keep stacks shallow to make it easier for the GC to collate the global roots.
However, I would not call these "bad" because there is nothing objectively wrong with them. They are only inefficient when used with this kind of garbage collector. With manual memory management, none of the issues arise (although many are replaced with equivalent issues, e.g. performance of malloc vs pool allocators). With other kinds of GC some of these issues disappear, e.g. some GCs don't have a write barrier, mark-region GCs should handle long-lived temporaries better, not all VMs need thread stacks.
When you have some loop involving the creation of new object's instances: if the number of cycles is very high you procuce a lot of trash causing the Garbage Collector to run more frequently and so decreasing performance.
One example would be object references that are kept in member variables oder static variables. Here is an example:
class Something {
static HugeInstance instance = new HugeInstance();
}
The problem is the garbage collector has no way of knowing, when this instance is not needed anymore. So its usually better to keep things in local variables and have small functions.
String foo = new String("a" + "b" + "c");
I understand Java is better about this now, but in the early days that would involve the creation and destruction of 3 or 4 string objects.
I can give you an example that will work with the .Net CLR GC:
If you override a finalize method from a class and do not call the super class Finalize method such as
protected override void Finalize(){
Console.WriteLine("Im done");
//base.Finalize(); => you should call him!!!!!
}
When you resurrect an object by accident
protected override void Finalize(){
Application.ObjJolder = this;
}
class Application{
static public object ObjHolder;
}
When you use an object that uses Finalize it takes two GC collections to get rid of the data, and in any of the above codes you won't delete it.
frequent memory allocations
lack of memory reusing (when dealing with large memory chunks)
keeping objects longer than needed (keeping references on obsolete objects)
In most modern collectors, any use of finalization will slow the collector down. And not just for the objects that have finalizers.
Your custom service does not have a load limiter on it, so:
A lot requests come in for some reason at the same time (everyone logs on in the morning say)
The service takes longer to process each requests as it now has 100s of threads (1 per request)
Yet more part processed requests builds up due to the longer processing time.
Each part processed request has created lots of objects that live until the end of processing that request.
The garbage collector spends lots of time trying to free memory it, however it can’t due to the above.
Yet more part processed requests builds up due to the longer processing time…. (including time in GC)
I have encountered a nice example while doing some parallel cell based simulation in Python. Cells are initialized and sent to worker processes after pickling for running. If you have too many cells at any one time the master node runs out of ram. The trick is to make a limited number of cells pack them and send them off to cluster nodes before making some more, remember to set the objects already sent off to "None". This allows you to perform large simulations using the total RAM of the cluster in addition to the computing power.
The application here was cell based fire simulation, only the cells actively burning were kept as objects at any one time.