Lua's GC and realtime game - garbage-collection

As I know, the tracing GC can't avoid thread blocking during complete GC.
I had used XNA+C#, and GC time were impossible to remove. So I switched to lower level language C, but I realized I need scripting language. I'm considering Lua, but I'm worrying about Lua's GC mechanism. Lua is using incremental tracing GC, and thread blocking should be too.
So how should I handle this in realtime game?

The power of Lua is that it gets out of your way. Want classes? That can be build with metatables. Want sandboxing? use lua_setfenv.
As for the garbage collector. Use it as is first. If later you find performance issues use lua_gc to fine tune its behavior.
Some examples:
Disable the garbage collector during those times when the slow down would be a problem.
Leave the garbage collector disabled and only step it when game logic says you've got some head room on your FPS count. You could pre-tune the step size, or discover the optimal step size at runtime.
Disable the collector and perform full collection at stopping points, ie a load screen or cut scene or at turn change in a hot seat game.
You might also consider an alternative scripting language. Squirrel tries very hard to be a second generation Lua. It tries to keep all of Lua's good features, while ditching any of its design mistakes. One of the big differences between the two is squirrel uses reference counting instead of garbage collection. It turns out that reference counting can be a bit slower than garbage collection but it is very deterministic(AKA realtime).

The correct way to handle this is:
Write a small prototype with just the core things that you want to test.
Profile it a lot, reproducing the different scenarios that could happen in your game (lots of memory available, little memory available, different numbers of threads, that kind of thing)
If you don't find a visible bottleneck, you can use Lua. Otherwise, you will have to look for alternative solutions (maybe Lisp or Javascript)

You can patch the Lua GC so as to time-limit each collection cycle. E.g: http://www.altdevblogaday.com/2011/07/23/predictable-garbage-collection-with-lua/
I believe that it is still possible to have long GC step times when collecting very large tables. Therefore you need to adopt a programming style that avoids large tables.
The following article discusses two strategies for using Lua for real-time robot control (1. don't generate garbage, or 2. using an O(1) allocator and tune when GC collection is run):
https://www.osadl.org/?id=1117

Related

What is the advantage of runtime GC over compile-time ARC?

Some newer languages are implementing ARC into their compilers (Swift and Rust, to name a couple). As I understand this achieves the same thing as runtime GC (taking the burden of manual deallocation away from the programmer), while being significantly more efficient.
I understand that ARC could become a complex process, but with the complexity of modern garbage collectors it seems like it would be no more complex to implement ARC. However, there are still tons of languages and frameworks using GC for memory management, and even the Go language, which targets systems programming, uses GC.
I really cannot understand why GC would be preferable to ARC. Am I missing something here?
There's a bunch of tradeoffs involved here, it's a complex topic. Here's the big ones though:
GC pros:
Tracing garbage collectors can handle cycles in object graphs. Automatic reference counting will leak memory unless cycles are manually broken either by removing a reference or figuring out which edge of the graph should be weak. This is quite a common problem in practice in reference counted apps.
Tracing garbage collectors can actually be moderately faster (in terms of throughput) than reference counting, by doing work concurrently, by batching work up, by deferring work, and by not messing up caches touching reference counts in hot loops.
Copying collectors can compact the heap, reclaiming fragmented pages to reduce footprint
ARC pros:
Because object destruction happens immediately when the reference count hits 0, object lifetimes can be used to manage non-memory resources. With garbage collection, lifetimes are non-deterministic, so this isn't safe.
Collection work is typically more spread out, resulting in much shorter pauses (it's still possible to get a pause if you deallocate a large subgraph of objects)
Because memory is collected synchronously, it's not possible to "outrun the collector" by allocating faster than it can clean up. This is particularly important when VM paging comes into play, since there are degenerate cases where the GC thread hits a page that's been paged out, and falls far behind.
On a related note, tracing garbage collectors have to walk the entire object graph, which forces unnecessary page-ins (there are mitigations for this like https://people.cs.umass.edu/~emery/pubs/f034-hertz.pdf, but they're not widely deployed)
Tracing garbage collectors typically need more "scratch space" than reference counting if they want to hit their full throughput
My personal take on this is that the only two points that really matter for most cases are:
ARC doesn't collect cycles
GC doesn't have deterministic lifetimes
I feel that both of these issues are deal breakers, but in the absence of a better idea, you just have to pick which horrifying problem sounds worse to you.

Using D for a realtime application?

I am considering using d for my ongoing graphics engine. The one thing that turns me down is the GC.
I am still a young programmer and I probably have a lot of misconceptions about GC's and I hope you can clarify some concerns.
I am aiming for low latency and timing in general is crucial. From what I know is that GC's are pretty unpredictable, for example my application could render a frame every 16.6ms and when to GC's kicks in it could go up to any number like 30ms because it is not deterministic right?
I read that you can turn down the GC in D, but then you can't use the majority of D's standard library and the GC is not completely off. is this true?
Do you think it makes sense to use D in a timing crucial application?
Short answer: it requires lot of customization and can be really difficult if you are not an experienced D developer.
List of issues:
Memory management itself is not that big problem. In real-time applications you never ever want to allocate memory in a main loop. Having pre-allocated memory pools for all main data is pretty much de-facto standard way to do such applications. In that sense, D is not different - you still call C malloc directly to get some heap for your pools and this memory won't be managed by a GC, it won't even know about it.
However, certain language features and large parts of Phobos do use GC automagically. For example, you can't really concatenate slices without some form of automatically managed allocation. And Phobos simply has not had a strong policy about this for quite a long time.
Few language-triggered allocations won't be a problem on their own as most memory used is managed via pools anyway. However, there is a killer issue for real-time software in stock D : default D garbage collector is stop-the-world. Even if there is almost no garbage your whole program will hit a latency spike when collection cycle is ran, as all threads get blocked.
What can be done:
1) Use GC.disable(); to switch off collection cycles. It will solve stop-the-world issue but now your program will start to leak memory in some cases, as GC-based allocations still work.
2) Dump hidden GC allocations. There was a pull request for -vgc switch which I can't find right now, but in its absence you can compile your own druntime version that prints backtrace upon gc_malloc() call. You may want to run this as part of automatic test suite.
3) Avoid Phobos entirely and use something like https://bitbucket.org/timosi/minlibd as an alternative.
Doing all this should be enough to target soft real-time requirements typical for game dev, but as you can see it is not simple at all and requires stepping out of stock D distribution.
Future alternative:
Once Leandro Lucarella ports his concurrent garbage collector to D2 (which is planned, but not scheduled), situation will become much more simple. Small amount of GC-managed memory + concurrent implementation will allow to meet soft real-time requirements even without disabling GC. Even Phobos can be used after it is stripped from most annoying allocations. But I don't think it will happen any soon.
But what about hard real-time?
You better not even try. But that is yet another story to tell.
If you do not like GC - disable it.
Here is how:
import core.memory;
void main(string[] args) {
GC.disable;
// your code here
}
Naturally, then you will have to do the memory manage yourself. It is doable, and there are several articles about it. It has been discussed here too, I just do not remember the thread.
dlang.org also has useful information about this. This article, http://dlang.org/memory.html , touches the topic of real-time programming and you should read it.
Yet another good article: http://3d.benjamin-thaut.de/?p=20 .

d garbage collector and realtime applications

I'm thinking about learning D (basically "C++ done right and with garbage collection and message passing between threads") and talked to a colleague who's been long-time C++ programmer and basically he complained that the garbage collector as such has severe timing issues even in soft realtime type applications.
It's not that I need to write realtime app - far from it - but I'm curious how problematic GC would be in developing, say, database? (abstracting from additional memory usage overhead that GC seems to impose, statistically)
(now I know that GC can be turned off in D but that's like saying you can get rid of problems related to a car by getting rid of a car - true but that's not the solution I'd like to choose)
Is this true? How severe are such issues in practice? Is developing, say, a device driver in D and with use of GC is practical/sensible/good practice?
While D has a GC, it does not force you to use it for everything. D also has structs, which act like C++ classes&structs(minus the polymorphism).
In modern managed languages, the GC is not a problem as long as you have enough memory. This is also true for unmanaged languages like C++ - but in C++, running out of memory means you can't allocate any more memory, while in Java running out of memory means a delay while the GC kicks in.
So, if you are planning to allocate tons of objects then yes - the GC can be a problem. But you probably don't really need to allocate so many objects. In Java, you have to use objects to store things like strings and dates and coordinates - and that can really fill up your heap and invoke the GC(luckily, modern JVM use generational GC to optimize those types of objects). In D, you'll just use structs for these things, and only use classes for cases that actually require GC.
As a rule of thumb, you'll usually want to use structs wherever you can, but if you find yourself doing something special to take care of deallocating or to prevent copying&destructing(though it's really fast in D) - make that type a class without a second thought.
I personally don't really approve of the statement "as long as you have enough memory a GC is not a problem". I mean, that basically means, you goahead and waste your memory instead of properly taking care of it and when it's out you suddenly have to wait 1> second for the GC to collect everything.
For one thing, that only happens if it's a really bad GC. The GC in c# for example collects objects extremly fast and often. You won't get a problem, even if you allocate in an often used function and it won't wait till you run out of memory to do a collection.
I am not fully up to date on the current features of the D2 GC (we use D1) but the behavior at the time was that it would allocate a pool of memory and for each of your allocations it would give you some of it. When it has given out 90% and you need more it would start a collection and/or allocate more from the system. (or something like that). There is (for D1) also the concurrent GC which would start collections earlier, but having them run in the background, but it is linux-only as it uses the fork syscall.
So, I think the current D GC can cause small but noticable freezes if not used with care. But you can disable/enable it, e.g. when you do something real-time critical, disable it, when that critical part of the code is over, enable it again.
For a database, I don't think the D GC is ready yet. I would heavily re-use memory and not rely on the GC at all for that kind of application.

How to implement a garbage collector?

Could anyone point me to a good source on how to implement garbage collection? I am making a lisp-like interpreted language. It currently uses reference counting, but of course that fails at freeing circularly dependent objects.
I've been reading of mark and sweep, tricolor marking, moving and nonmoving, incremental and stop-the-world, but... I don't know what the best way to keep the objects neatly separated into sets while keeping per-object memory overhead at a minimum, or how to do things incrementally.
I've read some languages with reference counting use circular reference detection, which I could use. I am aware I could use freely available collectors like Boehm, but I would like to learn how to do it myself.
I would appreciate any online material with some sort of tutorial or help for people with no experience on the topic like myself.
Could anyone point me to a good source on how to implement garbage collection?
There's a lot of advanced material about garbage collection out there. The Garbage Collection Handbook is great. But I found there was precious little basic introductory information so I wrote some articles about it. Prototyping a mark-sweep garbage collector describes a minimal mark-sweep GC written in F#. The Very Concurrent Garbage Collector describes a more advanced concurrent collector. HLVM is a virtual machine I wrote that includes a stop-the-world collector that handles threading.
The simplest way to implement a garbage collector is:
Make sure you can collate the global roots. These are the local and global variables that contain references into the heap. For local variables, push them on to a shadow stack for the duration of their scope.
Make sure you can traverse the heap, e.g. every value in the heap is an object that implements a Visit method that returns all of the references from that object.
Keep the set of all allocated values.
Allocate by calling malloc and inserting the pointer into the set of all allocated values.
When the total size of all allocated values exceeds a quota, kick off the mark and then sweep phases. This recursively traverses the heap accumulating the set of all reachable values.
The set difference of the allocated values minus the reachable values is the set of unreachable values. Iterate over them calling free and removing them from the set of allocated values.
Set the quota to twice the total size of all allocated values.
Check out the following page. It has many links. http://lua-users.org/wiki/GarbageCollection
As suggested by delnan, I started with a very naïve stop-the-world tri-color mark and sweep algorithm. I managed to keep the objects in the sets by making them linked-list nodes, but it does add a lot of data to each object (the virtual pointer, two pointers to nodes, one enum to hold the color). It works perfectly, no memory lost on valgrind :) From here I might try to add a free list for recycling, or some sort of thing that detects when it is convenient to stop the world, or an incremental approach, or a special allocator to avoid fragmentation, or something else. If you can point me where to find info or advice (I don't know whether you can comment on an answered question) on how to do these things or what to do, I'd be very thankful. I'll be checking Lua's GC in the meantime.
I have implemented a Cheney-style copying garbage collector in C in about 400 SLOC. I did it for a statically-typed language and, to my surprise, the harder part was actually communicating the information which things are pointers and which things aren't. In a dynamically typed language this is probably easier since you must already use some form of tagging scheme.
There also is a new version of the standard book on garbage collection coming out: "The Garbage Collection Handbook: The Art of Automatic Memory Management" by Jones, Hosking, Moss. (The Amazon UK site says 19 Aug 2011.)
One thing I haven't yet seen mentioned is the use of memory handles. One may avoid the need to double-up on memory (as would be needed with the Cheney-style copying algorithm) if each object reference is a pointer to a structure which contains the real address of the object in question. Using handles for memory objects will make certain routines a little slower (one must reread the memory address of an object any time something might have happened that would move it) but for single-threaded systems where garbage collection will only happen at predictable times, this isn't too much of a problem and doesn't require special compiler support (multi-threaded GC systems will are likely to require compiler-generated metadata whether they use handles or direct pointers).
If one uses handles, and uses one linked list for live handles (the same storage can be used to hold a linked list for dead handles needing reallocation), one can, after marking the master record for each handle, proceed through the list of handles, in allocation order, and copy the block referred to by that handle to the beginning of the heap. Because handles will be copied in order, there will be no need to use a second heap area. Further, generations may be supported by keeping track of some top-of-heap pointers. When compactifying memory, start by just compactifying items added since the last GC. If that doesn't free up enough space, compactify items added since the last level 1 GC. If that doesn't free up enough space, compactify everything. The marking phase would probably have to act upon objects of all generations, but the expensive compactifying stage would not.
Actually, using a handle-based approach, if one is marking things of all generations, one could if desired compute on each GC pass the amount of space that could be freed in each generation. If half the objects in Gen2 are dead, it may be worthwhile to do a Gen2 collection so as to reduce the frequency of Gen1 collections.
Garbage collection implementation in Lisp
Building LISP | http://www.lwh.jp/lisp/
Arcadia | https://github.com/kimtg/arcadia
Read Memory Management: Algorithms and Implementations in C/C++. It's a good place to start.
I'm doing similar work for my postscript interpreter. more info via my question. I agree with Delnan's comment that a simple mark-sweep algorithm is a good place to start. You'll need functions to set-mark, check-mark, clear-mark, and iterators for all your containers. One easy optimization is to clear-mark whenever allocating a new object, and clear-mark during the sweep; otherwise you'll need an entire pass to clear marks before you start setting them.

Why does Lua use a garbage collector instead of reference counting?

I've heard and experienced it myself: Lua's garbage collector can cause serious FPS drops in games as their scripted part grows.
This is as I found out related to the garbage collector, where for example every Vector() userdata object created temporarily lies around until getting garbage collected.
I know that Python uses reference counting, and that is why it doesn't need any huge, performance eating steps like Luas GC has to do.
Why doesn't Lua use reference counting to get rid of garbage?
Because reference counting garbage collectors can easily leak objects.
Trivial example: a doubly-linked list. Each node has a pointer to the next node - and is itself pointed to by the next one. If you just un-reference the list itself and expect it to be collected, you just leaked the entire list - none of the nodes have a reference count of zero, and hence they'll all keep each other alive. With a reference counting garbage collector, any time you have a cyclic object, you basically need to treat that as an unmanaged object and explicitly dispose of it yourself when you're finished.
Note that Python uses a proper garbage collector in addition to reference counting.
While others have explained why you need a garbage collector, keep in mind that you can configure the garbage collection cycles in Lua to either be smaller, less frequent, or on demand. If you have a lot of memory allocated and are busy drawing frames, then make the thresholds very large to avoid a collection cycle until there is a break in the game.
Lua 5.1 Manual on garbage collection
Reference Counting alone is not enough for a garbage collector to work correctly because it does not detect cycles. Even Python does not use reference counting alone.
Imagine that objects A and B each hold a reference to each other. Even once you, the programmer no longer hold a reference to either object, reference counting will still say that objects A and B have references pointing to them.
There are many different garbage collecting schemes out there and some will work better in some circumstances and some will work better in other circumstances. It is up to the language designers to try and choose a garbage collector that they think will work best for their language.
What version of Lua is being used in the games you are basing this claim on? When World of Warcraft switched from Lua 5.0 to 5.1, all the performance issues caused by garbage collection were severely diminished.
With Lua 5.0's garbage collection, the amount of time spent collecting garbage (and blocking anything else from happening at the same time) was proportional to the amount of memory currently in use, leading to lots of effort to minimize the memory usage of WoW addons.
With Lua 5.1's garbage collection, the collector changed to being incremental so it doesn't lock up the game while collecting garbage like it previously did. Now garbage collection has a very minimal impact on performance compared to the larger issue of horribly inefficient code in the majority of user created addons.
In general, reference counting isn't an exact substitute for garbage collection because of the potential of circular references. You might want to read this page on why garbage collection is preferred to reference counting.
You might also be interested in the Lua Gem about optimization which also has a part that handles garbage collection.
Take a look at some of the CPython sources. A good portion of the C code is Py_DECREF and Py_INCREF. That nasty, tedious and error-prone book keeping just goes away in Lua.
If required, there's nothing to stop you writing Lua modules in C that manage any heavy, private allocations manually.
It's a tradeoff. People have explained some reasons some languages (this really has nothing to do with Lua) use collectors, but haven't touched on the drawbacks.
Some languages, notably ObjC, use reference counting exclusively. The huge advantage of this is that deallocation is deterministic--as soon as you let go of the last reference, it's guaranteed that the object will be freed immediately. This is critical when you have memory constraints. With Lua's allocator, if memory constraints require predictable deallocation, you have to add methods to force the underlying storage to be freed immediately, which defeats the point of having garbage collection.
"WuHoUnited" is wrong in saying you can't do this--it works extremely well with ObjC on iOS, and with shared_ptr in C++. You just have to understand the environment you're in, to avoid cycles or break them when necessary.

Resources