I would like to ensure the garbage collector is not overused in haxe (cpp target).
I already have pools of large objects and frequently reused objects that I managed to efficiently recycle. But there are still some slowdowns. I'm sure I can limit some of the inconsistent slowdowns and skipped frames by reducing garbage collection.
How can I collect data about the gc? I would like to see the list of classes collected, the number of times they are collected and the number of objects registered in the GC.
Is there an option for that?
untyped __cpp__('code'); will let you execute arbitrary cpp code(passed as it is). Using this construction you can access any inner mechanisms, including garbage collector, so if youlook in gc implementatio, you can do anything you want I guess. You also can directly patch gc part of haxe after it was compiled to cpp.
For Haxe 3.1, use cpp.vm.ExecutionTrace.traceFunctions(); , traceLines, traceOff
Related
What techniques do modern garbage collectors (as in CLR, JVM) use to tell which heap objects are referenced from the stack?
Specifically how can a VM work back from knowing where the stack starts to interpreting all local references to heap objects?
In Java (and likely in the CLR although I know its internals less well), the bytecode is typed with object vs primitive information. As a result, there are data structures in the bytecode that describe which variables in each stack frame are objects and which are primitives. When the GC needs to scan the root set, it uses these StackMapTables to differentiate between references and non-references.
CLR and Java have to have some mechanism like this because they are exact collectors. There are conservative collectors like the boehm collector that treat every offset on the stack as a possible pointer. They look to see if the value (when treated as a pointer) is an offset into the heap, and if so, they mark it as alive.
Take a look at this Artima article from August 1996, Java's Garbage-Collected Heap; especially page 2.
Any garbage collection algorithm must do two basic things. First, it must detect garbage objects. Second, it must reclaim the heap space used by the garbage objects and make it available to the program. Garbage detection is ordinarily accomplished by defining a set of roots and determining reachability from the roots. An object is reachable if there is some path of references from the roots by which the executing program can access the object. The roots are always accessible to the program. Any objects that are reachable from the roots are considered live. Objects that are not reachable are considered garbage, because they can no longer affect the future course of program execution.
In a JVM the root set is implementation dependent but would always include any object references in the local variables. In the JVM, all objects reside on the heap. The local variables reside on the Java stack, and each thread of execution has its own stack. Each local variable is either an object reference or a primitive type, such as int, char, or float. Therefore the roots of any JVM garbage-collected heap will include every object reference on every thread's stack. Another source of roots are any object references, such as strings, in the constant pool of loaded classes. The constant pool of a loaded class may refer to strings stored on the heap, such as the class name, superclass name, superinterface names, field names, field signatures, method names, and method signatures.
Any object referred to by a root is reachable and is therefore a live object. Additionally, any objects referred to by a live object are also reachable. The program is able to access any reachable objects, so these objects must remain on the heap. Any objects that are not reachable can be garbage collected because there is no way for the program to access them.
The article continues to explore different garbage collection strategies, including reference counting collectors, tracing collectors, compacting collectors and copying collectors.
Though this article is old, it still applies today; not much has really changed. There have been performance improvements to the different collection strategies, but no new major advancements.
The Oracle HotSpot JVM, for example, has a new Garbage-First Garbage Collector which is a copying collector with performance tweaks for multi-core processors and large heap sizes (see this answer for more on the G1 Garbage Collector).
Interesting documentation on this topic posted up by the .Net team shortly after the they made CoreCLR open source: Stack Walking
Garbage collection involves walking through a list of allocated objects (either all objects or objects in a particular generation) and determining which are reachable.
How is this list maintained? Do runtimes for GC languages keep a giant list of all objects?
Also, from what I understand, GC involves walking the call stack to look for object references - how does the algorithm distinguish between GC-able pointers and primitive data?
The memory management system keeps track of the size of each allocated object, just like it does in C or C++. One way this is commonly done is for the memory management system to allocate an extra size_t before each allocation, that keeps track of the size of each objecct. The memory manager likewise has to keep track of the size of each free block, so that it can reuse blocks to allocate them.
The garbage collector works in two phases: the mark phase, and the sweep phase. In the mark phase, the garbage collector starts walks object references in order to find objects that are still reachable. The garbage collector starts at a few basic places where the object references are stored and given names (the stack, and global storage, and static storage), and then traverses references in the objects.
In the sweep phase, the garbage collector walks the heap from bottom to top, jumping from allocation to allocation based on those size_ts, and frees anything that isn't marked.
Some languages (like Ruby) tag all of the primitives so that they can be identified separately from the object references at runtime. Other garbage collectors are ver conservative and follow primatives as through they were object references (though some checks must be performed to make sure that the garbage collector doesn't stick a mark in the middle of some other object). Still other languages use runtime type information to be more precise about whether they follow primatives.
Ruby's garbage collector sometimes called "conservative" because it doesn't check whether the space on the stack is actually being used, so it sometimes keeps dead objects alive by following ghost references on the stack. But since it always knows exactly whether the data it's looking at is a reference or a primative, I don't call it conservative here.
Garbage collection involves walking through a list of allocated objects (either all objects or objects in a particular generation) and determining which are reachable.
Not really. GCs are categorized into tracing and reference counting (see A unified theory of garbage collection). Tracing GCs start from a set of global roots and trace all objects reachable from them. Reference counting GCs count the number of references to each object and reclaim it when the count reaches zero. Neither require a list including unreachable objects.
How is this list maintained? Do runtimes for GC languages keep a giant list of all objects?
Pedagogical solutions like the one in HLVM can keep a list of all objects because it is simple but this is rare.
Also, from what I understand, GC involves walking the call stack to look for object references - how does the algorithm distinguish between GC-able pointers and primitive data?
Again, there are many different strategies. Conservative GCs are unable to distinguish between pointers and non-pointers so they conservatively consider that non-pointers might be pointers. Pedagogical GCs like the one in HLVM can use algorithms like Henderson's Accurate GC in an uncooperative environment. Production GCs store enough information in the OS thread stack to determine exactly which words are pointers (and which stack frames to skip because they are not affiliated with managed code) and then use a stack walker to find them.
Note that you also have to find local references held in registers as well as on the stack.
This site ( How Java’s Garbage Collector Works? ) has a good, brief explanation on how garbage collectors work, not just the default Java one.
I've heard and experienced it myself: Lua's garbage collector can cause serious FPS drops in games as their scripted part grows.
This is as I found out related to the garbage collector, where for example every Vector() userdata object created temporarily lies around until getting garbage collected.
I know that Python uses reference counting, and that is why it doesn't need any huge, performance eating steps like Luas GC has to do.
Why doesn't Lua use reference counting to get rid of garbage?
Because reference counting garbage collectors can easily leak objects.
Trivial example: a doubly-linked list. Each node has a pointer to the next node - and is itself pointed to by the next one. If you just un-reference the list itself and expect it to be collected, you just leaked the entire list - none of the nodes have a reference count of zero, and hence they'll all keep each other alive. With a reference counting garbage collector, any time you have a cyclic object, you basically need to treat that as an unmanaged object and explicitly dispose of it yourself when you're finished.
Note that Python uses a proper garbage collector in addition to reference counting.
While others have explained why you need a garbage collector, keep in mind that you can configure the garbage collection cycles in Lua to either be smaller, less frequent, or on demand. If you have a lot of memory allocated and are busy drawing frames, then make the thresholds very large to avoid a collection cycle until there is a break in the game.
Lua 5.1 Manual on garbage collection
Reference Counting alone is not enough for a garbage collector to work correctly because it does not detect cycles. Even Python does not use reference counting alone.
Imagine that objects A and B each hold a reference to each other. Even once you, the programmer no longer hold a reference to either object, reference counting will still say that objects A and B have references pointing to them.
There are many different garbage collecting schemes out there and some will work better in some circumstances and some will work better in other circumstances. It is up to the language designers to try and choose a garbage collector that they think will work best for their language.
What version of Lua is being used in the games you are basing this claim on? When World of Warcraft switched from Lua 5.0 to 5.1, all the performance issues caused by garbage collection were severely diminished.
With Lua 5.0's garbage collection, the amount of time spent collecting garbage (and blocking anything else from happening at the same time) was proportional to the amount of memory currently in use, leading to lots of effort to minimize the memory usage of WoW addons.
With Lua 5.1's garbage collection, the collector changed to being incremental so it doesn't lock up the game while collecting garbage like it previously did. Now garbage collection has a very minimal impact on performance compared to the larger issue of horribly inefficient code in the majority of user created addons.
In general, reference counting isn't an exact substitute for garbage collection because of the potential of circular references. You might want to read this page on why garbage collection is preferred to reference counting.
You might also be interested in the Lua Gem about optimization which also has a part that handles garbage collection.
Take a look at some of the CPython sources. A good portion of the C code is Py_DECREF and Py_INCREF. That nasty, tedious and error-prone book keeping just goes away in Lua.
If required, there's nothing to stop you writing Lua modules in C that manage any heavy, private allocations manually.
It's a tradeoff. People have explained some reasons some languages (this really has nothing to do with Lua) use collectors, but haven't touched on the drawbacks.
Some languages, notably ObjC, use reference counting exclusively. The huge advantage of this is that deallocation is deterministic--as soon as you let go of the last reference, it's guaranteed that the object will be freed immediately. This is critical when you have memory constraints. With Lua's allocator, if memory constraints require predictable deallocation, you have to add methods to force the underlying storage to be freed immediately, which defeats the point of having garbage collection.
"WuHoUnited" is wrong in saying you can't do this--it works extremely well with ObjC on iOS, and with shared_ptr in C++. You just have to understand the environment you're in, to avoid cycles or break them when necessary.
As far as I know GC is used only when JVM needs more memory, but I'm not sure about it. So, please, someone suggest an answer to this question.
The garbage collection algorithm of Java is as I understand, pretty complex and not as straightforward. Also, there is more than algorithm available for GC, which can be chosen at VM launchtime with an argument passed to the JVM.
There's a FAQ about garbage collection here: http://www.oracle.com/technetwork/java/faq-140837.html
Oracle has also published an article "Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine" which contains deep insights into garbage collection, and will probably help you understand the matter better: http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html
The JVM and java specs don't say anything about when garbage collection occurs, so its entirely up to the JVM implementors what policies they wish to use. Each JVM should probably have some documention about how it handles GC.
In general though, most JVMs will trigger GC when a memory allocation pushes the total amount of allocated memory above some threshold. There may be mulitple levels of gc (full vs partial/generational) that occur at different thresholds.
Garbage Collection is deliberately vaguely described in the Java Language Specification, to give the JVM implementors best working conditions to provide good garbage collectors.
Hence garbage collectors and their behaviour are extremely vendor dependent.
The simplest but least fun is the one that stops the whole program when needing to clean. Others are more sophisticated and run quietly along your program cleaning up as you go more or less aggressively.
The most fun way to investigate garbage collection is to run jvisualvm in the Sun 6 JDK. It allows you to see many, many internal things many relevant to garbage collection.
https://visualvm.dev.java.net/monitor_tab.html (but the newest version has plenty more)
In the older days garbage collector were empirical in nature. At some set interval or based on certain condition they would kick in and examine each of the object.
Modern days collectors are smarter in the sense that they differentiate based on the fact that objects are different lifespan. The objects are differentiated between young generation objects and tenured generation objects.
Memory is managed in pools according to generation. Whenever the young generation memory pool is filled, a minor collection happens. The surviving objects are moved to tenured generation memory pool. When the tenured generation memory pool gets filled a major collection happens. A third generation is kept which is known as permanent generation and may contain objects defining classes and methods.
I noticed that garbage collection is not yet implemented in gccgo.
http://golang.org/doc/gccgo_install.html#Unimplemented
Does the standard Go compiler (gc) support garbage collection yet?
gccgo has its own runtime, the plan is to switch to use a single runtime shared by both gc and gccgo.
Also, the current garbage collector in gc is rather simple, a concurrent and much faster implementation based on research done by IBM is under development, and will probably be the one used by both gccgo and gc.
Yes.