Copying Garbage Collector - garbage-collection

How does a copying garbage collector avoid memory fragmentation? Also, what consequences for heap-space use?
From my understanding, a copying garbage collector, copies all reachable objects from the heap into another section of the heap. All objects that have been left behind are no longer needed and thus removed.
If this is a correct understanding, how does this avoid memory fragmentation?
This process must use a lot of heap-space, because it will have duplicates of all items it copied, right?

If this is a correct understanding, how does this avoid memory fragmentation?
Because when you copy the objects to the "new heap", you stick them right next to each other without leaving any gaps.
This process must use a lot of heap-space, because it will have duplicates of all items it copied, right?
Only during the collection process. After you've done that, all the "originals" are deallocated and that space is freed up again.
Additionally, garbage collectors like this are often "generational" - copying garbage collection is used on short-lived objects, with longer-lived objects being treated differently. This helps ease the space issue, as well as making collections take less time.

Your basic understanding is correct. It avoids fragmentation because as it copies the reachable objects, it can put then close together, leaving free space in one block. it does require lots of space, in fact, it requires potentially 2x the space plus some change for bookkeeping.

Memory fragmentation occurs when chunks of memory are deallocated in between two active chunks. Think of a block of memory like so...
AAAAAAAAAAAAAAAABBBBCCCCCCCCCCCC
Assume that B is no longer needed. If we free up the space that B was using we having something like...
AAAAAAAAAAAAAAAA----CCCCCCCCCCC
Now we have a gap that we can only put rather small objects in. A copying garbage collector might move things around so that we have...
AAAAAAAAAAAAAAAACCCCCCCCCCC---- (more free space here)
Most modern collectors can move things in-place. That is, you can see how C could be "shifted" to take up B's old space and thus there is no memory overhead.

Related

Why is garbage collection necessary?

Suppose that an object on the heap goes out of scope. Why can't the program free the memory right after the scope ends? Or, if we have a pointer to an object that is replaced by the address to a new object, why can't the program deallocate the old one before assigning the new one? I'm guessing that it's faster not to free it immediately and instead have the freeing be done asynchronously at a later point in time, but I'm not really sure.
Why is garbage collection necessary?
It is not strictly necessary. Given enough time and effort you can always translate a program that depends on garbage collection to one that doesn't.
In general, garbage collection involves a trade-off.
On the one hand, garbage collection allows you to write an application without worrying about the details of memory allocation and deallocation. (And the pain of debugging crashes and memory leaks caused by getting the deallocation logic wrong.)
The downside of garbage collection is that you need more memory. A typical garbage collector is not efficient if it doesn't have plenty of spare space1.
By contrast, if you do manual memory management, you can code your application to free up heap objects as soon as they are no longer used. Furthermore, you don't get awkward "pauses" while the GC is doing its thing.
The downside of manual memory management is that you have to write the code that decides when to call free, and you have to get it correct. Furthermore, if you try to manage memory by reference counting:
you have the cost of incrementing and decrementing ref counts whenever pointers are assign or variables go out of scope,
you have to deal with cycles in your data structures, and
it is worse when your application is multi-threaded and you have to deal with memory caches, synchronization, etc.
For what it is worth, if you use a decent garbage collector and tune it appropriately (e.g. give it enough memory, etc) then the CPU costs of GC and manual storage management are comparable when you apply them to a large application.
Reference:
"The measured cost of conservative garbage collection" by Benjamin Zorn
1 - This is because the main cost of a modern collector is in traversing and dealing with the non-garbage objects. If there is not a lot of garbage because you are being miserly with the heap space, the GC does a lot of work for little return. See https://stackoverflow.com/a/2414621/139985 for an analysis.
It's more complicated, but
1) what if there is memory pressure before the scope is over? Scope is only a language notion, not related to reachability. So an object can be "freed" before it goes out of scope ( java GCs do that on regular basis). Also, if you free objects after each scope is done, you might be doing too little work too often
2) As far as the references go, you are not considering that the reference might have hierarchies and when you change one, there has to be code that traverses those. It might not be the right time to do it when that happens.
In general, there is nothing wrong with such a proposal that you describer, as a matter of fact this is almost exactly how Rust programming language works, from a high level point of view.

Is it possible to skip an object from collecting by v8 GC?

I have a lot of a long live objects in memory(~10GB) and I definitely know that these objects never be collected by GC. The problem is that a mark-sweep gc action take a long time(90sec) to check all objects in memory and its relations. I need some way to skip my objects from collecting.
I've tried to use Persistent::MarkIndependent, but it doesn't work for me.
If the objects in question are references held live via handles from C++ than they will not be collected. However, the collector still has to traverse them, because it has to find all references to other objects that they contain. If it didn't do that then you would potentially get dangling pointers and make the VM crash.
So, no, at least the way you describe it, that can't be done. (If, on the other hand, these objects cannot contain random pointers because e.g. they are array buffers or strings then the GC knows that it doesn't need to traverse them so there shouldn't be a performance issue.)

How to implement a garbage collector?

Could anyone point me to a good source on how to implement garbage collection? I am making a lisp-like interpreted language. It currently uses reference counting, but of course that fails at freeing circularly dependent objects.
I've been reading of mark and sweep, tricolor marking, moving and nonmoving, incremental and stop-the-world, but... I don't know what the best way to keep the objects neatly separated into sets while keeping per-object memory overhead at a minimum, or how to do things incrementally.
I've read some languages with reference counting use circular reference detection, which I could use. I am aware I could use freely available collectors like Boehm, but I would like to learn how to do it myself.
I would appreciate any online material with some sort of tutorial or help for people with no experience on the topic like myself.
Could anyone point me to a good source on how to implement garbage collection?
There's a lot of advanced material about garbage collection out there. The Garbage Collection Handbook is great. But I found there was precious little basic introductory information so I wrote some articles about it. Prototyping a mark-sweep garbage collector describes a minimal mark-sweep GC written in F#. The Very Concurrent Garbage Collector describes a more advanced concurrent collector. HLVM is a virtual machine I wrote that includes a stop-the-world collector that handles threading.
The simplest way to implement a garbage collector is:
Make sure you can collate the global roots. These are the local and global variables that contain references into the heap. For local variables, push them on to a shadow stack for the duration of their scope.
Make sure you can traverse the heap, e.g. every value in the heap is an object that implements a Visit method that returns all of the references from that object.
Keep the set of all allocated values.
Allocate by calling malloc and inserting the pointer into the set of all allocated values.
When the total size of all allocated values exceeds a quota, kick off the mark and then sweep phases. This recursively traverses the heap accumulating the set of all reachable values.
The set difference of the allocated values minus the reachable values is the set of unreachable values. Iterate over them calling free and removing them from the set of allocated values.
Set the quota to twice the total size of all allocated values.
Check out the following page. It has many links. http://lua-users.org/wiki/GarbageCollection
As suggested by delnan, I started with a very naïve stop-the-world tri-color mark and sweep algorithm. I managed to keep the objects in the sets by making them linked-list nodes, but it does add a lot of data to each object (the virtual pointer, two pointers to nodes, one enum to hold the color). It works perfectly, no memory lost on valgrind :) From here I might try to add a free list for recycling, or some sort of thing that detects when it is convenient to stop the world, or an incremental approach, or a special allocator to avoid fragmentation, or something else. If you can point me where to find info or advice (I don't know whether you can comment on an answered question) on how to do these things or what to do, I'd be very thankful. I'll be checking Lua's GC in the meantime.
I have implemented a Cheney-style copying garbage collector in C in about 400 SLOC. I did it for a statically-typed language and, to my surprise, the harder part was actually communicating the information which things are pointers and which things aren't. In a dynamically typed language this is probably easier since you must already use some form of tagging scheme.
There also is a new version of the standard book on garbage collection coming out: "The Garbage Collection Handbook: The Art of Automatic Memory Management" by Jones, Hosking, Moss. (The Amazon UK site says 19 Aug 2011.)
One thing I haven't yet seen mentioned is the use of memory handles. One may avoid the need to double-up on memory (as would be needed with the Cheney-style copying algorithm) if each object reference is a pointer to a structure which contains the real address of the object in question. Using handles for memory objects will make certain routines a little slower (one must reread the memory address of an object any time something might have happened that would move it) but for single-threaded systems where garbage collection will only happen at predictable times, this isn't too much of a problem and doesn't require special compiler support (multi-threaded GC systems will are likely to require compiler-generated metadata whether they use handles or direct pointers).
If one uses handles, and uses one linked list for live handles (the same storage can be used to hold a linked list for dead handles needing reallocation), one can, after marking the master record for each handle, proceed through the list of handles, in allocation order, and copy the block referred to by that handle to the beginning of the heap. Because handles will be copied in order, there will be no need to use a second heap area. Further, generations may be supported by keeping track of some top-of-heap pointers. When compactifying memory, start by just compactifying items added since the last GC. If that doesn't free up enough space, compactify items added since the last level 1 GC. If that doesn't free up enough space, compactify everything. The marking phase would probably have to act upon objects of all generations, but the expensive compactifying stage would not.
Actually, using a handle-based approach, if one is marking things of all generations, one could if desired compute on each GC pass the amount of space that could be freed in each generation. If half the objects in Gen2 are dead, it may be worthwhile to do a Gen2 collection so as to reduce the frequency of Gen1 collections.
Garbage collection implementation in Lisp
Building LISP | http://www.lwh.jp/lisp/
Arcadia | https://github.com/kimtg/arcadia
Read Memory Management: Algorithms and Implementations in C/C++. It's a good place to start.
I'm doing similar work for my postscript interpreter. more info via my question. I agree with Delnan's comment that a simple mark-sweep algorithm is a good place to start. You'll need functions to set-mark, check-mark, clear-mark, and iterators for all your containers. One easy optimization is to clear-mark whenever allocating a new object, and clear-mark during the sweep; otherwise you'll need an entire pass to clear marks before you start setting them.

How can garbage collectors be faster than explicit memory deallocation?

I was reading this html generated, (may expire, Here is the original ps file.)
GC Myth 3: Garbage collectors are always slower than explicit memory deallocation.
GC Myth 4: Garbage collectors are always faster than explicit memory deallocation.
This was a big WTF for me. How would GC be faster then explicit memory deallocation? isnt it essentially calling a explicit memory deallocator when it frees the memory/make it for use again? so.... wtf.... what does it actually mean?
Very small objects & large sparse
heaps ==> GC is usually cheaper,
especially with threads
I still don't understand it. Its like saying C++ is faster then machine code (if you don't understand the wtf in this sentence please stop programming. Let the -1 begin). After a quick google one source suggested its faster when you have a lot of memory. What i am thinking is it means it doesn't bother will the free at all. Sure that can be fast and i have written a custom allocator that does that very thing, not free at all (void free(void*p){}) in ONE application that doesnt free any objects (it only frees at end when it terminates) and has the definition mostly in case of libs and something like stl. So... i am pretty sure this will be faster the GC as well. If i still want free-ing i guess i can use an allocator that uses a deque or its own implementation thats essentially
if (freeptr < someaddr) {
*freeptr=ptr;
++freeptr;
}
else
{
freestuff();
freeptr = freeptrroot;
}
which i am sure would be really fast. I sort of answered my question already. The case the GC collector is never called is the case it would be faster but... i am sure that is not what the document means as it mention two collectors in its test. i am sure the very same application would be slower if the GC collector is called even once no matter what GC used. If its known to never need free then an empty free body can be used like that one app i had.
Anyways, i post this question for further insight.
How would GC be faster then explicit memory deallocation?
GCs can pointer-bump allocate into a thread-local generation and then rely upon copying collection to handle the (relatively) uncommon case of evacuating the survivors. Traditional allocators like malloc often compete for global locks and search trees.
GCs can deallocate many dead blocks simultaneously by resetting the thread-local allocation buffer instead of calling free on each block in turn, i.e. O(1) instead of O(n).
By compacting old blocks so more of them fit into each cache line. The improved locality increases cache efficiency.
By taking advantage of extra static information such as immutable types.
By taking advantage of extra dynamic information such as the changing topology of the heap via the data recorded by the write barrier.
By making more efficient techniques tractable, e.g. by removing the headache of manual memory management from wait free algorithms.
By deferring deallocation to a more appropriate time or off-loading it to another core. (thanks to Andrew Hill for this idea!)
One approach to make GC faster then explicit deallocation is to deallocate implicitly :the heap is divided in partitions, and the VM switches between the partitions from time to time (when a partition gets too full for example). Live objects are copied to the new partition and all the dead objects are not deallocated - they are just left forgotten. So the deallocation itself ends up costing nothing. The additional benefit of this approach is that the heap defragmentation is a free bonus.Please note this is a very general description of the actual processes.
The trick is, that the underlying allocator for garbage collector can be much simpler than the explicit one and take some shortcuts that the explicit one can't.
If the collector is copying (java and .net and ocaml and haskell runtimes and many others actually use one), freeing is done in big blocks and allocating is just pointer increment and cost is payed per object surviving collection. So it's faster especially when there are many short-lived temporary objects, which is quite common in these languages.
Even for a non-copying collector (like the Boehm's one) the fact that objects are freed in batches saves a lot of work in combining the adjacent free chunks. So if the collection does not need to be run too often, it can easily be faster.
And, well, many standard library malloc/free implementations just suck. That's why there are projects like umem and libraries like glib have their own light-weight version.
A factor not yet mentioned is that when using manual memory allocation, even if object references are guaranteed not to form cycles, determining when the last entity to hold a reference has abandoned it can be expensive, typically requiring the use of reference counters, reference lists, or other means of tracking object usage. Such techniques aren't too bad on single-processor systems, where the cost of an atomic increment may be essentially the same as an ordinary one, but they scale very badly on multi-processor systems, where atomic-increment operations are comparatively expensive.

Why does Lua use a garbage collector instead of reference counting?

I've heard and experienced it myself: Lua's garbage collector can cause serious FPS drops in games as their scripted part grows.
This is as I found out related to the garbage collector, where for example every Vector() userdata object created temporarily lies around until getting garbage collected.
I know that Python uses reference counting, and that is why it doesn't need any huge, performance eating steps like Luas GC has to do.
Why doesn't Lua use reference counting to get rid of garbage?
Because reference counting garbage collectors can easily leak objects.
Trivial example: a doubly-linked list. Each node has a pointer to the next node - and is itself pointed to by the next one. If you just un-reference the list itself and expect it to be collected, you just leaked the entire list - none of the nodes have a reference count of zero, and hence they'll all keep each other alive. With a reference counting garbage collector, any time you have a cyclic object, you basically need to treat that as an unmanaged object and explicitly dispose of it yourself when you're finished.
Note that Python uses a proper garbage collector in addition to reference counting.
While others have explained why you need a garbage collector, keep in mind that you can configure the garbage collection cycles in Lua to either be smaller, less frequent, or on demand. If you have a lot of memory allocated and are busy drawing frames, then make the thresholds very large to avoid a collection cycle until there is a break in the game.
Lua 5.1 Manual on garbage collection
Reference Counting alone is not enough for a garbage collector to work correctly because it does not detect cycles. Even Python does not use reference counting alone.
Imagine that objects A and B each hold a reference to each other. Even once you, the programmer no longer hold a reference to either object, reference counting will still say that objects A and B have references pointing to them.
There are many different garbage collecting schemes out there and some will work better in some circumstances and some will work better in other circumstances. It is up to the language designers to try and choose a garbage collector that they think will work best for their language.
What version of Lua is being used in the games you are basing this claim on? When World of Warcraft switched from Lua 5.0 to 5.1, all the performance issues caused by garbage collection were severely diminished.
With Lua 5.0's garbage collection, the amount of time spent collecting garbage (and blocking anything else from happening at the same time) was proportional to the amount of memory currently in use, leading to lots of effort to minimize the memory usage of WoW addons.
With Lua 5.1's garbage collection, the collector changed to being incremental so it doesn't lock up the game while collecting garbage like it previously did. Now garbage collection has a very minimal impact on performance compared to the larger issue of horribly inefficient code in the majority of user created addons.
In general, reference counting isn't an exact substitute for garbage collection because of the potential of circular references. You might want to read this page on why garbage collection is preferred to reference counting.
You might also be interested in the Lua Gem about optimization which also has a part that handles garbage collection.
Take a look at some of the CPython sources. A good portion of the C code is Py_DECREF and Py_INCREF. That nasty, tedious and error-prone book keeping just goes away in Lua.
If required, there's nothing to stop you writing Lua modules in C that manage any heavy, private allocations manually.
It's a tradeoff. People have explained some reasons some languages (this really has nothing to do with Lua) use collectors, but haven't touched on the drawbacks.
Some languages, notably ObjC, use reference counting exclusively. The huge advantage of this is that deallocation is deterministic--as soon as you let go of the last reference, it's guaranteed that the object will be freed immediately. This is critical when you have memory constraints. With Lua's allocator, if memory constraints require predictable deallocation, you have to add methods to force the underlying storage to be freed immediately, which defeats the point of having garbage collection.
"WuHoUnited" is wrong in saying you can't do this--it works extremely well with ObjC on iOS, and with shared_ptr in C++. You just have to understand the environment you're in, to avoid cycles or break them when necessary.

Resources