card table and write barriers in .net GC - garbage-collection

Can anybody explain the concept of card table and write barriers in Garbage Collection process in .Net?
I really can't get the explanation of these terms i.e what are they,how are they useful and how do they paticipate in GC.
Any help would be really appreciated.

The card table is an array of bits, one bit for each chunk of 256 bytes of memory in the old generation. The bits are normally zero but when a field of an object in the old generation is written to, the bit corresponding to the objects memory address is set to one. That is called executing the write barrier.
The garbage collector in .NET is generational and has a phase which only traces and collects objects in the young generation. So it goes through the object graph starting with the roots but does not recurse into objects in the old generation. In that way, it only traces a small fraction of the whole object graph.
To find the roots to start tracing from, it scans the programs local and global variables for young generation objects. But it would miss objects only referenced from old generation objects. Therefore it also scans fields of objects in the old generation whose card table bit is set.
Then after the young generation collection is complete it resets all card table bits to zero.

Related

How actually is used card table structure during garbage-collector between multiple threads?

I actually have two questions 1) I have studied various articles and answers here about garbage collection and I can't understand the answer to the question: how is "card table" structure used during garbage collector between multiple threads? I think I'm missing something to understand it. 2) Is it right that this structure "card table" is used only in concurrent garbage collectors?
Card Table is a primitive implementation of a Remembered Set based on a bitmap. One bit in a Card Table corresponds to one or more words in a heap generation (or region).
The purpose of a remembered set is to track references from old generation to young generation - in order to update references in old gen when doing a young-only collection. So a remembered set, or a Card Table as its particular implementation, is inherent to generational/regional collectors, no matter concurrent or not.
Card Table is not specific to concurrent collectors and it has nothing to do with multithreading. Even the Serial GC uses the Card Table. I found the traces of gc/gen/cardtable.c in JDK 1.2 sources dated 1999, when there were no concurrent garbage collectors at all.

what is sequential store buffer structure in gc specifically?

I have read garbage collection book, it mentioned a sort of data structure ,sequential store buffer , could anyone help me to explain how it works? or the principle? or where i can find the thesis about it ?
For generational collectors, different regions of the heap get collected at different times (minor for young gen., major for old gen.). To ensure consistency of collection a remembered set is typically used that records links from objects in the old generation to the young generation.
There are different ways of recording the remembered set, as described in the GC book you mention. A common way is the use of a card table, which is how the G1 collector does it.
An alternative is the sequential store buffer. This is an area of memory that is treated roughly like a stack, i.e. there is a pointer to where the next piece of data can be stored. Once the data is saved the pointer is bumped by the size of the data. This is very efficient (and is also the way space is allocated in the young generation). For a GC algorithm that uses a write-barrier (most) this is a good way of reducing the load created by the write-barrier. It is also very efficient on pipelined architectures with branch prediction.

How object allocated in the heap measuring time?

When I read the book 《The Garbage Collection HandBook》, the chapter 9
impile that:"object lifetimes are better measured by the number of bytes of heap space allocated between their birth and death.". I am not very understand this sentence. why lifetime can be measured by the allocated bytes? I try to google for that, but I get no answer.
Who can explain that to me? Thanks!
By measuring object lifetimes in terms of bytes allocated between instantiation and death, it is easier for the GC algorithm to adapt to program behaviour.
If the rate of object allocation is very slow, a simple time measurement would show long pauses between collections, which would appear to be good. However, if the byte allocation measurement of object lifetimes is high objects may be getting promoted to a survivor space or the old generation too quickly. By measuring the byte allocation the collector could optimise heap sizes more efficiently by expanding the young generation to increase the number of objects that become garbage before a minor collection occurs. Just using time as this measure would not make the need for the heap resizing obvious.
As the book points out, with multi-threaded applications it is hard to measure byte allocation for individual threads so collectors tend to measure lifetimes in terms of how many collections an object survives. This is a simpler number to monitor and requires less space to record.
“time” is only a scale that allows to bring an order to events. There are many possible units, even in the real world. Inside the computer, for the purpose of garbage collection, there is no real world’s time unit needed, all the garbage collector usually wants to know, is, which object is older than the other.
For this purpose, just assigning an ascending number to each allocated object would be sufficient, but this would imply maintaining an additional counter. In contrast, the number of allocated bytes comes for free. It’s important that we accumulate the allocated bytes only, never subtracting deallocated bytes, so we have an always growing number.
In a generational memory management, this number doesn’t need to be updated on every allocation, as objects are allocated continuously in a dedicated space, so their addresses represent their relative age within this memory region whereas the start of the region is associated with the last garbage collection. Only when the garbage collector runs and moves the surviving objects, it has to merge this information into an absolute age, if needed.
Implementations like the HotSpot JVM simplify this further. For surviving objects, it maintains a small counter holding the number of garbage collection cycles it survived. After having survived a configurable number of collection cycles, it gets promoted to the old generation and beyond that point, the object’s age becomes irrelevant.

Stop and copy garbage collector in two phases

When implementing a stop and copy garbage collector as a pair, I need two memory banks (the old one and a free new one). One memory bank consists of the-cars and the-cdrs. So basicly when I make a new addres, it is a pointer to the-cars and the-cdrs.
When allocating new memory and I see that I don't have enough space, I start a GC. What this one does is:
switch the memory banks
move: read car and cdr from the old bank, copy to the new bank and put a pointer in the old bank to the new for later.
scan: loops over the memory and copies everything from old to new.
Now the question is: Why do I need to scan first and move after. Why can't I do both together?
It sounds like you are going through the really awesome garbage collection assignment where you implement your own collectors (mark and sweep, stop and copy, generational).
General answer to your question: two-pass algorithms typically use less memory than one-pass algorithms, by trading time for space.
More specific answer: in a stop-and-copy collector, you do it in two passes by (1) first copying the live data over to the new semispace, and (2) adjusting internal references in the live data to refer to elements in the new semispace, mapping old memory to new memory.
You must realize that the information necessary to do the mapping isn't magically available: you need memory to keep track how to redirect the moved memory. And remember: your collector itself is a program, and it must use a bounded, small amount of memory! Keeping a hash table in your collector to do the bookkeeping, for example, would be verboten: it's not playing by the rules. So one thing you need to keep track of is making sure the collector is playing with a limited amount of memory. So that explains why a stop-and-copy collector will reuse the old semispace and write those redirect records there.
With that constraint in mind: it's important to realize that we need to be careful of how we're traversing the live set. Which approach we choose may or may not require additional memory, in some very subtle and surprising ways. In particular, recursion in the general case is not free! Technically, in the first pass we should be using the new semispace not only as the target of our copying, but as a funky representation of the control stack that we use to implement the recursive process that walks through the live data.
Concretely, if we're doing a one-pass approach like this to copy the live set:
;; copy-live-set: number -> void
;; copies the live set starting from memory-location.
Pseudocode:
to copy-live-set starting at memory-location:
copy the block at memory-location over to the new semispace, and
record a redirection record in the old semispace
for each internal-reference in the block:
recursively call copy-live-set at the internal-reference if
it hasn't been copied already
remap the internal-reference to that new memory location
then you may be surprised to know that we've messed up with memory. The above will break the promise that the collector is making to the runtime environment because the recursion here is not iterative! It will consume control stack space. During the live set traversal, it will consume control stack space proportional to the depth of the structures we're walking across. Ooops.
If you try an alternative approach for walking through the live set, you should eventually see that there's a good way to traverse the whole live set while still guaranteeing bounded, small control stack usage. Hint: consider how graph traversal algorithms can be written as a simple while loop, with an explicit container that holds what to visit next till we exhaust the container. If you squint just right, the intermediate values in the new semispace look awfully like that container.
Once you discover how to traverse the live set in constant control stack space, you'll see that you'll need those two passes to do the complete copy-and-rewrite-internal-references thing. Worrying about these details is messy, but it's important in seeing how garbage collectors actually work. A real collector needs to do something like this, to be concerned about control stack usage, to ensure it uses bounded memory during the collection.
Summary: a two-pass algorithm is a solution that helps us with memory at the cost of some time. But we don't pay much in terms of performance: though we pass through the live set twice, the process is still linear in the size of the live set.
History: see Cheney's Algorithm, and note the title of the seminal paper's emphasis: "A Nonrecursive List Compacting Algorithm". That single highlighted word "Nonrecursive" is the key to what motivates the two-pass approach: it's trying to avoid consuming the control stack. Cheney's paper is an extension of the paper by Fenichel and Yochelson "A LISP Garbage-Collector for Virtual-Memory Computer Systems", in which the authors there proposed basically the recursive, stack-using approach first. To improve the situation, Fenichel and Yochelson then proposed using the non-recursive (but complicated!) Schorr-Waite DFS algorithm to do the traversal. Cheney's approach is an improvement because the traversal is simpler.

Does the GHC garbage collector have any special optimisations for large objects?

Does the GHC garbage collector handle "large" objects specially? Or does it treat them exactly the same as any other object?
Some GC engines put large objects in a separate area, which gets scanned less regularly and possibly has a different collection algorithm (e.g., compacting instead of copying, or maybe even using freelists rather than attempting to defragment). Does GHC do anything like this?
Yes. The GHC heap is not kept in one contiguous stretch of memory; rather, it is organized into blocks.
When an allocated object’s size is above a specific threshold (block_size*8/10, where block_size is 4k, so roughly 3.2k), the block holding the object is marked as large (BF_LARGE). Now, when garbage collection occurs, rather than copy large objects from this block to a new one, the block itself is added to the new generation's set of blocks; this involves fiddling with a linked list (a large object list, to be precise).
Since this means that it may take a while for us to reclaim dead space inside a large block, it does mean that large objects can suffer from fragmentation, as seen in bug 7831. However, this doesn't usually occur until individual allocations hit half of the megablock size, 1M.

Resources