How object allocated in the heap measuring time? - object

When I read the book 《The Garbage Collection HandBook》, the chapter 9
impile that:"object lifetimes are better measured by the number of bytes of heap space allocated between their birth and death.". I am not very understand this sentence. why lifetime can be measured by the allocated bytes? I try to google for that, but I get no answer.
Who can explain that to me? Thanks!

By measuring object lifetimes in terms of bytes allocated between instantiation and death, it is easier for the GC algorithm to adapt to program behaviour.
If the rate of object allocation is very slow, a simple time measurement would show long pauses between collections, which would appear to be good. However, if the byte allocation measurement of object lifetimes is high objects may be getting promoted to a survivor space or the old generation too quickly. By measuring the byte allocation the collector could optimise heap sizes more efficiently by expanding the young generation to increase the number of objects that become garbage before a minor collection occurs. Just using time as this measure would not make the need for the heap resizing obvious.
As the book points out, with multi-threaded applications it is hard to measure byte allocation for individual threads so collectors tend to measure lifetimes in terms of how many collections an object survives. This is a simpler number to monitor and requires less space to record.

“time” is only a scale that allows to bring an order to events. There are many possible units, even in the real world. Inside the computer, for the purpose of garbage collection, there is no real world’s time unit needed, all the garbage collector usually wants to know, is, which object is older than the other.
For this purpose, just assigning an ascending number to each allocated object would be sufficient, but this would imply maintaining an additional counter. In contrast, the number of allocated bytes comes for free. It’s important that we accumulate the allocated bytes only, never subtracting deallocated bytes, so we have an always growing number.
In a generational memory management, this number doesn’t need to be updated on every allocation, as objects are allocated continuously in a dedicated space, so their addresses represent their relative age within this memory region whereas the start of the region is associated with the last garbage collection. Only when the garbage collector runs and moves the surviving objects, it has to merge this information into an absolute age, if needed.
Implementations like the HotSpot JVM simplify this further. For surviving objects, it maintains a small counter holding the number of garbage collection cycles it survived. After having survived a configurable number of collection cycles, it gets promoted to the old generation and beyond that point, the object’s age becomes irrelevant.

Related

what is sequential store buffer structure in gc specifically?

I have read garbage collection book, it mentioned a sort of data structure ,sequential store buffer , could anyone help me to explain how it works? or the principle? or where i can find the thesis about it ?
For generational collectors, different regions of the heap get collected at different times (minor for young gen., major for old gen.). To ensure consistency of collection a remembered set is typically used that records links from objects in the old generation to the young generation.
There are different ways of recording the remembered set, as described in the GC book you mention. A common way is the use of a card table, which is how the G1 collector does it.
An alternative is the sequential store buffer. This is an area of memory that is treated roughly like a stack, i.e. there is a pointer to where the next piece of data can be stored. Once the data is saved the pointer is bumped by the size of the data. This is very efficient (and is also the way space is allocated in the young generation). For a GC algorithm that uses a write-barrier (most) this is a good way of reducing the load created by the write-barrier. It is also very efficient on pipelined architectures with branch prediction.

card table and write barriers in .net GC

Can anybody explain the concept of card table and write barriers in Garbage Collection process in .Net?
I really can't get the explanation of these terms i.e what are they,how are they useful and how do they paticipate in GC.
Any help would be really appreciated.
The card table is an array of bits, one bit for each chunk of 256 bytes of memory in the old generation. The bits are normally zero but when a field of an object in the old generation is written to, the bit corresponding to the objects memory address is set to one. That is called executing the write barrier.
The garbage collector in .NET is generational and has a phase which only traces and collects objects in the young generation. So it goes through the object graph starting with the roots but does not recurse into objects in the old generation. In that way, it only traces a small fraction of the whole object graph.
To find the roots to start tracing from, it scans the programs local and global variables for young generation objects. But it would miss objects only referenced from old generation objects. Therefore it also scans fields of objects in the old generation whose card table bit is set.
Then after the young generation collection is complete it resets all card table bits to zero.

Does the GHC garbage collector have any special optimisations for large objects?

Does the GHC garbage collector handle "large" objects specially? Or does it treat them exactly the same as any other object?
Some GC engines put large objects in a separate area, which gets scanned less regularly and possibly has a different collection algorithm (e.g., compacting instead of copying, or maybe even using freelists rather than attempting to defragment). Does GHC do anything like this?
Yes. The GHC heap is not kept in one contiguous stretch of memory; rather, it is organized into blocks.
When an allocated object’s size is above a specific threshold (block_size*8/10, where block_size is 4k, so roughly 3.2k), the block holding the object is marked as large (BF_LARGE). Now, when garbage collection occurs, rather than copy large objects from this block to a new one, the block itself is added to the new generation's set of blocks; this involves fiddling with a linked list (a large object list, to be precise).
Since this means that it may take a while for us to reclaim dead space inside a large block, it does mean that large objects can suffer from fragmentation, as seen in bug 7831. However, this doesn't usually occur until individual allocations hit half of the megablock size, 1M.

Some points about implementing a garbage collector

I'm trying to implement a simple programming language. I want to make the user of it not have to manage the memory, so I decided to implement a garbage collector. The simplest way I can think of after checking out some material is like this:
There are two kinds of heap zones. The first is for storing big objects(bigger than 85,000 bytes), the other is for small objects. In the following I use BZ for the first, SZ for the second.
The BZ uses the mark and sweep algorithm, because moving a big object is expensive. I don't compact, so there will be fragmentation.
The SZ uses generations with mark-compact. There are three generations: 0, 1, and 2. Allocation requests go directly to generation 0, and when generation 0 is full, I will do garbage collection on it, the survivals will be promoted to generation 1. generation 1 and generation 2 will also do garbage collection when full.
When the virtual machine starts, it will allocate a big memory from the OS to be used as a heap zone in the virtual machine The BZ and every generation in SZ will occupy a fixed portion of memory, and when an allocation request can't be satisfied, the virtual machine will give an error OTM (out of memory). This has a problem: when the virtual machine starts, even getting the program to run on it should need only a little memory, but it still uses a lot. A better way will be for the virtual machine get a small amount of memory from the OS, and then when the program needs more memory the virtual machine will get more from the OS. I am going to allocate a larger memory for the generation 2 in SZ, and then copy all the things in generation 2 to the new memory zone. And do the same thing for the BZ.
The other problem occurs when the BZ is full and SZ is empty, I would be silly not be able to satisfy a big object allocation request even though we in fact have enough free heap size for the big object in SZ. How to deal with this problem?
I am trying to understand your methodology . Since you didnt mention your strategy completely, i am having some assumption.
NOTE : Following is my hypothetical analysis and may not be practically possible .So please skip answer if you don't have time .
Your are trying to use Generational GC with changes ; There are classifications of 2 types
(1) big size objects BZ and
(2)small size objects SZ .
SZ perform generational GC with compaction( CMS )
From above understanding we know that the SZG2 has long lived objects .I am expecting that GC in szG2 is not as frequent as SZG1 or SZG0 since long lived object generally tend to live longer so less dead collection and size of SZG2 will be more as time lapses, so GC'ing takes lot of time traversing all elements ,so doing frequent GC on SZG2 is less productive(long GC spike ,so notable delay for user) compared to that of SZG1 or SZG0 .
And similarly for BZ there might be large memory requirement (as big object occupy more space) . So in order to address you query
"The other problem occurs when the BZ is full and SZ is empty, I would be silly not be able to satisfy a big object allocation request even though we in fact have enough free heap size for the big object in SZ. How to deal with this problem?"
Since you said that " when the program needs more memory the virtual machine will get more from the OS"
I have a small idea, may not be productive or may not be possible to implement and completely dependent on your implementation of GCHeap structure .
Let your virtual machine allocate memory like follows
Coming to the possibility (i borrowed idea from "memory segments of program" as shown below) below is possible at low level .
As shown in above figure a GCHeap structure has to be defined in such a way that SZG0 and BZ expand towards each other .For implementing GCHeap structure mentioned in figure a, figure b we need to have proper convention of memory growth in zones SZG[0-2] size and BZ .
So if you want to divide your heap for application into multiple heaps then you can pile figure A over figure B to decrease fragmentation (when i say fragmentation it means "when the BZ is full and SZ is empty, I would be silly not be able to satisfy a big object allocation request even though we in fact have enough free heap size for the big object in SZ." ).
So effective structure will be
B
|
B
|
B
|
B
|
A
Now its completely depends on heuristics to decide whether to consider GCHeap data structure in multiple GCHeap structures like GCHeapA , GCHeapB or take it as single heap based on requirement .
If you don't want to have multiple heaps then you can use figure A with small correction by Setting base address of **SZG2** to top of heap
The key reason behind Figure a is as follows :
we know that SZG0 get GC'ed frequently so it can have more free space compared to SZG1 and SZG2 ,since dead objects are removed and survived object gets moved to SZG1 and SZG2 .So if allocation of BZ is more it can grow towards SZG0 .
In the figure a base address of SZG1 and SZG2 are contigious because SZG2 is more prone to out of memory error as old generation object tend to live longer and GC'ing doesnt sweep much (NOTE : it is just my assumption and opinion ) so SZG2 is kept bound outwards .

How much extra memory does garbage collection require?

I heard once that for a language to implement and run garbage collection correctly there is on average of 3x more memory required. I am not sure if this is assuming the application is small, large or either.
So i wanted to know if theres any research or actually numbers of garbage collection overhead. Also i want to say GC is a very nice feature.
The amount of memory headroom you need depends on the allocation rate within your program. If you have a high allocation rate, you need more room for growth while the GC works.
The other factor is object lifetime. If your objects typically have a very short lifetime, then you may be able to manage with slightly less headroom with a generational collector.
There are plenty of research papers that may interest you. I'll edit a bit later to reference some.
Edit (January 2011):
I was thinking of a specific paper that I can't seem to find right now. The ones below are interesting and contain some relevant performance data. As a rule of thumb, you are usually ok with about twice as much memory available as your program residency. Some programs need more, but other programs will perform very well even in constrained environments. There are lots of variables that influence this, but allocation rate is the most important one.
Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance
Myths and realities: the performance impact of garbage collection
Edit (February 2013): This edit adds a balanced perspective on a paper cited, and also addresses objections raised by Tim Cooper.
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management, as noted by Natan Yellin, is actually the reference I was first trying to remember back in January 2011. However, I don't think the interpretation Natan has offered is correct. That study does not compare GC against conventional manual memory management. Rather it compares GC against an oracle which does perfect explicit releases. In otherwords, it leaves us not know how well conventional manual memory management compares to the magic oracle. It is also very hard to find this out because the source programs are either written with GC in mind, or with manual memory management in mind. So any benchmark retains in inherent bias.
Following Tim Cooper's objections, I'd like to clarify my position on the topic of memory headroom. I do this mainly for posterity, as I believe Stack Overflow answers should serve as a long-term resource for many people.
There are many memory regions in a typical GC system, but three abstract kinds are:
Allocated space (contains live, dead, and untraced objects)
Reserved space (from which new objects are allocated)
Working region (long-term and short-term GC data structures)
What is headroom anyway? Headroom is the minimum amount of reserved space needed to maintain a desired level of performance. I believe that is what the OP was asking about. You can also think of the headroom as memory additional to the actual program residency (maximum live memory) neccessary for good performance.
Yes -- increasing the headroom can delay garbage collection and increase throughput. That is important for offline non-critical operations.
In reality most problem domains require a realtime solution. There are two kinds of realtime, and they are very different:
hard-realtime concerns worst case delay (for mission critical systems) -- a late response from the allocator is an error.
soft-realtime concerns either average or median delay -- a late response from the allocator is ok, but shouldn't happen often.
Most state of the art garbage collectors aim for soft-realtime, which is good for desktop applications as well as for servers that deliver services on demand. If one eliminates realtime as a requirement, one might as well use a stop-the-world garbage collector in which headroom begins to lose meaning. (Note: applications with predominantly short-lived objects and a high allocation rate may be an exception, because the survival rate is low.)
Now suppose that we are writing an application that has soft-realtime requirements. For simplicity let's suppose that the GC runs concurrently on a dedicated processor. Suppose the program has the following artificial properties:
mean residency: 1000 KB
reserved headroom: 100 KB
GC cycle duration: 1000 ms
And:
allocation rate A: 100 KB/s
allocation rate B: 200 KB/s
Now we might see the following timeline of events with allocation rate A:
T+0000 ms: GC cycle starts, 100 KB available for allocations, 1000 KB already allocation
T+1000 ms:
0 KB free in reserved space, 1100 KB allocated
GC cycle ends, 100 KB released
100 KB free in reserve, 1000 KB allocated
T+2000 ms: same as above
The timeline of events with allocation rate B is different:
T+0000 ms: GC cycle starts, 100 KB available for allocations, 1000 KB already allocation
T+0500 ms:
0 KB free in reserved space, 1100 KB allocated
either
delay until end of GC cycle (bad, but sometimes mandatory), or
increase reserved size to 200 KB, with 100 KB free (assumed here)
T+1000 ms:
0 KB free in reserved space, 1200 KB allocated
GC cycle ends, 200 KB released
200 KB free in reserve, 1000 KB allocated
T+2000 ms:
0 KB free in reserved space, 1200 KB allocated
GC cycle ends, 200 KB released
200 KB free in reserve, 1000 KB allocated
Notice how the allocation rate directly impacts the size of the headroom required? With allocation rate B, we require twice the headroom to prevent pauses and maintain the same level of performance.
This was a very simplified example designed to illustrate only one idea. There are plenty of other factors, but it does show what was intended. Keep in mind the other major factor I mentioned: average object lifetime. Short lifetimes cause low survival rates, which work together with the allocation rate to influence the amount of memory required to maintain a given level of performance.
In short, one cannot make general claims about the headroom required without knowing and understanding the characteristics of the application.
According to the 2005 study Quantifying the Performance of Garbage Collection vs. Explicit Memory Management (PDF), generational garbage collectors need 5 times the memory to achieve equal performance. The emphasis below is mine:
We compare explicit memory management to both copying and non-copying garbage collectors across a range of benchmarks, and include real (non-simulated) runs that validate our results. These results quantify the time-space tradeoff of garbage collection: with five times as much memory, an Appel-style generational garbage collector with a non-copying mature space matches the performance of explicit memory management. With only three times as much memory, it runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection
degrades performance by nearly 70%. When physical memory is scarce, paging causes garbage collection to run an order of magnitude slower than explicit memory management.
I hope the original author clearly marked what they regard as correct usage of garbage collection and the context of their claim.
The overhead certainly depends on many factors; e.g., the overhead is larger if you run your garbage collector less frequently; a copying garbage collector has a higher overhead than a mark and sweep collector; and it is much easier to write a garbage collector with lower overhead in a single-threaded application than in the multi-threaded world, especially for anything that moves objects around (copying and/or compacting gc).
So i wanted to know if theres any research or actually numbers of garbage collection overhead.
Almost 10 years ago I studied two equivalent programs I had written in C++ using the STL (GCC on Linux) and in OCaml using its garbage collector. I found that the C++ used 2x more memory on average. I tried to improve it by writing custom STL allocators but was never able to match the memory footprint of the OCaml.
Furthermore, GCs typically do a lot of compaction which further reduces the memory footprint. So I would challenge the assumption that there is a memory overhead compared to typical unmanaged code (e.g. C++ using what are now the standard library collections).

Resources