When is garbage collector used in java? - garbage-collection

As far as I know GC is used only when JVM needs more memory, but I'm not sure about it. So, please, someone suggest an answer to this question.

The garbage collection algorithm of Java is as I understand, pretty complex and not as straightforward. Also, there is more than algorithm available for GC, which can be chosen at VM launchtime with an argument passed to the JVM.
There's a FAQ about garbage collection here: http://www.oracle.com/technetwork/java/faq-140837.html
Oracle has also published an article "Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine" which contains deep insights into garbage collection, and will probably help you understand the matter better: http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html

The JVM and java specs don't say anything about when garbage collection occurs, so its entirely up to the JVM implementors what policies they wish to use. Each JVM should probably have some documention about how it handles GC.
In general though, most JVMs will trigger GC when a memory allocation pushes the total amount of allocated memory above some threshold. There may be mulitple levels of gc (full vs partial/generational) that occur at different thresholds.

Garbage Collection is deliberately vaguely described in the Java Language Specification, to give the JVM implementors best working conditions to provide good garbage collectors.
Hence garbage collectors and their behaviour are extremely vendor dependent.
The simplest but least fun is the one that stops the whole program when needing to clean. Others are more sophisticated and run quietly along your program cleaning up as you go more or less aggressively.
The most fun way to investigate garbage collection is to run jvisualvm in the Sun 6 JDK. It allows you to see many, many internal things many relevant to garbage collection.
https://visualvm.dev.java.net/monitor_tab.html (but the newest version has plenty more)

In the older days garbage collector were empirical in nature. At some set interval or based on certain condition they would kick in and examine each of the object.
Modern days collectors are smarter in the sense that they differentiate based on the fact that objects are different lifespan. The objects are differentiated between young generation objects and tenured generation objects.
Memory is managed in pools according to generation. Whenever the young generation memory pool is filled, a minor collection happens. The surviving objects are moved to tenured generation memory pool. When the tenured generation memory pool gets filled a major collection happens. A third generation is kept which is known as permanent generation and may contain objects defining classes and methods.

Related

Why is garbage collection necessary?

Suppose that an object on the heap goes out of scope. Why can't the program free the memory right after the scope ends? Or, if we have a pointer to an object that is replaced by the address to a new object, why can't the program deallocate the old one before assigning the new one? I'm guessing that it's faster not to free it immediately and instead have the freeing be done asynchronously at a later point in time, but I'm not really sure.
Why is garbage collection necessary?
It is not strictly necessary. Given enough time and effort you can always translate a program that depends on garbage collection to one that doesn't.
In general, garbage collection involves a trade-off.
On the one hand, garbage collection allows you to write an application without worrying about the details of memory allocation and deallocation. (And the pain of debugging crashes and memory leaks caused by getting the deallocation logic wrong.)
The downside of garbage collection is that you need more memory. A typical garbage collector is not efficient if it doesn't have plenty of spare space1.
By contrast, if you do manual memory management, you can code your application to free up heap objects as soon as they are no longer used. Furthermore, you don't get awkward "pauses" while the GC is doing its thing.
The downside of manual memory management is that you have to write the code that decides when to call free, and you have to get it correct. Furthermore, if you try to manage memory by reference counting:
you have the cost of incrementing and decrementing ref counts whenever pointers are assign or variables go out of scope,
you have to deal with cycles in your data structures, and
it is worse when your application is multi-threaded and you have to deal with memory caches, synchronization, etc.
For what it is worth, if you use a decent garbage collector and tune it appropriately (e.g. give it enough memory, etc) then the CPU costs of GC and manual storage management are comparable when you apply them to a large application.
Reference:
"The measured cost of conservative garbage collection" by Benjamin Zorn
1 - This is because the main cost of a modern collector is in traversing and dealing with the non-garbage objects. If there is not a lot of garbage because you are being miserly with the heap space, the GC does a lot of work for little return. See https://stackoverflow.com/a/2414621/139985 for an analysis.
It's more complicated, but
1) what if there is memory pressure before the scope is over? Scope is only a language notion, not related to reachability. So an object can be "freed" before it goes out of scope ( java GCs do that on regular basis). Also, if you free objects after each scope is done, you might be doing too little work too often
2) As far as the references go, you are not considering that the reference might have hierarchies and when you change one, there has to be code that traverses those. It might not be the right time to do it when that happens.
In general, there is nothing wrong with such a proposal that you describer, as a matter of fact this is almost exactly how Rust programming language works, from a high level point of view.

How to implement a garbage collector?

Could anyone point me to a good source on how to implement garbage collection? I am making a lisp-like interpreted language. It currently uses reference counting, but of course that fails at freeing circularly dependent objects.
I've been reading of mark and sweep, tricolor marking, moving and nonmoving, incremental and stop-the-world, but... I don't know what the best way to keep the objects neatly separated into sets while keeping per-object memory overhead at a minimum, or how to do things incrementally.
I've read some languages with reference counting use circular reference detection, which I could use. I am aware I could use freely available collectors like Boehm, but I would like to learn how to do it myself.
I would appreciate any online material with some sort of tutorial or help for people with no experience on the topic like myself.
Could anyone point me to a good source on how to implement garbage collection?
There's a lot of advanced material about garbage collection out there. The Garbage Collection Handbook is great. But I found there was precious little basic introductory information so I wrote some articles about it. Prototyping a mark-sweep garbage collector describes a minimal mark-sweep GC written in F#. The Very Concurrent Garbage Collector describes a more advanced concurrent collector. HLVM is a virtual machine I wrote that includes a stop-the-world collector that handles threading.
The simplest way to implement a garbage collector is:
Make sure you can collate the global roots. These are the local and global variables that contain references into the heap. For local variables, push them on to a shadow stack for the duration of their scope.
Make sure you can traverse the heap, e.g. every value in the heap is an object that implements a Visit method that returns all of the references from that object.
Keep the set of all allocated values.
Allocate by calling malloc and inserting the pointer into the set of all allocated values.
When the total size of all allocated values exceeds a quota, kick off the mark and then sweep phases. This recursively traverses the heap accumulating the set of all reachable values.
The set difference of the allocated values minus the reachable values is the set of unreachable values. Iterate over them calling free and removing them from the set of allocated values.
Set the quota to twice the total size of all allocated values.
Check out the following page. It has many links. http://lua-users.org/wiki/GarbageCollection
As suggested by delnan, I started with a very naïve stop-the-world tri-color mark and sweep algorithm. I managed to keep the objects in the sets by making them linked-list nodes, but it does add a lot of data to each object (the virtual pointer, two pointers to nodes, one enum to hold the color). It works perfectly, no memory lost on valgrind :) From here I might try to add a free list for recycling, or some sort of thing that detects when it is convenient to stop the world, or an incremental approach, or a special allocator to avoid fragmentation, or something else. If you can point me where to find info or advice (I don't know whether you can comment on an answered question) on how to do these things or what to do, I'd be very thankful. I'll be checking Lua's GC in the meantime.
I have implemented a Cheney-style copying garbage collector in C in about 400 SLOC. I did it for a statically-typed language and, to my surprise, the harder part was actually communicating the information which things are pointers and which things aren't. In a dynamically typed language this is probably easier since you must already use some form of tagging scheme.
There also is a new version of the standard book on garbage collection coming out: "The Garbage Collection Handbook: The Art of Automatic Memory Management" by Jones, Hosking, Moss. (The Amazon UK site says 19 Aug 2011.)
One thing I haven't yet seen mentioned is the use of memory handles. One may avoid the need to double-up on memory (as would be needed with the Cheney-style copying algorithm) if each object reference is a pointer to a structure which contains the real address of the object in question. Using handles for memory objects will make certain routines a little slower (one must reread the memory address of an object any time something might have happened that would move it) but for single-threaded systems where garbage collection will only happen at predictable times, this isn't too much of a problem and doesn't require special compiler support (multi-threaded GC systems will are likely to require compiler-generated metadata whether they use handles or direct pointers).
If one uses handles, and uses one linked list for live handles (the same storage can be used to hold a linked list for dead handles needing reallocation), one can, after marking the master record for each handle, proceed through the list of handles, in allocation order, and copy the block referred to by that handle to the beginning of the heap. Because handles will be copied in order, there will be no need to use a second heap area. Further, generations may be supported by keeping track of some top-of-heap pointers. When compactifying memory, start by just compactifying items added since the last GC. If that doesn't free up enough space, compactify items added since the last level 1 GC. If that doesn't free up enough space, compactify everything. The marking phase would probably have to act upon objects of all generations, but the expensive compactifying stage would not.
Actually, using a handle-based approach, if one is marking things of all generations, one could if desired compute on each GC pass the amount of space that could be freed in each generation. If half the objects in Gen2 are dead, it may be worthwhile to do a Gen2 collection so as to reduce the frequency of Gen1 collections.
Garbage collection implementation in Lisp
Building LISP | http://www.lwh.jp/lisp/
Arcadia | https://github.com/kimtg/arcadia
Read Memory Management: Algorithms and Implementations in C/C++. It's a good place to start.
I'm doing similar work for my postscript interpreter. more info via my question. I agree with Delnan's comment that a simple mark-sweep algorithm is a good place to start. You'll need functions to set-mark, check-mark, clear-mark, and iterators for all your containers. One easy optimization is to clear-mark whenever allocating a new object, and clear-mark during the sweep; otherwise you'll need an entire pass to clear marks before you start setting them.

What's the Gambit-C's GC mechanism?

What's the Gambit-C's GC mechanism? I'm curious about this for making interactive app. I want to know whether it can avoid burst GC operation or not.
According to these threads:
https://mercure.iro.umontreal.ca/pipermail/gambit-list/2005-December/000521.html
https://mercure.iro.umontreal.ca/pipermail/gambit-list/2008-September/002645.html
Gambit has traditional stop-the-world GC at least until September 2008. People in thread recommended using pre-allocated object pooling to avoid GC operation itself. I couldn't find out about current implementation.
*It's hard to agree with the conversation. Because I can't pool object not written by myself and finally full-GC will happen at sometime by accumulated small/non-pooled temporary objects. But the method mentioned by #Gregory may help to avoid this problem. However, I wish incremental GC added to Gambit :)
According to http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Debugging#Garbage_collection_threshold gambit has some controls:
Garbage collection threshold
Pay attention to the runtime options h (maximum heapsize in kilobytes) and l (livepercent). See the reference manual for more information. Setting livepercent to five means that garbage collection will take place at the time that there are nineteen times more memory allocated for objects that should be garbage collected, than there is memory allocated for objects that should not. The reason the livepercent option is there, is to give a way to control how sparing/generous the garbage collector should be about memory consumption, vs. how heavy/light it should be in CPU load.
You can always force garbage collection by (##gc).
If you force garbage collection after some small number of operations, or schedule it near continuously, or set the livepercent to like 90 then presumably the gc will run frequently and not do very much on each run. This is likely to be more expensive overall, but avoid bursts of expense. You can then fairly easily budget for that expense to make the service fast despite.

How can garbage collectors be faster than explicit memory deallocation?

I was reading this html generated, (may expire, Here is the original ps file.)
GC Myth 3: Garbage collectors are always slower than explicit memory deallocation.
GC Myth 4: Garbage collectors are always faster than explicit memory deallocation.
This was a big WTF for me. How would GC be faster then explicit memory deallocation? isnt it essentially calling a explicit memory deallocator when it frees the memory/make it for use again? so.... wtf.... what does it actually mean?
Very small objects & large sparse
heaps ==> GC is usually cheaper,
especially with threads
I still don't understand it. Its like saying C++ is faster then machine code (if you don't understand the wtf in this sentence please stop programming. Let the -1 begin). After a quick google one source suggested its faster when you have a lot of memory. What i am thinking is it means it doesn't bother will the free at all. Sure that can be fast and i have written a custom allocator that does that very thing, not free at all (void free(void*p){}) in ONE application that doesnt free any objects (it only frees at end when it terminates) and has the definition mostly in case of libs and something like stl. So... i am pretty sure this will be faster the GC as well. If i still want free-ing i guess i can use an allocator that uses a deque or its own implementation thats essentially
if (freeptr < someaddr) {
*freeptr=ptr;
++freeptr;
}
else
{
freestuff();
freeptr = freeptrroot;
}
which i am sure would be really fast. I sort of answered my question already. The case the GC collector is never called is the case it would be faster but... i am sure that is not what the document means as it mention two collectors in its test. i am sure the very same application would be slower if the GC collector is called even once no matter what GC used. If its known to never need free then an empty free body can be used like that one app i had.
Anyways, i post this question for further insight.
How would GC be faster then explicit memory deallocation?
GCs can pointer-bump allocate into a thread-local generation and then rely upon copying collection to handle the (relatively) uncommon case of evacuating the survivors. Traditional allocators like malloc often compete for global locks and search trees.
GCs can deallocate many dead blocks simultaneously by resetting the thread-local allocation buffer instead of calling free on each block in turn, i.e. O(1) instead of O(n).
By compacting old blocks so more of them fit into each cache line. The improved locality increases cache efficiency.
By taking advantage of extra static information such as immutable types.
By taking advantage of extra dynamic information such as the changing topology of the heap via the data recorded by the write barrier.
By making more efficient techniques tractable, e.g. by removing the headache of manual memory management from wait free algorithms.
By deferring deallocation to a more appropriate time or off-loading it to another core. (thanks to Andrew Hill for this idea!)
One approach to make GC faster then explicit deallocation is to deallocate implicitly :the heap is divided in partitions, and the VM switches between the partitions from time to time (when a partition gets too full for example). Live objects are copied to the new partition and all the dead objects are not deallocated - they are just left forgotten. So the deallocation itself ends up costing nothing. The additional benefit of this approach is that the heap defragmentation is a free bonus.Please note this is a very general description of the actual processes.
The trick is, that the underlying allocator for garbage collector can be much simpler than the explicit one and take some shortcuts that the explicit one can't.
If the collector is copying (java and .net and ocaml and haskell runtimes and many others actually use one), freeing is done in big blocks and allocating is just pointer increment and cost is payed per object surviving collection. So it's faster especially when there are many short-lived temporary objects, which is quite common in these languages.
Even for a non-copying collector (like the Boehm's one) the fact that objects are freed in batches saves a lot of work in combining the adjacent free chunks. So if the collection does not need to be run too often, it can easily be faster.
And, well, many standard library malloc/free implementations just suck. That's why there are projects like umem and libraries like glib have their own light-weight version.
A factor not yet mentioned is that when using manual memory allocation, even if object references are guaranteed not to form cycles, determining when the last entity to hold a reference has abandoned it can be expensive, typically requiring the use of reference counters, reference lists, or other means of tracking object usage. Such techniques aren't too bad on single-processor systems, where the cost of an atomic increment may be essentially the same as an ordinary one, but they scale very badly on multi-processor systems, where atomic-increment operations are comparatively expensive.

Why Is Garbage Collection So Important? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I don't understand garbage collection so good, then I want to know, why it's so important to a language and to the developer?
Many other answers have stated that garbage collection can help to prevent memory leaks but, surprisingly, nobody seems to have mentioned the most important benefit that GC facilitates memory safety. This means that most garbage collected environments completely abstract away the notion of memory locations (i.e. raw pointers) and, consequently, eliminate a major class of bugs.
For example, a common mistake with manual memory management is to accidentally free a value slightly too early and continue to use the memory at that location after it has been freed. This can be extremely difficult to debug because freed memory might not be reallocated and, consequently, seemingly valid operations can be performed on the freed memory that can only fail sporadically with corruption or memory access violations or segmentation faults later in the program's execution, often with no direct link back to the offending code. This class of bugs simply do not exist in most garbage collected environments.
Garbage Collection is a part of many modern languages that attempts to abstract the disposal and reallocation of memory with less direction intervention by the developer.
When you hear talk of "safe" objects, this usually refers to something whose memory can be automatically reallocated by the Garbage Collector after an object falls out of scope, or is explicitly disposed.
While you can write the same program without a garbage collector to help manage memory usage, abstracting this away lets the developer think about more high level things and deliver value to the end user more quickly and efficiently without having to necessarily concentrate as much on lower level portions of the program.
In essence the developer can say
Give me a new object
..and some time later when the object is no longer being used (falls out of scope) the developer does not have to remember to say
throw this object away
Developers are lazy (a good virtue) and sometimes forget things. When working with GC properly, it's okay to forget to take out the trash, the GC won't let it pile up and start to smell.
Garbage Collection is a form of automatic memory management. It is a special case of resource management, in which the limited resource being managed is memory.
Benefits for the programmer is that garbage collection frees the programmer from manually dealing with memory allocation and deallocation.
The bottom line is that garbage collection helps to prevent memory leaks. In .NET, for example, when nothing references an object, the resources used by the object are flagged to be garbage collected. In unmanaged languages, like C and C++, it was up to the developer to take care of cleaning up.
It's important to note, however, that garbage collection isn't perfect. Check out this article on a problem that occurred because the developers weren't aware of a large memory leak.
In many older and less strict languages deallocating memory was hard-coded into programs by the programmer; this of course will cause problems if not done correctly as the second you reference memory that hasn't been deallocated your program will break. To combat this garbage collection was created, to automatically deallocate memory that was no longer being used. The benefits of such a system is easy to see; programs become far more reliable, deallocating memory is effectively removed from the design process, debugging and testing times are far shorter and more.
Of course, you don't get something for nothing. What you lose is performance, and sometimes you'll notice irregular behaviour within your programs, although nowadays with more modern languages this rarely is the case. This is the reason many typical applications are written in Java, it's quick and simple to write without the trauma of chasing memory leaks and it does the job, it's perfect for the world of business and the performance costs are little with the speed of computers today. Obviously some industries need to manage their own memory within their programs (the Games industry) for performance reasons, which is why nearly all major games are written in C++. A lecturer once told me that if every software house was in the same area, with a bar in the middle you'd be able to tell the game developers apart from the rest because they'd be the ones drinking heavily long into the night.
Garbage collection is one of the features required to allow the automatic management of memory allocation. This is what allows you to allocate various objects, maybe introduce other variables referencing or containing these in a fashion or other, and yet never worry about disposing of the object (when it is effectively not in use anymore).
The garbage collection, specifically takes care of "cleaning up" the heap(s) where all these objects are found, by removing unused objects an repacking the others together.
You probably hear a lot about it, because this is a critical function, which happens asynchronously with the program and which, if not handled efficiently can produce some random performance lagging in the program, etc. etc. Nowadays, however the algorithms related to the memory management at-large and the GC (garbage collection) in particular are quite efficient.
Another reason why the GC is sometimes mentioned is in relation to the destructor of some particular object. Since the application has no (or little) control over when particular objects are Garbage-Collected (hence destroyed), it may be an issue if an object waits till its destructor to dispose of some resource and such. That is why many objects implement a Dispose() method, which allow much of that clean-up (of the object itself) to be performed explicitly, rather than be postponed till the destructor is eventually called from the GC logic.
Automatic garbage collection, like java, reuses memory leaks or memory that is no longer being used, making your program efficient. In c++, you have to control the memory yourself, and if you lose access to a memory, then that memory can no longer be used, wasting space.
This is what I know so far from one year of computer science and using/learning java and c++.
Because someone can write code like
consume(produce())
without caring about cleanup. Just like in our current society.

Resources