I Know that Static references variable or instance variable not reclaimed by Garbage Collector but can class level non static variable reclaimed by garbage collection.
The idea behind garbage collection is that memory is references are monitored automatically.
A class instance serves as a reference holder of its member variables so the garbage collector should never collect the members before the instance.
Related
When searching for an answer, I found this question, however there is no mention of static lifetime objects. Can the method mentioned in this answer (calling drop() on the object) be used for static lifetime objects?
I was imagining a situation like a linked list. You need to keep nodes of the list around for (potentially) the entire lifetime of the program, however you also may remove items from the list. It seems wasteful to leave them in memory for the entire execution of the program.
Thanks!
No. The very point of a static is that it's static: It has a fixed address in memory and can't be moved from there. As a consequence, everybody is free to have a reference to that object, because it's guaranteed to be there as long as the program is executing. That's why you only get to use a static in the form of a &'static-reference and can never claim ownership.
Besides, doing this for the purpose of memory conservation is pointless: The object is baked into the executable and mapped to memory on access. All that could happen is for the OS to relinquish the memory mapping. Yet, since the memory is never allocated from the heap in the first place, there is no saving to be had.
The only thing you could do is to replace the object using unsafe mutable access. This is both dangerous (because the compiler is free to assume that the object does not in fact change) and pointless, due to the fact that the memory can't be freed, as it's part of the executable's memory mapping.
It seems like Box.clone() copies the heap memory. As I know, Box will get destructed after it gets out of its scope, as well as the memory area it is pointing to.
So I'd like to ask a way to create more than one Box object pointing to the same memory area.
By definition, you shall not.
Box is explicitly created with the assumption that it is the sole owner of the object inside.
When multiple owners are required, you can use instead Rc and Arc, those are reference-counted owners and the object will only be dropped when the last owner is destroyed.
Note, however, that they are not without downsides:
the contained object cannot be mutated without runtime checks; if mutation is needed this requires using Cell, RefCell or some Mutex for example,
it is possible to accidentally form cycles of objects, and since Rust has no Garbage Collector such cycles will be leaked.
What techniques do modern garbage collectors (as in CLR, JVM) use to tell which heap objects are referenced from the stack?
Specifically how can a VM work back from knowing where the stack starts to interpreting all local references to heap objects?
In Java (and likely in the CLR although I know its internals less well), the bytecode is typed with object vs primitive information. As a result, there are data structures in the bytecode that describe which variables in each stack frame are objects and which are primitives. When the GC needs to scan the root set, it uses these StackMapTables to differentiate between references and non-references.
CLR and Java have to have some mechanism like this because they are exact collectors. There are conservative collectors like the boehm collector that treat every offset on the stack as a possible pointer. They look to see if the value (when treated as a pointer) is an offset into the heap, and if so, they mark it as alive.
Take a look at this Artima article from August 1996, Java's Garbage-Collected Heap; especially page 2.
Any garbage collection algorithm must do two basic things. First, it must detect garbage objects. Second, it must reclaim the heap space used by the garbage objects and make it available to the program. Garbage detection is ordinarily accomplished by defining a set of roots and determining reachability from the roots. An object is reachable if there is some path of references from the roots by which the executing program can access the object. The roots are always accessible to the program. Any objects that are reachable from the roots are considered live. Objects that are not reachable are considered garbage, because they can no longer affect the future course of program execution.
In a JVM the root set is implementation dependent but would always include any object references in the local variables. In the JVM, all objects reside on the heap. The local variables reside on the Java stack, and each thread of execution has its own stack. Each local variable is either an object reference or a primitive type, such as int, char, or float. Therefore the roots of any JVM garbage-collected heap will include every object reference on every thread's stack. Another source of roots are any object references, such as strings, in the constant pool of loaded classes. The constant pool of a loaded class may refer to strings stored on the heap, such as the class name, superclass name, superinterface names, field names, field signatures, method names, and method signatures.
Any object referred to by a root is reachable and is therefore a live object. Additionally, any objects referred to by a live object are also reachable. The program is able to access any reachable objects, so these objects must remain on the heap. Any objects that are not reachable can be garbage collected because there is no way for the program to access them.
The article continues to explore different garbage collection strategies, including reference counting collectors, tracing collectors, compacting collectors and copying collectors.
Though this article is old, it still applies today; not much has really changed. There have been performance improvements to the different collection strategies, but no new major advancements.
The Oracle HotSpot JVM, for example, has a new Garbage-First Garbage Collector which is a copying collector with performance tweaks for multi-core processors and large heap sizes (see this answer for more on the G1 Garbage Collector).
Interesting documentation on this topic posted up by the .Net team shortly after the they made CoreCLR open source: Stack Walking
If I have a garbage collector that tracks every object allocated and deallocates them as soon as they no longer have usable references to them can you still have a memory leak?
Considering a memory leak is allocations without any reference isn't that impossible or am I missing something?
Edit: So what I'm counting as a memory leak is allocations which you no longer have any reference to in the code. Large numbers of accumulating allocations which you still have references to aren't the leaks I'm considering here.
I'm also only talking about normal state of the art G.C., It's been a while but I know cases like cyclical references don't trip them up. I don't need a specific answer for any language, this is just coming from a conversation I was having with a friend. We were talking about Actionscript and Java but I don't care for answers specific to those.
Edit2: From the sounds of it, there doesn't seem to be any reason code can completely lose the ability to reference an allocation and not have a GC be able to pick it up, but I'm still waiting for more to weigh in.
If your question is really this:
Considering a memory leak is allocations without any reference isn't
that impossible or am I missing something?
Then the answer is "yes, that's impossible" because a properly implemented garbage collector will reclaim all allocations that don't have active references.
However, you can definitely have a "memory leak" in (for example) Java. My definition of a "memory leak" is an allocation that still has an active reference (so that it won't be reclaimed by the garbage collector) but the programmer doesn't know that the object isn't reclaimable (ie: for the programmer, this object is dead and should be reclaimed). A simple example is something like this:
ObjectA -> ObjectB
In this example, ObjectA is an object in active use in the code. However, ObjectA contains a reference to ObjectB that is effectively dead (ie: ObjectB has been allocated and used and is now, from the programmer's perspective, dead) but the programmer forgot to set the reference in ObjectA to null. In this case, ObjectB has been "leaked".
Doesn't sound like a big problem, but there are situations where these leaks are cumulative. Let's imagine that ObjectA and ObjectB are actually instances of the same class. And this problem that the programmer forgot to set the reference to null happens every time such an instance is used. Eventually you end up with something like this:
ObjectA -> ObjectB -> ObjectC -> ObjectD -> ObjectE -> ObjectF -> ObjectG -> ObjectH -> etc...
Now ObjectB through ObjectH are all leaked. And problems like this will (eventually) cause your program to crash. Even with a properly implemented garbage collector.
To decide whether a program has a memory leak, one must first define what a leak is. I would define a program as having a memory leak if there exists some state S and series of inputs I such that:
If the program is in state `S` and it receives inputs `I`, it will still be in state `S` (if it doesn't crash), but...
The amount of memory required to repeat the above sequence `N` times will increase without bound.
It is definitely possible for programs that run entirely within garbage-collected frameworks to have memory leaks as defined above. A common way in which that can occur is with event subscriptions.
Suppose a thread-safe collection exposes a CollectionModified event, and the IEnumerator<T> returned by its IEnumerable<T>.GetEnumerator() method subscribes to that event on creation, and unsubscribes on Dispose; the event is used to allow enumeration to proceed sensibly even when the collection is modified (e.g. ensuring that objects are in the collection continuously throughout the enumeration will be returned exactly once; those that exist during part of it will be returned no more than once). Now suppose a long-lived instance of that collection class is created, and some particular input will cause it to be enumerated. If the CollectionModified event holds a strong reference to every non-disposed IEnumerator<T>, then repeatedly enumerating the collection will create and subscribe an unbounded number of enumerator objects. Memory leak.
Memory leaks don't just depend how efficient a garbage collection algorithm is, if your program holds on to object references which have long life time, say in an instance variable or static variable without being used, your program will have memory leaks.
Reference count have a known problem of cyclic refernces meaning
Object 1 refers to Object 2 and Object 2 refers to Object 1
but no one else refers to object 1 or Object 2, reference count algorithm will fail in this scenario.
Since you are working on garbage collector itself, its worth reading about different implementation strategies.
You can have memory leaks with a GC in another way: if you use a conservative garbage collector that naively scans the memory and for everything that looks like a pointer, doesn't free the memory it "points to", you may leave unreachable memory allocated.
Garbage collection involves walking through a list of allocated objects (either all objects or objects in a particular generation) and determining which are reachable.
How is this list maintained? Do runtimes for GC languages keep a giant list of all objects?
Also, from what I understand, GC involves walking the call stack to look for object references - how does the algorithm distinguish between GC-able pointers and primitive data?
The memory management system keeps track of the size of each allocated object, just like it does in C or C++. One way this is commonly done is for the memory management system to allocate an extra size_t before each allocation, that keeps track of the size of each objecct. The memory manager likewise has to keep track of the size of each free block, so that it can reuse blocks to allocate them.
The garbage collector works in two phases: the mark phase, and the sweep phase. In the mark phase, the garbage collector starts walks object references in order to find objects that are still reachable. The garbage collector starts at a few basic places where the object references are stored and given names (the stack, and global storage, and static storage), and then traverses references in the objects.
In the sweep phase, the garbage collector walks the heap from bottom to top, jumping from allocation to allocation based on those size_ts, and frees anything that isn't marked.
Some languages (like Ruby) tag all of the primitives so that they can be identified separately from the object references at runtime. Other garbage collectors are ver conservative and follow primatives as through they were object references (though some checks must be performed to make sure that the garbage collector doesn't stick a mark in the middle of some other object). Still other languages use runtime type information to be more precise about whether they follow primatives.
Ruby's garbage collector sometimes called "conservative" because it doesn't check whether the space on the stack is actually being used, so it sometimes keeps dead objects alive by following ghost references on the stack. But since it always knows exactly whether the data it's looking at is a reference or a primative, I don't call it conservative here.
Garbage collection involves walking through a list of allocated objects (either all objects or objects in a particular generation) and determining which are reachable.
Not really. GCs are categorized into tracing and reference counting (see A unified theory of garbage collection). Tracing GCs start from a set of global roots and trace all objects reachable from them. Reference counting GCs count the number of references to each object and reclaim it when the count reaches zero. Neither require a list including unreachable objects.
How is this list maintained? Do runtimes for GC languages keep a giant list of all objects?
Pedagogical solutions like the one in HLVM can keep a list of all objects because it is simple but this is rare.
Also, from what I understand, GC involves walking the call stack to look for object references - how does the algorithm distinguish between GC-able pointers and primitive data?
Again, there are many different strategies. Conservative GCs are unable to distinguish between pointers and non-pointers so they conservatively consider that non-pointers might be pointers. Pedagogical GCs like the one in HLVM can use algorithms like Henderson's Accurate GC in an uncooperative environment. Production GCs store enough information in the OS thread stack to determine exactly which words are pointers (and which stack frames to skip because they are not affiliated with managed code) and then use a stack walker to find them.
Note that you also have to find local references held in registers as well as on the stack.
This site ( How Java’s Garbage Collector Works? ) has a good, brief explanation on how garbage collectors work, not just the default Java one.