How does the Garbage Collector decide when to kill objects held by WeakReferences? - garbage-collection

I have an object, which I believe is held only by a WeakReference. I've traced its reference holders using SOS and SOSEX, and both confirm that this is the case (I'm not an SOS expert, so I could be wrong on this point).
The standard explanation of WeakReferences is that the GC ignores them when doing its sweeps. Nonetheless, my object survives an invocation to GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced).
Is it possible for an object that is only referenced with a WeakReference to survive that collection? Is there an even more thorough collection that I can force? Or, should I re-visit my belief that the only references to the object are weak?
Update and Conclusion
The root cause was that there was a reference on the stack that was locking the object. It is unclear why neither SOS nor SOSEX was showing that reference. User error is always a possibility.
In the course of diagnosing the root cause, I did do several experiments that demonstrated that WeakReferences to 2nd generation objects can stick around a surprisingly long time. However, a WRd 2nd gen object will not survive GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced).

As per wikipedia "An object referenced only by weak references is considered unreachable (or "weakly reachable") and so may be collected at any time. Weak references are used to avoid keeping memory referenced by unneeded objects"
I am not sure if your case is about weak references...

Try calling GC.WaitForPendingFinalizers() right after GC.Collect().
Another possible option: don't ever use a WeakReference for any purpose. In the wild, I've only ever seen them used as a mechanism for lowering an application's memory footprint (i.e. a form of caching). As the mighty MSDN says:
Avoid using weak references as an
automatic solution to memory
management problems. Instead, develop
an effective caching policy for
handling your application's objects.

I recommend you to check for the "other" references to the weakly referenced objects. Because, if there is another reference still alive, the objects won't be GCed.

Weakly referenced objects do get removed by garbage collection.
I've had the pleasure of debugging event systems where events were not getting fired... It turned out to be because the subscriber was only weakly referenced and so after some eventual random delay the GC would eventually collect it. At which point the UI stopped updating. :)

Yes it is possible. If the WeakReference is located in another generation than the one being collected, for example, if it is in the 2nd Generation, and the GC only does a Gen 0 collection; it will survive. It should not survive a full 2nd Gen collection that completes and where all finalizers run, however.

Related

Why does Concurrent-Mark-Sweep (CMS) remark phase need to re-examine the thread-stacks instead of just looking at the mutator's write-queues?

The standard CMS algorithm starts by making the application undergo a STW pause to calculate the GC-root-set. It then resumes mutator threads and both application and collector threads run concurrently until the marking is done. Any pointer store updated by a mutator-thread is protected by a write-barrier that will add that pointer reference to a write-queue.
When the marking phase is done we then proceed to the Remarking phase: it must then look into this write-queue and proceed to mark anything it finds there that was not already marked.
All of this makes sense. What I fail to understand is why would we need to:
Have this remarking phase recalculate the GC-root-set from scratch (including all thread stacks) -- does not doing this result in an incorrect algorithm, in the sense of it marking actually live and reachable objects as garbage to be reclaimed?;
Have this remarking phase be another STW event (maybe this is because of having to analyse all the thread-stacks?)
When reading one of the original papers on CMS A Generational Mostly-concurrent Garbage Collector one can see:
The original mostly-concurrent algorithm, proposed by
Boehm et al. [5], is a concurrent “tricolor” collector [9]. It
uses a write barrier to cause updates of fields of heap objects
to shade the containing object gray. Its main innovation is
that it trades off complete concurrency for better throughput, by allowing root locations (globals, stacks, registers),
which are usually updated more frequently than heap locations, to be written without using a barrier to maintain
the tricolor invariant.
it makes it look like this is just a trade-off emanating from a conscious decision to not involve what's happening on the stack in the write-barriers?
Thanks
Have this remarking phase recalculate the GC-root-set from scratch (including all thread stacks) -- does not doing this result in an incorrect algorithm, in the sense of it marking actually live and reachable objects as garbage to be reclaimed?
No, tricolor marking marks live objects (objects unmarked by then "grey" set is exhausted are unreachable). Remark add rediscovered root objects to "grey" set together with all reference caught by write-barrier, so more objects could be marked as live.
In summary, after CMS remark all live objects are marked, though some dead objects could be marked too.
Have this remarking phase be another STW event (maybe this is because of having to analyse all the thread-stacks?)
Yes, remark is STW pause in CMS algorithm in HotSpot JVM (you can read more about CMS phases here).
And answering question from title
Why does Concurrent-Mark-Sweep (CMS) remark phase need to re-examine the thread-stacks instead of just looking at the mutator's write-queues?
CMS does not use "mutator's write-queues", it does utilize card marking write barrier (shared with young generation copy collector).
Generally all algorithms using write barriers need STW pause to avoid "turtle and arrow" paradox.
CMS starts initial tri-color marking. Then it completed "some" live objects are marked, but due to concurrent modifications marking could miss certain objects. Though write-barrier captures all mutations, thus "pre clean" add all mutated references to "gray" set and resume marking reaching missed objects. Though for this process to converge, final remark with mutator stopped is required.

Does calling `gc()` manually, result in all `finalizers` being executed immediately?

I have some code that I suspect is leaking memory.
As the code uses ccall and maintains significant information held inside pointers,
which are supposed to be free'd by code that is ccalled during finalizers.
In my debugging I am calling gc().
And I want to know if this will immediately trigger all finalizers that are attached to the objects that have moved out of scope
Answers should be concerned only with julie 0.5+.
After the discussion on #Isaiah's answer (deleted), I decided to poke some internals folks and get some clarity on this. As a result, I have it on good authority that when gc() is called at the top level – i.e. not in a local scope – then the following assurance can be relied upon:
if an object is unreachable and you call gc() it’ll be finalized
which is pretty clear cut. The top-level part is significant since when you call gc() in a local scope, local references may or may not be considered reachable, even if they will never be used again.
This assurance does sweep some uncertainty under the carpet of "reachability" since it may not be obvious whether an object is reachable or not because the language runtime may keep references to some objects for various reasons. These reasons should be exhaustively documented, but currently they are not. A couple of notable cases where the runtime holds onto objects are:
The unique instance of a singleton type is permanent and will never be collected or finalized;
Method caches are also permanent, which in particular, means that modules are not freed when you might otherwise expect them to be since method caches keep references to the modules in which they are defined.
Under "normal circumstances" however – which is what I suspect this question is getting at – yes, calling gc() when an object is no longer reachable will cause it to be collected and finalized "immediately", i.e. before the gc() call returns.

Bindings and memory leaks

Problem
Example use case:
I have a control which displays a status gauge. The visual status of the gauge is bound to a property of the control
The control is part of a topology graph. So depending on the topology e. g. a 100 of these controls may be displayed at once
There are several topologies. Every time you switch to another topology view the whole graph is regenerated
Question
Could this cause a memory leak and do you have to perform a manual unbind in the old topology view before you create the new one? Similar to the bindings, do you have to remove event handlers manually?
The bindings and the event handlers are inside the control. So once the control isn't accessible anymore it should be possible that it's garbage collected. So I think you don't have to do anything, but I don't know.
Thank you very much for the expertise!
If you look into the JavaDocs:
[...] All bindings in our implementation use instances of WeakInvalidationListener, which means usually a binding does not need to be disposed. But if you plan to use your application in environments that do not support WeakReferences you have to dispose unused Bindings to avoid memory leaks.
So if you use or extend the default Bindings the Garbage Collector should be able to do its work.
If you do not, be sure do implement and call Binding.dispose().
As always: If an object is no longer referenced by any other object it gets garbage collected (at some point in the future). So usually one does not need to specifically implement in this direction, as it tends to clutter the code.

Garbage collect certain object

Is is possible to garbage collect a certain object in Pharo?
E.g. I know that certain object is not (should be not) referenced by any other object. And it takes a lot of space. Does it make sense to just run general garbage collect on system? Or it is possible to remove from heap just specific object/tree
Smalltalk garbage collectors can't garbage-collect just a single object.
There are two basic techniques used - generation scavenging and mark and sweep. Generation scavenging works on new and relatively new objects by copying the used objects into another unused space and ignoring all the garbage. Objects that get copied a lot of times are moved to "old space". Old space is garbage collected by a mark and sweep algorithm. This algorithm loops through all Smalltalk objects and marks them as "unmarked". It then traverses through all accessible objects and marks them as "marked". In the final sweep, anything that's still marked as "unmarked" is freed.
There's no way to run either algorithm on a single object.
No, it does not makes sense, and is not possible.
Also it does not make sense to manually run the garbage collector (which you can do, of course)... system should run gc when needed and you will get that space back.
The whole purpose of a gc is that you do not have to take care about that.
I think you're looking for a reference list.
(i.e. which object is keeping your object not garbage collected).
Might be a Global variable somewhere. Something in a class variable....

Can I use [self retain] to hold the object itself in objective-c?

I'm using [self retain] to hold an object itself, and [self release] to free it elsewhere. This is very convenient sometimes. But this is actually a reference-loop, or dead-lock, which most garbage-collection systems target to solve. I wonder if objective-c's autorelease pool may find the loops and give me surprises by release the object before reaching [self release]. Is my way encouraged or not? How can I ensure that the garbage-collection, if there, won't be too smart?
This way of working is very discouraged. It looks like you need some pointers on memory management.
Theoretically, an object should live as long as it is useful. Useful objects can easily be spotted: they are directly referenced somewhere on a thread stack, or, if you made a graph of all your objects, reachable through some path linked to an object referenced somewhere on a thread stack. Objects that live "by themselves", without being referenced, cannot be useful, since no thread can reach to them to make them perform something.
This is how a garbage collector works: it traverses your object graph and collects every unreferenced object. Mind you, Objective-C is not always garbage-collected, so some rules had to be established. These are the memory management guidelines for Cocoa.
In short, it is based over the concept of 'ownership'. When you look at the reference count of an object, you immediately know how many other objects depend on it. If an object has a reference count of 3, it means that three other objects need it to work properly (and thus own it). Every time you keep a reference to an object (except in rare conditions), you should call its retain method. And before you drop the reference, you should call its release method.
There are some other importants rule regarding the creation of objects. When you call alloc, copy or mutableCopy, the object you get already has a refcount of 1. In this case, it means the calling code is responsible for releasing the object once it's not required. This can be problematic when you return references to objects: once you return it, in theory, you don't need it anymore, but if you call release on it, it'll be destroyed right away! This is where NSAutoreleasePool objects come in. By calling autorelease on an object, you give up ownership on it (as if you called release), except that the reference is not immediately revoked: instead, it is transferred to the NSAutoreleasePool, that will release it once it receives the release message itself. (Whenever some of your code is called back by the Cocoa framework, you can be assured that an autorelease pool already exists.)
It also means that you do not own objects if you did not call alloc, copy or mutableCopy on them; in other words, if you obtain a reference to such an object otherwise, you don't need to call release on it. If you need to keep around such an object, as usual, call retain on it, and then release when you're done.
Now, if we try to apply this logic to your use case, it stands out as odd. An object cannot logically own itself, as it would mean that it can exist, standalone in memory, without being referenced by a thread. Obviously, if you have the occasion to call release on yourself, it means that one of your methods is being executed; therefore, there's gotta be a reference around for you, so you shouldn't need to retain yourself in the first place. I can't really say with the few details you've given, but you probably need to look into NSAutoreleasePool objects.
If you're using the retain/release memory model, it shouldn't be a problem. Nothing will go looking for your [self retain] and subvert it. That may not be the case, however, if you ever switch over to using garbage collection, where -retain and -release are no-ops.
Here's another thread on SO on the same topic.
I'd reiterate the answer that includes the phrase "overwhelming sense of ickyness." It's not illegal, but it feels like a poor plan unless there's a pretty strong reason. If nothing else, it seems sneaky, and that's never good in code. Do heed the warning in that thread to use -autorelease instead of -release.

Resources