Garbage collect certain object - garbage-collection

Is is possible to garbage collect a certain object in Pharo?
E.g. I know that certain object is not (should be not) referenced by any other object. And it takes a lot of space. Does it make sense to just run general garbage collect on system? Or it is possible to remove from heap just specific object/tree

Smalltalk garbage collectors can't garbage-collect just a single object.
There are two basic techniques used - generation scavenging and mark and sweep. Generation scavenging works on new and relatively new objects by copying the used objects into another unused space and ignoring all the garbage. Objects that get copied a lot of times are moved to "old space". Old space is garbage collected by a mark and sweep algorithm. This algorithm loops through all Smalltalk objects and marks them as "unmarked". It then traverses through all accessible objects and marks them as "marked". In the final sweep, anything that's still marked as "unmarked" is freed.
There's no way to run either algorithm on a single object.

No, it does not makes sense, and is not possible.
Also it does not make sense to manually run the garbage collector (which you can do, of course)... system should run gc when needed and you will get that space back.
The whole purpose of a gc is that you do not have to take care about that.

I think you're looking for a reference list.
(i.e. which object is keeping your object not garbage collected).
Might be a Global variable somewhere. Something in a class variable....

Related

Groovy memoized annotation protectedCacheSize usage

When reading the document I'm curious why the #Memoized have these 2nd parameter. What is the use case? When is the relation between the garbage collection and cache replace behavior?
int protectedCacheSize: Number of cached return values to protect from garbage collection.
Groovy memoized protectedCacheSize
This documentation mentions:
By default, the size of the cache is unlimited and no cache result is protected from garbage collection.
Setting a protectedCacheSize>0 would create an unlimited cache with some results protected.
Setting maxCacheSize>0 would create a limited cache but without any protection from garbage protection.
Setting both would create a limited, protected cache.
So a use case would be if you e.g. have a memory intensive application that might run the risk of the cache being garbage collected too often or deeply, then using the protectedCacheSize can forcibly protect a certain number of the most recently used cache entries from garbage collection.
This will make even more sense if the cached values are expensive to compute and the protectedCacheSize is not too high as to prevent the GC from regaining needed memory.
Digging a bit deeper the AST for the #Memoized annotation is handled by the groovy.transform.MemoizedASTTransformation.java which converts the method call to a Closure.
Depending on the max and protected args being 0 or not different memoize methods are then called on the Closure, see the implementation for more detail (scroll down for the three other memoize methods).

Why does Concurrent-Mark-Sweep (CMS) remark phase need to re-examine the thread-stacks instead of just looking at the mutator's write-queues?

The standard CMS algorithm starts by making the application undergo a STW pause to calculate the GC-root-set. It then resumes mutator threads and both application and collector threads run concurrently until the marking is done. Any pointer store updated by a mutator-thread is protected by a write-barrier that will add that pointer reference to a write-queue.
When the marking phase is done we then proceed to the Remarking phase: it must then look into this write-queue and proceed to mark anything it finds there that was not already marked.
All of this makes sense. What I fail to understand is why would we need to:
Have this remarking phase recalculate the GC-root-set from scratch (including all thread stacks) -- does not doing this result in an incorrect algorithm, in the sense of it marking actually live and reachable objects as garbage to be reclaimed?;
Have this remarking phase be another STW event (maybe this is because of having to analyse all the thread-stacks?)
When reading one of the original papers on CMS A Generational Mostly-concurrent Garbage Collector one can see:
The original mostly-concurrent algorithm, proposed by
Boehm et al. [5], is a concurrent “tricolor” collector [9]. It
uses a write barrier to cause updates of fields of heap objects
to shade the containing object gray. Its main innovation is
that it trades off complete concurrency for better throughput, by allowing root locations (globals, stacks, registers),
which are usually updated more frequently than heap locations, to be written without using a barrier to maintain
the tricolor invariant.
it makes it look like this is just a trade-off emanating from a conscious decision to not involve what's happening on the stack in the write-barriers?
Thanks
Have this remarking phase recalculate the GC-root-set from scratch (including all thread stacks) -- does not doing this result in an incorrect algorithm, in the sense of it marking actually live and reachable objects as garbage to be reclaimed?
No, tricolor marking marks live objects (objects unmarked by then "grey" set is exhausted are unreachable). Remark add rediscovered root objects to "grey" set together with all reference caught by write-barrier, so more objects could be marked as live.
In summary, after CMS remark all live objects are marked, though some dead objects could be marked too.
Have this remarking phase be another STW event (maybe this is because of having to analyse all the thread-stacks?)
Yes, remark is STW pause in CMS algorithm in HotSpot JVM (you can read more about CMS phases here).
And answering question from title
Why does Concurrent-Mark-Sweep (CMS) remark phase need to re-examine the thread-stacks instead of just looking at the mutator's write-queues?
CMS does not use "mutator's write-queues", it does utilize card marking write barrier (shared with young generation copy collector).
Generally all algorithms using write barriers need STW pause to avoid "turtle and arrow" paradox.
CMS starts initial tri-color marking. Then it completed "some" live objects are marked, but due to concurrent modifications marking could miss certain objects. Though write-barrier captures all mutations, thus "pre clean" add all mutated references to "gray" set and resume marking reaching missed objects. Though for this process to converge, final remark with mutator stopped is required.

repeated add/remove results in memory leak

I have a use case where I repeatedly fetch data from the server and display it using cytoscape. For this, I just have a single cy object and I repeatedly remove and add the elements. This happens once every second or two. I notice the browser memory growing with time. The documentation says "Though the elements specified to this function are removed from the graph, they may still exist in memory"
So, do I need to do anything with the collection returned by calling remove? How do I ensure memory is cleared.
Well javascript is already a garbage collected language, so it will drop all of your references to your nodes eventually. If you remove nodes from the graph and you don't have any references to it, then the garbage collector will clean it up ... eventually :)
Due to the fact that these memory leaks exists, my educated guess is, that there may be some internal entanglement with a global scope or sth like that which prevents elements to be discarded before the whole graph is reinitialized (maybe try that?).

Can I use [self retain] to hold the object itself in objective-c?

I'm using [self retain] to hold an object itself, and [self release] to free it elsewhere. This is very convenient sometimes. But this is actually a reference-loop, or dead-lock, which most garbage-collection systems target to solve. I wonder if objective-c's autorelease pool may find the loops and give me surprises by release the object before reaching [self release]. Is my way encouraged or not? How can I ensure that the garbage-collection, if there, won't be too smart?
This way of working is very discouraged. It looks like you need some pointers on memory management.
Theoretically, an object should live as long as it is useful. Useful objects can easily be spotted: they are directly referenced somewhere on a thread stack, or, if you made a graph of all your objects, reachable through some path linked to an object referenced somewhere on a thread stack. Objects that live "by themselves", without being referenced, cannot be useful, since no thread can reach to them to make them perform something.
This is how a garbage collector works: it traverses your object graph and collects every unreferenced object. Mind you, Objective-C is not always garbage-collected, so some rules had to be established. These are the memory management guidelines for Cocoa.
In short, it is based over the concept of 'ownership'. When you look at the reference count of an object, you immediately know how many other objects depend on it. If an object has a reference count of 3, it means that three other objects need it to work properly (and thus own it). Every time you keep a reference to an object (except in rare conditions), you should call its retain method. And before you drop the reference, you should call its release method.
There are some other importants rule regarding the creation of objects. When you call alloc, copy or mutableCopy, the object you get already has a refcount of 1. In this case, it means the calling code is responsible for releasing the object once it's not required. This can be problematic when you return references to objects: once you return it, in theory, you don't need it anymore, but if you call release on it, it'll be destroyed right away! This is where NSAutoreleasePool objects come in. By calling autorelease on an object, you give up ownership on it (as if you called release), except that the reference is not immediately revoked: instead, it is transferred to the NSAutoreleasePool, that will release it once it receives the release message itself. (Whenever some of your code is called back by the Cocoa framework, you can be assured that an autorelease pool already exists.)
It also means that you do not own objects if you did not call alloc, copy or mutableCopy on them; in other words, if you obtain a reference to such an object otherwise, you don't need to call release on it. If you need to keep around such an object, as usual, call retain on it, and then release when you're done.
Now, if we try to apply this logic to your use case, it stands out as odd. An object cannot logically own itself, as it would mean that it can exist, standalone in memory, without being referenced by a thread. Obviously, if you have the occasion to call release on yourself, it means that one of your methods is being executed; therefore, there's gotta be a reference around for you, so you shouldn't need to retain yourself in the first place. I can't really say with the few details you've given, but you probably need to look into NSAutoreleasePool objects.
If you're using the retain/release memory model, it shouldn't be a problem. Nothing will go looking for your [self retain] and subvert it. That may not be the case, however, if you ever switch over to using garbage collection, where -retain and -release are no-ops.
Here's another thread on SO on the same topic.
I'd reiterate the answer that includes the phrase "overwhelming sense of ickyness." It's not illegal, but it feels like a poor plan unless there's a pretty strong reason. If nothing else, it seems sneaky, and that's never good in code. Do heed the warning in that thread to use -autorelease instead of -release.

How does the Garbage Collector decide when to kill objects held by WeakReferences?

I have an object, which I believe is held only by a WeakReference. I've traced its reference holders using SOS and SOSEX, and both confirm that this is the case (I'm not an SOS expert, so I could be wrong on this point).
The standard explanation of WeakReferences is that the GC ignores them when doing its sweeps. Nonetheless, my object survives an invocation to GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced).
Is it possible for an object that is only referenced with a WeakReference to survive that collection? Is there an even more thorough collection that I can force? Or, should I re-visit my belief that the only references to the object are weak?
Update and Conclusion
The root cause was that there was a reference on the stack that was locking the object. It is unclear why neither SOS nor SOSEX was showing that reference. User error is always a possibility.
In the course of diagnosing the root cause, I did do several experiments that demonstrated that WeakReferences to 2nd generation objects can stick around a surprisingly long time. However, a WRd 2nd gen object will not survive GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced).
As per wikipedia "An object referenced only by weak references is considered unreachable (or "weakly reachable") and so may be collected at any time. Weak references are used to avoid keeping memory referenced by unneeded objects"
I am not sure if your case is about weak references...
Try calling GC.WaitForPendingFinalizers() right after GC.Collect().
Another possible option: don't ever use a WeakReference for any purpose. In the wild, I've only ever seen them used as a mechanism for lowering an application's memory footprint (i.e. a form of caching). As the mighty MSDN says:
Avoid using weak references as an
automatic solution to memory
management problems. Instead, develop
an effective caching policy for
handling your application's objects.
I recommend you to check for the "other" references to the weakly referenced objects. Because, if there is another reference still alive, the objects won't be GCed.
Weakly referenced objects do get removed by garbage collection.
I've had the pleasure of debugging event systems where events were not getting fired... It turned out to be because the subscriber was only weakly referenced and so after some eventual random delay the GC would eventually collect it. At which point the UI stopped updating. :)
Yes it is possible. If the WeakReference is located in another generation than the one being collected, for example, if it is in the 2nd Generation, and the GC only does a Gen 0 collection; it will survive. It should not survive a full 2nd Gen collection that completes and where all finalizers run, however.

Resources