Groovy memoized annotation protectedCacheSize usage - groovy

When reading the document I'm curious why the #Memoized have these 2nd parameter. What is the use case? When is the relation between the garbage collection and cache replace behavior?
int protectedCacheSize: Number of cached return values to protect from garbage collection.
Groovy memoized protectedCacheSize

This documentation mentions:
By default, the size of the cache is unlimited and no cache result is protected from garbage collection.
Setting a protectedCacheSize>0 would create an unlimited cache with some results protected.
Setting maxCacheSize>0 would create a limited cache but without any protection from garbage protection.
Setting both would create a limited, protected cache.
So a use case would be if you e.g. have a memory intensive application that might run the risk of the cache being garbage collected too often or deeply, then using the protectedCacheSize can forcibly protect a certain number of the most recently used cache entries from garbage collection.
This will make even more sense if the cached values are expensive to compute and the protectedCacheSize is not too high as to prevent the GC from regaining needed memory.
Digging a bit deeper the AST for the #Memoized annotation is handled by the groovy.transform.MemoizedASTTransformation.java which converts the method call to a Closure.
Depending on the max and protected args being 0 or not different memoize methods are then called on the Closure, see the implementation for more detail (scroll down for the three other memoize methods).

Related

repeated add/remove results in memory leak

I have a use case where I repeatedly fetch data from the server and display it using cytoscape. For this, I just have a single cy object and I repeatedly remove and add the elements. This happens once every second or two. I notice the browser memory growing with time. The documentation says "Though the elements specified to this function are removed from the graph, they may still exist in memory"
So, do I need to do anything with the collection returned by calling remove? How do I ensure memory is cleared.
Well javascript is already a garbage collected language, so it will drop all of your references to your nodes eventually. If you remove nodes from the graph and you don't have any references to it, then the garbage collector will clean it up ... eventually :)
Due to the fact that these memory leaks exists, my educated guess is, that there may be some internal entanglement with a global scope or sth like that which prevents elements to be discarded before the whole graph is reinitialized (maybe try that?).

Garbage collect certain object

Is is possible to garbage collect a certain object in Pharo?
E.g. I know that certain object is not (should be not) referenced by any other object. And it takes a lot of space. Does it make sense to just run general garbage collect on system? Or it is possible to remove from heap just specific object/tree
Smalltalk garbage collectors can't garbage-collect just a single object.
There are two basic techniques used - generation scavenging and mark and sweep. Generation scavenging works on new and relatively new objects by copying the used objects into another unused space and ignoring all the garbage. Objects that get copied a lot of times are moved to "old space". Old space is garbage collected by a mark and sweep algorithm. This algorithm loops through all Smalltalk objects and marks them as "unmarked". It then traverses through all accessible objects and marks them as "marked". In the final sweep, anything that's still marked as "unmarked" is freed.
There's no way to run either algorithm on a single object.
No, it does not makes sense, and is not possible.
Also it does not make sense to manually run the garbage collector (which you can do, of course)... system should run gc when needed and you will get that space back.
The whole purpose of a gc is that you do not have to take care about that.
I think you're looking for a reference list.
(i.e. which object is keeping your object not garbage collected).
Might be a Global variable somewhere. Something in a class variable....

Are there greenDAO thread safety best practices?

I'm having a go with greenDAO and so far it's going pretty well. One thing that doesn't seem to be covered by the docs or website (or anywhere :( ) is how it handles thread safety.
I know the basics mentioned elsewhere, like "use a single dao session" (general practice for Android + SQLite), and I understand the Java memory model quite well. The library internals even appear threadsafe, or at least built with that intention. But nothing I've seen covers this:
greenDAO caches entities by default. This is excellent for a completely single-threaded program - transparent and a massive performance boost for most uses. But if I e.g. loadAll() and then modify one of the elements, I'm modifying the same object globally across my app. If I'm using it on the main thread (e.g. for display), and updating the DB on a background thread (as is right and proper), there are obvious threading problems unless extra care is taken.
Does greenDAO do anything "under the hood" to protect against common application-level threading problems? For example, modifying a cached entity in the UI thread while saving it in a background thread (better hope they don't interleave! especially when modifying a list!)? Are there any "best practices" to protect against them, beyond general thread safety concerns (i.e. something that greenDAO expects and works well with)? Or is the whole cache fatally flawed from a multithreaded-application safety standpoint?
I've no experience with greenDAO but the documentation here:
http://greendao-orm.com/documentation/queries/
Says:
If you use queries in multiple threads, you must call forCurrentThread() on the query to get a Query instance for the current thread. Starting with greenDAO 1.3, object instances of Query are bound to their owning thread that build the query. This lets you safely set parameters on the Query object while other threads cannot interfere. If other threads try to set parameters on the query or execute the query bound to another thread, an exception will be thrown. Like this, you don’t need a synchronized statement. In fact you should avoid locking because this may lead to deadlocks if concurrent transactions use the same Query object.
To avoid those potential deadlocks completely, greenDAO 1.3 introduced the method forCurrentThread(). This will return a thread-local instance of the Query, which is safe to use in the current thread. Every time, forCurrentThread() is called, the parameters are set to the initial parameters at the time the query was built using its builder.
While so far as I can see the documentation doesn't explicitly say anything about multi threading other than this this seems pretty clear that it is handled. This is talking about multiple threads using the same Query object, so clearly multiple threads can access the same database. Certainly it's normal for databases and DAO to handle concurrent access and there are a lot of proven techniques for working with caches in this situation.
By default GreenDAO caches and returns cached entity instances to improve performance. To prevent this behaviour, you need to call:
daoSession.clear()
to clear all cached instances. Alternatively you can call:
objectDao.detachAll()
to clear cached instances only for the specific DAO object.
You will need to call these methods every time you want to clear the cached instances, so if you want to disable all caching, I recommend calling them in your Session or DAO accessor methods.
Documentation:
http://greenrobot.org/greendao/documentation/sessions/#Clear_the_identity_scope
Discussion: https://github.com/greenrobot/greenDAO/issues/776

LMAX Disruptor: Must EventHandler clone object received from EventHandler#onEvent

I have an application with Many Producers and consumers.
From my understanding, RingBuffer creates objects at start of RingBuffer init and you then copy object when you publish in Ring and get them from it in EventHandler.
My application LogHandler buffers received events in a List to send it in Batch mode further once the list has reached a certain size. So EventHandler#onEvent puts the received object in the list , once it has reached the size , it sends it in RMI to a server and clears it.
My question, is do I need to clone the object before I put in list, as I understand, once consumed they can be reused ?
Do I need to synchronize access to the list in my EventHandler#onEvent ?
Yes - your understanding is correct. You copy your values in and out of the ringbuffer slots.
I would suggest that yes you clone the values as you extract it from the ring buffer and into your event handler list; otherwise the slot can be reused.
You should not need to synchronise access to the list as long as it is a private member variable of your Event Handler and you only have one event handler instance per thread. If you have multiple event handlers adding to the same (eg static) List instance then you would need synchronisation.
Clarification:
Be sure to read the background in OzgurH's comments below. If you stick to using the endOfBatch flag on disruptor and use that to decide the size of your batch, you do not have to copy objects out of the list. If you are using your own accumulation strategy (such as size - as per the question), then you should clone objects out as the slot could be reused before you have had the chance to send.
Also worth noting that if you are needing to synchronize on the list instance, then you have missed a big opportunity with disruptor and will destroy your performance anyway.
It is possible to use slots in the Disruptor's RingBuffer (including ones containing a List) without cloning/copying values. This may be a preferable solution for you depending on whether you are worried about garbage creation, and whether you actually need to be concerned about concurrent updates to the objects being placed in the RingBuffer. If all the objects being placed in the slot's list are immutable, or if they are only being updated/read by a single thread at a time (a precondition which the Disruptor is often used to enforce), there will be nothing gained from cloning them as they are already immune to data races.
On the subject of batching, note that the Disruptor framework itself provides a mechanism for taking items from the RingBuffer in batches in your EventHandler threads. This is approach is fully thread-safe and lock-free, and could yield better performance by making your memory access patterns more predictable to the CPU.

How does the Garbage Collector decide when to kill objects held by WeakReferences?

I have an object, which I believe is held only by a WeakReference. I've traced its reference holders using SOS and SOSEX, and both confirm that this is the case (I'm not an SOS expert, so I could be wrong on this point).
The standard explanation of WeakReferences is that the GC ignores them when doing its sweeps. Nonetheless, my object survives an invocation to GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced).
Is it possible for an object that is only referenced with a WeakReference to survive that collection? Is there an even more thorough collection that I can force? Or, should I re-visit my belief that the only references to the object are weak?
Update and Conclusion
The root cause was that there was a reference on the stack that was locking the object. It is unclear why neither SOS nor SOSEX was showing that reference. User error is always a possibility.
In the course of diagnosing the root cause, I did do several experiments that demonstrated that WeakReferences to 2nd generation objects can stick around a surprisingly long time. However, a WRd 2nd gen object will not survive GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced).
As per wikipedia "An object referenced only by weak references is considered unreachable (or "weakly reachable") and so may be collected at any time. Weak references are used to avoid keeping memory referenced by unneeded objects"
I am not sure if your case is about weak references...
Try calling GC.WaitForPendingFinalizers() right after GC.Collect().
Another possible option: don't ever use a WeakReference for any purpose. In the wild, I've only ever seen them used as a mechanism for lowering an application's memory footprint (i.e. a form of caching). As the mighty MSDN says:
Avoid using weak references as an
automatic solution to memory
management problems. Instead, develop
an effective caching policy for
handling your application's objects.
I recommend you to check for the "other" references to the weakly referenced objects. Because, if there is another reference still alive, the objects won't be GCed.
Weakly referenced objects do get removed by garbage collection.
I've had the pleasure of debugging event systems where events were not getting fired... It turned out to be because the subscriber was only weakly referenced and so after some eventual random delay the GC would eventually collect it. At which point the UI stopped updating. :)
Yes it is possible. If the WeakReference is located in another generation than the one being collected, for example, if it is in the 2nd Generation, and the GC only does a Gen 0 collection; it will survive. It should not survive a full 2nd Gen collection that completes and where all finalizers run, however.

Resources