is there a way to access to NodeJS (V8) GC reference counts? - node.js

so i've implemented an experimental cache for my memory-hungry app, and thrown a heap into the mix so i can easily get the least accessed objects once the cache outgrows a certain limit—the idea is to purge the cache from objects that are likely not re-used any time soon and if so, retrieve them from the database.
so far, so fine, except there may be objects that have not yet been written to the database and should not be purged. i can handle that by setting a 'dirty' bit, no problem. but there is another source of problems: what if there are still valid references to a given, cached object lurking around somewhere? this may lead to a situation where function f holds a reference A to an object with an ID of xxx, which then gets purged from cache, and then another function g requests an object with the same ID of xxx, but gets another reference B, distinct from A. so far i'm building my software on the assumption that there should only ever be a single instance of any persisted object with a given ID (maybe that's stupid?).
my guess so far is that i could profit from a garbage-collection-related method like gc.get_reference_count( value )—checking that and knowing any count above 1 (since value is in the cache) means some closure is still holding on to value, so it should not be purged.
i haven't found anything useful in this direction. does the problem in general call for another solution?

Related

repeated add/remove results in memory leak

I have a use case where I repeatedly fetch data from the server and display it using cytoscape. For this, I just have a single cy object and I repeatedly remove and add the elements. This happens once every second or two. I notice the browser memory growing with time. The documentation says "Though the elements specified to this function are removed from the graph, they may still exist in memory"
So, do I need to do anything with the collection returned by calling remove? How do I ensure memory is cleared.
Well javascript is already a garbage collected language, so it will drop all of your references to your nodes eventually. If you remove nodes from the graph and you don't have any references to it, then the garbage collector will clean it up ... eventually :)
Due to the fact that these memory leaks exists, my educated guess is, that there may be some internal entanglement with a global scope or sth like that which prevents elements to be discarded before the whole graph is reinitialized (maybe try that?).

Clips multiple EnvEval queries invalidate previous result objects?

I had a another strange problem that I solved already. But I'm not sure I just luckily fixed it or I really understand what's going on. So basically I have perform a query on my facts via:
DATA_OBJECT decay_tree_fact_list;
std::stringstream clips_query;
clips_query << "(find-all-facts ((?f DecayTree)) TRUE)";
EnvEval(clips_environment_, clips_query.str().c_str(), &decay_tree_fact_list);
Then I go through the list of facts and retrieve the needed information. There I also make another "subquery" for each of the found facts above in the following way
DATA_OBJECT spin_quantum_number_fact_list;
std::stringstream clips_query;
clips_query << "(find-fact ((?f SpinQuantumNumber)) (= ?f:unique_id "
<< spin_quantum_number_unique_id << "))";
EnvEval(clips_environment_, clips_query.str().c_str(),
&spin_quantum_number_fact_list);
This all works fine for the first DecayTree fact, no matter at which position I start, but for the next one it crashes, because the fact address is bogus. I traced the problem down to the subquery I make. So what I did to solve the problem was to save all the DecayTree fact addresses in a vector and then process that. Since I could not find any information about my theory so far I wanted to ask here.
So my question is quite simple, and would be: If I perform two queries, after each other, does the retrieved information of the first query get invalidated as soon as I call the second query?
The EnvEval function should be marked in the documentation as triggering garbage collection, but it is not. CLIPS internally represents string, integers, floats, and other primitives similar to other languages (such as Java) which allow instances of classes such as String, Integer, and Float. As these values are dynamically created, they need to be subject to garbage collection when they are no longer used. Internally CLIPS uses reference counts to determine whether these values are referenced, but when these values are returned to a user's code it is not possible to know if they are referenced without some action from the user's code.
When you call EnvEval, the value it returns is exempt from garbage collection. It is not exempt the next time EnvEval is called. So if you immediately process the value returned or save it (i.e. allocate storage for a string and copy the value from CLIPS or save the fact addresses from a multifield in an array), then you don't need to worry about the value returned by CLIPS being garbage collected by a subsequent EnvEval call.
If you want to execute a series of EnvEval calls (or other CLIPS function which may trigger garbage collection) without having to worry about values being garbage collected, wrap the calls within EnvIncrementGCLocks/EnvDecrementGCLocks
EnvIncrementGCLocks(theEnv);
... Your Calls ...
EnvDecrementGCLocks(theEnv);
Garbage collection for all the values returned to your code will be temporarily disabled while you make the calls and then when you finish by calling EnvDecrementGCLocks the values will be garbage collected.
There's some additional information on garbage collection in section 1.4 of the Advanced Programming Guide.

Core-Data: Cases in which a XML-Database can be corrupted

I got an error from core-data that a value "" could not be parsed.
This value belonged to a non optional entity attribute of type double with 0 as default.
What can cause such data corruption?
I think the answer to your question "what could cause such data corruption" is "faulting".
Core data will only fetch the attributes when it needs them. This is a feature, not a bug, as it helps manage memory and performance efficiently behind the scenes. However, if you use a construct returned by a core data fetch (such as an array with fetch results) and construct an XLM it is conceivable that the faults are not filled (i.e., Core Data does not go to the persistent store to fetch the faulted data automatically).
Your observation that everything is there once you explicitly call the relationship like in children = entity.children corroborates this thesis.
So -no, not access observers, but faulting is responsible for your data loss.

Can I use [self retain] to hold the object itself in objective-c?

I'm using [self retain] to hold an object itself, and [self release] to free it elsewhere. This is very convenient sometimes. But this is actually a reference-loop, or dead-lock, which most garbage-collection systems target to solve. I wonder if objective-c's autorelease pool may find the loops and give me surprises by release the object before reaching [self release]. Is my way encouraged or not? How can I ensure that the garbage-collection, if there, won't be too smart?
This way of working is very discouraged. It looks like you need some pointers on memory management.
Theoretically, an object should live as long as it is useful. Useful objects can easily be spotted: they are directly referenced somewhere on a thread stack, or, if you made a graph of all your objects, reachable through some path linked to an object referenced somewhere on a thread stack. Objects that live "by themselves", without being referenced, cannot be useful, since no thread can reach to them to make them perform something.
This is how a garbage collector works: it traverses your object graph and collects every unreferenced object. Mind you, Objective-C is not always garbage-collected, so some rules had to be established. These are the memory management guidelines for Cocoa.
In short, it is based over the concept of 'ownership'. When you look at the reference count of an object, you immediately know how many other objects depend on it. If an object has a reference count of 3, it means that three other objects need it to work properly (and thus own it). Every time you keep a reference to an object (except in rare conditions), you should call its retain method. And before you drop the reference, you should call its release method.
There are some other importants rule regarding the creation of objects. When you call alloc, copy or mutableCopy, the object you get already has a refcount of 1. In this case, it means the calling code is responsible for releasing the object once it's not required. This can be problematic when you return references to objects: once you return it, in theory, you don't need it anymore, but if you call release on it, it'll be destroyed right away! This is where NSAutoreleasePool objects come in. By calling autorelease on an object, you give up ownership on it (as if you called release), except that the reference is not immediately revoked: instead, it is transferred to the NSAutoreleasePool, that will release it once it receives the release message itself. (Whenever some of your code is called back by the Cocoa framework, you can be assured that an autorelease pool already exists.)
It also means that you do not own objects if you did not call alloc, copy or mutableCopy on them; in other words, if you obtain a reference to such an object otherwise, you don't need to call release on it. If you need to keep around such an object, as usual, call retain on it, and then release when you're done.
Now, if we try to apply this logic to your use case, it stands out as odd. An object cannot logically own itself, as it would mean that it can exist, standalone in memory, without being referenced by a thread. Obviously, if you have the occasion to call release on yourself, it means that one of your methods is being executed; therefore, there's gotta be a reference around for you, so you shouldn't need to retain yourself in the first place. I can't really say with the few details you've given, but you probably need to look into NSAutoreleasePool objects.
If you're using the retain/release memory model, it shouldn't be a problem. Nothing will go looking for your [self retain] and subvert it. That may not be the case, however, if you ever switch over to using garbage collection, where -retain and -release are no-ops.
Here's another thread on SO on the same topic.
I'd reiterate the answer that includes the phrase "overwhelming sense of ickyness." It's not illegal, but it feels like a poor plan unless there's a pretty strong reason. If nothing else, it seems sneaky, and that's never good in code. Do heed the warning in that thread to use -autorelease instead of -release.

What's the best way to keep count of the data set size information in Core Data?

Right now whenever I need to access my data set size (and it can be quite frequently), I perform a countForFetchRequest on the managedObjectContext. Is this a bad thing to do? Should I manage the count locally instead? The reason I went this route is to ensure I am getting 100% correct answer. With Core Data being accessed from more than one places (for example, through NSFetchedResultsController as well), it's hard to keep an accurate count locally.
-countForFetchRequest: is always evaluated in the persistent store. When using the Sqlite store, this will result in IO being performed.
Suggested strategy:
Cache the count returned from -countForFetchRequest:.
Observe NSManagedObjectContextObjectsDidChangeNotification for your own context.
Observe NSManagedObjectContextDidSaveNotification for related contexts.
For the simple case (no fetch predicate) you can update the count from the information contained in the notification without additional IO.
Alternately, you can invalidate your cached count and refresh via -countForFetchRequest: as necessary.

Resources