Clips multiple EnvEval queries invalidate previous result objects? - object

I had a another strange problem that I solved already. But I'm not sure I just luckily fixed it or I really understand what's going on. So basically I have perform a query on my facts via:
DATA_OBJECT decay_tree_fact_list;
std::stringstream clips_query;
clips_query << "(find-all-facts ((?f DecayTree)) TRUE)";
EnvEval(clips_environment_, clips_query.str().c_str(), &decay_tree_fact_list);
Then I go through the list of facts and retrieve the needed information. There I also make another "subquery" for each of the found facts above in the following way
DATA_OBJECT spin_quantum_number_fact_list;
std::stringstream clips_query;
clips_query << "(find-fact ((?f SpinQuantumNumber)) (= ?f:unique_id "
<< spin_quantum_number_unique_id << "))";
EnvEval(clips_environment_, clips_query.str().c_str(),
&spin_quantum_number_fact_list);
This all works fine for the first DecayTree fact, no matter at which position I start, but for the next one it crashes, because the fact address is bogus. I traced the problem down to the subquery I make. So what I did to solve the problem was to save all the DecayTree fact addresses in a vector and then process that. Since I could not find any information about my theory so far I wanted to ask here.
So my question is quite simple, and would be: If I perform two queries, after each other, does the retrieved information of the first query get invalidated as soon as I call the second query?

The EnvEval function should be marked in the documentation as triggering garbage collection, but it is not. CLIPS internally represents string, integers, floats, and other primitives similar to other languages (such as Java) which allow instances of classes such as String, Integer, and Float. As these values are dynamically created, they need to be subject to garbage collection when they are no longer used. Internally CLIPS uses reference counts to determine whether these values are referenced, but when these values are returned to a user's code it is not possible to know if they are referenced without some action from the user's code.
When you call EnvEval, the value it returns is exempt from garbage collection. It is not exempt the next time EnvEval is called. So if you immediately process the value returned or save it (i.e. allocate storage for a string and copy the value from CLIPS or save the fact addresses from a multifield in an array), then you don't need to worry about the value returned by CLIPS being garbage collected by a subsequent EnvEval call.
If you want to execute a series of EnvEval calls (or other CLIPS function which may trigger garbage collection) without having to worry about values being garbage collected, wrap the calls within EnvIncrementGCLocks/EnvDecrementGCLocks
EnvIncrementGCLocks(theEnv);
... Your Calls ...
EnvDecrementGCLocks(theEnv);
Garbage collection for all the values returned to your code will be temporarily disabled while you make the calls and then when you finish by calling EnvDecrementGCLocks the values will be garbage collected.
There's some additional information on garbage collection in section 1.4 of the Advanced Programming Guide.

Related

is there a way to access to NodeJS (V8) GC reference counts?

so i've implemented an experimental cache for my memory-hungry app, and thrown a heap into the mix so i can easily get the least accessed objects once the cache outgrows a certain limit—the idea is to purge the cache from objects that are likely not re-used any time soon and if so, retrieve them from the database.
so far, so fine, except there may be objects that have not yet been written to the database and should not be purged. i can handle that by setting a 'dirty' bit, no problem. but there is another source of problems: what if there are still valid references to a given, cached object lurking around somewhere? this may lead to a situation where function f holds a reference A to an object with an ID of xxx, which then gets purged from cache, and then another function g requests an object with the same ID of xxx, but gets another reference B, distinct from A. so far i'm building my software on the assumption that there should only ever be a single instance of any persisted object with a given ID (maybe that's stupid?).
my guess so far is that i could profit from a garbage-collection-related method like gc.get_reference_count( value )—checking that and knowing any count above 1 (since value is in the cache) means some closure is still holding on to value, so it should not be purged.
i haven't found anything useful in this direction. does the problem in general call for another solution?

Is it required to lock shared variables in perl for read access?

I am using shared variables on perl with use threads::shared.
That variables can we modified only from single thread, all other threads are only 'reading' that variables.
Is it required in the 'reading' threads to lock
{
lock $shared_var;
if ($shared_var > 0) .... ;
}
?
isn't it safe to simple verification without locking (in the 'reading' thread!), like
if ($shared_var > 0) ....
?
Locking is not required to maintain internal integrity when setting or fetching a scalar.
Whether it's needed or not in your particular case depends on the needs of the reader, the other readers and the writers. It rarely makes sense not to lock, but you haven't provided enough details for us to determine what your needs are.
For example, it might not be acceptable to use an old value after the writer has updated the shared variable. For starters, this can lead to a situation where one thread is still using the old value while the another thread is using the new value, a situation that can be undesirable if those two threads interact.
It depends on whether it's meaningful to test the condition just at some point in time or other. The problem however is that in a vast majority of cases, that Boolean test means other things, which might have already changed by the time you're done reading the condition that says it represents a previous state.
Think about it. If it's an insignificant test, then it means little--and you have to question why you are making it. If it's a significant test, then it is telltale of a coherent state that may or may not exist anymore--you won't know for sure, unless you lock it.
A lot of times, say in real-time reporting, you don't really care which snapshot the database hands you, you just want a relatively current one. But, as part of its transaction logic, it keeps a complete picture of how things are prior to a commit. I don't think you're likely to find this in code, where the current state is the current state--and even a state of being in a provisional state is a definite state.
I guess one of the times this can be different is a cyclical access of a queue. If one consumer doesn't get the head record this time around, then one of them will the next time around. You can probably save some processing time, asynchronously accessing the queue counter. But here's a case where it means little in context of just one iteration.
In the case above, you would just want to put some locked-level instructions afterward that expected that the queue might actually be empty even if your test suggested it had data. So, if it is just a preliminary test, you would have to have logic that treated the test as unreliable as it actually is.

Thread locking / exclusive access improvements

I have 2 threaded methods running in 2 separate places but sharing access at the same time to a list array object (lets call it PriceArray), the first thread Adds and Removes items from PriceArray when necessary (the content of the array gets updated from a third party data provider) and the average update rate is between 0.5 and 1 second.
The second thread only reads -for now- the content of the array every 3 seconds using a foreach loop (takes most items but not all of them).
To ensure avoiding the nasty Collection was modified; enumeration operation may not execute exception when the second thread loops through the array I have wrapped the add and remove operation in the first thread with lock(PriceArray) to ensure exclusive access and prevent that exception from occurring. The problem is I have noticed a performance issue when the second method tries to loop through the array items as most of the time the array is locked by the add/remove thread.
Having the scenario running this way, do you have any suggestions how to improve the performance using other thread-safety/exclusive access tactics in C# 4.0?
Thanks.
Yes, there are many alternatives.
The best/easiest would be to switch to using an appropriate collection in System.Collections.Concurrent. These are all thread-safe collections, and will allow you to use them without managing your own locks. They are typically either lock-free or use very fine grained locking, so will likely dramatically improve the performance impacts you're getting from the synchronization.
Another option would be to use ReaderWriterLockSlim to allow your readers to not block each other. Since a third party library is writing this array, this may be a more appropriate solution. It would allow you to completely block during writing, but the readers would not need to block each other during reads.
My suggestion is that ArrayList.Remove() takes most of the time, because in order to perform deletion it performs two costly things:
linear search: just takes elements one by one and compares with element being removed
when index of the element being removed is found - it shifts everything below it by one position to the left.
Thus every deletion takes time proportionally to count of elements currently in the collection.
So you should try to replace ArrayList with more appropriate structure for this task. I need more information about your case to suggest which one to choose.

Can I use [self retain] to hold the object itself in objective-c?

I'm using [self retain] to hold an object itself, and [self release] to free it elsewhere. This is very convenient sometimes. But this is actually a reference-loop, or dead-lock, which most garbage-collection systems target to solve. I wonder if objective-c's autorelease pool may find the loops and give me surprises by release the object before reaching [self release]. Is my way encouraged or not? How can I ensure that the garbage-collection, if there, won't be too smart?
This way of working is very discouraged. It looks like you need some pointers on memory management.
Theoretically, an object should live as long as it is useful. Useful objects can easily be spotted: they are directly referenced somewhere on a thread stack, or, if you made a graph of all your objects, reachable through some path linked to an object referenced somewhere on a thread stack. Objects that live "by themselves", without being referenced, cannot be useful, since no thread can reach to them to make them perform something.
This is how a garbage collector works: it traverses your object graph and collects every unreferenced object. Mind you, Objective-C is not always garbage-collected, so some rules had to be established. These are the memory management guidelines for Cocoa.
In short, it is based over the concept of 'ownership'. When you look at the reference count of an object, you immediately know how many other objects depend on it. If an object has a reference count of 3, it means that three other objects need it to work properly (and thus own it). Every time you keep a reference to an object (except in rare conditions), you should call its retain method. And before you drop the reference, you should call its release method.
There are some other importants rule regarding the creation of objects. When you call alloc, copy or mutableCopy, the object you get already has a refcount of 1. In this case, it means the calling code is responsible for releasing the object once it's not required. This can be problematic when you return references to objects: once you return it, in theory, you don't need it anymore, but if you call release on it, it'll be destroyed right away! This is where NSAutoreleasePool objects come in. By calling autorelease on an object, you give up ownership on it (as if you called release), except that the reference is not immediately revoked: instead, it is transferred to the NSAutoreleasePool, that will release it once it receives the release message itself. (Whenever some of your code is called back by the Cocoa framework, you can be assured that an autorelease pool already exists.)
It also means that you do not own objects if you did not call alloc, copy or mutableCopy on them; in other words, if you obtain a reference to such an object otherwise, you don't need to call release on it. If you need to keep around such an object, as usual, call retain on it, and then release when you're done.
Now, if we try to apply this logic to your use case, it stands out as odd. An object cannot logically own itself, as it would mean that it can exist, standalone in memory, without being referenced by a thread. Obviously, if you have the occasion to call release on yourself, it means that one of your methods is being executed; therefore, there's gotta be a reference around for you, so you shouldn't need to retain yourself in the first place. I can't really say with the few details you've given, but you probably need to look into NSAutoreleasePool objects.
If you're using the retain/release memory model, it shouldn't be a problem. Nothing will go looking for your [self retain] and subvert it. That may not be the case, however, if you ever switch over to using garbage collection, where -retain and -release are no-ops.
Here's another thread on SO on the same topic.
I'd reiterate the answer that includes the phrase "overwhelming sense of ickyness." It's not illegal, but it feels like a poor plan unless there's a pretty strong reason. If nothing else, it seems sneaky, and that's never good in code. Do heed the warning in that thread to use -autorelease instead of -release.

Is reading data in one thread while it is written to in another dangerous for the OS?

There is nothing in the way the program uses this data which will cause the program to crash if it reads the old value rather than the new value. It will get the new value at some point.
However, I am wondering if reading and writing at the same time from multiple threads can cause problems for the OS?
I am yet to see them if it does. The program is developed in Linux using pthreads.
I am not interested in being told how to use mutexs/semaphores/locks/etc edit: so my program is only getting the new values, that is not what I'm asking.
No.. the OS should not have any problem. The tipical problem is the that you dont want to read the old values or a value that is half way updated, and thus not valid (and may crash your app or if the next value depends on the former, then you can get a corrupted value and keep generating wrong values all the itme), but if you dont care about that, the OS wont either.
Are the kernel/drivers reading that data for any reason (eg. it contains structures passed in to kernel APIs)? If no, then there isn't any issue with it, since the OS will never ever look at your hot memory.
Your own reads must ensure they are consistent so you don't read half of a value pre-update and half post-update and end up with a value that is neither pre neither post update.
There is no danger for the OS. Only your program's data integrity is at risk.
Imagine you data to consist of a set (structure) of values, which cannot be updated in an atomic operation. The reading thread is bound to read inconsistent data at some point (data consisting of a mixture of old and new values). But you did not want to hear about mutexes...
Problems arise when multiple threads share access to data when accessing that data is not atomic. For example, imagine a struct with 10 interdependent fields. If one thread is writing and one is reading, the reading thread is likely to see a struct that is halfway between one state and another (for example, half of it's members have been set).
If on the other hand the data can be read and written to with a single atomic operation, you will be fine. For example, imagine if there is a global variable that contains a count... One thread is incrementing it on some condition, and another is reading it and taking some action... In this case, there is really no intermediate inconsistent state. It's either got the new value, or it has the old value.
Logically, you can think of locking as a tool that lets you make arbitrary blocks of code atomic, at least as far as the other threads of execution are concerned.

Resources