I have a shared boolean array of size equal to number of threads. All threads will take a copy of the corresponding element of the array based on index. All threads except one(which negates the value of shared boolean array) goes into a while loop comparing its local copy with shared one. Whenever there is a change , it is supposed to exit the loop. But all threads remain struck in the loop forever. I made sure that the share values are updated correctly. Even tried declaring the shared array as volatile. But it didn't work out. What am i missing here?
Related
I had a another strange problem that I solved already. But I'm not sure I just luckily fixed it or I really understand what's going on. So basically I have perform a query on my facts via:
DATA_OBJECT decay_tree_fact_list;
std::stringstream clips_query;
clips_query << "(find-all-facts ((?f DecayTree)) TRUE)";
EnvEval(clips_environment_, clips_query.str().c_str(), &decay_tree_fact_list);
Then I go through the list of facts and retrieve the needed information. There I also make another "subquery" for each of the found facts above in the following way
DATA_OBJECT spin_quantum_number_fact_list;
std::stringstream clips_query;
clips_query << "(find-fact ((?f SpinQuantumNumber)) (= ?f:unique_id "
<< spin_quantum_number_unique_id << "))";
EnvEval(clips_environment_, clips_query.str().c_str(),
&spin_quantum_number_fact_list);
This all works fine for the first DecayTree fact, no matter at which position I start, but for the next one it crashes, because the fact address is bogus. I traced the problem down to the subquery I make. So what I did to solve the problem was to save all the DecayTree fact addresses in a vector and then process that. Since I could not find any information about my theory so far I wanted to ask here.
So my question is quite simple, and would be: If I perform two queries, after each other, does the retrieved information of the first query get invalidated as soon as I call the second query?
The EnvEval function should be marked in the documentation as triggering garbage collection, but it is not. CLIPS internally represents string, integers, floats, and other primitives similar to other languages (such as Java) which allow instances of classes such as String, Integer, and Float. As these values are dynamically created, they need to be subject to garbage collection when they are no longer used. Internally CLIPS uses reference counts to determine whether these values are referenced, but when these values are returned to a user's code it is not possible to know if they are referenced without some action from the user's code.
When you call EnvEval, the value it returns is exempt from garbage collection. It is not exempt the next time EnvEval is called. So if you immediately process the value returned or save it (i.e. allocate storage for a string and copy the value from CLIPS or save the fact addresses from a multifield in an array), then you don't need to worry about the value returned by CLIPS being garbage collected by a subsequent EnvEval call.
If you want to execute a series of EnvEval calls (or other CLIPS function which may trigger garbage collection) without having to worry about values being garbage collected, wrap the calls within EnvIncrementGCLocks/EnvDecrementGCLocks
EnvIncrementGCLocks(theEnv);
... Your Calls ...
EnvDecrementGCLocks(theEnv);
Garbage collection for all the values returned to your code will be temporarily disabled while you make the calls and then when you finish by calling EnvDecrementGCLocks the values will be garbage collected.
There's some additional information on garbage collection in section 1.4 of the Advanced Programming Guide.
I am embedding Python in C++.
I have a working C++ Python extension object.
The only thing wrong is that if I set tp_dealloc to a custom function it never gets called.
I would have thought Py_Finalize() would trigger this, or maybe terminating the program. But no.
Could anyone suggest why tp_dealloc isn't getting hit?
I believe the problem here was one of reference counting.
PyType_Ready() fills various tp_* fields depending on the bases of your type.
One of these is tp_alloc, which I have set to 0.
Its doc says the refcount is set to 1 and the memory block is zeroed.
Every instance Python creates of this type, a new PyObject get added to the appropriate Python Dictionary. If it is a module level variable, this is the module's dictionary.
When the dictionary is destroyed, it DECREF-s contained objects. Now the refcount will be 0, and tp_dealloc will get run.
It appears that in my code I was performing an extra INCREF somewhere and the object was never getting garbage collected.
It seems that (unless you compile with the specific flag) Python has no linked list that would allow it to track all of its objects. So we can't assume that Py_Finalize() will clear up. It won't!
Instead, every object is held in the dictionary for its containing scope, and so on back to the module dictionary. When this module dictionary is destroyed, the destruction will creep outwards through all the branches.
I have a subroutine which is called quite a lot during the Program run. I try to use as many allocatable arrays as possible, and the subroutine is called several times without any problem, but at some point, it terminates with:
malloc.c:3790: _int_malloc: Assertion `(unsigned long)(size) >= (unsigned long)(nb)' failed.
this happens at the beginning of the Subroutine when the first array is being allocated.
Using non-allocatable array instead, the subroutine is called several times more often but terminates again, now with:
wait: 28674: Memory fault(coredump)
I assume that it terminates on calling, because I write out some values right after the declaration of the variables, without any computation.
The calling
do k=1, kreise
write(*,*)k
call rundheit(n(k),kreis(k,1:n(k),3),kreis(k,1:n(k),2),outrnd)
end do
Where 'kreise' may have values of up to 1500. I printed out and checked the values of the parameters passed, before the call, in the subroutine and after the call.
Limiting 'kreise' does solve the problem, but limiting is not a practical solution. I need all the data to be evaluated. Not a fracture of it.
Some notes to my environment:
My program is a subroutine compiled by an FEM-Simulation-software using the Intel Fortran compiler. As far as I know I have no chance to alter the compiler options and I cannot compile my code on its own because it has to many dependencies on the subroutines deployed by the FEM-software.
I developed and run this exact subroutine on another, much smaller and simpler Simulation without any problems. The actual, ‘bigger’, simulation runs also without any problems as long as I don’t use this particular subroutine.(The difference is mostly the node density and thus the amount of data being considered during the computation) Other user-subroutine work without problems. All the subroutine do is fetch the results between some of the increments, do some analyses and write some reports without altering the Simulation.
I guess that the problem has something to do with the memory handling, something i have no idea of.
Thanks.
UPDATE
I compiled the subroutine using -check all and found the error occurs way before the blamed subroutine. Two arrays, one of them n(), are out of bound on several occasions, but the error gets somehow (more) critical while calling. The strange part is, that it is some iterations beyond the bound when the error occurs, for example: here both arrays have the size (1:72) and the calling breaks somewhere for k=135 to 267, (the lowest and highest values I found during some runs).
The problem is the integer Kreise, which value is set during a loop:
...
allocate(n(l))
allocate(pos(l))
...
do kreise = 1,l
pos(kreise)=minvalX+(Kreise-1)*IncX
if(pos(kreise).gt.maxvalX) exit
end do
Where kreise allways becomes l+1. Why?
NOTE: pos(kreise).gt.maxvalX should never be true. Becomming true isn't a problem, allthough it suggest that l was computed wrong (to big). This exit would then only save computation time later, by reducing the iterations of several loops.
The program may be writing to memory that it shouldn't write to and corrupting the structures of the memory management of malloc that is used by Fortran allocate. I suggest using the Fortran option for run-time subscript checking. With ifort, try -check all or -check bounds.
I have 2 threaded methods running in 2 separate places but sharing access at the same time to a list array object (lets call it PriceArray), the first thread Adds and Removes items from PriceArray when necessary (the content of the array gets updated from a third party data provider) and the average update rate is between 0.5 and 1 second.
The second thread only reads -for now- the content of the array every 3 seconds using a foreach loop (takes most items but not all of them).
To ensure avoiding the nasty Collection was modified; enumeration operation may not execute exception when the second thread loops through the array I have wrapped the add and remove operation in the first thread with lock(PriceArray) to ensure exclusive access and prevent that exception from occurring. The problem is I have noticed a performance issue when the second method tries to loop through the array items as most of the time the array is locked by the add/remove thread.
Having the scenario running this way, do you have any suggestions how to improve the performance using other thread-safety/exclusive access tactics in C# 4.0?
Thanks.
Yes, there are many alternatives.
The best/easiest would be to switch to using an appropriate collection in System.Collections.Concurrent. These are all thread-safe collections, and will allow you to use them without managing your own locks. They are typically either lock-free or use very fine grained locking, so will likely dramatically improve the performance impacts you're getting from the synchronization.
Another option would be to use ReaderWriterLockSlim to allow your readers to not block each other. Since a third party library is writing this array, this may be a more appropriate solution. It would allow you to completely block during writing, but the readers would not need to block each other during reads.
My suggestion is that ArrayList.Remove() takes most of the time, because in order to perform deletion it performs two costly things:
linear search: just takes elements one by one and compares with element being removed
when index of the element being removed is found - it shifts everything below it by one position to the left.
Thus every deletion takes time proportionally to count of elements currently in the collection.
So you should try to replace ArrayList with more appropriate structure for this task. I need more information about your case to suggest which one to choose.
There is nothing in the way the program uses this data which will cause the program to crash if it reads the old value rather than the new value. It will get the new value at some point.
However, I am wondering if reading and writing at the same time from multiple threads can cause problems for the OS?
I am yet to see them if it does. The program is developed in Linux using pthreads.
I am not interested in being told how to use mutexs/semaphores/locks/etc edit: so my program is only getting the new values, that is not what I'm asking.
No.. the OS should not have any problem. The tipical problem is the that you dont want to read the old values or a value that is half way updated, and thus not valid (and may crash your app or if the next value depends on the former, then you can get a corrupted value and keep generating wrong values all the itme), but if you dont care about that, the OS wont either.
Are the kernel/drivers reading that data for any reason (eg. it contains structures passed in to kernel APIs)? If no, then there isn't any issue with it, since the OS will never ever look at your hot memory.
Your own reads must ensure they are consistent so you don't read half of a value pre-update and half post-update and end up with a value that is neither pre neither post update.
There is no danger for the OS. Only your program's data integrity is at risk.
Imagine you data to consist of a set (structure) of values, which cannot be updated in an atomic operation. The reading thread is bound to read inconsistent data at some point (data consisting of a mixture of old and new values). But you did not want to hear about mutexes...
Problems arise when multiple threads share access to data when accessing that data is not atomic. For example, imagine a struct with 10 interdependent fields. If one thread is writing and one is reading, the reading thread is likely to see a struct that is halfway between one state and another (for example, half of it's members have been set).
If on the other hand the data can be read and written to with a single atomic operation, you will be fine. For example, imagine if there is a global variable that contains a count... One thread is incrementing it on some condition, and another is reading it and taking some action... In this case, there is really no intermediate inconsistent state. It's either got the new value, or it has the old value.
Logically, you can think of locking as a tool that lets you make arbitrary blocks of code atomic, at least as far as the other threads of execution are concerned.