Thread locking / exclusive access improvements - multithreading

I have 2 threaded methods running in 2 separate places but sharing access at the same time to a list array object (lets call it PriceArray), the first thread Adds and Removes items from PriceArray when necessary (the content of the array gets updated from a third party data provider) and the average update rate is between 0.5 and 1 second.
The second thread only reads -for now- the content of the array every 3 seconds using a foreach loop (takes most items but not all of them).
To ensure avoiding the nasty Collection was modified; enumeration operation may not execute exception when the second thread loops through the array I have wrapped the add and remove operation in the first thread with lock(PriceArray) to ensure exclusive access and prevent that exception from occurring. The problem is I have noticed a performance issue when the second method tries to loop through the array items as most of the time the array is locked by the add/remove thread.
Having the scenario running this way, do you have any suggestions how to improve the performance using other thread-safety/exclusive access tactics in C# 4.0?
Thanks.

Yes, there are many alternatives.
The best/easiest would be to switch to using an appropriate collection in System.Collections.Concurrent. These are all thread-safe collections, and will allow you to use them without managing your own locks. They are typically either lock-free or use very fine grained locking, so will likely dramatically improve the performance impacts you're getting from the synchronization.
Another option would be to use ReaderWriterLockSlim to allow your readers to not block each other. Since a third party library is writing this array, this may be a more appropriate solution. It would allow you to completely block during writing, but the readers would not need to block each other during reads.

My suggestion is that ArrayList.Remove() takes most of the time, because in order to perform deletion it performs two costly things:
linear search: just takes elements one by one and compares with element being removed
when index of the element being removed is found - it shifts everything below it by one position to the left.
Thus every deletion takes time proportionally to count of elements currently in the collection.
So you should try to replace ArrayList with more appropriate structure for this task. I need more information about your case to suggest which one to choose.

Related

Clips multiple EnvEval queries invalidate previous result objects?

I had a another strange problem that I solved already. But I'm not sure I just luckily fixed it or I really understand what's going on. So basically I have perform a query on my facts via:
DATA_OBJECT decay_tree_fact_list;
std::stringstream clips_query;
clips_query << "(find-all-facts ((?f DecayTree)) TRUE)";
EnvEval(clips_environment_, clips_query.str().c_str(), &decay_tree_fact_list);
Then I go through the list of facts and retrieve the needed information. There I also make another "subquery" for each of the found facts above in the following way
DATA_OBJECT spin_quantum_number_fact_list;
std::stringstream clips_query;
clips_query << "(find-fact ((?f SpinQuantumNumber)) (= ?f:unique_id "
<< spin_quantum_number_unique_id << "))";
EnvEval(clips_environment_, clips_query.str().c_str(),
&spin_quantum_number_fact_list);
This all works fine for the first DecayTree fact, no matter at which position I start, but for the next one it crashes, because the fact address is bogus. I traced the problem down to the subquery I make. So what I did to solve the problem was to save all the DecayTree fact addresses in a vector and then process that. Since I could not find any information about my theory so far I wanted to ask here.
So my question is quite simple, and would be: If I perform two queries, after each other, does the retrieved information of the first query get invalidated as soon as I call the second query?
The EnvEval function should be marked in the documentation as triggering garbage collection, but it is not. CLIPS internally represents string, integers, floats, and other primitives similar to other languages (such as Java) which allow instances of classes such as String, Integer, and Float. As these values are dynamically created, they need to be subject to garbage collection when they are no longer used. Internally CLIPS uses reference counts to determine whether these values are referenced, but when these values are returned to a user's code it is not possible to know if they are referenced without some action from the user's code.
When you call EnvEval, the value it returns is exempt from garbage collection. It is not exempt the next time EnvEval is called. So if you immediately process the value returned or save it (i.e. allocate storage for a string and copy the value from CLIPS or save the fact addresses from a multifield in an array), then you don't need to worry about the value returned by CLIPS being garbage collected by a subsequent EnvEval call.
If you want to execute a series of EnvEval calls (or other CLIPS function which may trigger garbage collection) without having to worry about values being garbage collected, wrap the calls within EnvIncrementGCLocks/EnvDecrementGCLocks
EnvIncrementGCLocks(theEnv);
... Your Calls ...
EnvDecrementGCLocks(theEnv);
Garbage collection for all the values returned to your code will be temporarily disabled while you make the calls and then when you finish by calling EnvDecrementGCLocks the values will be garbage collected.
There's some additional information on garbage collection in section 1.4 of the Advanced Programming Guide.

What multithreading based data structure should I use?

I have recently come across a question based on multi-threading. I was given a situation where there will be variable no of cars constantly changing there locations. Also there are multiple users who are posting requests to get location of any car at any moment. What would be data structure to handle this situation and why?
You could use a mutex (one per car).
Lock: before changing location of the associated car
Unlock: after changing location of the associated car
Lock: before getting location of the associated car
Unlock: after done doing work that relies on that location being up to date
I'd answer with:
Try to make threading an external concept to your system yet make the system as modular and encapsulated as possible at the same time. It will allow adding concurrency at later phase at low cost and in case the solution happens to work nicely in a single thread (say by making it event-loop-based) no time will have been burnt for nothing.
There are several ways to do this. Which way you choose depends a lot on the number of cars, the frequency of updates and position requests, the expected response time, and how accurate (up to date) you want the position reports to be.
The easiest way to handle this is with a simple mutex (lock) that allows only one thread at a time to access the data structure. Assuming you're using a dictionary or hash map, your code would look something like this:
Map Cars = new Map(...)
Mutex CarsMutex = new Mutex(...)
Location GetLocation(carKey)
{
acquire mutex
result = Cars[carKey].Location
release mutex
return result
}
You'd do that for Add, Remove, Update, etc. Any method that reads or updates the data structure would require that you acquire the mutex.
If the number of queries far outweighs the number of updates, then you can do better with a reader/writer lock instead of a mutex. With an RW lock, you can have an unlimited number of readers, OR you can have a single writer. With that, querying the data would be:
acquire reader lock
result = Cars[carKey].Location
release reader lock
return result
And Add, Update, and Remove would be:
acquire writer lock
do update
release writer lock
Many runtime libraries have a concurrent dictionary data structure already built in. .NET, for example, has ConcurrentDictionary. With those, you don't have to worry about explicitly synchronizing access with a Mutex or RW lock; the data structure handles synchronization for you, either with a technique similar to that shown above, or by implementing lock-free algorithms.
As mentioned in comments, a relational database can handle this type of thing quite easily and can scale to a very large number of requests. Modern relational databases, properly constructed and with sufficient hardware, are surprisingly fast and can handle huge amounts of data with very high throughput.
There are other, more involved, methods that can increase throughput in some situations depending on what you're trying to optimize. For example, if you're willing to have some latency in reported position, then you could have position requests served from a list that's updated once per minute (or once every five minutes). So position requests are fulfilled immediately with no lock required from a static copy of the list that's updated once per minute. Updates are queued and once per minute a new list is created by applying the updates to the old list, and the new list is made available for requests.
There are many different ways to solve your problem.

concurrent saving from two different threads to Core Data persistant store with unique entity Id

I'm implementing multithreaded core data downloader.
I have a problem with doubling objects while saving objects with unique string attribute in Entity.
If 2 threads are downloading from the same url simultaneously (f.e., updater-timer fires and application enters foreground - so user calls update method), I cant check existanse of object with unique attribute value in persistant store, so objects are doubling.
How can I avoid doubling objects and what is the best solution in terms of performance?
description: (sorry, I cant post images yet)
http://i.stack.imgur.com/yMBgQ.png
Another approach would be to perform the download/save within an NSOperation, and prior to adding an operation to the queue, you could check to see if there was an existing operation to download that URL in the NSOperationQueue.
The advantage of this approach is that you don't download any more data than is necessary.
I've run into this before and it's a tricky problem.
I solved it by performing by downloads in separate background threads (the same as you are doing now) but all code data write operations happen on a global NSOperation queue with numConcurrentOperations set to 1. When each background download was complete it created an NSOperation and put it onto that queue.
Good: Very simple thread safety - the NSOperationQueue ensured that only one thread was writing to CoreData at any one point.
Bad: Slight hit in terms of performance because the Core Data operations were working in series, not in parallel. This can be mitigated by doing any calculations needed on the data in the download background thread and doing as little as possible in the Core Data operation.

non-blocking producer and consumer using .NET 2.0

In our scenario,
the consumer takes at least half-a-second to complete a cycle of process (against a row in a data table).
Producer produces at least 8 items in a second (no worries, we don't mind about the duration of a consuming).
the shared data is simply a data table.
we should never ask producer to wait (as it is a server and we don't want it to wait on this)
How can we achieve the above without locking the data table at all (as we don't want producer to wait in any way).
We cannot use .NET 4.0 yet in our org.
There is a great example of a producer/consumer queue using Monitors at this page under the "Producer/Consumer Queue" section. In order to synchronize access to the underlying data table, you can have a single consumer.
That page is probably the best resource for threading in .NET on the net.
Create a buffer that holds the data while it is being processed.
It takes you half a second to process, and you get 8 items a second... unless you have at least 4 processors working on it, you'll have a problem.
Just to be safe I'd use a buffer at least twice the side needed (16 rows), and make sure it's possible with the hardware.
There is no magic bullet that is going to let you access a DataTable from multiple threads without using a blocking synchronization mechanism. What I would do is to hold the lock for as short a duration as possible. Keep in mind that modifying any object in the data table's hierarchy will require locking the whole data table. This is because modifying a column value on a DataRow can change the internal indexing structures inside the parent DataTable.
So what I would do is from the producer acquire a lock, add a new row, and release the lock. Then in the conumser you will acquire the same lock, copy data contained in a DataRow into a separate data structure, and then release the lock immediately. Now, you can operate on the copied data without synchronization mechanisms since it is isolated. After you have completed the operation on it you will again acquire the lock, merge the changes back into the DataRow, and then release the lock and start the process all over again.

Threading and iterating through changing collections

In C# (console app) I want to hold a collection of objects. All objects are of same type.
I want to iterate through the collection calling a method on each object. And then repeat the process continuously.
However during iteration objects can be added or removed from the list. (The objects themselves will not be destroyed .. just removed from the list).
Not sure what would happen with a foreach loop .. or other similar method.
This has to have been done 1000 times before .. can you recommend a solid approach?
There is also copy based approach.
The algorithm is like that:
take the lock on shared collection
copy all items from shared collection to some local collection
release lock on shared collection
Iterate over items in local collection
The advantage of this approach is that you take the lock on shared collection for small period of time (assuming that shared collection is relatively small).
In case when method that you want to invoke on every collection item takes some considerable time to complete or can block then the approach of iterating under shared lock can lead to blocking other threads that want to add/remove items from shared collection
However if that method that you want to invoke on every object is relatively fast then iterating under shared lock can be more preferable.
This is classic case of syncronization in multithreading.
Only solid approach and better approach would be syncronization between looping and addition/deletion of items from list.
Means you should allow addition/deletion only at end of end and start of iterating loop!
some thing like this:-
ENTER SYNC_BLOCK
WAIT FOR SYNC_BLOCK to be available
LOOP for items/ call method on them.
LEAVE SYNC_BLOCK
ENTER SYNC_BLOCK
WAIT FOR SYNC_BLOCK to be available
Add/Delete items
LEAVE SYNC_BLOCK
What comes to mind when I read this example is that you could use a C5 TreeSet/TreeBag. It does require that there be a way to order your items, but the advantage of the Tree collections is that they offer a Snapshot method (A member of C5.IPersistentSorted) that allows you to make light-weight snapshots of the state of the collection without needing to make a full duplicate.
e.g.:
using(var copy = mySet.Snapshot()) {
foreach(var item in copy) {
item.DoSomething();
}
}
C5 also offers a simple way to "apply to all" and is compatible with .NET 2.0:
using(var copy = mySet.Snapshot()) {
copy.Apply(i => i.DoSomething());
}
It's important to note that the snapshot should be disposed or you will incur a small performance penalty on subsequent modifications to the base collection.
This example is from the very thorough C5 Book.

Resources