concurrent saving from two different threads to Core Data persistant store with unique entity Id - multithreading

I'm implementing multithreaded core data downloader.
I have a problem with doubling objects while saving objects with unique string attribute in Entity.
If 2 threads are downloading from the same url simultaneously (f.e., updater-timer fires and application enters foreground - so user calls update method), I cant check existanse of object with unique attribute value in persistant store, so objects are doubling.
How can I avoid doubling objects and what is the best solution in terms of performance?
description: (sorry, I cant post images yet)
http://i.stack.imgur.com/yMBgQ.png

Another approach would be to perform the download/save within an NSOperation, and prior to adding an operation to the queue, you could check to see if there was an existing operation to download that URL in the NSOperationQueue.
The advantage of this approach is that you don't download any more data than is necessary.

I've run into this before and it's a tricky problem.
I solved it by performing by downloads in separate background threads (the same as you are doing now) but all code data write operations happen on a global NSOperation queue with numConcurrentOperations set to 1. When each background download was complete it created an NSOperation and put it onto that queue.
Good: Very simple thread safety - the NSOperationQueue ensured that only one thread was writing to CoreData at any one point.
Bad: Slight hit in terms of performance because the Core Data operations were working in series, not in parallel. This can be mitigated by doing any calculations needed on the data in the download background thread and doing as little as possible in the Core Data operation.

Related

non-blocking producer and consumer using .NET 2.0

In our scenario,
the consumer takes at least half-a-second to complete a cycle of process (against a row in a data table).
Producer produces at least 8 items in a second (no worries, we don't mind about the duration of a consuming).
the shared data is simply a data table.
we should never ask producer to wait (as it is a server and we don't want it to wait on this)
How can we achieve the above without locking the data table at all (as we don't want producer to wait in any way).
We cannot use .NET 4.0 yet in our org.
There is a great example of a producer/consumer queue using Monitors at this page under the "Producer/Consumer Queue" section. In order to synchronize access to the underlying data table, you can have a single consumer.
That page is probably the best resource for threading in .NET on the net.
Create a buffer that holds the data while it is being processed.
It takes you half a second to process, and you get 8 items a second... unless you have at least 4 processors working on it, you'll have a problem.
Just to be safe I'd use a buffer at least twice the side needed (16 rows), and make sure it's possible with the hardware.
There is no magic bullet that is going to let you access a DataTable from multiple threads without using a blocking synchronization mechanism. What I would do is to hold the lock for as short a duration as possible. Keep in mind that modifying any object in the data table's hierarchy will require locking the whole data table. This is because modifying a column value on a DataRow can change the internal indexing structures inside the parent DataTable.
So what I would do is from the producer acquire a lock, add a new row, and release the lock. Then in the conumser you will acquire the same lock, copy data contained in a DataRow into a separate data structure, and then release the lock immediately. Now, you can operate on the copied data without synchronization mechanisms since it is isolated. After you have completed the operation on it you will again acquire the lock, merge the changes back into the DataRow, and then release the lock and start the process all over again.

Creating Dependencies Within An NSOperation

I have a fairly involved download process I want to perform in a background thread. There are some natural dependencies between steps in this process. For example, I need to complete the downloads of both Table A and Table B before setting the relationships between them (I'm using Core Data).
I thought first of putting each dependent step in its own NSOperation, then creating a dependency between the two operations (i.e. download the two tables in one operation, then set the relationship between them in the next, dependent operation). However, each NSOperation requires it's own NSManagedContext, so this is no good. I don't want to save the background context until both tables have been downloaded and their relationships set.
I've therefore concluded this should all occur inside one NSOperation, and that I should use notifications or some other mechanism to call the dependent method when all the conditions for running it have been met.
I'm an iOS beginner, however, so before I venture down this path, I wouldn't mind advice on whether I've reached the right conclusion.
Given your validation requirements, I think it will be easiest inside of one operation, although this could turn into a bit of a hairball as far as code structure goes.
You'll essentially want to make two wire fetches to get the entire dataset you require, then combine the data and parse it at one time into Core Data.
If you're going to use the asynchronous API's this essentially means structuring a class that waits for both operations to complete and then launches another NSOperation or block which does the parse and relationship construction.
Imagine this order of events:
User performs some action (button tap, etc.)
Selector for that action fires two network requests
When both requests have finished (they both notify a common delegate) launch the parse operation
Might look something like this in code:
- (IBAction)someAction:(id)sender {
//fire both network requests
request1.delegate = aDelegate;
request2.delegate = aDelegate;
}
//later, inside the implementation of aDelegate
- (void)requestDidComplete... {
if (request1Finished && request2Finished) {
NSOperation *parse = //init with fetched data
//launch on queue etc.
}
}
There's two major pitfalls that this solution is prone to:
It keeps the entire data set around in memory until both requests are finished
You will have to constantly switch on the specific request that's calling your delegate (for error handling, success, etc.)
Basically, you're implementing operation dependencies on your own, although there might not be a good way around that because of the structure of NSURLConnection.

Silverlight Multithreading; Need to Synchronize?

I have a Silverlight app where I've implemented the M-V-VM pattern so my actual UI elements (Views) are separated from the data (Models). Anyways, at one point after the user has gone and done some selections and possible other input, I'd like to asyncronously go though the model and scan it and compile a list of optiions that the user has changed (different from the default), and eventually update that on the UI as a summary, but that would be a final step.
My question is that if I use a background worker to do this, up until I actually want to do the UI updates, I just want to read current values in one of my models, I don't have to synchronize access to the model right? I'm not modifying data just reading current values...
There are Lists (ObservableCollections), so I will have to call methods of those collections like "_ABCCollection.GetSelectedItems()" but again I'm just reading, I'm not making changes. Since they are not primitives, will I have to synchronize access to them for just reads, or does that not matter?
I assume I'll have to sychronize my final step as it will cause PropertyChanged events to fire and eventually the Views will request the new data through the bindings...
Thanks in advance for any and all advice.
You are correct. You can read from your Model objects and ObservableCollections on a worker thread without having a cross-thread violation. Getting or setting the value of a property on a UI element (more specifically, an object that derives from DispatcherObject) must be done on the UI thread (more specifically, the thread on which the DispatcherObject subclass instance was created). For more info about this, see here.

Resolving an NSManagedObject conflict with multiple threads, relationships, and pointers

I'm having a conflict when saving a bunch of NSManagedObjects via an outside thread. For starters, I can tell you the following:
I'm using a separate MOC for each thread.
The MOCs share the same persistent store coordinator.
It's likely that an outside thread is modifying one or many of the records that I'm saving.
OK, so with that out of the way, here's what I'm doing.
In my outside thread, I'm doing some computation and updating a single value in a bunch of managed objects. I do this by looking up the object in the persistent store by my primary key, modifying the single decimal property, and then calling save on the bunch all at once.
In the meantime, I believe the main thread is doing some updating of its own.
When my outside thread does its big save on its managed object context, I get an exception thrown stating a large number of conflicts. All of the conflicts seem to be centered around a single relationship on each record. Though the managed object in the persistent store and my outside thread share the same ObjectID for this relationship, they don't share the same pointer. Based on what I see, that's the only thing that's different between the objects in my NSMergeConflict debug output.
It makes sense to me why the two objects have relationships with different pointers -- they're in different threads. However, as I understand it from Apple's documentation, the only thing cached when an object is first retrieved from the persistent store are the global IDs. So, one would think that when I run save on the outside thread MOC, it compares the ObjectIDs, sees they're the same, and lets it all through.
So, can anyone tell me why I'm getting a conflict?
Per the documentation in the Concurrency with Core Data chapter of The Core Data Programming Guide, the recommended configuration is for the contexts to share the same persistent store coordinator, not just the same persistent store.
Also, the section Track Changes in Other Threads Using Notifications of the same chapter states if you're tracking updates with the NSManagedObjectContextDidSaveNotification then you send -mergeChangesFromContextDidSaveNotification to the main thread's context so it can merge the changes. But if you're tracking with NSManagedObjectContextDidChangeNotification then the external thread should send the object IDs of the modified objects to the main thread which will then send -refreshObject:mergeChanges: to its context for each modified object.
And really, you should know if the main thread is also performing updates through its controller, and propagate its changes in like manner but in the opposite direction.
You need to have all your contexts listening for NSManagedObjectContextDidSaveNotification from any context that makes changes. Otherwise, only the front context will be aware of changes made on the background threads but the background context won't be aware of changes on the front thread.
So, if you have three threads and three context each of which makes changes, all three context must register for notifications from the other two.
Unfortunately, it seems as though this bug was actually being caused by something else -- I was calling the operation causing the error more than once at the same time when I shouldn't have been. Although this doesn't answer the initial question as to why pointers matter in conflicts, updating my code to prevent this situation has resolved my issue.

C++/CLI efficient multithreaded circular buffer

I have four threads in a C++/CLI GUI I'm developing:
Collects raw data
The GUI itself
A background processing thread which takes chunks of raw data and produces useful information
Acts as a controller which joins the other three threads
I've got the raw data collector working and posting results to the controller, but the next step is to store all of those results so that the GUI and background processor have access to them.
New raw data is fed in one result at a time at regular (frequent) intervals. The GUI will access each new item as it arrives (the controller announces new data and the GUI then accesses the shared buffer). The data processor will periodically read a chunk of the buffer (a seconds worth for example) and produce a new result. So effectively, there's one producer and two consumers which need access.
I've hunted around, but none of the CLI-supplied stuff sounds all that useful, so I'm considering rolling my own. A shared circular buffer which allows write-locks for the collector and read locks for the gui and data processor. This will allow multiple threads to read the data as long as those sections of the buffer are not being written to.
So my question is: Are there any simple solutions in the .net libraries which could achieve this? Am I mad for considering rolling my own? Is there a better way of doing this?
Is it possible to rephrase the problem so that:
The Collector collects a new data point ...
... which it passes to the Controller.
The Controller fires a GUI "NewDataPointEvent" ...
... and stores the data point in an array.
If the array is full (or otherwise ready for processing), the Controller sends the array to the Processor ...
... and starts a new array.
If the values passed between threads are not modified after they are shared, this might save you from needing the custom thread-safe collection class, and reduce the amount of locking required.

Resources