I have entities that are managed by Core Data and have several cases where, within a single method, I set some attribute values that will result graph changes that Core Data will enforce and perform additional actions that (logically) depend on uptodate state for the graph.
Is there any reason not to call processPendingChanges after each time a relationship is set, to ensure that the graph is always fully uptodate? Everything works as it should when I do this, but, clearly, it's a bit "noisy", and breaks up some processing that would otherwise be notifications (e.g, fetched results controllers that end up sending lots of controllerWillChangeContent/controllerDidChangeContent to their deligates when one would otherwise have happened).
ADDITION:
Will the graph always be up-to-date after a return from any method that makes changes to an entity?
I found it the hard way that you should call processPendingChanges before inspecting deletedObjects of NSManagedObjectContext. At least if some relationships have deleteRule set to NSCascadeDeleteRule.
If you don't call processPendingChanges then deletedObjects may not contain objects that will be deleted by cascade at the end of current event.
processPendingChanges is most often used on iOS with multiple context operating on seperate threads. It plays a bigger and more common role under MacOS.
You usually don't have to call it under iOS in most circumstances. Doing so doesn't really give you much of an advantage and it can cause lags in the UI when executed on the main thread if you have a complex graph.
I wouldn't bother with it unless testing reveals you are loosing graph integrity for some reason.
Related
I was wondering if anyone has done any perf tests around the effect calling EF Cores SaveChangesAsync() has on performance if there are no changes to be saved.
Essentially I am assuming it's basically nothing and therefore isn't a big deal to call it "just in case"?
(I am trying to do something with tracking user activity in middleware in asp net core and essentially on the way out I want to make sure save changes was called to persist the activity to the database. There is a chance that it has already been called on the context depending on the operation of the user and if that's the case I don't want to incur the cost of a second operation when the activity could be persisted as part of the normal transaction/round trip)
As you can see in implementation if there are no changes, nothing will be done. As far it has impact to performance, I don't know. But of course calling SaveChanges or SaveChangesAsync without any changes will have a performance impact in relation to don't call them.
That's the same behavior like EF6 has too.
I apologize up front for this long post, but as you can probably see I have been thinking about this for quite some time, and I feel I need some input from other people before my head explodes :-)
I have been experimenting for some time now with various ways of building a game engine which satifies all the following criteria:
Complete seperation of object updating and object rendering
Full determinism
Updating and rendering at individual speeds
No blocking on shared resources
Complete seperation of object updating and object rendering
Seperation of object updating and object rendering seems to be vital to ensure optimal usage of resources while sending data to the graphics API and swapping buffers.
Even if you want to ensure full parallelism to use multiple cores of a CPU it seems that this seperation must still be managed.
Full determinism
Many game types, and especially multiplayer versions, must ensure full determinism. Otherwise players will experience different states of the same game effectively breaking the game logic. Determinism is required for game replays as well. And it is useful for other purposes where it is important that each run of a simulation produces the same result every time given the same starting conditions and inputs.
Updating and rendering at individual speeds
This is really a prerequisite for full determinism as you cannot have the simulation depend on rendering speeds (ie the various monitor refresh rates, graphics adapter speed etc.). During optimal conditions the update speed should be set at a certain fixed interval (eg. 25 updates per second - maybe less depending on the update type), and the rendering speed should be whatever the client's monitor refresh rate / graphics adapter allows.
This implies that rendering speed higher that update speed should be allowed. And while that sounds like a waste there are known tricks to ensure that the added rendering cycles are not wastes (interpolation / extrapolation) which means that faster monitors / adapters would be rewarded with a more visually pleasing experience as they should.
Rendering speeds lower than update speed must also be allowed though, even if this does in fact result in wasted updating cycles - at least the added updating cycles are not all presented to the user. This is however necessary to ensure a smooth multiplayer experience even if the rendering in one of the clients slows to a sudden crawl for one reason or another.
No blocking on shared resources
If the other criterias mentioned above are to be implemented it must also follow that we cannot allow rendering to be waiting for updating or vice versa. Of course it is painfully obvious that when 2 different threads share access to resources and one thread is updating some of these resources then it is impossible to guarantee that blocking will never take place. It is, however, possible to keep this blocking at an absolute minimum - for example when switching pointer references between queue of updated object and a queue of previously rendered objects.
So...
My question to all you skilled people in here is: Am I asking for too much?
I have been reading about ideas of these various topics on many sites. But always it seems that one part or the other is left out from the suggestions I've seen. And maybe the reason is that you cannot have it all without compromise.
I started this seemingly common quest a long time ago when I was putting my thoughts about it in this thread:
Thoughts about rendering loop strategies
Back then my first naive assumption was that it shouldn't matter if updating and reading happened simultaneously since this variations object state was so small that you shouldn't notice if one object was occasionally a step ahead of the other.
Now I am somewhat wiser, but still confused at times.
The most promising and detailed description of a method that would allow for all my wishes to come through was this:
http://blog.slapware.eu/game-engine/programming/multithreaded-renderloop-part1/
A three-state model that will ensure that the renderer can always choose a new queue for rendering without any wait (except perhaps a micro-second while switching pointer-references). At the same time the updater can alway gain access to 2 queues required for building the next state tree (1 queue for creating/updating the next state, and 1 queue for reading the previsous - which can be done even while the renderer reads it as well).
I recently found time to make a sample implementation of this, and it works very well, but for two issues.
One is a minor issue of having to deal with multiple references to all involved objects
The other is more serious (unless I'm just being too needy). And that is the fact that extrapolation - as opposed to intrapolation - is used to maintain a visually pleasing representation of the states given a fast screen refresh rate. While both methods do the job of showing states deviating from the solidly calculated object states, extrapolation seems to me to produce much more visible artifacts when the predictions fail to represent reality. My position seems to be supported by this:
http://gafferongames.com/networked-physics/snapshots-and-interpolation/
And it is not possible to implement interpolation in the three-state design as far as I can tell, since it requires the renderer to have read-access to 2 queues at all times to calculate the intermediate state between two known states.
So I was toying with extending the three-state model suggested on the slapware-blog to utilize interpolation instead of extrapolation - and at the same time try to simplify the multi-reference structur. While it seems to me to be possible, I am wondering if the price is too high. In order to meet all my goals I would need to have
2 queues (or states) exclusively held by the renderer (they could be used by another thread for read-only purposes, but never updated, or switched during rendering
1 queue (or state) with the newest updated state ready to switch over to the renderer, when it is done rendering the current scene
1 queue (or state) with the next frame being built/updated by the updater
1 queue (or state) containing a copy of the frame last built/updated. This is the same state as last sent to the renderer, so this queue/state should be accessible by both the updater for reading the previous state and the renderer for rendering the state.
So that would mean that I should keep at all times 4 copies of render states to be able to keep this design running smoothly, locklessly, deterministically.
I fear that I'm overthinking this. So if any of you have advise to pull me back on the ground, or advises of what can be improved, critique of the design, or perhaps references to good resources explaining how these goals can be achieved, or why this is or isn't a good idea - please hit me with them :-)
I'm designing a large-scale project, and I think I see a way I could drastically improve performance by taking advantage of multiple cores. However, I have zero experience with multiprocessing, and I'm a little concerned that my ideas might not be good ones.
Idea
The program is a video game that procedurally generates massive amounts of content. Since there's far too much to generate all at once, the program instead tries to generate what it needs as or slightly before it needs it, and expends a large amount of effort trying to predict what it will need in the near future and how near that future is. The entire program, therefore, is built around a task scheduler, which gets passed function objects with bits of metadata attached to help determine what order they should be processed in and calls them in that order.
Motivation
It seems to be like it ought to be easy to make these functions execute concurrently in their own processes. But looking at the documentation for the multiprocessing modules makes me reconsider- there doesn't seem to be any simple way to share large data structures between threads. I can't help but imagine this is intentional.
Questions
So I suppose the fundamental questions I need to know the answers to are thus:
Is there any practical way to allow multiple threads to access the same list/dict/etc... for both reading and writing at the same time? Can I just launch multiple instances of my star generator, give it access to the dict that holds all the stars, and have new objects appear to just pop into existence in the dict from the perspective of other threads (that is, I wouldn't have to explicitly grab the star from the process that made it; I'd just pull it out of the dict as if the main thread had put it there itself).
If not, is there any practical way to allow multiple threads to read the same data structure at the same time, but feed their resultant data back to a main thread to be rolled into that same data structure safely?
Would this design work even if I ensured that no two concurrent functions tried to access the same data structure at the same time, either for reading or for writing?
Can data structures be inherently shared between processes at all, or do I always explicitly have to send data from one process to another as I would with processes communicating over a TCP stream? I know there are objects that abstract away that sort of thing, but I'm asking if it can be done away with entirely; have the object each thread is looking at actually be the same block of memory.
How flexible are the objects that the modules provide to abstract away the communication between processes? Can I use them as a drop-in replacement for data structures used in existing code and not notice any differences? If I do such a thing, would it cause an unmanageable amount of overhead?
Sorry for my naivete, but I don't have a formal computer science education (at least, not yet) and I've never worked with concurrent systems before. Is the idea I'm trying to implement here even remotely practical, or would any solution that allows me to transparently execute arbitrary functions concurrently cause so much overhead that I'd be better off doing everything in one thread?
Example
For maximum clarity, here's an example of how I imagine the system would work:
The UI module has been instructed by the player to move the view over to a certain area of space. It informs the content management module of this, and asks it to make sure that all of the stars the player can currently click on are fully generated and ready to be clicked on.
The content management module checks and sees that a couple of the stars the UI is saying the player could potentially try to interact with have not, in fact, had the details that would show upon click generated yet. It produces a number of Task objects containing the methods of those stars that, when called, will generate the necessary data. It also adds some metadata to these task objects, assuming (possibly based on further information collected from the UI module) that it will be 0.1 seconds before the player tries to click anything, and that stars whose icons are closest to the cursor have the greatest chance of being clicked on and should therefore be requested for a time slightly sooner than the stars further from the cursor. It then adds these objects to the scheduler queue.
The scheduler quickly sorts its queue by how soon each task needs to be done, then pops the first task object off the queue, makes a new process from the function it contains, and then thinks no more about that process, instead just popping another task off the queue and stuffing it into a process too, then the next one, then the next one...
Meanwhile, the new process executes, stores the data it generates on the star object it is a method of, and terminates when it gets to the return statement.
The UI then registers that the player has indeed clicked on a star now, and looks up the data it needs to display on the star object whose representative sprite has been clicked. If the data is there, it displays it; if it isn't, the UI displays a message asking the player to wait and continues repeatedly trying to access the necessary attributes of the star object until it succeeds.
Even though your problem seems very complicated, there is a very easy solution. You can hide away all the complicated stuff of sharing you objects across processes using a proxy.
The basic idea is that you create some manager that manages all your objects that should be shared across processes. This manager then creates its own process where it waits that some other process instructs it to change the object. But enough said. It looks like this:
import multiprocessing as m
manager = m.Manager()
starsdict = manager.dict()
process = Process(target=yourfunction, args=(starsdict,))
process.run()
The object stored in starsdict is not the real dict. instead it sends all changes and requests, you do with it, to its manager. This is called a "proxy", it has almost exactly the same API as the object it mimics. These proxies are pickleable, so you can pass as arguments to functions in new processes (like shown above) or send them through queues.
You can read more about this in the documentation.
I don't know how proxies react if two processes are accessing them simultaneously. Since they're made for parallelism I guess they should be safe, even though I heard they're not. It would be best if you test this yourself or look for it in the documentation.
I have a Silverlight app where I've implemented the M-V-VM pattern so my actual UI elements (Views) are separated from the data (Models). Anyways, at one point after the user has gone and done some selections and possible other input, I'd like to asyncronously go though the model and scan it and compile a list of optiions that the user has changed (different from the default), and eventually update that on the UI as a summary, but that would be a final step.
My question is that if I use a background worker to do this, up until I actually want to do the UI updates, I just want to read current values in one of my models, I don't have to synchronize access to the model right? I'm not modifying data just reading current values...
There are Lists (ObservableCollections), so I will have to call methods of those collections like "_ABCCollection.GetSelectedItems()" but again I'm just reading, I'm not making changes. Since they are not primitives, will I have to synchronize access to them for just reads, or does that not matter?
I assume I'll have to sychronize my final step as it will cause PropertyChanged events to fire and eventually the Views will request the new data through the bindings...
Thanks in advance for any and all advice.
You are correct. You can read from your Model objects and ObservableCollections on a worker thread without having a cross-thread violation. Getting or setting the value of a property on a UI element (more specifically, an object that derives from DispatcherObject) must be done on the UI thread (more specifically, the thread on which the DispatcherObject subclass instance was created). For more info about this, see here.
I'm having a conflict when saving a bunch of NSManagedObjects via an outside thread. For starters, I can tell you the following:
I'm using a separate MOC for each thread.
The MOCs share the same persistent store coordinator.
It's likely that an outside thread is modifying one or many of the records that I'm saving.
OK, so with that out of the way, here's what I'm doing.
In my outside thread, I'm doing some computation and updating a single value in a bunch of managed objects. I do this by looking up the object in the persistent store by my primary key, modifying the single decimal property, and then calling save on the bunch all at once.
In the meantime, I believe the main thread is doing some updating of its own.
When my outside thread does its big save on its managed object context, I get an exception thrown stating a large number of conflicts. All of the conflicts seem to be centered around a single relationship on each record. Though the managed object in the persistent store and my outside thread share the same ObjectID for this relationship, they don't share the same pointer. Based on what I see, that's the only thing that's different between the objects in my NSMergeConflict debug output.
It makes sense to me why the two objects have relationships with different pointers -- they're in different threads. However, as I understand it from Apple's documentation, the only thing cached when an object is first retrieved from the persistent store are the global IDs. So, one would think that when I run save on the outside thread MOC, it compares the ObjectIDs, sees they're the same, and lets it all through.
So, can anyone tell me why I'm getting a conflict?
Per the documentation in the Concurrency with Core Data chapter of The Core Data Programming Guide, the recommended configuration is for the contexts to share the same persistent store coordinator, not just the same persistent store.
Also, the section Track Changes in Other Threads Using Notifications of the same chapter states if you're tracking updates with the NSManagedObjectContextDidSaveNotification then you send -mergeChangesFromContextDidSaveNotification to the main thread's context so it can merge the changes. But if you're tracking with NSManagedObjectContextDidChangeNotification then the external thread should send the object IDs of the modified objects to the main thread which will then send -refreshObject:mergeChanges: to its context for each modified object.
And really, you should know if the main thread is also performing updates through its controller, and propagate its changes in like manner but in the opposite direction.
You need to have all your contexts listening for NSManagedObjectContextDidSaveNotification from any context that makes changes. Otherwise, only the front context will be aware of changes made on the background threads but the background context won't be aware of changes on the front thread.
So, if you have three threads and three context each of which makes changes, all three context must register for notifications from the other two.
Unfortunately, it seems as though this bug was actually being caused by something else -- I was calling the operation causing the error more than once at the same time when I shouldn't have been. Although this doesn't answer the initial question as to why pointers matter in conflicts, updating my code to prevent this situation has resolved my issue.