QAbstractItemModel Lazy Loading locks application - multithreading

I have implemented canFetchMore, hasChildren and fetchMore in order to allow my model to be lazy loaded. It's very simple and based on QT's: http://doc.qt.io/archives/qt-4.7/itemviews-simpletreemodel.html
My problem is that in my application fetching children is not a very quick operation, it involves a few seconds of delay on the server side while it figures out who the children actually are.
I'm unsure how to deal with that. I can't have my application locking up for several seconds every time someone expands a node. I don't know how to go about making this happen in the background. If I was to create a sub-process or thread to actually do the work of retrieving the children and updating the client side data structure, how would I go about telling the model that this had successfully completed (and for the node to finally expand).
Also, is there a way to show that the node is currently in the process of loading the data in the background?
Apologies if these are stupid questions, GUI programming is still a bit of a mystery to me and I've never used QT before.
For the record, I'm using Python, but if answers are given in C++ I can understand them.
Thanks

If I was to
create a sub-process or thread to actually do the work of retrieving
the children and updating the client side data structure, how would I
go about telling the model that this had successfully completed (and
for the node to finally expand).
You can use signal and slots. In the thread you retrieve the data you will emit a custom signal like someDataAvailable(YourdataType) and then in the gui you will handle this signal with a slot something like handleDataReadySignal(YourdataType). The signal passes the object that you give it when emitting. Apparently you need to update the gui and the list in the handleDataReadySignal slot. Of course you need to connect the slot to the signal preferably in the constructor of the window/dialog to which the list is attached

Related

Designing concurrency in a Python program

I'm designing a large-scale project, and I think I see a way I could drastically improve performance by taking advantage of multiple cores. However, I have zero experience with multiprocessing, and I'm a little concerned that my ideas might not be good ones.
Idea
The program is a video game that procedurally generates massive amounts of content. Since there's far too much to generate all at once, the program instead tries to generate what it needs as or slightly before it needs it, and expends a large amount of effort trying to predict what it will need in the near future and how near that future is. The entire program, therefore, is built around a task scheduler, which gets passed function objects with bits of metadata attached to help determine what order they should be processed in and calls them in that order.
Motivation
It seems to be like it ought to be easy to make these functions execute concurrently in their own processes. But looking at the documentation for the multiprocessing modules makes me reconsider- there doesn't seem to be any simple way to share large data structures between threads. I can't help but imagine this is intentional.
Questions
So I suppose the fundamental questions I need to know the answers to are thus:
Is there any practical way to allow multiple threads to access the same list/dict/etc... for both reading and writing at the same time? Can I just launch multiple instances of my star generator, give it access to the dict that holds all the stars, and have new objects appear to just pop into existence in the dict from the perspective of other threads (that is, I wouldn't have to explicitly grab the star from the process that made it; I'd just pull it out of the dict as if the main thread had put it there itself).
If not, is there any practical way to allow multiple threads to read the same data structure at the same time, but feed their resultant data back to a main thread to be rolled into that same data structure safely?
Would this design work even if I ensured that no two concurrent functions tried to access the same data structure at the same time, either for reading or for writing?
Can data structures be inherently shared between processes at all, or do I always explicitly have to send data from one process to another as I would with processes communicating over a TCP stream? I know there are objects that abstract away that sort of thing, but I'm asking if it can be done away with entirely; have the object each thread is looking at actually be the same block of memory.
How flexible are the objects that the modules provide to abstract away the communication between processes? Can I use them as a drop-in replacement for data structures used in existing code and not notice any differences? If I do such a thing, would it cause an unmanageable amount of overhead?
Sorry for my naivete, but I don't have a formal computer science education (at least, not yet) and I've never worked with concurrent systems before. Is the idea I'm trying to implement here even remotely practical, or would any solution that allows me to transparently execute arbitrary functions concurrently cause so much overhead that I'd be better off doing everything in one thread?
Example
For maximum clarity, here's an example of how I imagine the system would work:
The UI module has been instructed by the player to move the view over to a certain area of space. It informs the content management module of this, and asks it to make sure that all of the stars the player can currently click on are fully generated and ready to be clicked on.
The content management module checks and sees that a couple of the stars the UI is saying the player could potentially try to interact with have not, in fact, had the details that would show upon click generated yet. It produces a number of Task objects containing the methods of those stars that, when called, will generate the necessary data. It also adds some metadata to these task objects, assuming (possibly based on further information collected from the UI module) that it will be 0.1 seconds before the player tries to click anything, and that stars whose icons are closest to the cursor have the greatest chance of being clicked on and should therefore be requested for a time slightly sooner than the stars further from the cursor. It then adds these objects to the scheduler queue.
The scheduler quickly sorts its queue by how soon each task needs to be done, then pops the first task object off the queue, makes a new process from the function it contains, and then thinks no more about that process, instead just popping another task off the queue and stuffing it into a process too, then the next one, then the next one...
Meanwhile, the new process executes, stores the data it generates on the star object it is a method of, and terminates when it gets to the return statement.
The UI then registers that the player has indeed clicked on a star now, and looks up the data it needs to display on the star object whose representative sprite has been clicked. If the data is there, it displays it; if it isn't, the UI displays a message asking the player to wait and continues repeatedly trying to access the necessary attributes of the star object until it succeeds.
Even though your problem seems very complicated, there is a very easy solution. You can hide away all the complicated stuff of sharing you objects across processes using a proxy.
The basic idea is that you create some manager that manages all your objects that should be shared across processes. This manager then creates its own process where it waits that some other process instructs it to change the object. But enough said. It looks like this:
import multiprocessing as m
manager = m.Manager()
starsdict = manager.dict()
process = Process(target=yourfunction, args=(starsdict,))
process.run()
The object stored in starsdict is not the real dict. instead it sends all changes and requests, you do with it, to its manager. This is called a "proxy", it has almost exactly the same API as the object it mimics. These proxies are pickleable, so you can pass as arguments to functions in new processes (like shown above) or send them through queues.
You can read more about this in the documentation.
I don't know how proxies react if two processes are accessing them simultaneously. Since they're made for parallelism I guess they should be safe, even though I heard they're not. It would be best if you test this yourself or look for it in the documentation.

Good approaches for queuing simultaneous NodeJS processes

I am building a simple application to download a set of XML files and parse them into a database using the async module (https://npmjs.org/package/node-async) for flow control. The overall flow is as follows:
Download list of datasets from API (single Request call)
Download metadata for each dataset to get link to XML file (async.each)
Download XML for each dataset (async.parallel)
Parse XML for each dataset into JSON objects (async.parallel)
Save each JSON object to a database (async.each)
In effect, for each dataset there is a parent process (2) which sets of a series of asynchronous child processes (3, 4, 5). The challenge that I am facing is that, because so many parent processes fire before all of the children of a particular process are complete, child processes seem to be getting queued up in the event loop, and it takes a long time for all of the child processes for a particular parent process to resolve and allow garbage collection to clean everything up. The result of this is that even though the program doesn't appear to have any memory leaks, memory usage is still too high, ultimately crashing the program.
One solution which worked was to make some of the child processes synchronous so that they can be grouped together in the event loop. However, I have also seen an alternative solution discussed here: https://groups.google.com/forum/#!topic/nodejs/Xp4htMTfvYY, which pushes parent processes into a queue and only allows a certain number to be running at once. My question then is does anyone know of a more robust module for handling this type of queueing, or any other viable alternative for handling this kind of flow control. I have been searching but so far no luck.
Thanks.
I decided to post this as an answer:
Don't launch all of the processes at once. Let the callback of one request launch the next one. The overall work is still asynchronous, but each request gets run in series. You can then pool up a certain number of the connections to be running simultaneously to maximize I/O throughput. Look at async.eachLimit and replace each of your async.each examples with it.
Your async.parallel calls may be causing issues as well.

Unity3D: Smooth Interaction When Loading Asset Via Network?

I'm making a networked computer game using Unity3D version 3.x on Mac. I have a game client and a game server. Whenever data arrives at the client from the server, the client would freeze for a little bit (~0.5 seconds), render the new data, and then continue. Is there any way I can optimize my game so that incoming data does not affect user's interaction with the game client?
Here's what I'm doing now:
I created a new thread to pull data from the server.
When the data arrives I put it in a buffer which is protected by a mutual exclusion lock.
In the Unity thread, on every frame, I check if the buffer is empty. If it is not empty, I wait on the mutual exclusion lock for permission to process the data. Once I got the permission, I parse the data and render it.
I'm doing this because Unity does not allow me to create new GameObjects in the network thread that I created. But I wonder if there's anything I can do to optimize user experience.
It's always best to first profile to know the source of the freezing, i.e. is it due to waiting on the lock? due to parsing? due to GameObject instantiation? all three?
Unity Profiler (Pro only)
General solutions:
If the pause is due to the lock, try splitting up your buffering into
different bins, so that the main thread can access the next non-empty
bin without having to acquire lock, or at least such a coarse-grained lock.
If parsing is the slowdown, you can still do all of that in a
background thread. Perform as much as you possibly can before you finally must instantiate GameObjects.
If it's the final step of instantiating GameObjects, then you
can try to preinstantiate objects you expect, simply reconfiguring
them upon new network data; or divide up the instantation
process into separate, incremental phases, with unnoticeably small
pauses, e.g. a root node first, then next phase its children, and so on, until fully reconstructed.

Silverlight Multithreading; Need to Synchronize?

I have a Silverlight app where I've implemented the M-V-VM pattern so my actual UI elements (Views) are separated from the data (Models). Anyways, at one point after the user has gone and done some selections and possible other input, I'd like to asyncronously go though the model and scan it and compile a list of optiions that the user has changed (different from the default), and eventually update that on the UI as a summary, but that would be a final step.
My question is that if I use a background worker to do this, up until I actually want to do the UI updates, I just want to read current values in one of my models, I don't have to synchronize access to the model right? I'm not modifying data just reading current values...
There are Lists (ObservableCollections), so I will have to call methods of those collections like "_ABCCollection.GetSelectedItems()" but again I'm just reading, I'm not making changes. Since they are not primitives, will I have to synchronize access to them for just reads, or does that not matter?
I assume I'll have to sychronize my final step as it will cause PropertyChanged events to fire and eventually the Views will request the new data through the bindings...
Thanks in advance for any and all advice.
You are correct. You can read from your Model objects and ObservableCollections on a worker thread without having a cross-thread violation. Getting or setting the value of a property on a UI element (more specifically, an object that derives from DispatcherObject) must be done on the UI thread (more specifically, the thread on which the DispatcherObject subclass instance was created). For more info about this, see here.

Using Core Data using multithreading and notifications

Here's yet another question on Core Data and multithreading:
I'm writing an application on the iPhone that retrieves XML data from the internet, parses it in a background thread (using NSXMLparser) and saves the data in Core Data using its own NSManagedObjectContext. I have a class - let's call it DataRetriever - that does this for me.
There are different UIViewControllers that then retrieve the data to display it in their respective UITableViews, of course this happens on the main thread using NSFetchedResultsControllers and a single managed object context that is used for reading.
I've read the answer to this question, which tells me that I need to register for NSManagedObjectDidSaveNotifications on the background thread (this will be done by the DataRetriever class I suppose) and then call the mergeChangesFromContextDidSaveNotification method on the reading context from that class on the main thread. This, I think, is totally thread-unsafe. I might have interpreted this the wrong way, though.
I've also read this part of Apple's documentation on the subject (Track Changes in Other Threads Using Notifications), and it tells me to simply register for NSManagedObjectDidSaveNotifications coming from the reading context in the view controller on the main thread and then it would have to call mergeChangesFromContextDidSaveNotification to update its reading context.
I went with Apple's recommendations: I now have my view controllers register themselves to NSManagedObjectDidSaveNotifications on the main thread using the reading managed object context as the source of the notifications. Doing this on the writing context probably isn't thread safe, and Apple's documentation isn't very specific on this.
Result: No crashes, but I am not receiving any notifications either.
Side note: I've read in Apple's documentation that notifications don't automatically propagate to other threads and I might even be listening for notifications from the wrong context, but why is Apple telling me to do it this way, then?
Any help is greatly appreciated.
-- EDIT --
Just to be clear, I'm registering for notifications coming from a particular NSManagedObjectContext, Apple's documentation specifically states (here) that some system frameworks may use an instance of Core Data themselves, so I could be receiving notifications from contexts that don't concern me if I don't specify a source. The documentation I referred to earlier on doesn't say anything about this, though. Any comments on this design choice are welcome.
The UI runs on the main thread so you want any intensive processing that might bog the UI down done on another thread. You have the context in the main thread listen for notifications because the main thread context is usually the only one that needs to update itself because of changes by other context in other threads.
All this is thread safe because data won't be deleted from the persistent store as long as one or more context is still using it. So, if context A has an object with the data while context B deletes another object representing the same data, the object in context A remains alive until context A calls for a merge.
Basically, each context operates in its own little world until you call merge. The race conditions that normally bedevil thread based data operations don't occur with Core Data.

Resources