processing two of many concurrent results already, optimization - architecture? algorithm? - multithreading

I am downloading data from two times a dozen sources, via threading.Thread(). The special property of my case is that data processing can already start when certain two such sources are ready (Not any two, but predefined two, pairwise).
First approach is to download all sources, and
for all t in threads: t.join()
to wait until all downloads have finished, and then start the data processing, because then I can be sure to have everything I need, for the data processing.
It will already be fast, but then the data processing only starts after all sources have completed downloading. How to get the last bit of optimization now?
I am wondering if perhaps there is a canonic CS way to solve this question of starting tasks when the input collection is only partly done.
Thanks!

In general CS terms what you are looking for is language features that enable composition of tasks.
For example, in C# if you use Task instead of Thread you can use Task.WhenAll() to wait for each pair of subtasks to complete (as a Task itself) and then ContinueWith whatever you need to do with that pair of results.
See https://msdn.microsoft.com/en-us/library/hh194874(v=vs.110).aspx
Task a = ...
Task b = ...
Task ab = Task.WhenAll(a,b).ContinueWith(...);
...
var result = Task.WhenAll(ab, ab, bc, ...).Result;
You could also look into Reactive Extensions that are becoming available in many programming languages. These enable composition of observable sequences of results in a push model. For example zip in RxJS would allow you to trigger a next step whenever both of its input sequences have produced a result.

Related

Designing concurrency in a Python program

I'm designing a large-scale project, and I think I see a way I could drastically improve performance by taking advantage of multiple cores. However, I have zero experience with multiprocessing, and I'm a little concerned that my ideas might not be good ones.
Idea
The program is a video game that procedurally generates massive amounts of content. Since there's far too much to generate all at once, the program instead tries to generate what it needs as or slightly before it needs it, and expends a large amount of effort trying to predict what it will need in the near future and how near that future is. The entire program, therefore, is built around a task scheduler, which gets passed function objects with bits of metadata attached to help determine what order they should be processed in and calls them in that order.
Motivation
It seems to be like it ought to be easy to make these functions execute concurrently in their own processes. But looking at the documentation for the multiprocessing modules makes me reconsider- there doesn't seem to be any simple way to share large data structures between threads. I can't help but imagine this is intentional.
Questions
So I suppose the fundamental questions I need to know the answers to are thus:
Is there any practical way to allow multiple threads to access the same list/dict/etc... for both reading and writing at the same time? Can I just launch multiple instances of my star generator, give it access to the dict that holds all the stars, and have new objects appear to just pop into existence in the dict from the perspective of other threads (that is, I wouldn't have to explicitly grab the star from the process that made it; I'd just pull it out of the dict as if the main thread had put it there itself).
If not, is there any practical way to allow multiple threads to read the same data structure at the same time, but feed their resultant data back to a main thread to be rolled into that same data structure safely?
Would this design work even if I ensured that no two concurrent functions tried to access the same data structure at the same time, either for reading or for writing?
Can data structures be inherently shared between processes at all, or do I always explicitly have to send data from one process to another as I would with processes communicating over a TCP stream? I know there are objects that abstract away that sort of thing, but I'm asking if it can be done away with entirely; have the object each thread is looking at actually be the same block of memory.
How flexible are the objects that the modules provide to abstract away the communication between processes? Can I use them as a drop-in replacement for data structures used in existing code and not notice any differences? If I do such a thing, would it cause an unmanageable amount of overhead?
Sorry for my naivete, but I don't have a formal computer science education (at least, not yet) and I've never worked with concurrent systems before. Is the idea I'm trying to implement here even remotely practical, or would any solution that allows me to transparently execute arbitrary functions concurrently cause so much overhead that I'd be better off doing everything in one thread?
Example
For maximum clarity, here's an example of how I imagine the system would work:
The UI module has been instructed by the player to move the view over to a certain area of space. It informs the content management module of this, and asks it to make sure that all of the stars the player can currently click on are fully generated and ready to be clicked on.
The content management module checks and sees that a couple of the stars the UI is saying the player could potentially try to interact with have not, in fact, had the details that would show upon click generated yet. It produces a number of Task objects containing the methods of those stars that, when called, will generate the necessary data. It also adds some metadata to these task objects, assuming (possibly based on further information collected from the UI module) that it will be 0.1 seconds before the player tries to click anything, and that stars whose icons are closest to the cursor have the greatest chance of being clicked on and should therefore be requested for a time slightly sooner than the stars further from the cursor. It then adds these objects to the scheduler queue.
The scheduler quickly sorts its queue by how soon each task needs to be done, then pops the first task object off the queue, makes a new process from the function it contains, and then thinks no more about that process, instead just popping another task off the queue and stuffing it into a process too, then the next one, then the next one...
Meanwhile, the new process executes, stores the data it generates on the star object it is a method of, and terminates when it gets to the return statement.
The UI then registers that the player has indeed clicked on a star now, and looks up the data it needs to display on the star object whose representative sprite has been clicked. If the data is there, it displays it; if it isn't, the UI displays a message asking the player to wait and continues repeatedly trying to access the necessary attributes of the star object until it succeeds.
Even though your problem seems very complicated, there is a very easy solution. You can hide away all the complicated stuff of sharing you objects across processes using a proxy.
The basic idea is that you create some manager that manages all your objects that should be shared across processes. This manager then creates its own process where it waits that some other process instructs it to change the object. But enough said. It looks like this:
import multiprocessing as m
manager = m.Manager()
starsdict = manager.dict()
process = Process(target=yourfunction, args=(starsdict,))
process.run()
The object stored in starsdict is not the real dict. instead it sends all changes and requests, you do with it, to its manager. This is called a "proxy", it has almost exactly the same API as the object it mimics. These proxies are pickleable, so you can pass as arguments to functions in new processes (like shown above) or send them through queues.
You can read more about this in the documentation.
I don't know how proxies react if two processes are accessing them simultaneously. Since they're made for parallelism I guess they should be safe, even though I heard they're not. It would be best if you test this yourself or look for it in the documentation.

How to "join threads" with Lego Mindstorms NXT default "LabVIEW" code

Simply put, I want to manipulate two motors in parallel, then when both are ready, continue with a 3rd thread.
Below is image of what I have now. In two top threads, it sets motors B and C to "unlimited", then waits until both trigger the switches, then sets a separate boolean variable for both.
Then in 3rd thread, I poll these two variables with 1 second interval, until AND operation gives true to the loop termination condition.
This is embedded system and all, so it may be ok here, but in "PC programming", this kind of polling loop would be rather horrible thing to do.
Question: Can I do either of both of
wait for variable without this kind of polling loop?
wait for a thread to finish without using a variable at all?
Your question is a bit vague on what you actually want to achieve and using which language. As I understood you want to be able to implement a similar multithreaded motor control mechanism in Labview?
If so, then the answer to both of your questions is yes, you can implement the wait without an explicitly defined variable (other than the error cluster, which you probably would be passing around anyway). The easiest method is to pass an error cluster to both your loops and then use Merge errors to combine the generated errors once the loops are finished. Merge errors will wait until both inputs have data, merges the errors, and passes the merged error cluster on. By wiring the merged error cluster to your teardown function you effectively achieve the thread synchronization you described. If you require thread synchronization for the two control loops, you would however still have to use semaphores, rendezvous', notifiers, and other built-in synch methods.
In the image there's an init function that opens two serial devices (purple wire) and passes them to the control loops, which both runs until an error (yellow-black wire) occurs. The errors from both are merged and passed to the teardown function that releases the serial devices. Notice that in this particular example the synchronization would occur at the end of program as long as there's at least one wire coming from each loop to the teardown function.
Similar functionality in a text based programming language would necessitate the use of more elaborate mechanisms, though some specialised language for parallel programming might help here.

Displaying progress bar for long running process in Actionscript/Flash Builder without mixing logic

I'm working on an application that processes (possibly large reaching one or two million lines) text (in tab separated form) files containing detail of items and since the processing time can be long I want to update a progress bar so the user knows that the application didn't just hang, or better, to provide an idea of the remaining time.
I've already researched and I know how to update a simple progress bar but the examples tend to be simplistic as to call something like progressBar.setProgress(counter++, 100) using Timer, there are other examples where the logic is simple and written in the same class. I'm also new to the language having done mostly Java and some JavaScript in the past, among others.
I wrote the logic for processing the file (validation of input and creation of output files). But then, if I call the processing logic in the main class the update will be done at the end of processing (flying by so fast from 0 to 100) no matter if I update variables and try to dispatch events or things like that; the bar won't reflect the processing progress.
Would processing the input by chunks be a valid approach? And then, I'm not sure if the processing delay of one data chunk won't affect the processing of the next chunk and so on, because the timer tick is set to be 1 millisecond and the chunk processing time would be longer than that. Also, if the order of the input won't be affected or the result will get corrupted in some way. I've read multithreading is not supported in the language, so should that be a concern?
I already coded the logic described before and it seems to work:
// called by mouse click event
function processInput():void {
timer = new Timer(1);
timer.addEventListener(TimerEvent.TIMER, processChunk);
timer.start();
}
function processChunk(event:TimerEvent):void {
// code to calculate start and end index for the data chunk,
// everytime processChunk is executed these indexes are updated
var dataChunk:Array = wholeInputArray.splice(index0, index1);
processorObj.processChunk(dataChunk)
progressBar.setProgress(index0, wholeInputArray.length);
progressBar.label = index0 + " processed items";
if(no more data to process) { // if wholeInputArray.length == index1
timer.stop();
progressBar.setProgress(wholeInputArray.length, wholeInputArray.length);
progressBar.label = "Processing done";
// do post processing here: show results, etc.
}
}
The declaration for the progress bar is as follows:
<mx:ProgressBar id="progressBar" x="23" y="357" width="411" direction="right"
labelPlacement="center" mode="manual" indeterminate="false" />
I tested it with an input of 50000 lines and it seems to work generating the same result as the other approach that processes the input at once. But, would that be a valid approach or is there a better approach?
Thanks in advance.
your solution is good, i use it most of time.
But multithreading is now supported on AS3 (for desktop and web only for the moment).
Have a look at: Worker documentation and Worker exemple.
Hope that helps :)
may I ask if this Timer AS IS is the working Timer ??? because IF YES then you are in for a lot of trouble with your Application in the long run! - re loading & getting the Timer to stop, close etc. The EventListener would be incomplete and would give problems for sure!
I would like to recommend to get this right first before going further as I know from experience as in some of my own AIR Applications I need to have several hundred of them running one after another in modules as well as in some of my web Apps. not quiet so intense yet a few!
I'm sure a more smother execution will be the reward! regards aktell
Use Workers. Because splitting data into chunks and then processing it is a valid but quite cumbersome approach and with workers you can simply spawn a background worker, do all the parsing there and return a result, all without blocking GUI. Worker approach should require less time to do parsing, because there is no need to stop parser and wait for the next frame.
Workers would be an ideal solution, but quite complicated to set up. If you're not up to it right now, here's a PseudoThread solution I use in similar situations which you can probably get up and running in 5 minutes:
Pseudo Threads
It uses EnterFrame events for balancing between work and letting the UI does its thing and you can manually update the progress bar within your 'thread' code. I think it would be easily adapted for your needs since your data is easily sliced.
Without using Workers (which it seems you are not yet familiar with) AS3 will behave single threaded. Your timers will not overlap. If one of your chunks takes more than 1s to complete the next timer event will be processed when it can. It will not queue up further events if it takes more than your time period ( assuming your processing code is blocking).
The previous answers show the "correct" solution to this, but this might get you where you need to be faster.

Creating Dependencies Within An NSOperation

I have a fairly involved download process I want to perform in a background thread. There are some natural dependencies between steps in this process. For example, I need to complete the downloads of both Table A and Table B before setting the relationships between them (I'm using Core Data).
I thought first of putting each dependent step in its own NSOperation, then creating a dependency between the two operations (i.e. download the two tables in one operation, then set the relationship between them in the next, dependent operation). However, each NSOperation requires it's own NSManagedContext, so this is no good. I don't want to save the background context until both tables have been downloaded and their relationships set.
I've therefore concluded this should all occur inside one NSOperation, and that I should use notifications or some other mechanism to call the dependent method when all the conditions for running it have been met.
I'm an iOS beginner, however, so before I venture down this path, I wouldn't mind advice on whether I've reached the right conclusion.
Given your validation requirements, I think it will be easiest inside of one operation, although this could turn into a bit of a hairball as far as code structure goes.
You'll essentially want to make two wire fetches to get the entire dataset you require, then combine the data and parse it at one time into Core Data.
If you're going to use the asynchronous API's this essentially means structuring a class that waits for both operations to complete and then launches another NSOperation or block which does the parse and relationship construction.
Imagine this order of events:
User performs some action (button tap, etc.)
Selector for that action fires two network requests
When both requests have finished (they both notify a common delegate) launch the parse operation
Might look something like this in code:
- (IBAction)someAction:(id)sender {
//fire both network requests
request1.delegate = aDelegate;
request2.delegate = aDelegate;
}
//later, inside the implementation of aDelegate
- (void)requestDidComplete... {
if (request1Finished && request2Finished) {
NSOperation *parse = //init with fetched data
//launch on queue etc.
}
}
There's two major pitfalls that this solution is prone to:
It keeps the entire data set around in memory until both requests are finished
You will have to constantly switch on the specific request that's calling your delegate (for error handling, success, etc.)
Basically, you're implementing operation dependencies on your own, although there might not be a good way around that because of the structure of NSURLConnection.

Progress bar and multiple threads, decoupling GUI and logic - which design pattern would be the best?

I'm looking for a design pattern that would fit my application design.
My application processes large amounts of data and produces some graphs.
Data processing (fetching from files, CPU intensive calculations) and graph operations (drawing, updating) are done in seperate threads.
Graph can be scrolled - in this case new data portions need to be processed.
Because there can be several series on a graph, multiple threads can be spawned (two threads per serie, one for dataset update and one for graph update).
I don't want to create multiple progress bars. Instead, I'd like to have single progress bar that inform about global progress. At the moment I can think of MVC and Observer/Observable, but it's a little bit blurry :) Maybe somebody could point me in a right direction, thanks.
I once spent the best part of a week trying to make a smooth, non-hiccupy progress bar over a very complex algorithm.
The algorithm had 6 different steps. Each step had timing characteristics that were seriously dependent on A) the underlying data being processed, not just the "amount" of data but also the "type" of data and B) 2 of the steps scaled extremely well with increasing number of cpus, 2 steps ran in 2 threads and 2 steps were effectively single-threaded.
The mix of data effectively had a much larger impact on execution time of each step than number of cores.
The solution that finally cracked it was really quite simple. I made 6 functions that analyzed the data set and tried to predict the actual run-time of each analysis step. The heuristic in each function analyzed both the data sets under analysis and the number of cpus. Based on run-time data from my own 4 core machine, each function basically returned the number of milliseconds it was expected to take, on my machine.
f1(..) + f2(..) + f3(..) + f4(..) + f5(..) + f6(..) = total runtime in milliseconds
Now given this information, you can effectively know what percentage of the total execution time each step is supposed to take. Now if you say step1 is supposed to take 40% of the execution time, you basically need to find out how to emit 40 1% events from that algorithm. Say the for-loop is processing 100,000 items, you could probably do:
for (int i = 0; i < numItems; i++){
if (i % (numItems / percentageOfTotalForThisStep) == 0) emitProgressEvent();
.. do the actual processing ..
}
This algorithm gave us a silky smooth progress bar that performed flawlessly. Your implementation technology can have different forms of scaling and features available in the progress bar, but the basic way of thinking about the problem is the same.
And yes, it did not really matter that the heuristic reference numbers were worked out on my machine - the only real problem is if you want to change the numbers when running on a different machine. But you still know the ratio (which is the only really important thing here), so you can see how your local hardware runs differently from the one I had.
Now the average SO reader may wonder why on earth someone would spend a week making a smooth progress bar. The feature was requested by the head salesman, and I believe he used it in sales meetings to get contracts. Money talks ;)
In situations with threads or asynchronous processes/tasks like this, I find it helpful to have an abstract type or object in the main thread that represents (and ideally encapsulates) each process. So, for each worker thread, there will presumably be an object (let's call it Operation) in the main thread to manage that worker, and obviously there will be some kind of list-like data structure to hold these Operations.
Where applicable, each Operation provides the start/stop methods for its worker, and in some cases - such as yours - numeric properties representing the progress and expected total time or work of that particular Operation's task. The units don't necessarily need to be time-based, if you know you'll be performing 6,230 calculations, you can just think of these properties as calculation counts. Furthermore, each task will need to have some way of updating its owning Operation of its current progress in whatever mechanism is appropriate (callbacks, closures, event dispatching, or whatever mechanism your programming language/threading framework provides).
So while your actual work is being performed off in separate threads, a corresponding Operation object in the "main" thread is continually being updated/notified of its worker's progress. The progress bar can update itself accordingly, mapping the total of the Operations' "expected" times to its total, and the total of the Operations' "progress" times to its current progress, in whatever way makes sense for your progress bar framework.
Obviously there's a ton of other considerations/work that needs be done in actually implementing this, but I hope this gives you the gist of it.
Multiple progress bars aren't such a bad idea, mind you. Or maybe a complex progress bar that shows several threads running (like download manager programs sometimes have). As long as the UI is intuitive, your users will appreciate the extra data.
When I try to answer such design questions I first try to look at similar or analogous problems in other application, and how they're solved. So I would suggest you do some research by considering other applications that display complex progress (like the download manager example) and try to adapt an existing solution to your application.
Sorry I can't offer more specific design, this is just general advice. :)
Stick with Observer/Observable for this kind of thing. Some object observes the various series processing threads and reports status by updating the summary bar.

Resources