Design pattern for checking asynchronous task dependencies before execution [closed] - multithreading

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
The Problem
Given a number of asynchronously loaded dependencies, I want to trigger some code only after all dependencies are finished loading. As a simple example, consider the following pseudo-code:
bool firstLoaded = false, secondLoaded = false, thirdLoaded = false;
function loadResourceOne() {
// Asynchronously, or in a new thread:
HTTPDownload("one.txt");
firstLoaded = true;
if (secondLoaded && thirdLoaded) {
allLoaded();
}
}
function loadResourceTwo() {
// Asynchronously, or in a new thread:
HTTPDownload("two.txt");
secondLoaded = true;
if (firstLoaded && thirdLoaded) {
allLoaded();
}
}
function loadResourceThree() {
// Asynchronously, or in a new thread:
HTTPDownload("three.txt");
thirdLoaded = true;
if (firstLoaded && secondLoaded) {
allLoaded();
}
}
function allLoaded() {
Log("Done!");
}
/* async */ loadResourceOne();
/* async */ loadResourceTwo();
/* async */ loadResourceThree();
What I'm Looking For
This is a problem that I've found myself having to solve repeatedly in different languages and in different contexts. However every time I find myself using the tools provided by the language to hack together some simple solution, like returning each asynchronous resource as a Promise in JavaScript then using Promise.all() -- or loading each resource in its own thread in Python then using threads.join()
I'm trying to find a design pattern that solves this problem in the general case. The best solution should meet two criteria:
Can be applied to any language that supports asynchronous operations
Minimizes repetition of code (note that in my simple example the line allLoaded(); was repeated three times, and the if statement preceding it was practically repeated, and wouldn't scale well if I need a fourth or fifth dependency)
Runs the final callback as soon as possible when all resources are loaded -- this one is hopefully obvious, but solutions like "check that all are loaded every 5 seconds" aren't acceptable
I tried flipping through the index of the Gang of Four's Design Patterns, but the few pattern names that jumped out at me as possible leads turned out to be unrelated.

You're looking for the Fork-Join pattern.
In parallel computing, the fork–join model is a way of setting up and executing parallel programs, such that execution branches off in parallel at designated points in the program, to "join" (merge) at a subsequent point and resume sequential execution. Parallel sections may fork recursively until a certain task granularity is reached. Fork–join can be considered a parallel design pattern...
The implementation will be language dependent, but you can search for fork-join in combination with your language of choice. Note that you will not find asynchronous patterns in the Gang of Four. You would want a book specific to multithreading or parallel computing.

I tried flipping through the index of the Gang of Four's Design Patterns, but the few pattern names that jumped out at me as possible leads turned out to be unrelated.
This problem domain will require combining multiple design-patterns rather than a single design-pattern. Let's address the key requirements :
A task should be able to know when the tasks it depends on are complete
so that it can start executing immediately. This needs to be achieved without
periodically polling the dependent tasks.
Addition of new dependencies to a task needs to be possible without the need to keep adding new if-else style checks.
For point 1, I would suggest that you take a look at the Observer pattern. The primary advantage of this pattern in your case would be that a task won't have to poll it's dependent tasks. Instead, each task that your task depends on will notify your task when it completes by calling the update method. The update method can be implemented intelligently to check against a pre-populated list of tasks that it depends on every-time the method is called. The moment all pre-configured list of tasks have called update, the task can launch it's worker (A thread for example).
For point 2, I would suggest that you take a look at the Composite pattern. A Task has an array of dependent Task instances and an array of Task instances it depends on. If a task finishes execution, it calls update on each of the tasks in the array of tasks that depend on it. On the other hand, for a task to start executing, other tasks that it depends on will call it's update method.
If I had to define the above approach in pseudo code, it would look something as follows :
Task structure :
array of dependents : [dependent Task instances that depend on this Task]
array of dependencies : [Task instances this task depends on]
function update(Task t) :
remove t from dependencies
if(dependencies size == 0)
- start asynchronous activity (call executeAsynchronous)
function executeAsynchronous() :
- perform asynchronous work
- on completion :
- iterate through dependent array
- call update on each Task in dependent array and pass it this Task
function addDependent(Task t) :
- add t to array of dependent tasks
function addDependency(Task t) :
- add t to array of dependencies
All said and done, don't go looking for a design pattern to solve your problem. Instead, come up with working code and work through it to improve its design.
Note : There is a small but significant difference between a framework and a design-pattern. If the objective is to build a task-dependencies framework using design patterns, you are definitely going to need more than one design pattern. The above answer explains how to do this using the Gang of Four patterns. If the objective is to not reinvent the wheel, one can look at frameworks that already solve this problem.
One such framework is the Spring Batch framework that allows you to define sequential flows and split flows which can be wired together into a job that defines the end to end processing flow.

How about a latch initialized with number of dependencies and the individual loader decrements it each time they finish.
This way as soon as the latch count = 0; we know all are loaded and can fire the callback / desired function.
For Java - https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CountDownLatch.html

Related

I want to know about the multi thread with future on Scala

I know multi thread with future a little such as :
for(i <- 1 to 5) yield future {
println(i)
}
but this is all the threads do same work.
So, i want to know how to make two threads which do different work concurrently.
Also, I want to know is there any method to know all the thread is complete?
Please, give me something simple.
First of all, chances are you might be happy with parallel collections, especially if all you need is to crunch some data in parallel using multiple threads:
val lines = Seq("foo", "bar", "baz")
lines.par.map(line => line.length)
While parallel collections suitable for finite datasets, Futures are more oriented towards events-like processing and in fact, future defines task, abstracting away from execution details (one thread, multiple threads, how particular task is pinned to thread) -- all of this is controlled with execution context. What you can do with futures though is to add callback (on success, on failure, on both), compose it with another future or await for result. All this concepts are nicely explained in official doc which is worthwhile reading.

Use of The Task Based Asynchronous Pattern regarding custom return objects

When my goal ( perhaps I'm confused ) is to actually return a complex class that has had a series of processes performed on it (a list of tasks not depicted here)
Can I pass my custom object to the actual "DoSomethingAsync" method that starts the Tasks and can I manipulate their results INTO the object to return it to the caller SomethingAsync so that its caller may then persist the data to disk (SQL or Whatever) or move it to the next step in my processes ?
All the potential interrupts or interference's to the async steps of the Tasks have me worried that.... for example Task1 while I harvest its results into the object may conflict with Task2's processes.
My tests have shown Task2's results being processed before I can harvest Task1's completed work.... but at the moment I'm concerned with correctly passing & populating a complex object, and this concern is probably regarding a different topic of how to correctly use multiple truly multithreaded tasks. I expect answers in the vein of .. "run the asyncs to fill respective properties & don't try to fill your custom object IN the Async tasks" ,, but I had to ask, , assuredly I'm not the first to wonder about this.
I just reviewed the event-based & task-based, of which I am familiar and can work with well. But this day 5 of my TAP & TPL [Task Parallel Library] studies and I may be confusing myself with all of the odd details like CancellationToken, and Task Based Combinators as well as erroneously mixing TAP & TPL dynamics.. Anxiously awaiting the many down votes thanks
public async Task<MyCustomObject> SomethingAsync()
{
return await DoSomethingAsync(MyCustomObject);
}

Scala - best API for doing work inside multiple threads

In Python, I am using a library called futures, which allows me to do my processing work with a pool of N worker processes, in a succinct and crystal-clear way:
schedulerQ = []
for ... in ...:
workParam = ... # arguments for call to processingFunction(workParam)
schedulerQ.append(workParam)
with futures.ProcessPoolExecutor(max_workers=5) as executor: # 5 CPUs
for retValue in executor.map(processingFunction, schedulerQ):
print "Received result", retValue
(The processingFunction is CPU bound, so there is no point for async machinery here - this is about plain old arithmetic calculations)
I am now looking for the closest possible way to do the same thing in Scala. Notice that in Python, to avoid the GIL issues, I was using processes (hence the use of ProcessPoolExecutor instead of ThreadPoolExecutor) - and the library automagically marshals the workParam argument to each process instance executing processingFunction(workParam) - and it marshals the result back to the main process, for the executor's map loop to consume.
Does this apply to Scala and the JVM? My processingFunction can, in principle, be executed from threads too (there's no global state at all) - but I'd be interested to see solutions for both multiprocessing and multithreading.
The key part of the question is whether there is anything in the world of the JVM with as clear an API as the Python futures you see above... I think this is one of the best SMP APIs I've ever seen - prepare a list with the function arguments of all invocations, and then just two lines: create the poolExecutor, and map the processing function, getting back your results as soon as they are produced by the workers. Results start coming in as soon as the first invocation of processingFunction returns and keep coming until they are all done - at which point the for loop ends.
You have way less boilerplate than that using parallel collections in Scala.
myParameters.par.map(x => f(x))
will do the trick if you want the default number of threads (same as number of cores).
If you insist on setting the number of workers, you can like so:
import scala.collection.parallel._
import scala.concurrent.forkjoin._
val temp = myParameters.par
temp.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(5))
temp.map(x => f(x))
The exact details of return timing are different, but you can put as much machinery as you want into f(x) (i.e. both compute and do something with the result), so this may satisfy your needs.
In general, simply having the results appear as completed is not enough; you then need to process them, maybe fork them, collect them, etc.. If you want to do this in general, Akka Streams (follow links from here) are nearing 1.0 and will facilitate the production of complex graphs of parallel processing.
There is both a Futures api that allows you to run work-units on a thread pool (docs: http://docs.scala-lang.org/overviews/core/futures.html) and a "parallell collections api" that you can use to perform parallell operations on collections: http://docs.scala-lang.org/overviews/parallel-collections/overview.html

How do I Yield() to another thread in a Win8 C++/Xaml app?

Note: I'm using C++, not C#.
I have a bit of code that does some computation, and several bits of code that use the result. The bits that use the result are already in tasks, but the original computation is not -- it's actually in the callstack of the main thread's App::App() initialization.
Back in the olden days, I'd use:
while (!computationIsFinished())
std::this_thread::yield(); // or the like, depending on API
Yet this doesn't seem to exist for Windows Store apps (aka WinRT, pka Metro-style). I can't use a continuation because the bits that use the results are unconnected to where the original computation takes place -- in addition to that computation not being a task anyway.
Searching found Concurrency::Context::Yield(), but Context appears not to exist for Windows Store apps.
So... say I'm in a task on the background thread. How do I yield? Especially, how do I yield in a while loop?
First of all, doing expensive computations in a constructor is not usually a good idea. Even less so when it's the "App" class. Also, doing heavy work in the main (ASTA) thread is pretty much forbidden in the WinRT model.
You can use concurrency::task_completion_event<T> to interface code that isn't task-oriented with other pieces of dependent work.
E.g. in the long serial piece of code:
...
task_completion_event<ComputationResult> tce;
task<ComputationResult> computationTask(tce);
// This task is now tied to the completion event.
// Pass it along to interested parties.
try
{
auto result = DoExpensiveComputations();
// Successfully complete the task.
tce.set(result);
}
catch(...)
{
// On failure, propagate the exception to continuations.
tce.set_exception(std::current_exception());
}
...
Should work well, but again, I recommend breaking out the computation into a task of its own, and would probably start by not doing it during construction... surely an anti-pattern for a responsive UI. :)
Qt simply uses Sleep(0) in their WinRT yield implementation.

In NodeJS: is it possible for two callbacks to be executed exactly at the same time?

Let's say I have this code:
function fn(n)
{
return function()
{
for(var k = 0; k <= 1000; ++k) {
fs.writeSync(process.stdout.fd, n+"\n");
}
}
}
setTimeout(fn(1), 100);
setTimeout(fn(2), 100);
Is it possible that 1 and 2 will be printed to stdout interchangeably (e.g. 12121212121...)?
I've tested this and they did NOT apper interchangeably, i.e. 1111111...222222222..., but few tests are far from proof and I'm worried that something like 111111211111...2222222... could happen.
In other words: when I register some callbacks and event handlers in Node can two callbacks be executed exactly at the same time?
(I know this could be possible with launching two processes, but then we would have two stdout and the above code would be splitted into separate files, etc.)
Another question: Forgetting the Node and speaking generally: in any language on single process is it possible for two functions to be executed at exactly the same time (i.e. in the same manner as above)?
No, every callback will be executed in its own "execution frame". In other languages "parallel execution" and potential conflicts as locks caused by that are possible if operations occur in different threads.
As long as the callback code is purely sync than no two functions can execute parallel.
Start using some asynchornish things inside, like getting a network result or inserting to a database, tadam: you will have parallelism issues.

Resources