await Task.WhenAll() vs Task.WhenAll().Wait() - multithreading

I have a method that produces an array of tasks (See my previous post about threading) and at the end of this method I have the following options:
await Task.WhenAll(tasks); // done in a method marked with async
Task.WhenAll(tasks).Wait(); // done in any type of method
Task.WaitAll(tasks);
Basically I am wanting to know what the difference between the two whenalls are as the first one doesn't seem to wait until tasks are completed where as the second one does, but I'm not wanting to use the second one if it's not asynchronus.
I have included the third option as I understand that this will lock the current thread until all the tasks have completed processing (seemingly synchronously instead of asynchronus) - please correct me if I am wrong about this one
Example function with await:
public async void RunSearchAsync()
{
_tasks = new List<Task>();
Task<List<SearchResult>> products = SearchProductsAsync(CoreCache.AllProducts);
Task<List<SearchResult>> brochures = SearchProductsAsync(CoreCache.AllBrochures);
_tasks.Add(products);
_tasks.Add(brochures);
await Task.WhenAll(_tasks.ToArray());
//code here hit before all _tasks completed but if I take off the async and change the above line to:
// Task.WhenAll(_tasks.ToArray()).Wait();
// code here hit after _tasks are completed
}

await will return to the caller, and resume method execution when the awaited task completes.
WhenAll will create a task When All all the tasks are complete.
WaitAll will block the creation thread (main thread) until all the tasks are complete.

Talking about await Task.WhenAll(tasks) vs Task.WhenAll(tasks).Wait(). If execution is in an async context, always try to avoid .Wait and .Result because those break async paradigm.
Those two blocks the thread, nothing other operation can take it. Maybe it is not a big problem in small apps but if you are working with high demand services that is bad. It could lead to thread starvation.
In the other hand await waits for the task to be completed in background, but this does not block the thread allowing to Framework/CPU to take it for any other task.

Related

async nodejs execution order

When does processItem start executing. Does it start as soon as some items are pushed onto the queue? Or must the for loop finish before the first item on the queue starts executing?
var processItem = function (item, callback) {
console.log(item)
callback();
}
var myQueue = async.queue(processItem, 2)
for (index = 0; index < 1000; ++index) {
myQueue.push(index)
}
The simple answer is that all of your tasks will be added to the queue, and then executed in a random (undefined) order.
background information, via https://github.com/caolan/async#queueworker-concurrency
queue(worker, concurrency)
Creates a queue object with the specified concurrency. Tasks added to the queue are processed in parallel (up to the concurrency limit). If all workers are in progress, the task is queued until one becomes available. Once a worker completes a task, that task's callback is called.
In other words, the order is undefined. If you require the tasks to be executed in a particular order, you should be using a different asynchronous primitive, such as series(tasks, [callback])
https://github.com/caolan/async#seriestasks-callback
The current call stack must resolve before any asynchronous tasks will run. This means that the current function must run to completion (plus any functions that called that function, etc.) before any asynchronous operations will run. To answer your question directly: all items will be fully queued before the first one runs.
It might be helpful for you to read more about the JavaScript event loop: asynchronous jobs sit in the event queue, where they are handled one by one. Each job in the event queue causes the creation of the call stack (where each item in the stack is function call -- the first function is on the bottom, a function called within that first function is next-to-bottom, etc.). When the stack is fully cleared, the event loop processes the next event.

Task.wait and continueWIth

I am having a task like below.
var task = Task<string>.Factory.StartNew(() => longrunningmethod()
.ContinueWith(a =>
longrunningmethodcompleted((Task<string>)a,
TaskScheduler.FromCurrentSynchronizationContext())));
task.Wait();
My task will call the longrunningmethod and after completing it will call completed method.
Inside my longrunningmethod I am delaying by Thread.Sleep(30000). When I use Task.wait system hangs and it's not calling longrunningmethodcompleted method. If I don't use Task.wait everything flows good.
I strongly suspect your context is a UI context.
In that case, you're causing the deadlock because you're telling longrunningmethodcompleted to execute on the current SynchronizationContext.
Then you're blocking that context by calling Wait. The continuation will never complete because it needs to execute on the thread that is blocked in Wait.
To fix this, you can't use Wait on a continuation running in that context. I'm assuming that longrunningmethodcompleted must run on the UI thread, so the solution is to replace the call to Wait with a call to ContinueWith:
var ui = TaskScheduler.FromCurrentSynchronizationContext();
var task = Task<string>.Factory.StartNew(() => longrunningmethod()
.ContinueWith(a =>
longrunningmethodcompleted((Task<string>)a,
ui);
task.ContinueWith(..., ui);
Or, you can upgrade to VS2012 and use async/await, which lets you write much cleaner code like this:
var task = Task.Run(() => longrunningmethod());
await task;
longrunningmethodcompleted(task);
Well it is hard to tell what is wrong with your code without seeing what the actual asynch actions are, all I know is according to MSDN waits for the task to be completed. Is it possible that because you are trying to use the current SynchronizationContext your actions blocks?
The reason I am asking is because you
Start the taskWait for the task to complete (which is the continue with task)Task tries to continue with current SynchronizationContextTask tries to acquire the main threadTask scheduled to take the thread after the Wait is completedBut Wait is waiting on current Task to complete (deadlock)
What I mean is that the reason your program works with Thread.Sleep(seconds) is because after the time limit is up the thread will continue.
Thread.Sleep(nnn) is blocking. Use Task.Delay(nnn) and await:
await Task.Delay(30000);
Edited to add: Just noted the tag says C# 4. This requires C# 5 and the new async await support. Seriously, if you're doing async and tasks, you need to upgrade.

How not to let application exit when tasks are running

Is there any framework support for the cases when application is going to close but there are some not completed tasks?
Tasks run in the context of background thread by default. Sometimes it's ok just to let the task complete:
Task.Factory.StartNew(() =>
{
Thread.CurrentThread.IsBackground = false;
Thread.Sleep(20000);
Thread.CurrentThread.IsBackground = true;
});
But this doesn't work if task has continuations which require completition. Especially if some of the tasks will run under UI thread.
It is possible to get all the "final" tasks and to postpone the application exit with WhenAll().
Yet I do not know how to do the same when async/await is used.
So I'd like to know if there's any support for such cases. Are there any guarantees which framework gives or ways to enforce them?
The only way I see now is setting flags as a sign that critical tasks are not completed yet.
As an aside, you should not set IsBackground on a thread pool thread.
For async methods, ensure your methods are returning Task and not void, and then you can await them all using WhenAll. You can have your event handlers capture the top-level Tasks rather than await them.

How to have heavy processing operations done in node.js

I have a heavy data processing operation that I need to get done per 10-12 simulatenous request. I have read that for higher level of concurrency Node.js is a good platform and it achieves it by having an non blocking event loop.
What I know is that for having things like querying a database, I can spawn off an event to a separate process (like mongod, mysqld) and then have a callback which will handle the result from that process. Fair enough.
But what if I want to have a heavy piece of computation to be done within a callback. Won't it block other request until the code in that callback is executed completely. For example I want to process an high resolution image and code I have is in Javascript itself (no separate process to do image processing).
The way I think of implementing is like
get_image_from_db(image_id, callback(imageBitMap) {
heavy_operation(imageBitMap); // Can take 5 seconds.
});
Will that heavy_operation stop node from taking in any request for those 5 seconds. Or am I thinking the wrong way to do such task. Please guide, I am JS newbie.
UPDATE
Or can it be like I could process partial image and make the event loop go back to take in other callbacks and return to processing that partial image. (something like prioritising events).
Yes it will block it, as the callback functions are executed in the main loop. It is only the asynchronously called functions which do not block the loop. It is my understanding that if you want the image processing to execute asynchronously, you will have to use a separate processes to do it.
Note that you can write your own asynchronous process to handle it. To start you could read the answers to How to write asynchronous functions for Node.js.
UPDATE
how do i create a non-blocking asynchronous function in node.js? may also be worth reading. This question is actually referenced in the other one I linked, but I thought I'd include it here to for simplicity.
Unfortunately, I don't yet have enough reputation points to comment on Nick's answer, but have you looked into Node's cluster API? It's currently still experimental, but it would allow you to spawn multiple threads.
When a heavy piece of computation is done in the callback, the event loop would be blocked until the computation is done. That means the callback will block the event loop for the 5 seconds.
My solution
It's possible to use a generator function to yield back control to the event loop. I will use a while loop that will run for 3 seconds to act as a long running callback.
Without a Generator function
let start = Date.now();
setInterval(() => console.log('resumed'), 500);
function loop() {
while ((Date.now() - start) < 3000) { //while the difference between Date.now() and start is less than 3 seconds
console.log('blocked')
}
}
loop();
The output would be:
// blocked
// blocked
//
// ... would not return to the event loop while the loop is running
//
// blocked
//...when the loop is over then the setInterval kicks in
// resumed
// resumed
With a Generator function
let gen;
let start = Date.now();
setInterval(() => console.log('resumed'), 500);
function *loop() {
while ((Date.now() - start) < 3000) { //while the difference between Date.now() and start is less than 3 seconds
console.log(yield output())
}
}
function output() {
setTimeout(() => gen.next('blocked'), 500)
}
gen = loop();
gen.next();
The output is:
// resumed
// blocked
//...returns control back to the event loop while though the loop is still running
// resumed
// blocked
//...end of the loop
// resumed
// resumed
// resumed
Using javascript generators can help run heavy computational functions that would yield back control to the event loop while it's still computing.
To know more about the event loop visit
https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Statements/function*
https://davidwalsh.name/es6-generators

GPars report status on large number of async functions and wait for completion

I have a parser, and after gathering the data for a row, I want to fire an aync function and let it process the row, while the main thread continues on and gets the next row.
I've seen this post: How do I execute two tasks simultaneously and wait for the results in Groovy? but I'm not sure it is the best solution for my situation.
What I want to do is, after all the rows are read, wait for all the async functions to finish before I go on. One concern with using a collection of Promises is that the list could be large (100,000+).
Also, I want to report status as we go. And finally, I'm not sure I want to automatically wait for a timeout (like on a get()), because the file could be huge, however, I do want to allow the user to kill the process for various reasons.
So what I've done for now is record the number of rows parsed (as they occur via rowsRead), then use a callback from the Promise to record another row being finished processing, like this:
def promise = processRow(row)
promise.whenBound {
rowsProcessed.incrementAndGet()
}
Where rowsProcessed is an AtomicInteger.
Then in the code invoked at the end of the sheet, after all parsing is done and I'm waiting for the processing to finish, I'm doing this:
boolean test = true
while (test) {
Thread.sleep(1000) // No need to pound the CPU with this check
println "read: ${sheet.rowsRead}, processed: ${sheet.rowsProcessed.get()}"
if (sheet.rowsProcessed.get() == sheet.rowsRead) {
test = false
}
}
The nice thing is, I don't have an explosion of Promise objects here - just a simple count to check. But I'm not sure sleeping every so often is as efficient as checking the get() on each Promise() object.
So, my questions are:
If I used the collection of Promises instead, would a get() react and return if the thread executing the while loop above was interrupted with Thread.interrupt()?
Would using the collection of Promises and calling get() on each be more efficient than trying to sleep and check every so often?
Is there another, better approach that I haven't considered?
Thanks!
Call to allPromises*.get() will throw InterruptedException if the waiting (main) thread gets interrupted
Yes, the promises have been created anyway, so grouping them in a list should not impose additional memory requirements, in my opinion.
The suggested solutions with a CountDownLanch or a Phaser are IMO much more suitable than using busy waiting.
An alternative to an AtomicInteger is to use a CountDownLatch. It avoids both the sleep and the large collection of Promise objects. You could use it like this:
latch = new CountDownLatch(sheet.rowsRead)
...
def promise = processRow(row)
promise.whenBound {
latch.countDown()
}
...
while (!latch.await(1, TimeUnit.SECONDS)) {
println "read: ${sheet.rowsRead}, processed: ${sheet.rowsRead - latch.count}"
}

Resources