When should I split some task into asynchronous tinier tasks? - node.js

I'm writing a personal project in Node and I'm trying to figure out when a task should be asynchronously splitted. Let's say I have this "4-Step-Task", they are not very expensive (the most expensive its the one who iterates over an array of objects and trying to match a URL with a RegExp, and the array probably won't have more than 20 or 30 objects).
part1().then(y => {
doTheSecondPart
}).then(z => {
doTheThirdPart
}).then(c => {
doTheFourthPart
});
The other way will be just executing one after another, but nothing else will progress until this task is done. With the above approach, others tasks can progress at least a little bit between each part.
Is there any criteria about when this approach should be prefered over a classic synchronous one?
Sorry my bad english, not my native language.

All you've described is synchronous code that isn't very long to run. First off, there's no reason to even use promises for that type of code. Secondly, there's no reason to break it up into chunks. All you would be doing with either of those choices is making the code more complicated to write, more complicated to test and more complicated to understand and it would also run slower. All of those are undesirable.
If you force even synchronous code into a promise, then a .then() handler will give some other code a chance to run between .then() handlers, but only certain types of events can be run there because processing a resolved promise is one of the highest priority things to do in the event queue system. It won't, for example, allow another incoming http request arriving on your server to start to run.
If you truly wanted to allow other requests to run and so on, you would be better off just putting the code (without promises) into a WorkerThread and letting it run there and then communicate back the result via messaging. If you wanted to keep it in the main thread, but let any other code run, you'd probably have to use a short setTimeout() delay to truly let all possible other types of tasks run in between.
So, if this code doesn't take much time to run, there's just really no reason to mess with complicating it. Just let it run in the fastest, quickest and simplest way.
If you want more concrete advice, then please show some actual code and provide some timing information about how long it takes to run. Iterating through an array of 20-30 objects is nothing in the general scheme of things and is not a reason to rewrite it into timesliced pieces.
As for code that iterates over an array/list of items doing matching against some string, this is exactly what the Express web server framework does on every incoming URL to find the matching routes. That is not a slow thing to do in Javascript.

Asynchronous programming is a better fit for code that must respond to events – for example, any kind of graphical UI. An example of a situation where programmers use async but shouldn't is any code that can focus entirely on data processing and can accept a “stop-the-world” block while waiting for data to download.
I use it extensivly with a rest API server as we have no idea of how long a request can take to for a server to respond . So in order for us not to "block the app" while waiting for the server response async requests are most useful
part1().then(y => {
doTheSecondPart
}).then(z => {
doTheThirdPart
}).then(c => {
doTheFourthPart
});
As you have described in your sample is much more of a synchronous procedural process that would not necessarily allow your interface to still work while your algorithm is busy with a process
In the case of a server call, if you still waiting for server to respond the algorithm using then is still using up resources and wont free your app up to run any other user interface events, while its waiting for the process to reach the next then statement .
You should use Async Await in this instance where you waiting for a user event or a server to respond but do not want your app to hang while waiting for server data...
async function wait() {
await new Promise(resolve => setTimeout(resolve,2000));
console.log("awaiting for server once !!")
return 10;
}
async function wait2() {
await new Promise(resolve => setTimeout(resolve,3000));
console.log("awaiting for server twice !!")
return 10;
}
async function f() {
let promise = new Promise((resolve, reject) => {
setTimeout(() => resolve("done!"), 1000)
});
let result = await promise;//.then(async function(){
console.log(result)
let promise6 = await wait();
let promise7 = await wait2();
//}); // wait until the promise resolves (*)
//console.log(result); // "done!"
}
f();
This sample should help you gain a basic understanding of how async/ Await works and here are a few resources to research it
Promises and Async
Mozilla Refrences

Related

Does an backend endpoint with long awaits within it block other endpoints?

My backend has a few endpoints, most of them return some json to the customer and are pretty fast, however one of them takes a very long time to process.
It takes an image url from the request body, manipulates that image to get a new one, and once the image is processed it uploads it to a server in order to get back a url,
and only then it can use the url to make an order.
Getting the enhanced image and uploading it to the server (to get back the url) take a long time, like a good 3 seconds each if not more. I don't want the "order" endpoint to block the other endpoints, if that is something that would happen.
Each order is independent from the previous or the next one and I don't care how long it takes to process one,
if it means it doesn't distrupt and block the event loop.
For now this is my code:
app.post("/order", async (req,res) => {
AIEnhancedImage = await enhance(req.body.image)
url = await uploadImageToServer(AIEnhancedImage)
order(url)
}
app.get("/A"), async (req,res) => {
...
}
app.get("/B"), async (req,res) => {
...
}
app.get("/C"), async (req,res) => {
...
}
My question is, if another endpoint is hit, will that endpoint be blocked by the "order" one if there is one processing?
If it does, what is a better implementation to make sure the order endpoint is processed bit by bit instead all at once?
This doubt probably arises from my lack of knowledge about the event loop. what I hope is that
the code from the order endpoint will be added to the event loop but be processed indipendently and at the same time as other
requests from other endpoints. The blocking part would only be within that endpoint, so it wouldn;t affect significantly the performance of other endpoints.
The answer is it depends.
Is the code below CPU intensive or IO intensive?
AIEnhancedImage = await enhance(req.body.image)
url = await uploadImageToServer(AIEnhancedImage)
order(url)
Only one active user action can run inside an event loop callback. So if you are doing some cpu intensive task than nothing else can run on unless that task finishes.
Think it like this.. what ever custom code you write only one thing can run at a time.
But if you are doing IO based task then node JS will use special worker pool to process and wait for IO. So while Node JS waits for IO, node JS will pick something else in event loop and try to process it.
https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/

Should an AWS Lambda function instance in Node.js pick up another request during an async await?

Let's say I've got a queue of requests for my Lambda, and inside the lambda might be an external service call that takes 500ms, which is wrapped in async await like
async callSlowService(serializedObject: string) Promise<void>{
await slowServiceClient.post(serializedObject);
}
Should I expect that my Lambda instance will pick up another request off the queue while awaiting the slow call? I know it'll also spin up new Lambda instances but that's not what I'm talking about interleaving requests on a single instance.
I'm asking because I would think that it should do this, however I'm testing with a sleep function and a load generator and it's not happening. My code actually looks like this:
async someCoreFunction() Promise<void>{
// Business logic
console.log("Before wait");
await sleep(2000);
console.log("After wait");
}
}
const sleep = (milliseconds) => {
return new Promise(resolve => setTimeout(resolve, milliseconds))
};
And while it definitely is taking 2 seconds between the "Before wait" and "After wait" statements, there's no new logs being written in that time.
No.
Lambda as a service is largely unaware of what your code is doing. It simply takes a request, invokes your code and then waits for it to return.
I would not expect AWS to implement a feature like interleaving any time soon. It would require the lambda runtime to have substantial knowledge of how your code behaves (for example, you may be awaiting two concurrent long asynchronous calls within one invocation- so simply interrupting when you hit your first await would be incorrect). It would also cause no end of issues for people using the shared scope outside of the handler for common setup/teardown.
As you pay per invocation and time, I don't really see that there is much difference between interleaving and processing the queue in parallel (which lambda natively supports); considering that time spent awaiting still requires some compute. If interleaving ever happens I'd expect it to be a way for AWS to reduce the drain on their own resources.
n.b. If you are awaiting for a long time in a lambda function then there is probably a better way of doing things. For example, Step Functions provide a great way to kick off and poll long running tasks. Similarly, the pattern of using a session variable in your payload is a good way of allowing a long service to callback into lambda without having the lambda idling.

Using Promises to make In Memory Processing Concurrent

We have a project where we need to process ~5,000 objects and each object takes 200-500 milliseconds each to process. A developer on my team suggested using promises to try to process each object concurrently. So basically something like this:
let result = await Promise.all(objects.map(o => process(o));
The process() code might look like this:
async process(theObject) {
return new Promise(resolve => {
1 + 1 = 2;
resolve();
});
}
While it seems like a fair pattern, it seems like an anti-pattern, or a code smell. There also seems to be something about how Node/V8 handles promises that might introduce major issues later. Anyone have any thoughts on this pattern and whether it might be use-ful/less?
One caveat of using Promise.all() is how it handles errors. From the MDN:
It rejects with the reason of the first promise that rejects.
So if a single processing error of the ~5000 objects stops the entire process is okay, then it seems like a decent tool. I would recommend setting up a queue to both separate out the processing from the orchestration of the messages as well as provide scalability advantages.

Using caolan's async library, when should I use process.nextTick and when should I just invoke the callback?

I've been reading up on the differences, and it's hard to think about it when utilizing a library that helps with async methods.
I'm using https://github.com/caolan/async to write specific async calls. I'm having a hard time understanding the use of process.nextTick within some of these async methods, particularly async series methods, where async methods are basically performed synchronously.
So for example:
async.waterfall([
next => someAsyncMethod(next),
(res, next) => {
if (res === someCondition) {
return anotherAsyncMethod(next);
}
return process.nextTick(next); // vs just calling next()
},
], cb);
I've seen this done in code before. But I have no idea why? Just invoking next instead of process.nextTick gives me the same results?
Is there any reason for using process.nextTick in these scenarios, where there is an async method being controlled in a synchronous manner?
Also, what about in an async like the following?
async.map(someArray, (item, next) => {
if (item === someCondition) {
return anotherAsyncMethod(next);
}
return process.nextTick(next); // vs just calling next()
}, cb);
Thanks!
The code is happening in a SEQUENTIAL manner, not a synchronous manner. It's an important difference.
In your code, the async methods are called one after another, in sequence. HOWEVER, while that code is executing, node.js can still respond to other incoming requests because your code yields control.
Node is single-threaded. So if your code is doing something synchronously, node cannot accept new requests or perform actions until that code is finished. For instance, if you did a synchronous web request, node would stop doing ANYTHING ELSE until that request was finished.
What's really happening is this:
Start async action in background (yield control)
Node is available to handle other stuff
Async action 1 completes. Node starts async action 2 and yields control.
Node can accept other requests/handle other stuff.
Async action 2 completes...
And so on. Process.nextTick() says to node 'Stop dealing with this for a while, and come back once you've handled the other stuff that's waiting on you'. Node goes off and handles whatever that is, then gets back to handling your scheduled request.
In your case, there is nothing else waiting so node just continues where it left off. However, if there WERE other things going on like other incoming HTTP requests, this would not be the case.
Feel free to ask questions.

Synchronous NodeJs (or other serverside JS) call

We are using Node for developing and 95% of code is Async, working fine.
For some 5% (one small module), which is sync in nature [and depends on other third party software],
and we are looking for
1. "Code to block until call back is finished"
2. At a time only one instance of function1 + its callback should be executed.
PS 1: I do completely agree, Node is for async work, We should avoid that, but this is separate non-realtime process.
PS 2: If not with Node any other Serverside JS framework? Last option is to use other lang like python, but if anything in JS possible, we are ready to give it a shot!
SEQ should solve your problem.
For an overview about sync modules please look at http://nodejsrocks.blogspot.de/2012/05/how-to-avoid-nodejs-spaghetti-code-with.html
Seq()
.seq(function () {
mysql.query("select * from foo",[], function(err,rows,fields) {
this(null, rows);
});
})
.seq(function(mysqlResult) {
console.log("mysql callback returnes:"+mysqlResult);
})
There are lots and lots of options, look at node-async, kaffeine async support, IcedCoffeescript, etc.
I want to make a plug for IcedCoffeeScript since I'm its maintainer. You can get by with solutions like Seq, but in general you'll wind up encoding control flow with function calls. I find that approach difficult to write and maintain. IcedCoffeeScript makes simple sequential operations a breeze:
console.log "hello, just wait a sec"
await setTimeout defer(), 100
console.log "ok, what did you want"
But more important, it handles any combination of async code and standard control flow:
console.log "Let me check..."
if isRunningLate()
console.log "Can't stop now, sorry!"
else
await setTimeout defer(), 1000
console.log "happy to wait, now what did you want?"
resumeWhatIWasDoingBefore()
Also loops work well, here is serial dispatch:
for i in [0...10]
await launchRpc defer res[i]
done()
And here is parallel dispatch:
await
for i in [0...10]
launchRpc defer res[i]
done()
Not only does ICS make sequential chains of async code smoother, it also encourages you to do as much as possible in parallel. If you need to change your code or your concurrency requirements, the changes are minimal, not a complete rewrite (as it would be in standard JS/CS or with some concurrency libraries).

Resources