Mongoose eachAsync with async function = memory leak? - node.js

I am not sure if it's a bug in mongoose or if I am doing something wrong. Once I start using async functions when iterating on a cursor with eachAsync I experience memory leaks (quickly goes up to 4gb and then crashes). After trying some things I noticed that this wouldn't happen if I don't use an async function as callback.
No Memory leak:
const playerCursor: QueryCursor<IPlayerProfileModel> = PlayerProfile.find({}, projection).lean().cursor();
await playerCursor.eachAsync(
(profile: IPlayerProfileModel) => {
return;
},
{ parallel: 50 }
);
Memory leak:
const playerCursor: QueryCursor<IPlayerProfileModel> = PlayerProfile.find({}, projection).lean().cursor();
await playerCursor.eachAsync(
async (profile: IPlayerProfileModel) => {
return;
},
{ parallel: 50 }
);
Obviously above code doesn't make any sense but I need to perform an asynchronous operation within the function.
Question:
What is causing the memory leak / how can I avoid it?

It has to do with how async functions work.
Quoting the documentation:
When the async function returns a value, the Promise will be resolved
with the returned value.
Meaning, values returned by async functions will be automatically wrapped into a Promise.
In your first code sample, your code returns undefined whereas in the second code sample your code returns Promise.resolve(undefined).
What is causing the memory leak?
I didn't take a look at mongoose code but the documentation states:
If fn returns a promise, will wait for the promise to resolve before iterating on to the next one.
Since your first example does not return a Promise, I am thinking your callback is executed on each result all at once rather than sequentially.
How can I avoid it?
I'd recommend using async/wait as you used it on your second code sample.
After taking a look at the code (looking for an answer myself), if provided with a callback that doesn't return a promise eachAsync will run the callback as many times as fast as possible as it can.
This line is where your callback is executed. The next line checks whether it's a Promise and if it's not then execute right away callback which effectively calls your eachAsync callback on the next result. If your callback has any sort of async operation but returns right away then you end up with thousands and thousands of async operations running all at once.
On top of that, you set the option parallel to 100 so it executes eachAsync callback one hundred times in parallel.
This isn't a bug on mongoose because there are cases where this behavior is wanted and it does provide with a sequential processing using Promise. The documentation should mention the caveat of using a callback which doesn't return a Promise.
To go a little further, express uses next on middleware callbacks for the purpose of sequencing them.

Related

JS: Why Promise then() method executes synchronously?

I need to make part of my method's code asynchronous, so it will execute in a non-blocking manner. For this purpose I've tried to create a "dummy" Promise and put the specified code in then block. I have something like this:
public static foo(arg1, arg2) {
const prom = new Promise((resolve, reject) => {
if (notValid(arg1, arg2)) {
reject();
}
resolve(true);
});
prom.then(() => {
...my code using arg1 and arg2...
});
}
However, then block always executes synchronously and blocks whole app, even though each and every JS documentation tells that then always runs asynchronously. I've also tried to replace the Promise with this:
Promise.resolve().then(() => {
...my code using arg1 and arg2...
});
but got same result.
The only way I've managed to make then block work asynchronously is by using setTimeout:
const pro = new Promise(resolve => {
setTimeout(resolve, 1);
});
pro.then(() => {
...my code using arg1 and arg2...
})
What can be the reason behind then block working synchronously? I don't want to proceed with using setTimeout, because it is kind of a "dirty" solution, and I think that there should be a way to make then run asynchronously.
Application is NodeJS with Express, written using Typescript.
I need to make part of my method's code asynchronous, so it will execute in a non-blocking manner. For this purpose I've tried to create a "dummy" Promise and put the specified code in then block.
Promises don't really make things asynchronous in and of themselves. What they do is wrap around something that's already asynchronous, and give a convenient way to tell when that thing is done. If you wrap a promise around something synchronous, your code is mostly still synchronous, just with a few details about when the .then callback executes.
even though each and every JS documentation tells that then always runs asynchronously.
By that, they mean that it waits for the current call stack to finish, and then runs the code in the .then callback. The technical term for what it's doing is a "microtask". This delay is done so that the order of operations of your code is the same whether the promise is already in a resolved state, or if some time needs to pass before it resolves.
But if your promise is already resolved (eg, because it's wrapped around synchronous code), the only thing your .then callback will be waiting for is the currently executing call stack. Once the current call stack synchronously finishes, your microtask runs synchronously to completion. The event loop will not be able to progress until you're done.
The only way I've managed to make then block work asynchronously is by using setTimeout
setTimeout will make things asynchronous*, yes. The code will be delayed until the timer goes off. If you want to wrap this in a promise you can, but the promise is not the part that makes it asynchronous, the setTimeout is.
I don't want to proceed with using setTimeout, because it is kind of a "dirty" solution
Ok, but it's the right tool for the job.
* when i say setTimeout makes it asynchronous, i just mean it delays the execution. This is good enough for many cases, but when your code eventually executes, it will tie up the thread until it's done. So if it takes a really long time you may need to write the code so it just does a portion of the work, and then sets another timeout to resume later

Considering node.js asynchronous architecture, why doesn't push throw error when executed after array init function calls?

I'm fairly new to Node.js and learning about the history of callbacks, Promises, and async/await. So far I have written some code that I expected to throw an error, but it works fine. I suppose I should be happy with that, but I'd like to fully understand when async/await needs to be used.
Here is my code:
const rooms = {};
function initRoom(roomId) {
if (!rooms[roomId]) {
rooms[roomId] = {};
rooms[roomId].users = [];
rooms[roomId].messages = [];
}
}
initRoom('A');
initRoom('B');
rooms.A.users.push(1);
rooms.A.users.push(2);
rooms.B.users.push(3);
console.log(rooms);
I expected the first push() to throw an error, assuming that it would execute in the stack before the initRoom() function calls completed. Why doesn't it throw an error?
There is nothing in initRoom() that is asynchronous. It's all synchronous code.
The "asynchronous architecture" you refer to has to do with specific library operations in nodejs that have an asynchronous design. These would be things such as disk operations or network operations. Those all have an underlying native code implementation that allows them to be asynchronous.
Nothing in your initRoom() function is asynchronous or calls anything asynchronous. Therefore, it is entirely synchronous and each line of code executes sequentially.
but I'd like to fully understand when async/await needs to be used.
You use promises and optionally async/await when you have operations that are actually asynchronous (they execute in the background and notify of completion later). You would not typically use them with purely synchronous code because they would only complicate the implementation and do not add any value if all the code is already synchronous.
I expected the first push() to throw an error, assuming that it would execute in the stack before the initRoom() function calls completed.
initRoom() is not asynchronous and contains no asynchronous code. When you call it, it runs each line in it sequentially before it returns and allows the next line of code after calling initRoom() to execute.

Does putting synchronous functions in an async function still block nodejs?

For example, require is synchronous.
If I put require in an async function and call that async function, does it block nodejs?
If I put require in an async function and call that async function, does it block nodejs?
Yes, it does. If the module you are using require() to load is not already cached, then it will block the interpreter to load the module from disk using synchronous file I/O. The fact that it's in an async function doesn't affect anything in this regard.
async functions don't change blocking, synchronous operations in any way. They provide automatic exception handling and allow the use of await and always return a promise. They have no magic powers to affect synchronous operations within the function.
FYI, in nearly all cases, modules you will need in your code should be loaded at module initialization. They can then be referenced later from other code without blocking the interpreter to load them.
It will still block. This is true for whichever blocking code you would wrap in an async function. Also realise that using an async function is not useful unless you also use await in it.
You could for instance write the async function as follows:
async function work() {
await null;
synchronous_task();
}
work();
console.log("called work");
Here the call to work will return immediately, because of the await, but as soon as the code following that call has completed (until the call stack is empty), the execution will continue with what follows the await null, and will still block until that synchronous code has finished executing.

Using caolan's async library, when should I use process.nextTick and when should I just invoke the callback?

I've been reading up on the differences, and it's hard to think about it when utilizing a library that helps with async methods.
I'm using https://github.com/caolan/async to write specific async calls. I'm having a hard time understanding the use of process.nextTick within some of these async methods, particularly async series methods, where async methods are basically performed synchronously.
So for example:
async.waterfall([
next => someAsyncMethod(next),
(res, next) => {
if (res === someCondition) {
return anotherAsyncMethod(next);
}
return process.nextTick(next); // vs just calling next()
},
], cb);
I've seen this done in code before. But I have no idea why? Just invoking next instead of process.nextTick gives me the same results?
Is there any reason for using process.nextTick in these scenarios, where there is an async method being controlled in a synchronous manner?
Also, what about in an async like the following?
async.map(someArray, (item, next) => {
if (item === someCondition) {
return anotherAsyncMethod(next);
}
return process.nextTick(next); // vs just calling next()
}, cb);
Thanks!
The code is happening in a SEQUENTIAL manner, not a synchronous manner. It's an important difference.
In your code, the async methods are called one after another, in sequence. HOWEVER, while that code is executing, node.js can still respond to other incoming requests because your code yields control.
Node is single-threaded. So if your code is doing something synchronously, node cannot accept new requests or perform actions until that code is finished. For instance, if you did a synchronous web request, node would stop doing ANYTHING ELSE until that request was finished.
What's really happening is this:
Start async action in background (yield control)
Node is available to handle other stuff
Async action 1 completes. Node starts async action 2 and yields control.
Node can accept other requests/handle other stuff.
Async action 2 completes...
And so on. Process.nextTick() says to node 'Stop dealing with this for a while, and come back once you've handled the other stuff that's waiting on you'. Node goes off and handles whatever that is, then gets back to handling your scheduled request.
In your case, there is nothing else waiting so node just continues where it left off. However, if there WERE other things going on like other incoming HTTP requests, this would not be the case.
Feel free to ask questions.

Promise is neither resolved nor rejected. What could be the reason for this?

I am running into a strange problem. I am using a module to look up the geo location from the IP address. The lookup method by default is sync.
I converted the method to async using bluebird, but its promise never gets resolved or rejected!
Here is the snippet:
var Promise = require('bluebird');
var geoip = Promise.promisifyAll(require('geoip-lite'));
geoip.lookupAsync('52.39.138.72').then((r) => {
console.log(r);
}).catch((err) => {
console.log(err);
})
console.log(geoip.lookup('52.39.138.72').country + '^^^^');
In the above snippet, the last console.log always gets printed but neither of the statement inside then or catch gets executed. What could be the reason for this?
In the above snippet, the last console.log always gets printed but neither of the statement inside then or catch gets executed. What could be the reason for this?
The function you are trying to promisify does not follow the required asynchronous calling convention so promisifying it this way will not work.
For Bluebird's promisify to work properly, the function you promisify must follow the node.js async calling convention. That means the function must take a callback as its last argument and that callback must be called with two arguments err and result when the operation completes. If the function does not follow this convention, then promisifying it will not work.
And, there is really no reason to take a synchronous operation and promisify it either. Promisfying it won't suddenly make its functionality asynchronous.
So, your promise is never getting resolved or rejected because the underlying function doesn't use a callback that gets called with the right calling convention.
So, if geoip.lookup('52.39.138.72') is completely synchronous (as it appears to be) and gets called this way, then the underlying operation isn't asynchronous so there is no reason to even try to promisify it.
If you explain what problem you're actually trying to solve by promisifying it, we could likely offer another way (perhaps in a new question). One thing to keep in mind about stack overflow. If you describe your actual problem and show us the relevant code rather than asking about issues with one attempted solution, then we are much more likely to be able to help you and to offer you the best solution.

Resources