Mapping large array is blocking my nodejs thread - node.js

I'm trying to map a large array(around 11k items). The actual mapping function is super simple, but the amount of items in the array is just too much and it blocks everything.
What's the best approach to avoid this? I tried using Async map, but I'm getting the same problem.

You can somehow change the sync (map) operation to an async operation using Promise or setTimeout. Recursive function can be used to progressively process the items in large array.
For example:
const largeArrays = [];
const resultArrays = [];
function process(source, target, index) {
if (index === target.length) {
// Now the result Arrays should have all processed data
return
}
// Dummy map action here for example, please change to your own one
target.push(source[index] + 1);
setTimeout(() => { process(source, target, index + 1) }, 0);
}
process(largeArrays, resultArrays, 0)
You can wrap about code into a Promise and resolve it instead of using the return statement above.
You don't need any fancy library, just native javascript function. You can check on two of my blogs illustrating ideas for these kinds of problems.
How to avoid Stack overflow error on recursion
How to make long running loop breakable?

i did not try this but using an async function that handles the mapping part, then call that function in every iteration with necessary information (index, array item etc.) wont help?

Related

Nested loop Synchronous or Asynchronous?

I have an array called noNCreatedResources. I want do some operation on each item of array and push item in createdResources array and remove the item from noNCreatedResources array and continue to do that until noNCreatedResources be empty. For this I've written CreateResources function including nested while and for loop. It works fine but i realize that it don't work synchronously. For example: it must iterate twice in while loop but iterates 4 times and I don't know why.
I think I don't understand concept of async/await/non-blocking concept of node.js. can any body help me to realize what the problem is?
CreateResources = async () => {
while (this.noNCreatedResources.length > 0) {
for (let index = 0; index < this.noNCreatedResources.length; index++) {
if (this.resourceHasCreatedDependencies(this.noNCreatedResources[index])) {
const resourceModel = this.someOperation(this.noNCreatedResources[index]);
this.createdResources.push(resourceModel);
this.noNCreatedResources.splice(index, 1);
}
}
}
}
First of all you are not doing anything asynchronous in you function so you can remove the async keyword from your function. Since you are not doing anything asynchronous so, your problem is not related to it. It is more of an implementation problem IMO.
Your while loop is useless for what you are trying to achieve. Also, your logic is broken!
Example: The following code will output 1, 3, and 5.
let x = [1,2,3,4,5];
for(let i = 0; i < x.length; i++) {
console.log(x[i]);
x.splice(i, 1);
}
I do not think you need to remove item from array to achieve your expected result. If you need to reset the array then at the end you can just do this x = [] to reset the array.
The problem you have is not due to async calls. Actually, your code is entirely synchronous. Try to take a look at where the "noNCreatedResources" is been created/updated. Async calls happens when you're sending a http request, reading a file etc, in other words, operations that doesn't happens inside your code. It allows the code to go on, not blocking the next function calls, and when the promise is fulfilled, the callback function is invoked.

Running knex queries synchronously

I have complex solution and I just need to run knex synchronously, is it possible?
I have scenario when knex query is run inside Promise.mapSeries for array with unknown number of elements. For each element some knex query is called, including insert query.
So, this insert could affect result for the next element of array.
var descriptionSplitByCommas = desc.split(",");
Promise.mapSeries(descriptionSplitByCommas, function (name) {
// knex.select
// knex.insert if select doesn't return results
});
This was not my initial code, so maybe even Promise.mapSeries should be removed. But I need each descriptionSplitByCommas array elements to be processed syncrhonously.
Otherwise often while processing next description in array I get SQL error, because of duplicate elements inserted for column with unique index. This would not happen if query would be synchronous.
I am using native promises, so I do not have experience with mapSeries, therefore I cannot tell you what exactly is going on at current state.
However running several asynchronous commands in series instead of parallel is quite common. There is one important thing, you have to know - once you create Promise, you do not have control about how and when it will be resolved. So if you create 100 Promises, they all start resolving in parallel.
This is the reason, there is no method for native promises like Promise.series - it is not possible.
What are your options? If you need to "create promise at one place, but run it in another", then factory method is your friend:
const runPromiseLater = () => Promise.resolve(25);
// some code
const myRealPromise = runPromiseLater();
myRealPromise.then( //
Of course, you can create array with these methods, then is question - how to run it in series?
If you can use Node with support for async/await, then for cycle is good enough
async function runInSeries(array) {
for (let i=0;i < array.length; i++){
await array[i]();
// or if you have only instructions in array then you get the value and then call some // await myMethod(array[i])
}
}
If you cant use that, then async library is your friend: https://caolan.github.io/async/docs.html#series
If you need to use the value from previous calls, you can use .waterfall

Node.js async recursive function with callback

Task: I need to recursively walk through a json object and make certain changes to the keys. I will be handling objects with varying depths, and varying sizes. When the function hits a key whose value is an object, it is called again on that object.
Problem 1.: Doing this as a synchronous function, I noticed that large json objects were returning incomplete. Using the the async library, async.forEach solves the problem of handling long tasks and returning only when finished, but...
Problem 2.: It seems that the async function loses concurrency (?) when it is called recursively (pointed out in the code snippet).
To test this idea, I removed the recursive function call and it worked (but without recursion), and I got a callback. When I add the function call back in, I get TypeError: results is not a function .
This leads me to think that async with callback and recursive functions don't mix in node. Is there a way to achieve both?
Possible Fix: I could run a separate function to count all the keys and use a counter as my control in a simple for loop, rather than letting forEach handle the control. That seems a bit inefficient, no?
Here's the code:
function fixJsonKeys(obj, results) {
async.forEach(Object.keys(obj), function(key, callback) {
if (typeof obj[key] == 'object') {
// do stuff to json key, then call the function
// on the nested object
fixJsonKeys(obj[key]);
callback(); // <-- how does this work with recursion??
}
else {
// do stuff to json key
callback();
}
}, function(err) {
if (err) return next(err);
// obj keys fixed, now return completed object
results(obj);
});
}
EDIT: hard to format in comments, so:
#Laksh and #Suhail: I tried your suggestions and same outcomes. If I remove the callback from the if condition, the async.forEach still looks to confirm that it has handled the top level keys (I think).
For example, say I have 3 top level keys, and one of them has a nested object:
[key1] (no callback, do recursion)
--[subKey1]
--[subKey2]
[key2] (callback)
[key3] (callback)
async.forEach is still looking for a callback on action taken for key1. Do I have this correct?
Since you are using recursive functions here, you don't need to call callback() inside the if condition.
This is because your recursive call will return to the existing callback() in if once the entire recursive stack finished for that particular obj[key].
Recursive function return only when the base condition is true, in your case when if condition fails so it will automatically call callback() from else block.

For loop in redis with nodejs asynchronous requests

I've got a problem with redis and nodejs. I have to loop through a list of phone numbers, and check if this number is present in my redis database. Here is my code :
function getContactList(contacts, callback) {
var contactList = {};
for(var i = 0; i < contacts.length; i++) {
var phoneNumber = contacts[i];
if(utils.isValidNumber(phoneNumber)) {
db.client().get(phoneNumber).then(function(reply) {
console.log("before");
contactList[phoneNumber] = reply;
});
}
}
console.log("after");
callback(contactList);
};
The "after" console log appears before the "before" console log, and the callback always return an empty contactList. This is because requests to redis are asynchronous if I understood well. But the thing is I don't know how to make it works.
How can I do ?
You have two main issues.
Your phoneNumber variable will not be what you want it to be. That can be fixed by changing to a .forEach() or .map() iteration of your array because that will create a local function scope for the current variable.
You have create a way to know when all the async operations are done. There are lots of duplicate questions/answers that show how to do that. You probably want to use Promise.all().
I'd suggest this solution that leverages the promises you already have:
function getContactList(contacts) {
var contactList = {};
return Promise.all(contacts.filter(utils.isValidNumber).map(function(phoneNumber) {
return db.client().get(phoneNumber).then(function(reply) {
// build custom object
constactList[phoneNumber] = reply;
});
})).then(function() {
// make contactList be the resolve value
return contactList;
});
}
getContactList.then(function(contactList) {
// use the contactList here
}, funtion(err) {
// process errors here
});
Here's how this works:
Call contacts.filter(utils.isValidNumber) to filter the array to only valid numbers.
Call .map() to iterate through that filtered array
return db.client().get(phoneNumber) from the .map() callback to create an array of promises.
After getting the data for the phone number, add that data to your custom contactList object (this is essentially a side effect of the .map() loop.
Use Promise.all() on the returned array of promises to know when they are all done.
Make the contactList object we built up be the resolve value of the returned promise.
Then, to call it just use the returned promise with .then() to get the final result. No need to add a callback argument when you already have a promise that you can just return.
The simplest solution may be to use MGET with a list of phone numbers and put the callback in the 'then' section.
You could also put the promises in an array and use Promise.all().
At some point you might want your function to return a promise rather than with callback, just to stay consistent.
Consider refactoring your NodeJS code to use Promises.
Bluebird is an excellent choice: http://bluebirdjs.com/docs/working-with-callbacks.html
you put async code into a for loop (sync operations). So, each iteration of the for loop is not waiting for the db.client(...) function to end.
Take a look at this stackoverflow answer, it explains how to make async loops :
Here

How do I make a large but unknown number of REST http calls in nodejs?

I have an orientdb database. I want to use nodejs with RESTfull calls to create a large number of records. I need to get the #rid of each for some later processing.
My psuedo code is:
for each record
write.to.db(record)
when the async of write.to.db() finishes
process based on #rid
carryon()
I have landed in serious callback hell from this. The version that was closest used a tail recursion in the .then function to write the next record to the db. However, I couldn't carry on with the rest of the processing.
A final constraint is that I am behind a corporate proxy and cannot use any other packages without going through the network administrator, so using the native nodejs packages is essential.
Any suggestions?
With a completion callback, the general design pattern for this type of problem makes use of a local function for doing each write:
var records = ....; // array of records to write
var index = 0;
function writeNext(r) {
write.to.db(r, function(err) {
if (err) {
// error handling
} else {
++index;
if (index < records.length) {
writeOne(records[index]);
}
}
});
}
writeNext(records[0]);
The key here is that you can't use synchronous iterators like .forEach() because they won't iterate one at a time and wait for completion. Instead, you do your own iteration.
If your write function returns a promise, you can use the .reduce() pattern that is common for iterating an array.
var records = ...; // some array of records to write
records.reduce(function(p, r) {
return p.then(function() {
return write.to.db(r);
});
}, Promsise.resolve()).then(function() {
// all done here
}, function(err) {
// error here
});
This solution chains promises together, waiting for each one to resolve before executing the next save.
It's kinda hard to tell which function would be best for your scenario w/o more detail, but I almost always use asyncjs for this kind of thing.
From what you say, one way to do it would be with async.map:
var recordsToCreate = [...];
function functionThatCallsTheApi(record, cb){
// do the api call, then call cb(null, rid)
}
async.map(recordsToCreate, functionThatCallsTheApi, function(err, results){
// here, err will be if anything failed in any function
// results will be an array of the rids
});
You can also check out other ones to enable throttling, which is probablya good idea.

Resources