web3js subscribe logs to fast for Javascript to handle

web3js subscribe logs to fast for Javascript to handle - node.js

I am using web3js to subscribe to logs, I listening to swap events, the problem is that the .on(data) is so fast in giving data JavaScript can not keep up. lets say I add a variable let count = 0; each time I get a new log I increase the number ++count, sometimes the logs come so fast I get a double number.
The real problem is I need it to be in the exact order as it is coming in, that's why I give the number to each log, but that does not work.
How would I make sure that each data item I get from the log events that they are in order?
I tried to create a promise sequence
let sequence = Promise.resolve();
let count = 0;
web3.eth.subscribe('logs', {
fromBlock: block,
topics: [
[swapEvent]
]
}).on('data', (logData)=>{
sequence = sequence.then(()=>{
++count
processData(logData)
})
});
function processData(){
return new Promise(resolve=>{
// do some stuff
resolve();
})
};
In a simple test with a loop and random time to resolve this works fine, but in the actual code with socket it does not keep the order.
Anyone has some idea how I can make the socket data keep in order and process one by one?

Not sure why but my problem got solved with this.
sequence = sequence.then(()=>processData(logData))
before it was
sequence = sequence.then(()=>{
processData(logData)
})
Now its doing all in sequence.

Related

Querying DB2 every 15 seconds causing memory leak in NodeJS

I have an application which checks for new entries in DB2 every 15 seconds on the iSeries using IBM's idb-connector. I have async functions which return the result of the query to socket.io which emits an event with the data included to the front end. I've narrowed down the memory leak to the async functions. I've read multiple articles on common memory leak causes and how to diagnose them.
MDN: memory management
Rising Stack: garbage collection explained
Marmelab: Finding And Fixing Node.js Memory Leaks: A Practical Guide
But I'm still not seeing where the problem is. Also, I'm unable to get permission to install node-gyp on the system which means most memory management tools are off limits as memwatch, heapdump and the like need node-gyp to install. Here's an example of what the functions basic structure is.
const { dbconn, dbstmt } = require('idb-connector');// require idb-connector
async function queryDB() {
const sSql = `SELECT * FROM LIBNAME.TABLE LIMIT 500`;
// create new promise
let promise = new Promise ( function(resolve, reject) {
// create new connection
const connection = new dbconn();
connection.conn("*LOCAL");
const statement = new dbstmt(connection);
statement.exec(sSql, (rows, err) => {
if (err) {
throw err;
}
let ticks = rows;
statement.close();
connection.disconn();
connection.close();
resolve(ticks.length);// resolve promise with varying data
})
});
let result = await promise;// await promise
return result;
};
async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};
Any ideas on where the leak is? Am i using async/await incorrectly? Or else am i creating/destroying DB connections improperly? Any help on figuring out why this code is leaky would be much appreciated!!
Edit: Forgot to mention that i have limited control on the backend processes as they are handled by another team. I'm only retrieving the data they populate the DB with and adding it to a web page.
Edit 2: I think I've narrowed it down to the DB connections not being cleaned up properly. But, as far as i can tell I've followed the instructions suggested on their github repo.

I don't know the answer to your specific question, but instead of issuing a query every 15 seconds, I might go about this in a different way. Reason being that I don't generally like fishing expeditions when the environment can tell me an event occurred.
So in that vein, you might want to try a database trigger that loads the key to the row into a data queue on add, or even change or delete if necessary. Then you can just put in an async call to wait for a record on the data queue. This is more real time, and the event handler is only called when a record shows up. The handler can get the specific record from the database since you know it's key. Data queues are much faster than database IO, and place little overhead on the trigger.
I see a couple of potential advantages with this method:
You aren't issuing dozens of queries that may or may not return data.
The event would fire the instant a record is added to the table, rather than 15 seconds later.
You don't have to code for the possibility of one or more new records, it will always be 1, the one mentioned in the data queue.

yes you have to close connection.
Don't make const data. you don't need promise by default statement.exec is async and handles it via return result;
keep setTimeout(getNewData, 2000);// check again in 2 seconds
line outside getNewData otherwise it becomes recursive infinite loop.
Sample code
const {dbconn, dbstmt} = require('idb-connector');
const sql = 'SELECT * FROM QIWS.QCUSTCDT';
const connection = new dbconn(); // Create a connection object.
connection.conn('*LOCAL'); // Connect to a database.
const statement = new dbstmt(dbconn); // Create a statement object of the connection.
statement.exec(sql, (result, error) => {
if (error) {
throw error;
}
console.log(`Result Set: ${JSON.stringify(result)}`);
statement.close(); // Clean up the statement object.
connection.disconn(); // Disconnect from the database.
connection.close(); // Clean up the connection object.
return result;
});
*async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};*
change to
**async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
};
setTimeout(getNewData, 2000);// check again in 2 seconds**

First thing to notice is possible open database connection in case of an error.
if (err) {
throw err;
}
Also in case of success connection.disconn(); and connection.close(); return boolean values that tell is operation successful (according to documentation)
Always possible scenario is to pile up connection objects in 3rd party library.
I would check those.

This was confirmed to be a memory leak in the idb-connector library that i was using. Link to github issue Here. Basically there was a C++ array that never had it's memory deallocated. A new version was added and the commit can viewed Here.

Best way to query all documents from a mongodb collection in a reactive way w/out flooding RAM

I want to query all the documents in a collection in a reactive way. The collection.find() method of the mongodb nodejs driver returns a cursor that fires events for each document found in the collection. So I made this:
function giant_query = (db) => {
var req = db.collection('mycollection').find({});
return Rx.Observable.merge(Rx.Observable.fromEvent(req, 'data'),
Rx.Observable.fromEvent(req, 'end'),
Rx.Observable.fromEvent(req, 'close'),
Rx.Observable.fromEvent(req, 'readable'));
}
It will do what I want: fire for each document, so I can treat then in a reactive way, like this:
Rx.Observable.of('').flatMap(giant_query).do(some_function).subscribe()
I could query the documents in packets of tens, but then I'd have to keep track of an index number for each time the observable stream is fired, and I'd have to make an observable loop which I do not know if it's possible or the right way to do it.
The problem with this cursor is that I don't think it does things in packets. It'll probably fire all the events in a short period of time, therefore flooding my RAM. Even if I buffer some events in packets using Observable's buffer, the events and events data (the documents) are going to be waiting on RAM to be manipulated.
What's the best way to deal with it n a reactive way?

I'm not an expert on mongodb, but based on the examples I've seen, this is a pattern I would try.
I've omitted the events other than data, since throttling that one seems to be the main concern.
var cursor = db.collection('mycollection').find({});
const cursorNext = new Rx.BehaviourSubject('next'); // signal first batch then wait
const nextBatch = () => {
if(cursor.hasNext()) {
cursorNext.next('next');
}
});
cursorNext
.switchMap(() => // wait for cursorNext to signal
Rx.Observable.fromPromise(cursor.next()) // get a single doc
.repeat() // get another
.takeWhile(() => cursor.hasNext() ) // stop taking if out of data
.take(batchSize) // until full batch
.toArray() // combine into a single emit
)
.map(docsBatch => {
// do something with the batch
// return docsBatch or modified doscBatch
})
... // other operators?
.subscribe(x => {
...
nextBatch();
});
I'm trying to put together a test of this Rx flow without mongodb, in the meantime this might give you some ideas.

You also might wanna check my solution without using of rxJS:
Mongoose Cursor: http bulk request from collection

Socket.io loses data on server side

Yellow,
so, I'm making a multiplayer online game on node (for funzies) and I'm stuck on a problem for over a week now. Perhaps the solution is simple, but I'm oblivious to it.
Long story short:
Data gets sent from client to server, this emit happens every
16.66ms.
Server receives them correctly and we collect all the data (lots of
fireballs in this case). We save them in player.skills_to_execute
array.
Every 5 seconds, we copy the data to seperate array (player_information), because
we are gona clean the current one, so it can keep collecting new
data, and then we send all the collected data back to the client.
Problem is definitely on server side. Sometimes this works, and sometimes it doesn't.
player_information is the array that I'm sending back to front, but before I send it, I do check with console.log on server if it does actually contain the data, and it does! But somehow that data gets deleted/overwritten right before sending it and it sends empty array (cause I check on frontend and I receive empty).
Code is fairly more complex, but I've minimized it here so it's easier to understand it.
This code stays on client side, and works as it should:
// front.js
socket.on("update-player-information", function(player_data_from_server){
console.log( player_data_from_server.skills_to_execute );
});
socket.emit("update-player-information", {
skills_to_execute: "fireball"
});
This code stays on server side, and works as it should:
// server.js
socket.on("update-player-information", function(data){
// only update if there are actually skills received
// we dont want every request here to overwrite actual array with empty [] // data.skills_to_execute = this will usually be 1 to few skills that are in need to be executed on a single client cycle
// naturally, we receive multiple requests in these 5 seconds,
// so we save them all in player object, where it has an array for this
if ( data.skills_to_execute.length > 0 ) {
player.skills_to_execute.push( data.skills_to_execute );
}
});
Now this is the code, where shit hits the fan.
// server.js
// Update player information
setInterval(function(){
// for every cycle, reset the bulk data that we are gona send, just to be safe
var player_information = [];
// collect the data from player
player_information.push(
{
skills_to_execute: player.skills_to_execute
}
);
// we reset the collected actions here, cause they are now gona be sent to front.js
// and we want to keep collecting new skills_to_execute that come in
player.skills_to_execute = [];
socket.emit("update-player-information", player_information);
}, 5000);
Perhaps anybody has any ideas?

Copy the array by value instead of by reference.
Try this:
player_information.push(
{
skills_to_execute: player.skills_to_execute.slice()
}
);
Read more about copying arrays in JavaScript by value or by reference here: Copying array by value in JavaScript

Run NodeJS event loop / wait for child process to finish

I first tried a general description of the problem, then some more detail why the usual approaches don't work. If you would like to read these abstracted explanations go on. In the end I explain the greater problem and the specific application, so if you would rather read that, jump to "Actual application".
I am using a node.js child-process to do some computationally intensive work. The parent process does it's work but at some point in the execution it reaches a point where it must have the information from the child process before continuing. Therefore, I am looking for a way to wait for the child-process to finish.
My current setup looks somewhat like this:
importantDataCalculator = fork("./runtime");
importantDataCalculator.on("message", function (msg) {
if (msg.type === "result") {
importantData = msg.data;
} else if (msg.type === "error") {
importantData = null;
} else {
throw new Error("Unknown message from dataGenerator!");
}
});
and somewhere else
function getImportantData() {
while (importantData === undefined) {
// wait for the importantDataGenerator to finish
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have a proper data now
return importantData;
}
}
So when the parent process starts, it executes the first bit of code, spawning a child process to calculate the data and goes on doing it's own bit of work. When the time comes that it needs the result from the child process to continue it calls getImportantData(). So the idea is that getImportantData() blocks until the data is calculated.
However, the way I used doesn't work. I think this is due to me preventing the event loop from executing by using the while-loop. And since the Event-Loop does not execute no message from the child-process can be received and thus the condition of the while-loop can not change, making it an infinite loop.
Of course, I don't really want to use this kind of while-loop. What I would rather do is tell node.js "execute one iteration of the event loop, then get back to me". I would do this repeatedly, until the data I need was received and then continue the execution where I left of by returning from the getter.
I realize that his poses the danger of reentering the same function several times, but the module I want to use this in does almost nothing on the event loop except for waiting for this message from the child process and sending out other messages reporting it's progress, so that shouldn't be a problem.
Is there way to execute just one iteration of the event loop in Node.js? Or is there another way to achieve something similar? Or is there a completely different approach to achieve what I'm trying to do here?
The only solution I could think of so far is to change the calculation in such a way that I introduce yet another process. In this scenario, there would be the process calculating the important data, a process calculating the bits of data for which the important data is not needed and a parent process for these two, which just waits for data from the two child-processes and combines the pieces when they arrive. Since it does not have to do any computationally intensive work itself, it can just wait for events from the event loop (=messages) and react to them, forwarding the combined data as necessary and storing pieces of data that cannot be combined yet.
However this introduces yet another process and even more inter-process communication, which introduces more overhead, which I would like to avoid.
Edit
I see that more detail is needed.
The parent process (let's call it process 1) is itself a process spawned by another process (process 0) to do some computationally intensive work. Actually, it just executes some code over which I don't have control, so I cannot make it work asynchronously. What I can do (and have done) is make the code that is executed regularly call a function to report it's progress and provided partial results. This progress report is then send back to the original process via IPC.
But in rare cases the partial results are not correct, so they have to be modified. To do so I need some data I can calculate independently from the normal calculation. However, this calculation could take several seconds; thus, I start another process (process 2) to do this calculation and provide the result to process 1, via an IPC message. Now process 1 and 2 are happily calculating there stuff, and hopefully the corrective data calculated by process 2 is finished before process 1 needs it. But sometimes one of the early results of process 1 needs to be corrected and in that case I have to wait for process 2 to finish its calculation. Blocking the event loop of process 1 is theoretically not a problem, since the main process (process 0) would not be be affected by it. The only problem is, that by preventing the further execution of code in process 1 I am also blocking the event loop, which prevents it from ever receiving the result from process 2.
So I need to somehow pause the further execution of code in process 1 without blocking the event loop. I was hoping that there was a call like process.runEventLoopIteration that executes an iteration of the event loop and then returns.
I would then change the code like this:
function getImportantData() {
while (importantData === undefined) {
process.runEventLoopIteration();
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have a proper data now
return importantData;
}
}
thus executing the event loop until I have received the necessary data but NOT continuing the execution of the code that called getImportantData().
Basically what I'm doing in process 1 is this:
function callback(partialDataMessage) {
if (partialDataMessage.needsCorrection) {
getImportantData();
// use data to correct message
process.send(correctedMessage); // send corrected result to main process
} else {
process.send(partialDataMessage); // send unmodified result to main process
}
}
function executeCode(code) {
run(code, callback); // the callback will be called from time to time when the code produces new data
// this call is synchronous, run is blocking until the calculation is finished
// so if we reach this point we are done
// the only way to pause the execution of the code is to NOT return from the callback
}
Actual application/implementation/problem
I need this behaviour for the following application. If you have a better approach to achieve this feel free to propose it.
I want to execute arbitrary code and be notified about what variables it changes, what functions are called, what exceptions occur etc. I also need the location of these events in the code to be able to display the gathered information in the UI next to the original code.
To achieve this, I instrument the code and insert callbacks into it. I then execute the code, wrapping the execution in a try-catch block. Whenever the callback is called with some data about the execution (e.g. a variable change) I send a message to the main process telling it about the change. This way, the user is notified about the execution of the code, while it is running. The location information for the events generated by these callbacks is added to the callback call during the instrumentation, so that is not a problem.
The problem appears, when an exception occurs. I also want to notify the user about exceptions in the tested code. Therefore, I wrapped the execution of the code in a try-catch and any exceptions that get out of the execution are caught and send to the user interface. But the location of the errors is not correct. An Error object created by node.js has a complete call stack so it knows where it occurred. But this location if relative to the instrumented code, so I cannot use this location information as is, to display the error next to the original code. I need to transform this location in the instrumented code into a location in the original code. To do so, after instrumenting the code, I calculate a source map to map locations in the instrumented code to locations in the original code. However, this calculation might take several seconds. So, I figured, I would start a child process to calculate the source map, while the execution of the instrumented code is already started. Then, when an exception occurs, I check whether the source map has already been calculated, and if it hasn't I wait for the calculation to finish to be able to correct the location.
Since the code to be executed and watched can be completely arbitrary I cannot trivially rewrite it to be asynchronous. I only know that it calls the provided callback, because I instrumented the code to do so. I also cannot just store the message and return to continue the execution of the code, checking back during the next call whether the source map has been finished, because continuing the execution of the code would also block the event-loop, preventing the calculated source map from ever being received in the execution process. Or if it is received, then only after the code to execute has completely finished, which could be quite late or never (if the code to execute contains an infinite loop). But before I receive the sourceMap I cannot send further updates about the execution state. Combined, this means I would only be able to send the corrected progress messages after the code to execute has finished (which might be never) which completely defeats the purpose of the program (to enable the programmer to watch what the code does, while it executes).
Temporarily surrendering control to the event loop would solve this problem. However, that does not seem to be possible. The other idea I have is to introduce a third process which controls both the execution process and the sourceMapGeneration process. It receives progress messages from the execution process and if any of the messages needs correction it waits for the sourceMapGeneration process. Since the processes are independent, the controlling process can store the received messages and wait for the sourceMapGeneration process while the execution process continues executing, and as soon as it receives the source map, it corrects the messages and sends all of them off.
However, this would not only require yet another process (overhead) it also means I have to transfer the code once more between processes and since the code can have thousands of line that in itself can take some time, so I would like to move it around as little as possible.
I hope this explains, why I cannot and didn't use the usual "asynchronous callback" approach.

Adding a third ( :) ) solution to your problem after you clarified what behavior you seek I suggest using Fibers.
Fibers let you do co-routines in nodejs. Coroutines are functions that allow multiple entry/exit points. This means you will be able to yield control and resume it as you please.
Here is a sleep function from the official documentation that does exactly that, sleep for a given amount of time and perform actions.
function sleep(ms) {
var fiber = Fiber.current;
setTimeout(function() {
fiber.run();
}, ms);
Fiber.yield();
}
Fiber(function() {
console.log('wait... ' + new Date);
sleep(1000);
console.log('ok... ' + new Date);
}).run();
console.log('back in main');
You can place the code that does the waiting for the resource in a function, causing it to yield and then run again when the task is done.
For example, adapting your example from the question:
var pausedExecution, importantData;
function getImportantData() {
while (importantData === undefined) {
pausedExecution = Fiber.current;
Fiber.yield();
pausedExecution = undefined;
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have proper data now
return importantData;
}
}
function callback(partialDataMessage) {
if (partialDataMessage.needsCorrection) {
var theData = getImportantData();
// use data to correct message
process.send(correctedMessage); // send corrected result to main process
} else {
process.send(partialDataMessage); // send unmodified result to main process
}
}
function executeCode(code) {
// setup child process to calculate the data
importantDataCalculator = fork("./runtime");
importantDataCalculator.on("message", function (msg) {
if (msg.type === "result") {
importantData = msg.data;
} else if (msg.type === "error") {
importantData = null;
} else {
throw new Error("Unknown message from dataGenerator!");
}
if (pausedExecution) {
// execution is waiting for the data
pausedExecution.run();
}
});
// wrap the execution of the code in a Fiber, so it can be paused
Fiber(function () {
runCodeWithCallback(code, callback); // the callback will be called from time to time when the code produces new data
// this callback is synchronous and blocking,
// but it will yield control to the event loop if it has to wait for the child-process to finish
}).run();
}
Good luck! I always say it is better to solve one problem in 3 ways than solving 3 problems the same way. I'm glad we were able to work out something that worked for you. Admittingly, this was a pretty interesting question.

The rule of asynchronous programming is, once you've entered asynchronous code, you must continue to use asynchronous code. While you can continue to call the function over and over via setImmediate or something of the sort, you still have the issue that you're trying to return from an asynchronous process.
Without knowing more about your program, I can't tell you exactly how you should structure it, but by and large the way to "return" data from a process that involves asynchronous code is to pass in a callback; perhaps this will put you on the right track:
function getImportantData(callback) {
importantDataCalculator = fork("./runtime");
importantDataCalculator.on("message", function (msg) {
if (msg.type === "result") {
callback(null, msg.data);
} else if (msg.type === "error") {
callback(new Error("Data could not be generated."));
} else {
callback(new Error("Unknown message from sourceMapGenerator!"));
}
});
}
You would then use this function like this:
getImportantData(function(error, data) {
if (error) {
// handle the error somehow
} else {
// `data` is the data from the forked process
}
});
I talk about this in a bit more detail in one of my screencasts, Thinking Asynchronously.

What you are running into is a very common scenario that skilled programmers who are starting with nodejs often struggle with.
You're correct. You can't do this the way you are attempting (loop).
The main process in node.js is single threaded and you are blocking the event loop.
The simplest way to resolve this is something like:
function getImportantData() {
if(importantData === undefined){ // not set yet
setImmediate(getImportantData); // try again on the next event loop cycle
return; //stop this attempt
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have a proper data now
return importantData;
}
}
What we are doing, is that the function is re-attempting to process the data on the next iteration of the event loop using setImmediate.
This introduces a new problem though, your function returns a value. Since it will not be ready, the value you are returning is undefined. So you have to code reactively. You need to tell your code what to do when the data arrives.
This is typically done in node with a callback
function getImportantData(err,whenDone) {
if(importantData === undefined){ // not set yet
setImmediate(getImportantData.bind(null,whenDone)); // try again on the next event loop cycle
return; //stop this attempt
}
if (importantData === null) {
err("Data could not be generated.");
} else {
// we should have a proper data now
whenDone(importantData);
}
}
This can be used in the following way
getImportantData(function(err){
throw new Error(err); // error handling function callback
}, function(data){ //this is whenDone in our case
//perform actions on the important data
})

Your question (updated) is very interesting, it appears to be closely related to a problem I had with asynchronously catching exceptions. (Also Brandon and Ihad an interesting discussion with me about it! It's a small world)
See this question on how to catch exceptions asynchronously. The key concept is that you can use (assuming nodejs 0.8+) nodejs domains to constrain the scope of an exception.
This will allow you to easily get the location of the exception since you can surround asynchronous blocks with atry/catch. I think this should solve the bigger issue here.
You can find the relevant code in the linked question. The usage is something like:
atry(function() {
setTimeout(function(){
throw "something";
},1000);
}).catch(function(err){
console.log("caught "+err);
});
Since you have access to the scope of atry you can get the stack trace there which would let you skip the more complicated source-map usage.
Good luck!

How to populate mongoose with a large data set

I'm attempting to load a store catalog into MongoDb (2.2.2) using Node.js (0.8.18) and Mongoose (3.5.4) -- all on Windows 7 64bit. The data set contains roughly 12,500 records. Each data record is a JSON string.
My latest attempt looks like this:
var fs = require('fs');
var odir = process.cwd() + '/file_data/output_data/';
var mongoose = require('mongoose');
var Catalog = require('./models').Catalog;
var conn = mongoose.connect('mongodb://127.0.0.1:27017/sc_store');
exports.main = function(callback){
var catalogArray = fs.readFileSync(odir + 'pc-out.json','utf8').split('\n');
var i = 0;
Catalog.remove({}, function(err){
while(i < catalogArray.length){
new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
if(err){
console.log(err);
} else {
i++;
}
});
if(i === catalogArray.length -1) return callback('database populated');
}
});
};
I have had a lot of problems trying to populate the database. Under previous scenarios (and this one), node pegs the processor and eventually runs out of memory. Note that in this scenario, I'm trying to allow Mongoose to save a record, and then iterate to the next record once the record saves.
But the iterator inside of the Mongoose save function never gets incremented. In addition, it never throws any errors. But if I put the iterator (i) outside of the asynchronous call to Mongoose, it will work, provided the number of records that I try to load are not too big (I have successfully loaded 2,000 this way).
So my questions are: Why isn't the iterator inside of the Mongoose save call ever incremented? And, more importantly, what is the best way to load a large data set into MongoDb using Mongoose?
Rob

i is your index to where you're pulling input data from in catalogArray, but you're also trying to use it to keep track of how many have been saved which isn't possible. Try tracking them separately like this:
var i = 0;
var saved = 0;
Catalog.remove({}, function(err){
while(i < catalogArray.length){
new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
saved++;
if(err){
console.log(err);
} else {
if(saved === catalogArray.length) {
return callback('database populated');
}
}
});
i++;
}
});
UPDATE
If you want to add tighter flow control to the process, you can use the async module's forEachLimit function to limit the number of outstanding save operations to whatever you specify. For example, to limit it to one outstanding save at a time:
Catalog.remove({}, function(err){
async.forEachLimit(catalogArray, 1, function (catalog, cb) {
new Catalog(JSON.parse(catalog)).save(function (err, doc) {
if (err) {
console.log(err);
}
cb(err);
});
}, function (err) {
callback('database populated');
});
}

Rob,
The short answer:
You created an infinite loop. You're thinking synchronously and with blocking, Javascript functions asynchronously and without blocking. What you are trying to do is like trying to directly turn the feeling of hunger into a sandwich. You can't. The closest thing is you use the feeling of hunger to motivate you to go to the kitchen and make it. Don't try to make Javascript block. It won't work. Now, learn async.forEachLimit. It will work for what you want to do here.
You should probably review asynchronous design patterns and understand what it means on a deeper level. Callbacks are not simply an alternative to return values. They are fundamentally different in how and when they are executed. Here is a good primer: http://cs.brown.edu/courses/csci1680/f12/handouts/async.pdf
The long answer:
There is an underlying problem here, and that is your lack of understanding of what non-blocking IO and asynchronous means. Im not sure if you are breaking into node development, or this is just a one-off project, but if you do plan to continue using node (or any asynchronous language) then it is worth the time to understand the difference between synchronous and asynchronous design patterns, and what motivations there are for them. So, that is why you have a logic error putting the loop invariant increment inside an asynchronous callback which is creating an infinite loop.
In non-computer science, that means that your increment to i will never occur. The reason is because Javascript executes a single block of code to completion before any asynchronous callbacks are called. So in your code, your loop will run over and over, without i ever incrementing. And, in the background, you are storing the same document in mongo over and over. Each iteration of the loop starts sending document with index 0 to mongo, the callback can't fire until your loop ends, and all other code outside the loop runs to completion. So, the callback queues up. But, your loop runs again since i++ is never executed (remember, the callback is queued until your code finishes), inserting record 0 again, queueing another callback to execute AFTER your loop is complete. This goes on and on until your memory is filled with callbacks waiting to inform your infinite loop that document 0 has been inserted millions of times.
In general, there is no way to make Javascript block without doing something really really bad. For example, something paramount to setting your kitchen on fire to fry some eggs for that sandwich I talked about in the "short answer".
My advice is to take advantage of libs like async. https://github.com/caolan/async JohnnyHK mentioned it here, and he was correct for doing so.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string