Debug a stack overflow exception with nodejs - node.js

I'm parsing a large amount of files using nodejs. In my process, I'm parsing audio files, video files and than the rest.
The function to parse files looks like this :
/**
* #param arr : array of files objects (path, ext, previous directory)
* #param cb : the callback when every object is parsed,
* objects are then throwed in a database
* #param others : the array beeing populated by matching objects
**/
var parseOthers = function(arr, cb, others) {
others = others === undefined ? [] : others;
if(arr.length == 0)
return cb(others); //should be a nextTick ?
var e = arr.shift();
//do some tests on the element and add it
others.push(e);
//Then call next tested callImediate and nextTick according
//to another stackoverflow questions with no success
return parseOthers(arr, cb, others);
});
Full code here (care it's a mess)
Now with about 3565 files (not so much) the script catch a "RangeError: Maximum call stack size exceeded" exception, with no trace.
What have I tried :
I've tried to debug it with node-inspector and node debug script, but it never hangs as if it was running without debugging (does debugging increase the stack ?).
I've tried with process.on('uncaughtException') to catch the exception with no success.
I've got no memory leak.
How may I found an exception trace ?
Edit 1
Increasing the --stack_size seams to work pretty well. Isn't there another way of preventing this ?
(about 1300 there)
Edit 2
According to :
$ node --v8-options | grep -B0 -A1 stack_size
The default stack size (in kBytes) is 984.
Edit 3
A few more explanations :
I'm never reading this type of files itselves
I'm working here on an array of paths, I don't parse folders recursively
I'm looking at the path and checking if it's already stored in the database
My guess is that the populated array becomes to big for nodejs, but memory looks fine and that's weird...

Most Stackoverflow situations are not easy or sometimes possible to debug. Even if you debug on the problem, you may not find the trigger.
But I can suggest you a way to share the task load easily (including the queue management):
JXcore (a multithreaded fork on Node.JS) would suit better in your case. Simply create a task pool and set a task method handling 1 file at a time. It will manage your queue 1 by 1 multithreaded.
var myTask = function ( args here )
{
logic here
}
for(var i=0;i<LIST_OF_THE_FILES;i++)
jxcore.tasks.addTask( myTask, paramshere, optional callback ...
OR in case the logic definition is out of the scope of a single method;
var myTask = function ( args here )
{
require('mytasketc.js').handleTask(args here);
}
for(var i=0;i<LIST_OF_THE_FILES;i++)
jxcore.tasks.addTask( myTask, paramshere, optional callback ...
Remarks
Every single thread has its own V8 memory limit.
The context among the threads are separated
Make sure the task method closes the file in the end
Link
You can find more on multithreaded Javascript tasks

You getting this error because of recursion. Reformat your code to do not use it, especially because this peace of code really don't need it. Here is just APPROXIMATE example, to show you how better to do it:
var parseElems = function(arr, cb) {
var result = [];
arr.forEach(function (el) {
//do some tests on the element (el)
result.push(el);
});
cb(result);
});

Related

how to set a timeout with node for redis request

I've written a simple service using redis to store data in memory or fetch from disc and then store in memory. I'm trying to now control for rare cases where fetching to redis is slow. I've seen one example (https://gist.github.com/stockholmux/3a4b2d1480f27df8be67#file-timelimitedredis-js) which appears to solve this problem but I've had trouble implementing.
The linked implementation is:
/**
* returns a function that acts like the Redis command indicated by cmd except that it will time out after a given number of milliseconds
*
* #param {string} cmd The redis commmand to execute ('get','hset','sort', etc.)
* #param {integer} timeLimit The number of milliseconds to wait until returning an error to the callback.
*
*/
function timeLimited(cmd, timeLimit) {
return function() {
var
argsAsArr = Array.prototype.slice.call(arguments),
cb = argsAsArr.pop(),
timeoutHandler;
timeoutHandler = setTimeout(function(){
cb(new Error('Redis timed out'));
cb = function() {};
}, timeLimit);
argsAsArr.push(function(err, values){
clearTimeout(timeoutHandler);
cb(err,values);
});
client[cmd].apply(client,argsAsArr);
};
}
however I don't understand how to implement this because client is never defined and the the redis key/value are never passed in. Could someone explain a little about how one could go about implementing this example? I've been searching for more information or a working example but not had any luck so far. Thank you.
This isn't very clearly written but when you call it with cmd (eg. SET, HSET, etc) and time limit it returns a function. You call this returned function with the values. I don't know where client comes from, I guess you need to have it in scope. This isn't very good code, I would suggest posting what you've written and asking how to achieve what you want with that.

Node.js Spawning multiple threads within a class method

How can I run a single method multiple times multi-threaded when called as a method of a class?
At first I tried to use the cluster module, but I realize it just re-runs the whole process from the start, rightfully so.
How can I achieve something like what's outlined below?
I want a class's method to spawn n processes, and when the parallel tasks are completed, I can resolve a promise which the method returns.
The problem with the code below is that calling cluster.fork() will fork index.js process.
index.js
const Person = require('./Person.js');
var Mary = new Person('Mary');
Mary.run(5).then(() => {...});
console.log('I should only run once, but I am called 5 times too many');
Person.js
const cluster = require('cluster');
class Person{
run(distance){
var completed = 0;
return new Promise((resolve, reject) => {
for(var i = 0; i < distance; i++) {
// run a separate process for each
cluster.fork().send(i).on('message', message => {
if (message === 'completed') { ++completed; }
if (completed === distance) { resolve(); }
});
}
});
}
}
I think the short answer is impossible. It's even worse - this has nothing to do with js. To multi (process or thread) in your particular problem you will essentially need a copy of the object in every thread, since it needs (maybe) access to fields - in this case you would need to either initialize it in every thread or share memory. That last one I don't think is provided in cluster, and not trivial in other languages in every use case.
If the calculation is independent of the Person I suggest you extract it, and use the usual (in index.js):
if(cluster.isWorker) {
//Use the i for calculation
} else {
//Create Person, then fork children in for loop
}
You then collect the results and change the Person as needed. You will be copying index.js, but this is standard and you only run what you need.
The problem is if results are dependent on Person. If these are constant for all i you can still send them to your forks independently. Otherwise what you have is the only way to fork. In general forking in cluster is not meant for methods, but for the app itself, which is the standard forking behavior.
Another solution
Following your comment, I suggest you checkout child_process.execFile or child_process.exec on same file.
This way you can spawn a totally independent process on the fly. Now instead of calling cluster.fork you call execFile. You can use either the exit code or stdout as return values (stderr etc.). Promise is now replaced with:
var results = []
for(var i = 0; i < distance; i++) {
// run a separate process for each
results.push(child_process.execFile().child.execFile('node', 'mymethod.js`,i]));
}
//... catch the exit event from all results or return a callback using results.
Inside mymethod.js Have your code that takes i and returns what you want either in the exit code or through stdout, both properties of the returned child_process. This is a bit un-node.js-y since you're waiting on asynchronous calls, but you're requirements are non standard. Since I'm not sure how you use this perhaps returning a callback with the array is a better idea.

Modifying current reference causes maximum stack size exceeded crash

In node js, using version 4.1.0 of the 'firebase-admin' SDK, I have a listener which listens to a message queue reference in my database, processes messages, and thereafter tries to remove it from the queue reference.
When I have greater than a certain number of records (1354 on my machine) in the queue prior to starting the script, the script crashes with a maximum call stack exceeded error.
The strange thing is that this only occurs when I have 1354+ values in the queue prior to script start. Any lower than this and the problem vanishes.
I don't know why this is happening, but I know that it only occurs when I try to modify/remove the object at the snapshot reference.
Here is a self-contained mcve with the problem area marked in the comments:
var admin = require("firebase-admin");
var serviceAccount = require("<ADMIN JSON FILE PATH GOES HERE>");
admin.initializeApp({
credential: admin.credential.cert(serviceAccount),
databaseURL: "<FIREBASE URL GOES HERE>"
});
var ref = admin.database().ref();
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// the number of messages to generate for the queue. when this is >= 1354 (on my machine) the program crashes, if it's less than that,
// it works perfectly fine; your tipping point may vary
var amount = 1354;
// message payload to deliver to the queue <amount> times
var payload = {};
// message generation loop
for (i = 0; i < amount; i++) {
var message = {msg: "hello"};
payload['message-queue/' + ref.push().key] = message;
}
// add the generated messages simultaneously to message-queue
ref.update(payload).then(function () {
// 'on child added' listener that causes the crash of the program when there are 1354+ pre-existing messages in the queue prior to application start
ref.child('message-queue').on('child_added', function(snapshot) {
var msgKey = snapshot.key;
var msgContents = snapshot.val().msg
// do something with msgContents (e.g. sanitize message and deliver to some user's message-received node in the firebase)
// ***THIS*** is what causes the crash. if you remove this line of code, the program does not crash. it seems that any
// modification/removal to/of the current <msgKey> node does the same
ref.child('message-queue').child(msgKey).remove();
});
});
And here is the stack trace of the crash:
FIREBASE WARNING: Exception was thrown by user callback. RangeError: Maximum call stack size exceeded
at RegExp.exec (native)
at RegExp.test (native)
at tc (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:63:86)
at ub (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:60:136)
at vb (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:43:1228)
at Xb.h.remove (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:52:44)
at Xb.h.remove (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:52:136)
at Xb.h.remove (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:52:136)
at Xb.h.remove (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:52:136)
at Xb.h.remove (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:52:136)
<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:63
(d="0"+d),c+=d;return c.toLowerCase()}var zc=/^-?\d{1,10}$/;function tc(a){retur
n zc.test(a)&&(a=Number(a),-2147483648<=a&&2147483647>=a)?a:null}function Ac(a){
try{a()}catch(b){setTimeout(function(){N("Exception was thrown by user callback.
",b.stack||"");throw b;},Math.floor(0))}}function Bc(a,b,c){Object.definePropert
y(a,b,{get:c})}function Cc(a,b){var c=setTimeout(a,b);"object"===typeof c&&c.unr
ef&&c.unref();return c};function Dc(a){var b={},c={},d={},e="";try{var f=a.split
("."),b=bb(hc(f[0])||""),c=bb(hc(f[1])||""),e=f[2],d=c.d||{};delete c.d}catch(g)
{}return{wg:b,Ge:c,data:d,mg:e}}function Ec(a){a=Dc(a);var b=a.Ge;return!!a.mg&&
!!b&&"object"===typeof b&&b.hasOwnProperty("iat")}function Fc(a){a=Dc(a).Ge;retu
rn"object"===typeof a&&!0===y(a,"admin")};function Gc(a,b,c){this.type=Hc;this.s
ource=a;this.path=b;this.children=c}Gc.prototype.Jc=function(a){if(this.path.e()
)return a=this.children.sub
RangeError: Maximum call stack size exceeded
at RegExp.exec (native)
at RegExp.test (native)
at tc (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:63:86)
at ub (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:60:136)
at vb (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:43:1228)
at Xb.h.remove (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:52:44)
at Xb.h.remove (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:52:136)
at Xb.h.remove (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:52:136)
at Xb.h.remove (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:52:136)
at Xb.h.remove (<MY_PROJECT_PATH>\node_modules\firebase-admin\lib\database\database.js:52:136)
<MY_PROJECT_PATH>>
<MY_PROJECT_PATH>>
<MY_PROJECT_PATH>>
Even though you aren't processing it, the call to remove() is still async/promise-based and generates a context to run in. Promise contexts are fairly big and it's not a surprise you're running out of stack here. If you really needed a pattern like this to work properly, you could batch the updates - have child_added insert the values into a "to be deleted" array, then process that array a batch of entries at a time as a separate task until it was empty. There are plenty of helper methods for working with arrays and Promises in the BlueBird (http://bluebirdjs.com/) library that could help with this (e.g. map/mapSeries).
This isn't really a Firebase problem - every other VM (PHP, Java, etc.) has stack size limits to deal with as well. Like most others, V8's is tunable, and if you need to, you can query (and adjust) it using a command like:
node --v8-options | grep -B0 -A1 stack_size
But I believe your best approach is to structure your program to minimize your stack usage for this deletion pattern. Increasing stack size is always going to leave you open to the "is it big enough now?" question.
Minmize the memory allocated to stack.for example static array inside function instead of that use dynamic array.

fs.writeFile callback never gets called, same for WritableStream.write, etc

I am writing a small text file (~500B) but, strangely, I get an empty file if I write using asynchronous methods such as fs.writeFile(..) (or WriteableStream's write/end method).
This works:
var scanInfo = getScanInfo( core ); // returns several lines delimited by \r\n
fs.writeFileSync( filename, scanInfo, 'ascii' );
This creates empty file and the callback function never produces any output:
var scanInfo = getScanInfo( core );
scanInfo.push('') ;
scanInfo = scanInfo.join(DOS_CRLF);
fs.writeFile( filename, scanInfo, 'ascii', function ( err ) {
if(err) { console.error('Failed'); console.error(err) ; }
else { console.log('OK'); }
});
I was looking for similar posts but in the one I found the problem was something else (calling another function returning the content) but my content is a text string (verified by debugging).
The similar post: fs.writeFile() doesn't return callback
Platform> Win8.1 x64
NodeJS> x64 0.12.0
P.S. The application using the function that is actually writing the file was written in a "plain nodejs" style using callbacks but as it got more complicated I rewrote the main processing stream using Q and Q-IO.
So now the processing starts like this:
(in the main module)
var qfs = require('q-io/fs') ;
...
qfs.read( configFile )
.then( doSomeConfig )
.then( function( config ) {
var promise = qfs.read( config.inputFile, someOptions );
return promise ;
})
.then( processMyInputData /* (binaryData) returns {Core} */ )
.then( writeMyOutputData /* (core) returns {undefined} */ )
.fail( reportSomeErrors /* (reason) returns {undefined} */ )
.done( reportFinished ) ;
The point is that in the main stream the fail function never reports any problem, either. Function reportFinished() reports that everything was OK and there is no place to throw any exception because the original snippet above, which is a function located in another module and called as part of writeMyOutputData( core ) never gets to call the callback and therefore it is not possible to do any exception throwing or any kind of error processing.
However, after reading Joseph's comment that it works for him I suspect there might be some interference between the standard fs module and q-io/fs
OK, after careful deugging problem identified. As Joseph mentioned, not related to fs.writeFile() at all.
In my application there are in fact two file writes running "concurrently". The one listed in my question and another one, writing data progressively as it calculates some averages.
The other, progressively writing function, had a bug (misspelled variable name), causing a Reference Error to be thrown in the course of action (in between successive writes). This exception, for some reason I do not quite understand, did not appear anywhere in the chain. According to Q documentation, Promise.done() should throw any unhandled exceptions, but this was not the case.
After I added several fail() handlers in the promise chain, I was able to locate the bug and achieve reasonable behavior of the whole application.
The error is therefore related to bad programming style (not handling exceptions properly) rather than fs module. However, I can't believe that there could be such thing as unhandled exception that can get lost and never appear in the daylight. Also I can hardly believe that an exception in asynchronous operation B can affect another, non-related asynchronous operation A.
I had a similar issue with fs.stat
The issue was that i was writing a grunt task and the task didn't know it was asynchronous so the synchronous code finished and just terminated the application before the fs.stat callback could be called.
This is probably not your issue, but it might help others.
Making a grunt task can be done like this:
Wait async grunt task to finish

NodeJS fs API: Detect Asynchronous Completion

I have a NodeJS application which uses the fs API to read files from a directory tree. I'm using the fs-walk module to walk the tree. For every sub directory encountered, the same function executes again to handle it. (I don't think this is recursion; rather, the same function is bound to an event which is fired each time a directory is handled.) Files are handled by a different function, which does stuff to them.
I'd like to execute arbitrary code once all files have been read without using synchronous or blocking code. I couldn't find any way to keep track of the number of files in a directory (to count down, for instance), nor could I find any attribute in fs.stat to indicate that the entire operation has completed.
Had anyone found a way to do this yet? I could find nothing in the node docs or on stack overflow.
After reviewing the fs-walk library a little closer, it looks like the third argument to the walk() method is actually a final callback. Internally they are using the async library, specifically async.whilst() and async.waterfall() methods which will execute the final callback when everything is complete.
I think the intention of the library creator is for that final callback to be executed when all async actions are completed. If that isn't working, you may want to file an issue in Github for it:
According to the code, you should be able to do:
var walk = require('fs-walk';
walk('/some/dir', someFileOrDirHandler, function(err) {
// This should be a final callback, if the first argument is present,
// then there was an error
if (err) {
/* handle it */
return;
}
// Getting here indicates success
});
As a compromise in performance, I ended up doing a total file count using a recursive function that accessed the file system synchronously. Using the total, I then accessed all the files asynchronously, decrementing the total each time. Once the total reached zero, I executed a function to handle all of the completed data.
var countAllFiles = new Promise(function (resolve, reject) {
var total = 0,
count = function (path) {
var contents = fs.readdirSync(path), file, name;
for (file in contents) {
if (!contents.hasOwnProperty(file)) continue;
name = path + '/' + contents[file];
if (fs.statSync(name).isDirectory())
count(name);
else
++total;
}
};
count('/path/to/tree/');
resolve(total);
}).then(function (total) {
walk.dirs('/path/to/tree/', handlerFunction, errorHandler);
// for every file, decrement total. Then, if it's zero, execute the code that
// depends on all the read/write operations being complete
});

Resources