Bytes sent/received for Node.js HTTP request - node.js

Once an HTTP request has been served, I would like to log the number of bytes sent/received.
A simple source for this data is req.connection.bytesRead/.bytesWritten. However, this is problematic for HTTP 1.1 keep-alive connections, as the same socket can be used for multiple requests. I need to log per-request, not per-connection.
The solution must lie on the HTTP side of things, but I see no methods documented for getting the data I need.
What is the proper way to calculate bytes read/written for HTTP requests served by Node.js's http.Server?

Unfortunately, I never found a proper way to do this. I've resorted some fairly terrible duck punching, but it works for my particular use case. In case anyone else stumbles along with this problem, you can start with this and refine from there.
Module #1: "Extra Events"
All this module does is make the response object emit a finishBeforeSocketDestroy event. Since I needed this event in a few places in my application, I effectively made a separate module just for this duck punch. app.use() it before Module #2.
module.exports = function (req, res, next) {
var end = res.end;
res.end = function () {
res.end = end;
res.emit('finishBeforeSocketDestroy');
res.end.apply(this, arguments);
}
next();
}
Module #2: "Stats"
This module creates a req.stats object, containing all sorts of useful goodies for tracking bandwidth usage during usage of the connection, and after it is finished.
var pollTime = 1000;
module.exports = function (req, res, next) {
var pollInterval;
function pollStats () {
if (typeof req.stats._lastMeasuredTime === 'object') {
var secondsSinceLastMeasurement = ((new Date() - req.stats._lastMeasuredTime) / 1000);
req.stats.averageRate = {
read: (req.socket.bytesRead - req.stats.bytesRead) / secondsSinceLastMeasurement,
write: (req.socket.bytesWritten - req.stats.bytesWritten) / secondsSinceLastMeasurement
};
}
req.stats._lastMeasuredTime = new Date();
req.stats.bytesRead = req.socket.bytesRead;
req.stats.bytesWritten = req.socket.bytesWritten;
}
req.stats = {
startTime: new Date(),
endTime: null,
averageRate: {read: null, write: null},
bytesRead: req.socket.bytesRead,
bytesWritten: req.socket.bytesWritten,
_lastMeasuredTime: new Date()
};
pollInterval = setInterval(pollStats, pollTime);
res.on('finishBeforeSocketDestroy', function () {
clearInterval(pollInterval);
pollStats();
req.stats.endTime = new Date();
});
next();
}
Like I said... messy. I'm only posting it as duck punching may be your only option. Also beware that socket may get re-used for multiple HTTP requests, which could cause you to double-count some bytes if you're not careful.

Just store traffic value after each response and calculate difference in 'finish' or 'end' handler:
// server.onRequest:
...
req._prevBytesWritten = 0;
// response.onFinish/onEnd:
...
responseLen = req.socket.bytesWritten - req._prevBytesWritten;
req._prevBytesWritten = req.socket.bytesWritten;

Related

Express.js - while loop before sending response

I'm trying to implement and existing solution in node.js, specifically, using express.js framework. Now, the existing solution works as follows:
server exposes a GET service that clients can connect to
when a client calls the GET service, the client number increments (a global variable) and then the number of clients is checked;
if there are not at least 3 clients connected, the service is in endless loop, waiting for other clients to connect
if (or rather, when) the rest of the two clients connect, the service sends respond to everyone that enough clients are connected (a 'true' value).
So what basically happens is, the client connects and the connection is active (in a loop) until enough clients connect, then and only then there is a response (to all clients at the same time).
Now I'm not expert in these architectures, but from what I think, this is not a correct or good solution. My initial thought was: this must be solved with sockets. However, since the existing solution works like that (it's not written in node.js), I tried to emulate such behaviour:
var number = (function(){
var count = 0;
return {
increase: function() {
count++;
},
get: function(){
return count;
}
};
})();
app.get('/test', function(req, res){
number.increase();
while (number.get() < 3) {
//hold it here, until enough clients connect
}
res.json(number.get());
});
Now while I think that this is not a correct solution, I have a couple of questions:
Is there any alternative to solving this issue, besides using sockets?
Why does this "logic" work in C#, but not in express.js? The code above hangs, no other request is processed.
I know node.js is single-threaded, but what if we have a more conventional service that responds immediately, and there are 20 requests all at the same time?
I would probably use an event emitter for this:
var EventEmitter = require('events').EventEmitter;
var emitter = new EventEmitter();
app.get('/', function(req, res) {
// Increase the number
number.increase();
// Get the current value
var current = number.get();
// If it's less than 3, wait for the event emitter to trigger.
if (current < 3) {
return emitter.once('got3', function() {
return res.json(number.get());
});
}
// If it's exactly 3, emit the event so we wake up other listeners.
if (current === 3) {
emitter.emit('got3');
}
// Fall through.
return res.json(current);
});
I would like to stress that #Plato is correct in stating that browsers may timeout when a response takes too much time to complete.
EDIT: as an aside, some explanation on the return emitter.once(...).
The code above can be rewritten like so:
if (current < 3) {
emitter.once('got3', function() {
res.json(number.get());
});
} else if (current === 3) {
emitter.emit('got3');
res.json(number.get());
} else {
res.json(number.get());
}
But instead of using those if/else statements, I return from the request handler after creating the event listener. Since request handlers are asynchronous, their return value is discarded, so you can return anything (or nothing). As an alternative, I could also have used this:
if (current < 3) {
emitter.once(...);
return;
}
if (current === 3) {
...etc...
Also, even though you return from the request handler function, the event listener is still referencing the res variable, so the request handler scope is maintained by Node until res.json() in the event listener callback is called.
Your http approach should work
You are blocking the event loop so node refuses to do any other work while it is in the while loop
You're really close, you just need to check every now and then instead of constantly. I do this below with process.nextTick() but setTimeout() would also work:
var number = (function(){
var count = 0;
return {
increase: function() {
count++;
},
get: function(){
return count;
}
};
})();
function waitFor3(callback){
var n = number.get();
if(n < 3){
setImmediate(function(){
waitFor3(callback)
})
} else {
callback(n)
}
}
function bump(){
number.increase();
console.log('waiting');
waitFor3(function(){
console.log('done');
})
}
setInterval(bump, 2000);
/*
app.get('/test', function(req, res){
number.increase();
waitFor3(function(){
res.json(number.get());
})
});
*/

How to use filesystem's createReadStream with Meteor router(NodeJS)

I need to allow the user of my app to download a file with Meteor. Currently what I do is when the user requests to download a file I enter into a "fileRequests" collection in Mongo a document with the file location and a timestamp of the request and return the ID of the newly created request. When the client gets the new ID it imediately goes to mydomain.com/uploads/:id. I then use something like this to intercept the request before Meteor does:
var connect = Npm.require("connect");
var Fiber = Npm.require("fibers");
var path = Npm.require('path');
var fs = Npm.require("fs");
var mime = Npm.require("mime");
__meteor_bootstrap__.app
.use(connect.query())
.use(connect.bodyParser()) //I add this for file-uploading
.use(function (req, res, next) {
Fiber(function() {
if(req.method == "GET") {
// get the id here, and stream the file using fs.createReadStream();
}
next();
}).run();
});
I check to make sure the file request was made less than 5 seconds ago, and I immediately delete the request document after I've queried it.
This works, and is secure(enough) I think. No one can make a request without being logged in and 5 seconds is a pretty small window for someone to be able to highjack the created request URL but I just don't feel right with my solution. It feels dirty!
So I attempted to use Meteor-Router to accomplish the same thing. That way I can check if they're logged in correctly without doing the 5 second open to the world trickery.
So here's the code I wrote for that:
Meteor.Router.add('/uploads/:id', function(id) {
var path = Npm.require('path');
var fs = Npm.require("fs");
var mime = Npm.require("mime");
var res = this.response;
var file = FileSystem.findOne({ _id: id });
if(typeof file !== "undefined") {
var filename = path.basename(file.filePath);
var filePath = '/var/MeteorDMS/uploads/' + filename;
var stat = fs.statSync(filePath);
res.setHeader('Content-Disposition', 'attachment; filename=' + filename);
res.setHeader('Content-Type', mime.lookup(filePath));
res.setHeader('Content-Length', stat.size);
var filestream = fs.createReadStream(filePath);
filestream.pipe(res);
return;
}
});
This looks great, fits right in with the rest of the code and is easy to read, no hacking involved, BUT! It doesn't work! The browser spins and spins and never quite knows what to do. I have ZERO error messages coming up. I can keep using the app on other tabs. I don't know what it's doing, it never stops "loading". If I restart the server, I get a 0 byte file with all the correct headers, but I don't get the data.
Any help is greatly appreciated!!
EDIT:
After digging around a bit more, I noticed that trying to turn the response object into a JSON object results in a circular structure error.
Now the interesting thing about this is that when I listen to the filestream for the "data" event, and attempt to stringify the response object I don't get that error. But if I attempt to do the same thing in my first solution(listen to "data" and stringify the response) I get the error again.
So using the Meteor-Router solution something is happening to the response object. I also noticed that on the "data" event response.finished is flagged as true.
filestream.on('data', function(data) {
fs.writeFile('/var/MeteorDMS/afterData', JSON.stringify(res));
});
The Meteor router installs a middleware to do the routing. All Connect middleware either MUST call next() (exactly once) to indicate that the response is not yet settled or MUST settle the response by calling res.end() or by piping to the response. It is not allowed to do both.
I studied the source code of the middleware (see below). We see that we can return false to tell the middleware to call next(). This means we declare that this route did not settle the response and we would like to let other middleware do their work.
Or we can return a template name, a text, an array [status, text] or an array [status, headers, text], and the middleware will settle the response on our behalf by calling res.end() using the data we returned.
However, by piping to the response, we already settled the response. The Meteor router should not call next() nor res.end().
We solved the problem by forking the Meteor router and making a small change. We replaced the else in line 87 (after if (output === false)) by:
else if (typeof(output)!="undefined") {
See the commit with sha 8d8fc23d9c in my fork.
This way return; in the route method will tell the router to do nothing. Of course you already settled the response by piping to it.
Source code of the middleware as in the commit with sha f910a090ae:
// hook up the serving
__meteor_bootstrap__.app
.use(connect.query()) // <- XXX: we can probably assume accounts did this
.use(this._config.requestParser(this._config.bodyParser))
.use(function(req, res, next) {
// need to wrap in a fiber in case they do something async
// (e.g. in the database)
if(typeof(Fiber)=="undefined") Fiber = Npm.require('fibers');
Fiber(function() {
var output = Meteor.Router.match(req, res);
if (output === false) {
return next();
} else {
// parse out the various type of response we can have
// array can be
// [content], [status, content], [status, headers, content]
if (_.isArray(output)) {
// copy the array so we aren't actually modifying it!
output = output.slice(0);
if (output.length === 3) {
var headers = output.splice(1, 1)[0];
_.each(headers, function(value, key) {
res.setHeader(key, value);
});
}
if (output.length === 2) {
res.statusCode = output.shift();
}
output = output[0];
}
if (_.isNumber(output)) {
res.statusCode = output;
output = '';
}
return res.end(output);
}
}).run();
});

fs.watch fired twice when I change the watched file

fs.watch( 'example.xml', function ( curr, prev ) {
// on file change we can read the new xml
fs.readFile( 'example.xml','utf8', function ( err, data ) {
if ( err ) throw err;
console.dir(data);
console.log('Done');
});
});
OUTPUT:
some data
Done X 1
some data
Done X 2
It is my usage fault or ..?
The fs.watch api:
is unstable
has known "behaviour" with regards repeated notifications. Specifically, the windows case being a result of windows design, where a single file modification can be multiple calls to the windows API
I make allowance for this by doing the following:
var fsTimeout
fs.watch('file.js', function(e) {
if (!fsTimeout) {
console.log('file.js %s event', e)
fsTimeout = setTimeout(function() { fsTimeout=null }, 5000) // give 5 seconds for multiple events
}
}
I suggest to work with chokidar (https://github.com/paulmillr/chokidar) which is much better than fs.watch:
Commenting its README.md:
Node.js fs.watch:
Doesn't report filenames on OS X.
Doesn't report events at all when using editors like Sublime on OS X.
Often reports events twice.
Emits most changes as rename.
Has a lot of other issues
Does not provide an easy way to recursively watch file trees.
Node.js fs.watchFile:
Almost as bad at event handling.
Also does not provide any recursive watching.
Results in high CPU utilization.
If you need to watch your file for changes then you can check out my small library on-file-change. It checks file sha1 hash between fired change events.
Explanation of why we have multiple fired events:
You may notice in certain situations that a single creation event generates multiple Created events that are handled by your component. For example, if you use a FileSystemWatcher component to monitor the creation of new files in a directory, and then test it by using Notepad to create a file, you may see two Created events generated even though only a single file was created. This is because Notepad performs multiple file system actions during the writing process. Notepad writes to the disk in batches that create the content of the file and then the file attributes. Other applications may perform in the same manner. Because FileSystemWatcher monitors the operating system activities, all events that these applications fire will be picked up.
Source
My custom solution
I personally like using return to prevent a block of code to run when checking something, so, here is my method:
var watching = false;
fs.watch('./file.txt', () => {
if(watching) return;
watching = true;
// do something
// the timeout is to prevent the script to run twice with short functions
// the delay can be longer to disable the function for a set time
setTimeout(() => {
watching = false;
}, 100);
};
Feel free to use this example to simplify your code. It may NOT be better than using a module from others, but it works pretty well!
Similar/same problem. I needed to do some stuff with images when they were added to a directory. Here's how I dealt with the double firing:
var fs = require('fs');
var working = false;
fs.watch('directory', function (event, filename) {
if (filename && event == 'change' && active == false) {
active = true;
//do stuff to the new file added
active = false;
});
It will ignore the second firing until if finishes what it has to do with the new file.
I'm dealing with this issue for the first time, so all of the answers so far are probably better than my solution, however none of them were 100% suitable for my case so I came up with something slightly different – I used a XOR operation to flip an integer between 0 and 1, effectively keeping track of and ignoring every second event on the file:
var targetFile = "./watchThis.txt";
var flippyBit = 0;
fs.watch(targetFile, {persistent: true}, function(event, filename) {
if (event == 'change'){
if (!flippyBit) {
var data = fs.readFile(targetFile, "utf8", function(error, data) {
gotUpdate(data);
})
} else {
console.log("Doing nothing thanks to flippybit.");
}
flipBit(); // call flipBit() function
}
});
// Whatever we want to do when we see a change
function gotUpdate(data) {
console.log("Got some fresh data:");
console.log(data);
}
// Toggling this gives us the "every second update" functionality
function flipBit() {
flippyBit = flippyBit ^ 1;
}
I didn't want to use a time-related function (like jwymanm's answer) because the file I'm watching could hypothetically get legitimate updates very frequently. And I didn't want to use a list of watched files like Erik P suggests, because I'm only watching one file. Jan Święcki's solution seemed like overkill, as I'm working on extremely short and simple files in a low-power environment. Lastly, Bernado's answer made me a little nervous – it would only ignore the second update if it arrived before I'd finished processing the first, and I can't handle that kind of uncertainty. If anyone were to find themselves in this very specific scenario, there might be some merit to the approach I used? If there's anything massively wrong with it please do let me know/edit this answer, but so far it seems to work well?
NOTE: Obviously this strongly assumes that you'll get exactly 2 events per real change. I carefully tested this assumption, obviously, and learned its limitations. So far I've confirmed that:
Modifying a file in Atom editor and saving triggers 2 updates
touch triggers 2 updates
Output redirection via > (overwriting file contents) triggers 2 updates
Appending via >> sometimes triggers 1 update!*
I can think of perfectly good reasons for the differing behaviours but we don't need to know why something is happening to plan for it – I just wanted to stress that you'll want to check for yourself in your own environment and in the context of your own use cases (duh) and not trust a self-confessed idiot on the internet. That being said, with precautions taken I haven't had any weirdness so far.
* Full disclosure, I don't actually know why this is happening, but we're already dealing with unpredictable behaviour with the watch() function so what's a little more uncertainty? For anyone following along at home, more rapid appends to a file seem to cause it to stop double-updating but honestly, I don't really know, and I'm comfortable with the behaviour of this solution in the actual case it'll be used, which is a one-line file that will be updated (contents replaced) like twice per second at the fastest.
first is change and the second is rename
we can make a difference from the listener function
function(event, filename) {
}
The listener callback gets two arguments (event, filename). event is either 'rename' or 'change', and filename is the name of the file which triggered the event.
// rm sourcefile targetfile
fs.watch( sourcefile_dir , function(event, targetfile)){
console.log( targetfile, 'is', event)
}
as a sourcefile is renamed as targetfile, it's will call three event as fact
null is rename // sourcefile not exist again
targetfile is rename
targetfile is change
notice that , if you want catch all these three evnet, watch the dir of sourcefile
I somtimes get multible registrations of the Watch event causing the Watch event to fire several times.
I solved it by keeping a list of watching files and avoid registering the event if the file allready is in the list:
var watchfiles = {};
function initwatch(fn, callback) {
if watchlist[fn] {
watchlist[fn] = true;
fs.watch(fn).on('change', callback);
}
}
......
Like others answers says... This got a lot of troubles, but i can deal with this in this way:
var folder = "/folder/path/";
var active = true; // flag control
fs.watch(folder, function (event, filename) {
if(event === 'rename' && active) { //you can remove this "check" event
active = false;
// ... its just an example
for (var i = 0; i < 100; i++) {
console.log(i);
}
// ... other stuffs and delete the file
if(!active){
try {
fs.unlinkSync(folder + filename);
} catch(err) {
console.log(err);
}
active = true
}
}
});
Hope can i help you...
Easiest solution:
const watch = (path, opt, fn) => {
var lock = false
fs.watch(path, opt, function () {
if (!lock) {
lock = true
fn()
setTimeout(() => lock = false, 1000)
}
})
}
watch('/path', { interval: 500 }, function () {
// ...
})
I was downloading file with puppeteer and once a file saved, I was sending automatic emails. Due to problem above, I noticed, I was sending 2 emails. I solved by stopping my application using process.exit() and auto-start with pm2. Using flags in code didn't saved me.
If anyone has this problem in future, one can use this solution as well. Exit from program and restart with monitor tools automatically.
Here's my simple solution. It works well every time.
// Update obj as file updates
obj = JSON.parse(fs.readFileSync('./file.json', 'utf-8'));
fs.watch('./file.json', () => {
const data = JSON.parse(fs.readFileSync('./file.json', 'utf-8') || '{}');
if(Object.entries(data).length > 0) { // This checks fs.watch() isn't false-firing
obj = data;
console.log('File actually changed: ', obj)
}
});
I came across the same issue. If you don't want to trigger multiple times, you can use a debounce function.
fs.watch( 'example.xml', _.debounce(function ( curr, prev ) {
// on file change we can read the new xml
fs.readFile( 'example.xml','utf8', function ( err, data ) {
if ( err ) throw err;
console.dir(data);
console.log('Done');
});
}, 100));
Debouncing The Observer
A solution I arrived at was that (a) there needs to be a workaround for the problem in question and, (b), there needs to be a solution to ensure multiple rapid Ctrl+s actions do not cause Race Conditions. Here's what I have...
./**/utilities.js (somewhere)
export default {
...
debounce(fn, delay) { // #thxRemySharp https://remysharp.com/2010/07/21/throttling-function-calls/
var timer = null;
return function execute(...args) {
var context = this;
clearTimeout(timer);
timer = setTimeout(fn.bind(context, ...args), delay);
};
},
...
};
./**/file.js (elsewhere)
import utilities from './**/utilities.js'; // somewhere
...
function watch(server) {
const debounced = utilities.debounce(observeFilesystem.bind(this, server), 1000 * 0.25);
const observers = new Set()
.add( fs.watch('./src', debounced) )
.add( fs.watch('./index.html', debounced) )
;
console.log(`watching... (${observers.size})`);
return observers;
}
function observeFilesystem(server, type, filename) {
if (!filename) console.warn(`Tranfer Dev Therver: filesystem observation made without filename for type ${type}`);
console.log(`Filesystem event occurred:`, type, filename);
server.close(handleClose);
}
...
This way, the observation-handler that we pass into fs.watch is [in this case a bound bunction] which gets debounced if multiple calls are made less than 1000 * 0.25 seconds (250ms) apart from one another.
It may be worth noting that I have also devised a pipeline of Promises to help avoid other types of Race Conditions as the code also leverages other callbacks. Please also note the attribution to Remy Sharp whose debounce function has repeatedly proven very useful over the years.
watcher = fs.watch( 'example.xml', function ( curr, prev ) {
watcher.close();
fs.readFile( 'example.xml','utf8', function ( err, data ) {
if ( err ) throw err;
console.dir(data);
console.log('Done');
});
});
I had similar similar problem but I was also reading the file in the callback which caused a loop.
This is where I found how to close watcher:
How to close fs.watch listener for a folder
NodeJS does not fire multiple events for a single change, it is the editor you are using updating the file multiple times.
Editors use stream API for efficiency, they read and write data in chunks which causes multiple updates depending on the chunks size and the amount of content. Here is a snippet to test if fs.watch fires multiple events:
const http = require('http');
const fs = require('fs');
const path = require('path');
const host = 'localhost';
const port = 3000;
const file = path.join(__dirname, 'config.json');
const requestListener = function (req, res) {
const data = new Date().toString();
fs.writeFileSync(file, data, { encoding: 'utf-8' });
res.end(data);
};
const server = http.createServer(requestListener);
server.listen(port, host, () => {
fs.watch(file, (eventType, filename) => {
console.log({ eventType });
});
console.log(`Server is running on http://${host}:${port}`);
});
I believe a simple solution would be checking for the last modified timestamp:
let lastModified;
fs.watch(file, (eventType, filename) => {
stat(file).then(({ mtimeMs }) => {
if (lastModified !== mtimeMs) {
lastModified = mtimeMs;
console.log({ eventType, filename });
}
});
});
Please note that you need to use all-sync or all-async methods otherwise you will have issues:
Update the file in a editor, you will see only single event is logged:
const http = require('http');
const host = 'localhost';
const port = 3000;
const fs = require('fs');
const path = require('path');
const file = path.join(__dirname, 'config.json');
let lastModified;
const requestListener = function (req, res) {
const data = Date.now().toString();
fs.writeFileSync(file, data, { encoding: 'utf-8' });
lastModified = fs.statSync(file).mtimeMs;
res.end(data);
};
const server = http.createServer(requestListener);
server.listen(port, host, () => {
fs.watch(file, (eventType, filename) => {
const mtimeMs = fs.statSync(file).mtimeMs;
if (lastModified !== mtimeMs) {
lastModified = mtimeMs;
console.log({ eventType });
}
});
console.log(`Server is running on http://${host}:${port}`);
});
Few notes on the alternative solutions: Storing files for comparison will be memory inefficient especially if you have large files, taking file hashes will be expensive, custom flags are hard to keep track of, especially if you are going to detect changes made by other applications, and lastly unsubscribing and re-subscribing requires unnecessary juggling.
If you don't need an instant result, you can use setTimout to debounce successive events:
let timeoutId;
fs.watch(file, (eventType, filename) => {
clearTimeout(timeoutId);
timeoutId = setTimeout(() => {
console.log({ eventType });
}, 100);
});

http.Clientrequest.abort() ends program

I'm having some issues using Node.js as a http client against an existing long polling server. I'm using 'http' and 'events' as requires.
I've created a wrapper object that contains the logic for handling the http.clientrequest. Here's a simplified version of the code. It works exactly as expected. When I call EndMe it aborts the request as anticipated.
var http = require('http');
var events = require('events');
function lpTest(urlHost,urlPath){
this.options = {
host: urlHost,
port: 80,
path: urlPath,
method: 'GET'
};
var req = {};
events.EventEmitter.call(this);
}
lpTest.super_ = events.EventEmitter;
lpTest.prototype = Object.create(events.EventEmitter.prototype, {
constructor: {
value: lpTest,
enumerable: false
}
});
lpTest.prototype.getData = function getData(){
this.req = http.request(this.options, function(res){
var httpData = "";
res.on('data', function(chunk){
httpData += chunk;
});
res.on('end', function(){
this.emit('res_complete', httpData);
}
};
}
lpTest.prototype.EndMe = function EndMe(){
this.req.abort();
}
module.exports = lpTest;
Now I want to create a bunch of these objects and use them to long poll a bunch of URL's. So I create an object to contain them all, generate each object individually, initiate it, then store it in my containing object. This works a treat, all of the stored long-polling objects fire events and return the data as expected.
var lpObject = require('./lpTest.js');
var objWatchers = {};
function DoSomething(hostURL, hostPath){
var tempLP = new lpObject(hostURL,hostPath);
tempLP.on('res_complete', function(httpData){
console.log(httpData);
this.getData();
});
objWatchers[hosturl + hostPath] = tempLP;
}
DoSomething('firsturl.com','firstpath');
DoSomething('secondurl.com','secondpath);
objWatchers['firsturl.com' + 'firstpath'].getData();
objWatchers['secondurl.com' + 'secondpath'].getData();
Now here's where it fails... I want to be able to stop a long-polling object while leaving the rest going. So naturally I try adding:
objWatchers['firsturl.com' + 'firstpath'].EndMe();
But this causes the entire node execution to cease and return me to the command line. All of the remaining long-polling objects, that are happily doing what they're supposed to do, suddenly stop.
Any ideas?
Could it have something to do with the fact that you are only calling getData() when the data is being returned?
Fixed code:
function DoSomething(hostURL, hostPath){
var tempLP = new lpObject(hostURL,hostPath);
tempLP.on('res_complete', function(httpData){
console.log(httpData);
});
tempLP.getData();
objWatchers[hosturl + hostPath] = tempLP;
}
I have seemingly solved this, although I'm note entirely happy with how it works:
var timeout = setTimeout(function(){
objWatchers['firsturl.com' + 'firstpath'].EndMe();
}, 100);
By calling the closing function on the object after a delay I seem to be able to preserve the program execution. Not exactly ideal, but I'll take it! If anyone can offer a better method please feel free to let me know :)

node.js process out of memory in http.request loop

In my node.js server i cant figure out, why it runs out of memory. My node.js server makes a remote http request for each http request it receives, therefore i've tried to replicate the problem with the below sample script, that also runs out of memory.
This only happens if the iterations in the for loop are very high.
From my point of view, the problem is related to the fact that node.js is queueing the remote http requests. How to avoid this?
This is the sample script:
(function() {
var http, i, mypost, post_data;
http = require('http');
post_data = 'signature=XXX%7CPSFA%7Cxxxxx_value%7CMyclass%7CMysubclass%7CMxxxxx&schedule=schedule_name_6569&company=XXXX';
mypost = function(post_data, cb) {
var post_options, req;
post_options = {
host: 'myhost.com',
port: 8000,
path: '/set_xxxx',
method: 'POST',
headers: {
'Content-Length': post_data.length
}
};
req = http.request(post_options, function(res) {
var res_data;
res.setEncoding('utf-8');
res_data = '';
res.on('data', function(chunk) {
return res_data += chunk;
});
return res.on('end', function() {
return cb();
});
});
req.on('error', function(e) {
return console.debug('TM problem with request: ' + e.message);
});
req.write(post_data);
return req.end;
};
for (i = 1; i <= 1000000; i++) {
mypost(post_data, function() {});
}
}).call(this);
$ node -v
v0.4.9
$ node sample.js
FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory
Tks in advance
gulden PT
Constraining the flow of requests into the server
It's possible to prevent overload of the built-in Server and its HTTP/HTTPS variants by setting the maxConnections property on the instance. Setting this property will cause node to stop accept()ing connections and force the operating system to drop requests when the listen() backlog is full and the application is already handling maxConnections requests.
Throttling outgoing requests
Sometimes, it's necessary to throttle outgoing requests, as in the example script from the question.
Using node directly or using a generic pool
As the question demonstrates, unchecked use of the node network subsystem directly can result in out of memory errors. Something like node-pool makes the active pool management attractive, but it doesn't solve the fundamental problem of unconstrained queuing. The reason for this is that node-pool doesn't provide any feedback about the state of the client pool.
UPDATE: As of v1.0.7 node-pool includes a patch inspired by this post to add a boolean return value to acquire(). The code in the following section is no longer necessary and the example with the streams pattern is working code with node-pool.
Cracking open the abstraction
As demonstrated by Andrey Sidorov, a solution can be reached by tracking the queue size explicitly and mingling the queuing code with the requesting code:
var useExplicitThrottling = function () {
var active = 0
var remaining = 10
var queueRequests = function () {
while(active < 2 && --remaining >= 0) {
active++;
pool.acquire(function (err, client) {
if (err) {
console.log("Error acquiring from pool")
if (--active < 2) queueRequests()
return
}
console.log("Handling request with client " + client)
setTimeout(function () {
pool.release(client)
if(--active < 2) {
queueRequests()
}
}, 1000)
})
}
}
queueRequests(10)
console.log("Finished!")
}
Borrowing the streams pattern
The streams pattern is a solution which is idiomatic in node. Streams have a write operation which returns false when the stream cannot buffer more data. The same pattern can be applied to a pool object with acquire() returning false when the maximum number of clients have been acquired. A drain event is emitted when the number of active clients drops below the maximum. The pool abstraction is closed again and it's possible to omit explicit references to the pool size.
var useStreams = function () {
var queueRequests = function (remaining) {
var full = false
pool.once('drain', function() {
if (remaining) queueRequests(remaining)
})
while(!full && --remaining >= 0) {
console.log("Sending request...")
full = !pool.acquire(function (err, client) {
if (err) {
console.log("Error acquiring from pool")
return
}
console.log("Handling request with client " + client)
setTimeout(pool.release, 1000, client)
})
}
}
queueRequests(10)
console.log("Finished!")
}
Fibers
An alternative solution can be obtained by providing a blocking abstraction on top of the queue. The fibers module exposes coroutines that are implemented in C++. By using fibers, it's possible to block an execution context without blocking the node event loop. While I find this approach to be quite elegant, it is often overlooked in the node community because of a curious aversion to all things synchronous-looking. Notice that, excluding the callcc utility, the actual loop logic is wonderfully concise.
/* This is the call-with-current-continuation found in Scheme and other
* Lisps. It captures the current call context and passes a callback to
* resume it as an argument to the function. Here, I've modified it to fit
* JavaScript and node.js paradigms by making it a method on Function
* objects and using function (err, result) style callbacks.
*/
Function.prototype.callcc = function(context /* args... */) {
var that = this,
caller = Fiber.current,
fiber = Fiber(function () {
that.apply(context, Array.prototype.slice.call(arguments, 1).concat(
function (err, result) {
if (err)
caller.throwInto(err)
else
caller.run(result)
}
))
})
process.nextTick(fiber.run.bind(fiber))
return Fiber.yield()
}
var useFibers = function () {
var remaining = 10
while(--remaining >= 0) {
console.log("Sending request...")
try {
client = pool.acquire.callcc(this)
console.log("Handling request with client " + client);
setTimeout(pool.release, 1000, client)
} catch (x) {
console.log("Error acquiring from pool")
}
}
console.log("Finished!")
}
Conclusion
There are a number of correct ways to approach the problem. However, for library authors or applications that require a single pool to be shared in many contexts it is best to properly encapsulate the pool. Doing so helps prevent errors and produces cleaner, more modular code. Preventing unconstrained queuing then becomes an evented dance or a coroutine pattern. I hope this answer dispels a lot of FUD and confusion around blocking-style code and asynchronous behavior and encourages you to write code which makes you happy.
yes, you trying to queue 1000000 requests before even starting them. This version keeps limited number of request (100):
function do_1000000_req( cb )
{
num_active = 0;
num_finished = 0;
num_sheduled = 0;
function shedule()
{
while (num_active < 100 && num_sheduled < 1000000) {
num_active++;
num_sheduled++;
mypost(function() {
num_active--;
num_finished++;
if (num_finished == 1000000)
{
cb();
return;
} else if (num_sheduled < 1000000)
shedule();
});
}
}
}
do_1000000_req( function() {
console.log('done!');
});
the node-pool module can help you. For more détails, see this post (in french), http://blog.touv.fr/2011/08/http-request-loop-in-nodejs.html

Resources