Node.js API choking with concurrent connections

Node.js API choking with concurrent connections - node.js

This is the first time I've used Node.js and Mongo, so please excuse any ignorance. I come from a PHP background. It was my understanding that Node.js scaled well because of the event-driven nature of it. As such, I built my API in node and have been testing it on a localhost. Today, I deployed it to my cloud server and everything works great, except...
As the requests start to pile up, they start to take a long time to fulfill. With just 2 clients connecting to the API, already I'm seeing 30sec+ page load times when both clients are trying to make several requests at once (which does sometimes happen).
Most of the work done by the API is either (a) reading/writing to MongoDB, which resides on a 2nd server on the cloud (b) making requests to other APIs, websites, etc. and returning the results. Both of these operations should not be blocking, but I can imagine the problem being something to do with a bottleneck either on the Mongo DB server (a) or to the external APIs (b).
Of course, I will have multiple application servers in the end, but I would expect each one to handle more than a couple concurrent clients without choking.
Some considerations:
1) I have some console.logs that I left in my node code, and I have a SSH client open to monitor the cloud server. I suspect that this could cause slowdown
2) I use express, mongoose, Q, request, and a handful of other modules
Thanks for taking the time to help a node newb ;)
Edit: added some pics of performance graphs after some responses below...
EDIT: here's a typical callback -- it is called by the express router, and it uses the Q module and OAuth to make a Post API call to Facebook:
post: function(req, links, images, callback)
{
// removed some code that calculates the target (string) and params (obj) variables
// the this.request function is a simple wrapper around the oauth.getProtectedResource function
Q.ncall(this.request, this, target, 'POST', params)
.then(function(res){
callback(null, res);
})
.fail(callback).end();
},
EDIT: some "upsert" code
upsert: function(query, callback)
{
var id = this.data._id,
upsertData = this.data.toObject(),
query = query || {'_id': id};
delete upsertData._id;
this.model.update(query, upsertData, {'upsert': true}, function(err, res, out){
if(err)
{
if(callback) callback(new Errors.Database({'message':'the data could not be upserted','error':err, 'search': query}));
return;
}
if(callback) callback(null);
});
},
Admittedly, my knowledge of Q/promises is weak. But, I think I have consistently implemented them in a way that does not block...

Your question has provided half of the relevant data: the technology stack. However, when debugging performance issues, you also need the other half of the data: performance metrics.
You're running some "cloud servers", but it's not clear what these servers are actually doing. Are they spiked on CPU? on Memory? on IO?
There are lots of potential issues. Are you running Express in production mode? Are you taking up too much IO on your MongoDB server? Are you legitimately downloading too much data? Did you get caught in an infinite Node.JS loop? (it happens)
I would like to provide better advice, but without knowing the status of the servers involved it's really impossible to start picking at any specific underlying technology. You may be a "Node newb", but basic server monitoring is pretty standard across programming languages.
Thank you for the extra details, I will re-iterate the most important part of my comments above: Where are these servers blocked?
CPU? (clearly not from your graph)
Memory? (doesn't seem likely here)
IO? (where are the IO graphs, what is your DB server doing?)

Related

Node one async call slows another in parallel

For over a year we've seen interesting patterns that don't always rear themselves but on occasion repeat and we've never been able to figure out why and I'm hoping someone can make sense of it. It may be our approach, it may be the environment (node 8.x & koa), it may be a number of things.
We make two async calls in parallel to our dependencies using the request-promise module.
Simplified code of a single api dependency:
const httpRequest = require("request-promise");
module.exports = function (url) {
const requestOptions = {
uri: ...,
json: true,
resolveWithFullResponse: true
}
return httpRequest(requestOptions)
.then(response => {
status = response.statusCode;
tmDiff = moment().diff(tmStart);
return createResponseObject({
status,
payload: response.body,
})
})
.catch(err => { .... };
});
};
Parallel calls:
const apiResponses = yield {
api1: foo(ctx, route),
api2: bar(ctx)
};
Yet we've seen situations in our response time charts where if 1 is slow, latency seems to follow the other separate service. It doesn't matter what services they are, the pattern has been noticed across > 5 services that may be called in parallel. Does anyone have any ideas what could be causing the supposed latency?

If the latency is caused by a temporarily slowed network connection, then it would be logical to expect both parallel requests to feel that same effect. ping or tracert during the slowdown might give you useful diagnostics to see if it's a general transport issue. If your node.js server (which runs Javascript single threaded) was momentarily busy doing something else with the CPU (serving another request, garbage collecting, etc...), then that would affect the apparent responsiveness of API calls just because it took a moment for node.js to get around to processing the waiting responses.
There are tools that monitor the responsiveness of your own http server on a continual basis (you can set whatever monitoring interval you want). If you have a CPU-hog somewhere, those tools would show a similar lag in responsiveness. There are also tools that monitor the health of your network connection which would also show a connectivity glitch. These are the types of monitoring tools that people whose job it is to maintain a healthy server farm might use. I don't have a name handy for either one, but you can presumably find such tools by searching.

nodejs multithread for the same resource

I'm quite new to nodejs and I'm doing some experiments.
What I get from them (and I hope I'm wrong!) is that nodejs couldn't serve many concurrent requests for the same resource without putting them in sequence.
Consider following code (I use Express framework in the following example):
var express = require('express');
var app = express();
app.get('/otherURL', function (req, res) {
res.send('otherURL!');
});
app.get('/slowfasturl', function (req, res)
{
var test = Math.round(Math.random());
if(test == "0")
{
var i;
setTimeout
(
function()
{
res.send('slow!');
}, 10000
);
}
else
{
res.send('fast!');
}
});
app.listen(3000, function () {
console.log('app listening on port 3000!');
});
The piece of code above exposes two endpoints:
http://127.0.0.1:3000/otherurl , that just reply with "otherURL!" simple text
http://127.0.0.1:3000/slowfasturl , that randomly follow one of the two behaviors below:
scenario 1 : reply immediately with "fast!" simple text
or
scenario 2 : reply after 10 seconds with "slow!" simple text
My test:
I've opened several windows of chrome calling at the same time the slowfasturl URL and I've noticed that the first request that falls in the "scenario 2", causes the blocking of all the other requests fired subsequentely (with other windows of chrome), indipendently of the fact that these ones are fallen into "scenario 1" (and so return "slow!") or "scenario 2" (and so return "fast!"). The requests are blocked at least until the first one (the one falling in the "scenario 2") is not completed.
How do you explain this behavior? Are all the requests made to the same resource served in sequence?
I experience a different behavior if while the request fallen in the "scenario 2" is waiting for the response, a second request is done to another resource (e.g. the otherurl URL explained above). In this case the second request is completed immediately without waiting for the first one
thank you
Davide

As far as I remember, the requests are blocked browser side.
Your browser is preventing those parallel requests but your server can process them. Try in different browsers or using curl and it should work.

The behavior you observe can only be explained through any sequencing which browser does. Node does not service requests in sequence, instead it works on an event driven model, leveraging the libuv framework
I have ran your test case with non-browser client, and confirmed that requests do not influence each other.
To gain further evidence, I suggest the following:
Isolate the problem scope. Remove express (http abstraction) and use either http (base http impl), or even net (TCP) module.
Use non-browser client. I suggest ab (if you are in Linux) - apache benchmarking tool, specifically for web server performance measurement.
I used
ab -t 60 -c 100 http://127.0.0.1:3000/slowfasturl
collect data for 60 seconds, for 100 concurrent clients.
Make it more deterministic by replacing Math.random with a counter, and toggling between a huge timeout and a small timeout.
Check result to see the rate and pattern of slow and fast responses.
Hope this helps.

Davide: This question needs an elaboration, so adding as another answer rather than comment, which has space constraints.
If you are hinting at the current node model as a problem:
Traditional languages (and runtimes) caused code to be run in sequence. Threads were used to scale this but has side effects such as:
i) shared data access need sequencing, ii) I/O operations block. Node is the result of a careful integration between three entities
libuv(multiplexer), v8 (executor), and node (orchestrator) to address those issues. This model ensures improved performance and scalability under web and cloud deployments. So there is no problem with this approach.
If you are hinting at further improvements to manage stressful CPU bound operations in node where there will be waiting period yes, leveraging the multi-core and introducing more threads to share the CPU intensive workload would be the right way.
Hope this helps.

Why should I use Restify?

I had the requirement to build up a REST API in node.js and was looking for a more light-weight framework than express.js which probably avoids the unwanted features and would act like a custom-built framework for building REST APIs. Restify from its intro is recommended for the same case.
Reading Why use restify and not express? seemed like restify is a good choice.
But the surprise came when I tried out both with a load.
I made a sample REST API on Restify and flooded it with 1000 requests per second. Surprise to me the route started not responding after a while. The same app built on express.js handled all.
I am currently applying the load to API via
var FnPush = setInterval(function() {
for(i=0;i<1000;i++)
SendMsg(makeMsg(i));
}, 1000);
function SendMsg(msg) {
var post_data = querystring.stringify(msg);
var post_options = {
host: target.host,
port: target.port,
path: target.path,
agent: false,
method: 'POST',
headers: {
'Content-Type': 'application/x-www-form-urlencoded',
'Content-Length': post_data.length,
"connection": "close"
}
};
var post_req = http.request(post_options, function(res) {});
post_req.write(post_data);
post_req.on('error', function(e) {
});
post_req.end();
}
Does the results I have got seem sensible? And if so is express more efficient than restify in this scenario? Or is there any error in the way I tested them out?
updated in response to comments
behavior of restify
when fed with a load of more than 1000 req.s it stopped processing in just 1 sec receiving till 1015 req.s and then doing nothing. ie. the counter i implemented for counting incoming requests stopped increment after 1015.
when fed with a load of even 100 reqs. per second it received till 1015 and gone non responsive after that.

Corrigendum: this information is now wrong, keep scrolling!
there was an issue with the script causing the Restify test to be conducted on an unintended route. This caused the connection to be kept alive causing improved performance due to reduced overhead.
This is 2015 and I think the situation has changed a lot since. Raygun.io has posted a recent benchmark comparing hapi, express and restify.
It says:
We also identified that Restify keeps connections alive which removes the overhead of creating a connection each time when getting called from the same client. To be fair, we have also tested Restify with the configuration flag of closing the connection. You’ll see a substantial decrease in throughput in that scenario for obvious reasons.
Looks like Restify is a winner here for easier service deployments. Especially if you’re building a service that receives lots of requests from the same clients and want to move quickly. You of course get a lot more bang for buck than naked Node since you have features like DTrace support baked in.

This is 2017 and the latest performance test by Raygun.io comparing hapi, express, restify and Koa.
It shows that Koa is faster than other frameworks, but as this question is about express and restify, Express is faster than restify.
And it is written in the post
This shows that indeed Restify is slower than reported in my initial
test.

According to the Node Knockout description:
restify is a node.js module purpose built to create REST web services in Node. restify makes lots of the hard problems of building such a service, like versioning, error handling and content-negotiation easier. It also provides built in DTrace probes that you get for free to quickly find out where your application’s performance problems lie. Lastly, it provides a robust client API that handles retry/backoff for you on failed connections, along with some other niceties.
Performance issues and bugs can probably be fixed. Maybe that description will be adequate motivation.

I ran into a similar problem benchmarking multiple frameworks on OS X via ab. Some of the stacks died, consistently, after around the 1000th request.
I bumped the limit significantly, and the problem disappeared.
You can you check your maxfiles is at with ulimit, (or launchctl limit < OS X only) and see what the maximum is.
Hope that helps.

In 2021, benchmarking done by Fastify (https://www.fastify.io/benchmarks/) indicates that Restify is now slightly faster than Express.
The code used to run the benchmark can be found here https://github.com/fastify/benchmarks/.

i was confused with express or restify or perfectAPI. even tried developing a module in all of them. the main requirement was to make a RESTapi. but finally ended up with express, tested my self with the request per second made on all the framework, the express gave better result than others. Though in some cases restify outshines express but express seams to win the race. I thumbs up for express. And yes i also came across locomotive js, some MVC framework build on top of express. If anyone looking for complete MVC app using express and jade, go for locomotive.

Using node ddp-client to insert into a meteor collection from Node

I'm trying to stream some syslog data into Meteor collections via node.js. It's working fine, but the Meteor client polling cycle of ~10sec is too long of a cycle for my tastes - I'd like it be be ~1 second.
Client-side collection inserts via console are fast and all clients update instantly, as it's using DDP. But a direct MongoDB insert from the server side is subject to the polling cycle of the client(s).
So it appears that for now I'm relegated to using DDP to insert updates from my node daemon.
In the ddp-client package example, I'm able to see messages I've subscribed to, but I don't see how to actually send new messages into the Meteor collection via DDP and node.js, thereby updating all of the clients at once...
Any examples or guidance? I'd greatly appreciate it - as a newcomer to node and Meteor, I'm quickly hitting my limits.

Ok, I got it working after looking closely at some code and realizing I was totally over-thinking things. The protocol is actually pretty straight forward, RPC ish stuff.
I'm happy to report that it absolutely worked around the server-side insert delay (manual Mongo inserts were taking several seconds to poll/update the clients).
If you go through DDP, you get all the real-time(ish) goodness that you've come to know and love with Meteor :)
For posterity and to hopefully help drive some other folks to interesting use cases, here's the setup.
Use Case
I am spooling some custom syslog data into a node.js daemon. This daemon then parses and inserts the data into Mongo. The idea was to come up with a real-timey browser based reporting project for my first Meteor experiment.
All of it worked well, but because I was inserting into Mongo outside of Meteor proper, the clients had to poll every ~10 seconds. In another SO post #TimDog suggested I look at DDP for this, and his suggestion looks to have worked perfectly.
I've tested it on my system, and I can now instantly update all Meteor clients via a node.js async application.
Setup
The basic idea here is to use the DDP "call" method. It takes a list of parameters. On the Meteor server side, you export a Meteor method to consume these and do your MongoDB inserts. It's actually really simple:
Step 1: npm install ddp
Step 2: Go to your Meteor server code and do something like this, inside of Meteor.methods:
Meteor.methods({
'push': function(k,v) { // k,v will be passed in from the DDP client.
console.log("got a push request")
var d = {};
d[k] = parseInt(v);
Counts.insert(d, function(err,result){ // Now, simply use your Collection object to insert.
if(!err){
return result
}else{
return(err)
}
});
}
});
Now all we need to do is call this remote method from our node.js server, using the client library. Here's an example call, which is essentially a direct copy from the example.js calls, tweaked a bit to hook our new 'push' method that we've just exported:
ddpclient.call('push', ['hits', '1111'], function(err, result) {
console.log('called function, result: ' + result);
})
Running this code inserts via the Meteor server, which in turn instantly updates the clients that are connected to us :)
I'm sure my code above isn't perfect, so please chime in with suggestions. I'm pretty new to this whole ecosystem, so there's a lot of opportunity to learn here. But I do hope that this helps save some folks a bit of time. Now, back to focusing on making my templates shine with all this real-time data :)

According to this screencast its possible to simply call the meteor-methods declared by the collection. In your case the code would look like this:
ddpclient.call('/counts/insert', [{hits: 1111}], function(err, result) {
console.log('called function, result: ' + result);
})

How to create a database and expose it over http inorder to receie data from a sensor

First of all, i am very much a newbie to all the technologies mentioned in this post.
I am working on something, where I have sensors, which would send their reading via HTTP post. The sensor would send the sensed value periodically over http as XML.
I found a link here that explains how to create REST API.
Now in the link above it is quite clear untill the point, the author installs mongoDB. But after that point things get complex and the author didn't give an explanation what is happening in the code after.
What I am not able to figure out is,
How to create a database in node.js using mondgoDB and expose this database over http, for the sensors to send the readings.
How can I access this database in my URI's?
How can I access the date and time the data was added onto database.
I would really appreciate any help.

I am no node.js expert but the same rules apply across the board for this kind of stuff.
You must understand that your database will not be accessible directly. Instead it will be accesses from node.js. You can add a --rest option to MongoDBs startup which will start a self contained RESTlet within the mongod program, but this is probably not an awesome idea here.
As far as I can see your jkust confused about the layers, which is common in this scenario, so to explain:
Your sensors will POST data (I would probably change that to JSON format, it is more expressive and smaller than XML) out to your node.js server running on, i.e. 81.187.205.13
It will post to whatever destination your rest function to deal with this data is running, i.e. /someawesomecontroller/notsuchagoodfunction
That function (as described by the tutorial you linked) will then pick up this POST, parse it and use the default method within node.js (via the driver) to insert into MongoDB. You can see the guy who wrote that tutorial doing that in the later partsd, i.e.:
exports.findById = function(req, res) {
var id = req.params.id;
console.log('Retrieving wine: ' + id);
db.collection('wines', function(err, collection) {
collection.findOne({'_id':new BSON.ObjectID(id)}, function(err, item) {
res.send(item);
});
});
};
So really now all you need are some tutorials on how the MongoDB driver in node.js works, here is a nice starting place: Do you know any tutorial for mongoDB in nodeJS?
Hope it helps,

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Node.js API choking with concurrent connections - node.js

Related

Node one async call slows another in parallel

nodejs multithread for the same resource

Why should I use Restify?

Using node ddp-client to insert into a meteor collection from Node

How to create a database and expose it over http inorder to receie data from a sensor

Categories

Resources