How does one handle slow client connections in node.js? - node.js

Node.js is single threaded. If a slow client is making a request, I imagine it could block the thread until it completes. Does that make sense ?
How does one handle slow connections ? Does it make sense to just terminate a connection if it takes too long ? How does one determine this? How does one measure how long the request is taking, and terminate it if it is taking too long ? I'm not referring to the duration it takes to send back a response. I'm just referring the time it takes for node to receive all the data required to process a request. Is this a legitimate scenario ?
I imagine there must be some way to do this, otherwise it would be really easy to DOS attack a node.js server...
EDIT: In a post request, the data comes in, in chunks. So what if it just comes in slowly ? I'm not sure how to simulate this. But if this is a problem in node, it could equally be a problem in PHP etc, because you would just need to spawn many connection, all of which are very slow to attack a server.

It doesn't matter if client request data comes in slowly. I/O in node is asynchronous and non-blocking. So if a chunk of data isn't available on a socket for a long time, node can do other things in the meantime, such as receive chunks of data from other sockets.
You can set an inactivity timeout that fires when no data is seen on the socket for the desired length of time. For HTTP, you can set a global request timeout via server.setTimeout() that can automatically close the underlying socket or if you pass in a callback, you can handle the timeout however you want. For TCP, there is socket.setTimeout().

Related

Faster HTTP scraping per POST request?

I'm writing an API that returns an array of redirects for any given page:
router.post('/trace', function(req,res){
if(!req.body.link)
return res.status(405).send(""); //error: no link provided!
console.log("\tapi/trace()", req.body.link);
var redirects = [];
function exit(goodbye){
if(goodbye)
console.log(goodbye);
res.status(200).send(JSON.stringify(redirects)); //end
}
function getRedirect(link){
request({ url: link, followRedirect: false }, function (err, response, body) {
if(err)
exit(err);
else if(response.headers.location){
redirects.push(response.headers.location);
getRedirect(response.headers.location);
}
else
exit(); //all done!
});
}
getRedirect(req.body.link);
});
and here is the corresponding browser request:
$.post('/api/trace', { link: l }, cb);
a page will make about 1000 post request very quickly and then waits a very long time to get each request back.
The problem is the response to the nth request is very slow. individual request takes about half a second, but as best I cant tell the express server is processing each link sequentially. I want the server to make all the requests and respond as it receives a response.
Am I correct in assuming express POST router is running processes sequentially? How do I get it to blast all requests and pass the responses as it gets them?
My question is why is it so slow / is POST an async process on a "out of the box" express server?
You may be surprised to find out that this is probably first a browser issue, not a node.js issue.
A browser will have a max number of simultaneous requests it will allow your Javascript ajax to make to same host which will vary slightly from one browser to the next, but is around 6. So, if you're making 1000 requests, then only around 6 are being sent at at time. The rest go in a queue in the browser waiting for prior requests to finish. So, your node server likely isn't getting 1000 simultaneous requests. You should be able to confirm this by logging incoming requests in your node.js app. You will probably see a long delay before it receives the 1000th request (because it's queued by the browser).
Here's a run-down of how many simultanous requests to a given host each of the browser supported (as of a couple years ago): Max parallel http connections in a browser?.
My first recommendation would be to package up an array of requests to make from the client to the server (perhaps 50 at a time) and then send that in one request. That will give your node.js server plenty to chew on and won't run afoul of the browser's connection limit to the same host.
As for the node.js server, it depends a lot on what you're doing. If most of what you're doing in the node.js server is just networking and not a lot of processing that requires CPU cycles, then node.js is very efficient at handling lots and lots of simultaneous requests. If you start engaging a bunch of CPU (processing or preparing results), then you make benefit from either adding worker processes or using node.js clustering. In your case, you may want to use worker processes. You can examine your CPU load when your node.js server is processing a bunch of work and see if the one CPU that node.js is using is anywhere near 100% or not. If it isn't, then you don't need more node.js processes. If it is, then you do need to spread the work over more node.js processes to go faster.
In your specific case, it looks like you're really only doing networking to collect 302 redirect responses. Your single node.js process should be able to handle a lot of those requests very efficiently so probably the issue is just that your client is being throttled by the browser.
If you want to send a lot of requests to the server (so it can get to work on as many as feasible), but want to get results back immediately as they become available, that's a little more work.
One scheme that could work is to open a webSocket or socket.io connection. You can then send a giant array of URLs that you want the server to check for you in one message over the socket.io connection. Then, as the server gets a result, it can send back each individual result (tagged with the URL that it corresponds to). That way, you can somewhat get the best of both worlds with the server crunching on a long list of URLs, but able to send back individual responses as soon as it gets them.
Note, you will probably find that there is an upper limit to how many outbound http requests you may want to run at the same time from your node.js server too. While modern versions of node.js don't throttle you like the browser does, you probably also don't want your node.js server attempting to run 10,000 simultaneous requests because you may exhaust some sort of network resource pool. So, once you get past the client bottleneck, you will want to test your server at different levels of simultaneous requests open to see where it performs best. This is both to optimize its performance, but also to protect your server against attempting to overextend its use of networking or memory resources and get into error conditions.

Will using Socket.io instead of normal ajax calls prevent a server from running out of TCP sockets?

I'm trying to set up a server that can handle a high sustained amount of simultaneous requests. I found that at a certain point, the server won't be able to recycle "old" TCP connections quickly enough to accommodate extreme amounts of requests.
Do websockets eliminate or decrease the amount of tcp connections that a server needs to handle, and are they a good alternative to "normal" requests?
Websockets are persistent connections so it really depends on what you're talking about. The way socket.io uses XHR is different from a typical ajax call in that it hangs onto the request for as long as possible before sending a response. It's a technique called long-polling and It's trying to simulate a persistent connection by never letting go of the request. When the request is about to timeout it sends a response and a new request is initiated immediately which it hangs onto yet again, and the cycle continues.
So I guess if you're getting flooded with connections because of ajax calls then that's probably because your client code is polling the server at some sort of interval. This means that even idle clients will be hitting your server with fury because of this polling. If that's the case then yes, socket.io will reduce your number of connections because it tries to hang onto one single connection per client for as long as possible.
These days I recommend socket.io over doing plain ajax requests. Socket.io is designed to be performant with whatever transport it settles on. The way it gracefully degrades based on what connection is possible is great and means your server will be overloaded as little as possible while still reaching as wide an audience as it can.

Node.js options to push updates to some microcontrollers that have an HTTP 1.1 stack

The title pretty well says it.
I need the microcontrollers to stay connected to the server to receive updates within a couple of seconds and I'm not quite sure how to do this.
The client in this case is very limited to say the least and it seems like all the solutions I've found for polling or something like socket.io require dropping some significant JavaScript to the client.
If I'm left having to reimplement one of those libraries in C on the micro I could definitely use some pointers on the leanest way to handle it.
I can't just pound the server with constant requests because this is going to increase to a fair number of connected micros.
Just use ordinary long polling: each controller initially makes an HTTP request and waits for a response, which happens when there's an update. Once the controller receives the response, it makes another request. Lather, rinse, repeat. This won't hammer the server because each controller makes only one request per update, and node's architecture is such that you can have lots of requests pending, since you aren't creating a new thread or process for each active connection.

Node.js App with API endpoints which take 20sec+ :: Connections Left Open :: How to Optimize?

I have a Node.js RESTful API returning JSON data. One of the API calls can (and frequently does) take 10 - 20 seconds to finish. This long RTT is due to connecting to external APIs, like DiffBot, MailChimp, Facebook, Twitter, etc. I wish I could make the API call shorter, but I cannot.
Of course, I've implemented the node code in a nice async way, but the problem is that the client's inbound connection (to the node app) is alive while it waits for the server to finish, and thus might be killing my performance. In fact, I'm currently guessing that this may explain my long-running timeout issue in node.
I've already increased maxSockets to a huge number...
require('http').globalAgent.maxSockets = 9999;
For the sake of interest, I'm printing out the active sockets each time a new connection is made (here's the code).
Which gives me output like this:
SOCKETS: {} { 'graph.facebook.com:443': 5, 'api.instagram.com:443': 1 }
Nothing too enlightening there. The max connections I ever see is around 20 or so, total, across all hosts. But this doesn't really tell me anything about incoming connections, or how to optimize them so that my server does not choke when there are many of them alive at once (which I suspect it is).
You should optimize your architecture, not just the code.
First, I would change the way the client/server interact with each other. The server should end the request upon recept and notify the client once all the tasks for that request are truly complete.
There are different ways to achieve that. For example, the client can query the stats of the request using AJAX (poll) every X seconds. Another example would be to use WebSocket.
If you're going with this approach, look into Socket.IO. It supports many transports with the same API, if WebSocket is available, it would use that, otherwise, it would fall back to other transports such as Flash Socket, long-polling, etc.
Second, you shouldn't use one process to do all this work. You should use a queue (preferably a messaging system that supports queues), then, run workers (separate processes) to do the "heavy lifting".
Personally, I use AMQP due to its features and portability (it's an open-standard) but feel free to use any other queue system with a persistant backend.
That way, if one or more process(es) crash(es) and you use the right queue, you wouldn't lose any data (such as the API tasks you mentioned).
Hope it helps.

Long polling in node.js - how to 'timeout' the pending requests if no data is available?

I'm trying to implement an http long polling server in Node.js, and have no idea how to close/shutdown pending requests if a timeout is reached.
3 possible solutions come to my mind:
Store the pendingRequest with a timestamp in a hash/object, then call setIntervall, so that every 1/2/x secs the pendingRequests are removed if the timestamp is too old.
set a timeout on the socket connection
Both solutions don't seem very reasonable, so what would be the Node.js way to achieve something like this?
Why don't those sound reasonable? In particular, setting a timeout on the socket seems to make sense to me, as:
There is a built-in method for doing so
An event is fired when the connection times out, allowing you to do any necessary cleanup (e.g. calling end/destroy on the socket)
I would probably go this route so that Node handles the timeout behind the scenes; however, if it makes sense for your app, I don't see any harm in keeping a timestamp and expiring connections manually.
You may be interested in these articles, each of which handles expiring connections differently:
Long polling in Node.js
How to write a Long Polling Event Push Server with node.js

Resources