How to manage node.js request connection pool? - node.js

I am using node.js Request module to make multiple post requests.
Does this module have a connection pool ?
Can we manage this connection pool ?
can we close open connections ?
How do we handle the socket hang up error

Request does not have a connection pool. However, the http module (which request uses) does:
In node 0.5.3+ there is a new implementation of the HTTP Agent which is used for pooling sockets used in HTTP client requests.
By default, there is a limit of 5 concurrent connections per host. There is an issue with the current agent implementation that causes hang up errors to occur when you try to open too many connections.
You can either:
increase the maximum number of connections: http.globalAgent.maxSockets.
disable the agent entirely: pass {pool: false} to request.
There are several reasons for having an HTTP agent in the first place:
it prevents you from accidentally opening thousands of connections to a host (would be perceived as an attack).
connections in the pool will be kept opened for HTTP 1.1 keepalive.
most of the time, maxSockets really depends on the host you're targetting. node.js will be perfectly happy to open 1000 concurrent connections if the other host handles it.
The behavior of the agent is explained in the node.js doc:
The current HTTP Agent also defaults client requests to using Connection:keep-alive. If no pending HTTP requests are waiting on a socket to become free the socket is closed. This means that node's pool has the benefit of keep-alive when under load but still does not require developers to manually close the HTTP clients using keep-alive.
The asynchronous architecture of node.js is what makes it very cheap to open new connections.

Related

Frequent xhr request by socket.io

When I connect to the socket server from the client side, which is considered react, every few seconds a repeated request is sent by the socket client. Generally, the requests are of get type and most of the time they are in pending mode. Sometimes the result of requests is 2.
What do you think is the problem of sending repeated requests after connecting or doing anything with the socket?
UPDATE
This problem occurs when I use namespace . I tried all the solutions but this problem was not solved.
image
This is expected behavior when the option used for transport is polling (long-polling).
What happens is, by default, the transport parameter is ["polling", "websocket"] (client, server), where the sequence of elements matters. So, the first connection attempt is made via polling (which is faster to start compared to websocket), and then (or in parallel, I don't know the working details) there is a connection attempt by websocket (this takes a little longer to establish but is faster for later communication).
If the websocket connection is successfully established, the communication will be carried in this way. But if an error occurs, or the connection takes a long time to be established, or this transport option is not present in the instance's parameters, then the communication will continue being carried out through polling, which are the various requests that remain pending. It is normal for them to remain pending, so they receive an update and are able to inform the requester immediately, without the need for several quick requests consulting the application's status.
Check the instance parameters you set for this connection to find out if transport via websocket is enabled. Be careful when using the socket server behind a reverse proxy, as this reverse proxy needs to be properly configured to accept websocket connections, otherwise it won't work.
You can check the websocket requests in the browser inspection, Network tab, by enabling the WS filter.
Here are some additional links for you to read more about:
https://socket.io/docs/v4/how-it-works/
https://socket.io/docs/v4/using-multiple-nodes/
https://socket.io/docs/v4/reverse-proxy/
https://ably.com/blog/websockets-vs-long-polling

What happens to nodejs server requests when the process is blocked

What happens to incoming requests when a nodejs server is blocked? There are times when the server will be blocked because it is chewing through something computationally expensive, or perhaps doing some synchronous IO (e.g. writing to a sqlite database). This is best described with an example:
given a server like this:
const { execSync } = require('child_process')
const express = require('express')
const app = express()
const port = 3000
// a synchronous (blocking) sleep function
function sleep(ms) {
execSync(`sleep ${ms / 1000}`)
}
app.get('/block', (req, res) => {
sleep(req.query.ms)
res.send(`Process blocked for ${req.query.ms}ms.`)
})
app.get('/time', (req, res) => res.send(new Date()))
app.listen(port, () => console.log(`Example app listening at http://localhost:${port}`))
I can block the nodejs process like so:
# block the server for two seconds
curl http://localhost:3000/block\?ms\=2000
and while it is blocked attempt to make another request to the server:
curl http://localhost:3000/time
the second request will hang until the blocking call is completed, and then respond with the expected datetime. My question is, what specifically is happening to the request while the nodejs process is blocked?
Does node read in the request using some low level c++ and put it into a queue? Is backpressure involved here?
Is the unix kernel involved here? Does it know to put a request on some kind of queue while a server refuses to respond?
Is it just as simple as curl waiting on a response from a socket indefinitely?
What happens if the server is blocked and 10,000 new requests hit the server? Will they all be serviced as soon as the server becomes unblocked? (assuming there is no load balancer or other timeout mechanisms in between the client & server)
Finally, I understand that blocking nodejs is bad practice but I am not asking about best practices. I want to understand what nodejs does under stressful circumstances like those described here.
In the OS, the TCP stack has a queue for incoming data or connections that is waiting to be picked up by the appropriate host application if the host application is too busy to pick it up right now. Depending upon OS and configuration, that inbound queue will fill up at some point and clients attempting to connect would get an error. I'm not aware of any separate thread in nodejs that picks these up into its own queue and there probably isn't any reason for nodejs to do so as the TCP stack already implements an inbound connection queue on its own.
If you're blocking the nodejs process long enough for 10,000 incoming requests to arrive, you have much bigger problems and need to solve the blocking problem at its core. Nodejs has threads, child processes and clustering all of which can be employed as relief for a blocking calculation.
For data sent on an existing, already-opened TCP connection, there is back pressure (at the TCP level). For new incoming connections, there really isn't such a thing as back pressure. The new incoming connection is either accepted or its not. This is one cause of what we sometimes observe as ERR_CONNECTION_REFUSED.
Some related discussion here: What can be the reason of connection refused errors.
Does node read in the request using some low level c++ and put it into a queue? Is backpressure involved here?
Node itself does not do this (that I'm aware of). The OS TCP stack has a queue for inbound data and incoming connection requests.
Is the unix kernel involved here? Does it know to put a request on some kind of queue while a server refuses to respond?
The TCP stack (in the OS) does have a queue for both incoming data arriving on an existing connection and for inbound connection requests. This queue is of a finite (and partially configurable) size.
Is it just as simple as curl waiting on a response from a socket indefinitely?
No. If the queue for inbound connection requests on the server is full, the connection request will be rejected. If the queue is not full, then it is just a matter of waiting long enough for it to succeed. Most client-side libraries will use some sort of timeout and give up after a time in case something happened that causes there to never be a response sent back.
What happens if the server is blocked and 10,000 new requests hit the server? Will they all be serviced as soon as the server becomes unblocked? (assuming there is no load balancer or other timeout mechanisms in between the client & server)
The target host will queue the inbound connection requests up to some limit (which varies by OS and configuration) and then will reject ones that come after that.
Some other relevant articles:
How TCP backlog works in Linux
What is "backlog" in TCP connections?
TCP Connection Backlog and a Struggling Server
The more of these types of articles you read, the more you will also discover a tradeoff between quickly accepting lots of connections and defending against various types of DOS attacks. It seems a balance has to be drawn.

Node / HAPI: Too many simultaneous connections causes network connections to be reset

A Node server uses HAPI to create its REST api. A growing number of connections is targetting this system until the point where more & more connections are "reset", causing an error on the client side.
First attempt was to look for synchronous ("blocking"?) request handlers, but everything requiring filsystem / external systems access is already made asynchronous.
My best guess is that Node (or HAPI) is designed in a way that requests are assigned to a socket (?) and until the response is sent, that socket is "blocked". There is perhaps a maximum open socket count. I'm looking for clues... thanks in advance for your suggestions!

maximum reasonable timeout for a synchronous HTTP request

This applies to non-user facing backend applications communicating with each other through HTTP. I'm wondering if there is a guideline for a maximum timeout for a synchronous HTTP request. For example, let's say a request can take up to 10 minutes to complete. Can I simply create a worker thread on the client and, in the worker thread, invoke the request synchronously? Or should I implement the request asynchronously, to return HTTP 202 Accepted and spin off a worker thread on the server side to complete the request and figure out a way to send the results back, presumable through a messaging framework?
One of my concerns is it safe to keep an socket open for an extended period of time?
How long a socket connection can remain open (without activity) depends on the (quality of the) network infrastructure.
A client HTTP request waiting for an answer from a server results in an open socket connection without any data going through that connection for a while. A proxy server might decide to close such inactive connections after 5 minutes. Similarly, a firewall can decide to close connections that are open for more than 30 minutes, active or not.
But since you are in the backend, these cases can be tested (just let the server thread handling the request sleep for a certain time before giving an answer). Once it is verified that socket connections are not closed by different network components, it is safe to rely on socket connections to remain open. Keep in mind though that network cables can be unplugged and servers can crash - you will always need a strategy to handle disruptions.
As for synchronous and asynchronous: both are feasable and both have advantages and disadvantages. But what is right for you depends on a whole lot more than just the reliability of socket connections.

Why is node.js only processing six requests at a time?

We have a node.js server which implements a REST API as a proxy to a central server which has a slightly different, and unfortunately asymmetric REST API.
Our client, which runs in various browsers, asks the node server to get the tasks from the central server. The node server gets a list of all the task ids from the central one and returns them to the client. The client then makes two REST API calls per id through the proxy.
As far as I can tell, this stuff is all done asynchronously. In the console log, it looks like this when I start the client:
Requested GET URL under /api/v1/tasks/*: /api/v1/tasks/
This takes a couple seconds to get the list from the central server. As soon as it gets the response, the server barfs this out very quickly:
Requested GET URL under /api/v1/tasks/id/:id :/api/v1/tasks/id/438
Requested GET URL under /api/v1/workflow/id/:id :/api/v1/workflow/id/438
Requested GET URL under /api/v1/tasks/id/:id :/api/v1/tasks/id/439
Requested GET URL under /api/v1/workflow/id/:id :/api/v1/workflow/id/439
Requested GET URL under /api/v1/tasks/id/:id :/api/v1/tasks/id/441
Requested GET URL under /api/v1/workflow/id/:id :/api/v1/workflow/id/441
Then, each time a pair of these requests gets a result from the central server, another two lines is barfed out very quickly.
So it seems our node.js server is only willing to have six requests out at a time.
There are no TCP connection limits imposed by Node itself. (The whole point is that it's highly concurrent and can handle thousands of simultaneous connections.) Your OS may limit TCP connections.
It's more likely that you're either hitting some kind of limitation of your backend server, or you're hitting the builtin HTTP library's connection limit, but it's hard to say without more details about that server or your Node implementation.
Node's built-in HTTP library (and obviously any libraries built on top of it, which are most) maintains a connection pool (via the Agent class) so that it can utilize HTTP keep-alives. This helps increase performance when you're running many requests to the same server: rather than opening a TCP connection, making a HTTP request, getting a response, closing the TCP connection, and repeating; new requests can be issued on reused TCP connections.
In node 0.10 and earlier, the HTTP Agent will only open 5 simultaneous connections to a single host by default. You can change this easily: (assuming you've required the HTTP module as http)
http.globalAgent.maxSockets = 20; // or whatever
node 0.12 sets the default maxSockets to Infinity.
You may want to keep some kind of connection limit in place. You don't want to completely overwhelm your backend server with hundreds of HTTP requests under a second – performance will most likely be worse than if you just let the Agent's connection pool do its thing, throttling requests so as to not overload your server. Your best bet will be to run some experiments to see what the optimal number of concurrent requests is in your situation.
However, if you really don't want connection pooling, you can simply bypass the pool entirely – sent agent to false in the request options:
http.get({host:'localhost', port:80, path:'/', agent:false}, callback);
In this case, there will be absolutely no limit on concurrent HTTP requests.
It's the limit on number of concurrent connections in the browser:
How many concurrent AJAX (XmlHttpRequest) requests are allowed in popular browsers?
I have upvoted the other answers, as they helped me diagnose the problem. The clue was that node's socket limit was 5, and I was getting 6 at a time. 6 is the limit in Chrome, which is what I was using to test the server.
How are you getting data from the central server? "Node does not limit connections" is not entirely accurate when making HTTP requests with the http module. Client requests made in this way use the http.globalAgent instance of http.Agent, and each http.Agent has a setting called maxSockets which determines how many sockets the agent can have open to any given host; this defaults to 5.
So, if you're using http.request or http.get (or a library that relies on those methods) to get data from your central server, you might try changing the value of http.globalAgent.maxSockets (or modify that setting on whatever instance of http.Agent you're using).
See:
http.Agent documentation
agent.maxSockets documentation
http.globalAgent documentation
Options you can pass to http.request, including an agent parameter to specify your own agent
Node js can handle thousands of incoming requests - yes!
But when it comes down to ougoing requests every request has to deal with a dns lookup and dns lookup's, disk reads etc are handled by the libuv which is programmed in C++. The default value of threads for each node process is 4x threads.
If all 4x threads are busy with https requests ( dns lookup's ) other requests will be queued. That is why no matter how brilliant your code might be : you sometimes get 6 or sometimes less concurrent outgoing requests per second completed.
Learn about dns cache to reduce the amount of dns look up's and increase libuv size. If you use PM2 to manage your node processes they do have a well documentation on their side on environment variables and how to inject them. What you are looking for is the environment variable UV_THREADPOOL_SIZE = 4
You can set the value anywhere between 1 or max limit of 1024. But keep in mind libuv limit of 1024 is across all event loops.
I have seen the same problem in my server. It was only processing 4 requests.
As explained already from 0.12 maxsockets defaults to infinity. That easily overwhelms the sever. Limiting the requests to say 10 by
http.globalAgent.maxSockets = 20;
solved my problem.
Are you sure it just returns the results to the client? Node processes everything in one thread. So if you do some fancy response parsing or anything else which doesn't yield, then it would block all your requests.

Resources