How to use Request js (Node js Module) pools

How to use Request js (Node js Module) pools - node.js

Can someone explain how to use the request.js pool hash?
The github notes say this about pools:
pool - A hash object containing the agents for these requests. If omitted this
request will use the global pool which is set to node's default maxSockets.
pool.maxSockets - Integer containing the maximum amount of sockets in the pool.
I have this code for writing to a CouchDB instance (note the question marks). Basically, any user who connects to my Node server will write to the DB independent of each other:
var request = require('request');
request({
//pool:, // ??????????????????
'pool.maxSockets' : 100, // ??????????????????
'method' : 'PUT',
'timeout' : 4000,
'strictSSL' : true,
'auth' : {
'username' : myUsername,
'password' : myPassword
},
'headers' : {
'Content-Type': 'application/json;charset=utf-8',
'Content-Length': myData.length
},
'json' : myData,
'url': myURL
}, function (error, response, body){
if (error == null) {
log('Success: ' + body);
}
else {
log('Error: ' + error);
}
});
What's best for high throughput/performance?
What are the drawbacks of a high 'maxSockets' number?
How do I create a separate pool to use instead of the global pool? Why do I only want to create a separate pool?

The pool option in request uses agent which is same as http.Agent from standard http library. See the documentation for http.Agent and see the agent options in http.request.
Usage
pool = new http.Agent(); //Your pool/agent
http.request({hostname:'localhost', port:80, path:'/', agent:pool});
request({url:"http://www.google.com", pool:pool });
If you are curious to know what is that you can see it from console.
{ domain: null,
_events: { free: [Function] },
_maxListeners: 10,
options: {},
requests: {},
sockets: {},
maxSockets: 5,
createConnection: [Function] }
The maxSockets determines how many concurrent sockets the agent can have open per host, is present in an agent by default with value 5. Typically you would set it before. Passing pool.maxSockets explicitly would override the maxSockets property in pool. This option only makes sense if passing pool option.
So different ways to use it :
Don't give agent option, will be undefined will use http.globalAgent. The default case.
Give it as false, will disable pooling.
Provide your own agent, like above example.
Answering your questions in reverse.
Pool is meant to keep certain number of sockets to be used by the program. Firstly the sockets are reused for different requests. So it reduces overhead of creating new sockets. Secondly it uses fewer sockets for requests, but consistently. It will not take up all sockets available. Thirdly it maintains queue of requests. So there is waiting time implied.
Pool acts like both a cache and a throttle. The throttle effect will be more visible if you have more requests and lesser sockets. When using global pool it may limit functioning of two different clients, there are no guarantees on waiting time. Having separate pool for them will be fairer to both (think if one requests more than other).
The maxSockets property gives maximum concurrency possible. It increases the overall throughput/performance. Drawback is throttle effect is reduced. You cannot control peak overhead. Setting it to large number, will be like no pooling at all. You would start getting errors like socket not available. It cannot be more than the allowed maximum limit set by the OS.
So what is best for high throughput/performance? There is a physical limit in throughput. If you reach the limit, response time will increase with number of connections. You can keep increasing maxSockets till then, but after that increasing it will not help.

You should take a look at the forever-agent module, which is a wrapper to http.Agent.
Generally the pool is a hash object that contains a number of http agent. it tries to reuse created sockets from "keep-alive" connection. per host:port. For example, you performed several requests to host www.domain1.com:80 and www.domain2.com:80, if any of response contains no header Connection: close, it will put the socket in pool and give it to pending requests.
If no pending requests need this pooled socket, it will be destroyed.
The maxSockets means the max concurrent sockets for a single host:port, the default value is 5. I would suggest thinking of this value with your scenario together:
According to those hot sites requests visit, you'd better create separate pool. so that new requests can pick up idle sockets very fast. the point is, you need to reduce the number of pending requests to certain sites by increasing maxSockets value of a pool. Notice that it doesn't matter if you set a very high number to maxSockets when the connection is well managed by the origin server via response header Connection: close.
According to those sites that your requests hardly visit, use pool: false to disable pool.
You can use this way to specify separate pool for your request:
// create a separate socket pool with 10 concurrent sockets as max value.
var separateReqPool = {maxSockets: 10};
var request = require('request');
request({url: 'http://localhost:8080/', pool: separateReqPool}, function(e, resp){
});

Related

Reuse socket after connect fails in node

I need to reuse socket for two connect calls made using http.request. I tried passing custom agent limiting number of sockets but the first socket is removed before the 2nd connect call is made by code:
https://github.com/nodejs/node/blob/master/lib/_http_client.js#L438
mock code:
var options = {
method: 'CONNECT', agent: new http.Agent({ keepAlive: true, maxSockets: 1 })
};
var request = this.httpModule.request(options);
request.on('connect', (res, sock, head) => {
console.log(sock.address());
// some processing...
var request2 = this.httpModule.request(options);
request2.on('connect', (res, sock, head) => {
console.log(sock.address());
});
request2.end();
});
request.end();
Is there some way by which I can reuse the same socket for two connect calls?

The two unique sockets are required for this form of communication.
Each socket in this case represents a connection between a client and a server. There is no such socket that represents n clients and one server, so to speak. They also don't act like "threads" here, where one socket can perform work for many clients.
By setting the max sockets to 1, you've requested that only 1 client connection be active at any time. When you try to connect that second client, it kills the first one because the max is reached and we need room for a new connection!
If you want to recycle sockets -- For example, a client connects, refreshes the page after an hour, and the same client triggers another connection -- There's probably not a way to do it this high in the technology stack, and it would be far more complicated and unnecessary than destroying the old socket to make way for a new one anyway. If you don't understand why you would or wouldn't need to do this, you don't need to do it.
If you want to send a message to many clients (and you wanted to accomplish it "under one socket" in your question), consider using the broadcast and emit methods.

Nodejs Keep-alive with Unix Domain sockets

Keep-alive works just fine over TCP. But Unix Domain Sockets gives me weird behavior. If I send a couple thousand requests like this:
request.post('http://unix:/tmp/http.sock:/check', {
json: {
...
},
forever: true,
pool: {maxSockets: 10},
headers: {
'Host': '',
'Connection': 'keep-alive'
})
a kernel trace will show 2000 sockets being created (and never closed), one for each request. I'd expect only 10 sockets to be created and reused as necessary.
Is there a way to set things up so unix sockets are kept alive and reused the same way TCP sockets are?

From the request documentation:
Note that if you are sending multiple requests in a loop and creating multiple new pool objects, maxSockets will not work as intended. To work around this, either use request.defaults with your pool options or create the pool object with the maxSockets property outside of the loop.
So it seems like you need to create the pool object outside the loop in order for sockets to be reused as you expect.

This behavior is broken in Node prior to version v8.7.0. A commit by user bengl fixing keep-alive for Unix domain sockets was put into the v8.7.0 build. That build was released about 6 days ago.

How to get a count of the current open sockets in Node?

I am using the request module to crawl a list of URLs and would like to
limit the number of open sockets to 2:
var req = request.defaults({
forever: true,
pool: {maxSockets: 1}
});
req(options, function(error, response, body) {
... code ...
done();
however, when looping over an array of URLs and issuing a new request to each - that does not seem to work.
is there a way to get the current number of open sockets to test it?

I believe that maxSockets maps to http.Agent.maxSockets, which limits the number of concurrent requests to the same origin (host:port).
This comment, from the developer of request, suggests the same:
actually, pooling controls the agent passed to core. each agent holds all hosts and throttles the maxSockets per host
In other words, you can't use it to limit the number of concurrent requests in general. For that, you need to use an external solution, for instance using limiter or async.queue.

Throttling event-driven Nodejs HTTP requests

I have a Node net.Server that listens to a legacy system on a TCP socket. When a message is received, it sends an http request to another http server. Simplified, it looks like this:
var request = require('request-promise');
...
socket.on('readable', function () {
var msg = parse(socket.read());
var postOptions = {
uri: 'http://example.com/go',
method: 'POST',
json: msg,
headers: {
'Content-Type': 'application/json'
}
};
request(postOptions);
})
The problem is that the socket is readable about 1000 times per second. The requests then overload the http server. Almost immediately, we get multiple-second response times.
In running Apache benchmark, it's clear that the http server can handle well over 1000 requests per second in under 100ms response time - if we limit the number of concurrent requests to about 100.
So my question is, what is the best way to limit the concurrent requests outstanding using the request-promise (by extension, request, and core.http.request) library when each request is fired separately within an event callback?
Request's documentation says:
Note that if you are sending multiple requests in a loop and creating multiple new pool objects, maxSockets will not work as intended. To work around this, either use request.defaults with your pool options or create the pool object with the maxSockets property outside of the loop.
I'm pretty sure that this paragraph is telling me the answer to my problem, but I can't make sense of it. I've using defaults to limit the number of sockets open:
var rp = require('request-promise');
var request = rp.defaults({pool: {maxSockets: 50}});
Which doesn't help. My only thought at the moment is to manually manage a queue, but I expect that would be unnecessary if I only knew the conventional way to do it.

Well you need to throttle your request right? I have workaround this in two ways, but let me show you one patter I always use. I often use throttle-exec and Promise to make wrapper for request. You could install it with npm install throttle-exec and use Promise natively or third-party. Here is my gist for this wrapper https://gist.github.com/ans-4175/d7faec67dc6374803bbc
How do you use it? It's simple, just like ordinary request.
var Request = require("./Request")
Request({
url:url_endpoint,
json:param,
method:'POST'
})
.then(function(result){
console.log(result)
})
.catch(reject)
Tell me after you implement it. Either way I have another wrapper :)

nodejs, control over concurrent connections

I am using nodejs async module for concurrent connectios, Now my backend server only can handle 1000 connections at a time, I am using async.mapLimit to limit the connections, each and every job of async.mapLimit does multiple connections, when I am sending the same request which does async.mapLimit from multiple browser at the same time, then I am getting EMFILE error from Server side
([Error: connect EMFILE] code: 'EMFILE', errno: 'EMFILE', syscall: 'connect'),
My code somewhat looks like this:
async.mapLimit(jobList, 200, jobCallback, function(error, data) {
});
function jobCallback(job, callback) {
/* Make multiple connections to to backend server, this number is
dynamic, here also I use async.mapLimit */
}
Now I want to implement some wrapper function top of this mapLimit or anything, irrespective of number of parallel requests, I want to limit the concurrent connections, even irrespective of number of client calls also, it may be slower, but I do not bother.
How I can achieve this?
I am using restler library. I have tried to set
proto.globalAgent.maxSockets = 1000
to do concurrent 1000 connections at a time, but it seems it is not working.
Please advise.
-M-

You will have to control for throttling yourself, as that async instruction won't know if you have calls from other users adding to the 1000 limit.
In REST services, a service would typically send an http 429 response when such a limit is triggered, allowing your app to identify a bottleneck scenario and trigger a throttling mechanism.
A common way to do that is via exponential backoff
https://developers.google.com/api-client-library/java/google-http-java-client/backoff

I use the following line of code to manage the limit globally:
require('events').EventEmitter.prototype._maxListeners = 1000;
Thanks

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string