How to optimize HTTP polling in NodeJS? - node.js

I'm developing a NodeJS service that continuously polls a REST API to get state changes of a large number of resources in near real-time. No other protocol is available.
The polling interval is 5 seconds. The number of resources is usually between 100-500, so let's consider:
Usually it does 50 HTTP requests per second
Response often takes more than 5 seconds so requests for the same request may overlap
Often the load average gets to over 8.00 (in a 8 core vm) and/or the app crashes.
I want to ensure that the resource usage is as minimal as possible while handling this workload.
Here's what I've done:
HTTP2 is available (but unsupported by axios?)
Set process.env.UV_THREADPOOL_SIZE = 8
Use the axios library to make async requests
Reusing the same axios instance for all requests
Use an HTTPSAgent with keep-alive
The relevant code:
'use strict'
process.env.UV_THREADPOOL_SIZE = 8
const axios = axios.create({
baseURL: 'https://api.example.com',
httpsAgent: new HTTPS.Agent({
keepAlive: true,
//maxSockets: 256,
//maxFreeSockets: 128,
scheduling: 'fifo',
timeout: 1000 * 15
}),
timeout: 1000 * 15,
})
loadResource(id){
return axios.request({
method: 'GET',
url: `/resource/${id}`
})
.catch((err) => {
process.logger.error('Error with request (...)')
})
.then((res) => {
handle(res)
})
}
for(let id of [1,2,3,4,(...)]){
setInterval(() => loadResource(id), 5000)
}
handle(res){...}
What I'm considering:
Waiting for a response before making the new request (and looking for the best approach to do this)
What else can be done to handle the requests in an optimal way?

Related

Continuously hitting the GitHub secondary rate limit even after following the best practices?

In my application I am making authenticated requests to the GitHub search API with a token. I am making a request every 2s to stay within the primary rate limit of 30 reqs per minute (so not concurrently) and I am also validating every request with the GitHub rate limit API before I make the actual search API call.
Even in the rare case of accidental concurrent requests, they are not likely to be for the same token.
I seem to be following all the rules mentioned in the Primary and Secondary best practices documentation. Despite this my application keeps getting secondary rate limited and I have no idea why. Could anyone help me with why this may be happening?
EDIT:
Sample code:
const search = async function(query, token) {
var limitResponse;
try {
limitResponse = JSON.parse(await rp({
uri: "https://api.github.com/rate_limit",
headers: {
'User-Agent': 'Request-Promise',
'Authorization': 'token ' + token
},
timeout: 20000
}));
} catch (e) {
logger.error("error while fetching rate limit from github", token);
throw new Error(Codes.INTERNAL_SERVER_ERROR);
}
if (limitResponse.resources.search.remaining === 0) {
logger.error("github rate limit reached to zero");
throw new Error(Codes.INTERNAL_SERVER_ERROR);
}
try {
var result = JSON.parse(await rp({
uri: "https://api.github.com/search/code",
qs: {
q: query,
page: 1,
per_page: 50
},
headers: {
'User-Agent': 'Request-Promise',
'Authorization': 'token ' + token
},
timeout: 20000
}));
logger.info("successfully fetched data from github", token);
/// process response
} catch (e) {
logger.error("error while fetching data from github" token);
throw new Error(Codes.INTERNAL_SERVER_ERROR);
}
};
Sample Architecture:
A query string (from a list of query strings) and the appropriate token to make the API call with is inserted into a rabbitmq x-delayed queue, with a delay of index*2000s per message (hence they are spaced out by 2s) and the function above is the consumer for that queue. When the consumer throws an error, the message is nack'd and sent to a dead letter queue.
const { delayBetweenMessages } = require('../rmq/queue_registry').GITHUB_SEARCH;
await __.asyncForEach(queries, async (query, index) => {
await rmqManager.publish(require('../rmq/queue_registry').GITHUB_SEARCH, query, {
headers: { 'x-delay': index * delayBetweenMessages }
})
})
Looks like there is not an issue in your code. I was just surfing from my browser and was using github search bar, and I hit secondary rate limit from browser by just surfing. So, looks like search API is internally using concurrency. So, might be it is github's own bug.
You hardcoded a sleep time of 2s, but, according to the documentation, when you trigger the secondary api rate limit you have to wait a time same as the one indicated in the Retry-After attribute of the response header.

Performance of Cloud Functions for Firebase with outbound networking

When I run the following code in Cloud Functions, it takes more than 2 seconds.
When I run it locally, it takes about 600 milliseconds.
What are the possible causes?
import * as functions from 'firebase-functions'
import axios from 'axios'
const headers = { 'accept': 'application/json', 'x-access-key': '...', 'x-access-secret': '...' }
exports.functionName = functions.https.onRequest(async (req, res) => {
try {
console.log('request 1 start')
const response1 = await axios.get(`https://api.sample.com/users/${req.body.userId}`, { headers })
console.log('request 1 completed')
const response2 = await axios.post(`https://api.sample.com/contents1/${response1.data.id}`, {}, { headers })
console.log('request 2 completed')
const response3 = await axios.post(`https://api.sample.com/contents2/${response2.data.id}`, {}, { headers })
console.log('request 3 completed')
res.send(response3)
} catch (error) {
res.send(error)
}
})
Metrics
In Cloud, each asynchronous request (axios.get/post) is taking up to almost 1 second.
Hypothesis
It is inevitable that Cloud Functions with Outbound networking will take a long time
Cold start is not the cause (as the execution time does not decrease after the second execution).
What I tried
I think I tried all the methods described in the official Firebase documentation.
Minimum number of instances: I set it to "2" in the GCP console, but no improvement
Increase Memory allocated: I increased it to 1GB, but no improvement
Use global variables to reuse objects: in the above code, header object is the one
HTTP Keep-Alive: I wrote the following code, but no improvement
const httpAgent = new http.Agent({ keepAlive: true })
await axios.get(`https://api.sample.com/users/${req.body.userId}`, { headers, httpAgent })
Important thing to consider:
Where is your API located?
Where are you located?
Where are the functions located?
They could be deployed on the other side of the planet than you and your API is.
You can control where you want your functions deployed by specifying the datacenters.
https://firebase.google.com/docs/functions/locations
All our functions code looks like this:
const DEFAULT_FUNCTIONS_LOCATION = "us-east4";
const runtimeOpts: Record<string, "512MB" | any> = {
timeoutSeconds: 120,
memory: "512MB",
};
const getCustomAnalysis = functions
.region(DEFAULT_FUNCTIONS_LOCATION)
.runWith(runtimeOpts)
You can have functions deployed in different datacenters / in one project. Btw you will need to specify this when you call them.
We do have this in one of our projects. The whole project is deployed in EU (Because of legal). One function is in US - calling US API. Once the data are in the GCP function and they travel to another GCP datacenter, they utilize Google premium tier networking. But if you are not constrained, just deploy your whole project closes to your users / your API.
Also is your GCP project clean? You don't utilize any VPC networking?
Networking can be messy, they could be also "issue" between the GCP datacenter and your API datacenter.
As a tip for more testing: Try out different urls and measure the speed. But I don't think this is general CF issues.
Btw I have run into similar issue with retrieving larger amount of data from Firestore. However we noticed difference when we speed up functions (using more Memory give you more MHz).

Cannot subscribe to more than 6 ipfs pubsub channels with ipfs-http-client

I'm currently building a node application with ipfs-http-client.
I need to subscribe to several pubsub channels (~20).
When I'm connecting to the channels, I receive a 200 response for each subscribe, but only the 6 first subscribe are receiving the messages.
I isolated the problem in a little snippet with only:
Connect to the node (ipfs 0.4.23 and I tried with another one in 0.8)
Subscribe to 20 channels (with different names or the same channel with different handlers)
I always reproduce the problem (only connected to the 6 first subscribers)
I'm running my tests with node 14.16.0
When I look into the ipfs-http-client package, I can see that I actually have no response from the http request after the 6 first. Still, no error is reported.
achingbrain answers this question: https://github.com/ipfs/js-ipfs/issues/3741#issuecomment-898344489
In node the default agent used by the http client limits connections to 6 so it's consistent with the browser. You can configure your own agent to change this:
const { create } = require('ipfs-http-client');
const http = require('http')
async function echo(msg) {
console.log(`TopicID: ${msg.topicIDs[0]}, Msg: ${new TextDecoder().decode(msg.data)}`);
}
async function run() {
// connect to the default API address http://localhost:5001
const client = create({
agent: http.Agent({ keepAlive: true, maxSockets: Infinity })
});
console.log(await client.version());
for (let i = 0; i < 20; i++) {
await client.pubsub.subscribe(parseInt(i), echo);
}
}
run();

What is the default timeout for NPM request module (REST client)?

Following will be my node.js call to retrive some data, which is taking more than 1 minute. Here this will be timeout at 1 minute (60 seconds). I put a console log for the latency also. However I have configured the timeout for 120 seconds but it is not reflecting. I know the default level nodejs server timeout is 120 seconds but still I get the timeout (of 60 seconds) from this request module for this call. Please provide your insights on this.
var options = {
method: 'post',
url:url,
timeout: 120000,
json: true,
headers: {
"Content-Type": "application/json",
"X-Authorization": "abc",
"Accept-Encoding":"gzip"
}
}
var startTime = new Date();
request(options, function(e, r, body) {
var endTime = new Date();
var latencyTime = endTime - startTime;
console.log("Ended. latencyTime:"+latencyTime/1000);
res.status(200).send(body);
});
From the request options docs, scrolling down to the timeout entry:
timeout - Integer containing the number of milliseconds to wait for a server to send response headers (and start the response body) before aborting the request. Note that if the underlying TCP connection cannot be established, the OS-wide TCP connection timeout will overrule the timeout option (the default in Linux can be anywhere from 20-120 seconds).
Note the last part "if the underlying TCP connection cannot be established, the OS-wide TCP connection timeout will overrule the timeout option".
There is also an entire section on Timeouts. Based on that, and your code sample, we can modify the request sample as such
request(options, function(e, r, body) {
if (e.code === 'ETIMEDOUT' && e.connect === true){
// when there's a timeout and connect is true, we're meeting the
// conditions described for the timeout option where the OS governs
console.log('bummer');
}
});
If this is true, you'll need to decide if changing OS settings is possible and acceptable (this is beyond the scope of this answer and such a question would be better on Server Fault).

Node request queue backed up

TL;DR - Are there any best practices when configuring the globalAgent that allow for high throughput with a high volume of concurrent requests?
Here's our issue:
As far as I can tell, connection pools in Node are managed by the http module, which queues requests in a globalAgent object, which is global to the Node process. The number of requests pulled from the globalAgent queue at any given time is determined by the number of open socket connections, which is determined by the maxSockets property of globalAgent (defaults to 5).
When using "keep-alive" connections, I would expect that as soon as a request is resolved, the connection that handled the request would be available and can handle the next request in the globalAgent's queue.
Instead, however, it appears that each connection up to the max number is resolved before any additional queued requests are handled.
When watching networking traffic between components, we see that if maxSockets is 10, then 10 requests resolve successfully. Then there is a pause 3-5 second pause (presumably while new tcp connections are established), then 10 more requests resolve, then another pause, etc.
This seems wrong. Node is supposed to excel at handling a high volume of concurrent requests. So if, even with 1000 available socket connections, if request 1000 cannot be handled until 1-999 resolve, you'd hit a bottleneck. Yet I can't figure out what we're doing incorrectly.
Update
Here's an example of how we're making requests -- though it's worth noting that this behavior occurs whenever a node process makes an http request, including when that request is initiated by widely-used third-party libs. I don't believe it is specific to our implementation. Nevertheless...
class Client
constructor: (#endpoint, #options = {}) ->
#endpoint = #_cleanEndpoint(#endpoint)
throw new Error("Endpoint required") unless #endpoint && #endpoint.length > 0
_.defaults #options,
maxCacheItems: 1000
maxTokenCache: 60 * 10
clientId : null
bearerToken: null # If present will be added to the request header
headers: {}
#cache = {}
#cards = new CardMethods #
#lifeStreams = new LifeStreamMethods #
#actions = new ActionsMethods #
_cleanEndpoint: (endpoint) =>
return null unless endpoint
endpoint.replace /\/+$/, ""
_handleResult: (res, bodyBeforeJson, callback) =>
return callback new Error("Forbidden") if res.statusCode is 401 or res.statusCode is 403
body = null
if bodyBeforeJson and bodyBeforeJson.length > 0
try
body = JSON.parse(bodyBeforeJson)
catch e
return callback( new Error("Invalid Body Content"), bodyBeforeJson, res.statusCode)
return callback(new Error(if body then body.message else "Request failed.")) unless res.statusCode >= 200 && res.statusCode < 300
callback null, body, res.statusCode
_reqWithData: (method, path, params, data, headers = {}, actor, callback) =>
headers['Content-Type'] = 'application/json' if data
headers['Accept'] = 'application/json'
headers['authorization'] = "Bearer #{#options.bearerToken}" if #options.bearerToken
headers['X-ClientId'] = #options.clientId if #options.clientId
# Use method override (AWS ELB problems) unless told not to do so
if (not config.get('clients:useRealHTTPMethods')) and method not in ['POST', 'PUT']
headers['x-http-method-override'] = method
method = 'POST'
_.extend headers, #options.headers
uri = "#{#endpoint}#{path}"
#console.log "making #{method} request to #{uri} with headers", headers
request
uri: uri
headers: headers
body: if data then JSON.stringify data else null
method: method
timeout: 30*60*1000
, (err, res, body) =>
if err
err.status = if res && res.statusCode then res.statusCode else 503
return callback(err)
#_handleResult res, body, callback
To be honest, coffeescript isn't my strong point so can't comment really on the code.
However, I can give you some thoughts: in what we are working on, we use nano to connect to cloudant and we're seeing up to 200requests/s into cloudant from a micro AWS instance. So you are right, node should be up to it.
Try using request https://github.com/mikeal/request if you're not already. (I don't think it will make a difference, but nevertheless worth a try as that is what nano uses).
These are the areas I would look into:
The server doesn't deal well with multiple requests and throttles it. Have you run any performance tests against your server? If it can't handle the load for some reason or your requests are throttled in the os, then it doesn't matter what your client does.
Your client code has a long running function somewhere which prevents node processing any reponses you get back from the server. Perhaps 1 specific response causes a response callback to spend far too long.
Are the endpoints all different servers/hosts?

Resources