Node.js 10000 concurrent HTTP requests every 10 seconds - node.js

I have a use case where I need to make more than 10,000 external HTTP requests(to one API) in an infinite loop every 10 seconds. This API takes anywhere from 200-800ms to respond.
Goals:
Call an external API more than 10,000 times in 10 seconds
Have a failproof polling system that can run for months at a time without failing
Attempts:
I have attempted to use the Async library and limit requests to 300 concurrent calls(higher numbers fail quickly) but after about 300,000 requests I run into errors where I receive a connection refused(I am calling a local Node server on another port). I am running the local server to mimic the real API because our application has not scaled to more than 10,000 users yet and the external API requires 1 unique user per request. The local response server is a very simple Node server that has one route that waits between 200-800ms to respond.
Am I getting this error because I am calling a server running locally on my machine or is it because Node is having an issue handling this many requests? Any insight into the best way to perform this type of polling would be appreciated.
Edit: My question is about how to create a client that can send more than 10,000 requests in a 10 second interval.
Edit2:
Client:
//arr contains 10,000 tokens
const makeRequests = arr => {
setInterval(() => {
async.eachLimit(arr, 300, (token, cb) => {
axios.get(`http://localhost:3001/tokens/${token}`)
.then(res => {
//do something
cb();
})
.catch(err => {
//handle error
cb();
})
})
}, 10000);
}
Dummy Server:
const getRandomArbitrary = (min, max) => {
return Math.random() * (max - min) + min;
}
app.get('/tokens:token', (req, res) => {
setTimeout(() => {
res.send('OK');
}, getRandomArbitrary(200, 800))
});

Related

How to space out (rate-limiting) outgoing axios requests originating from an express app handling requests received from a webhook?

here is my issue :
I built a Node express app that handles incoming requests from a webhook that sometimes sends dozens of packages in one second. Once the request has been processed I need to make an API POST request using Axios with the transformed data.
Unfortunetely this API has a rate limit of 2 request per second.
I am looking for a way to build some kind of queuing system that accepts every incoming requests, and send the outgoing request at a limited rate of 2 request per seconds.
I tried adding a delay with SetTimeout, but it only delayed the load => when hundreds of requests were receieved, the handling of each of them was delayed by 10 seconds, but they were still being sent out at nearly the same time, just 10 seconds later.
I was thinking of trying to log the time of each outgoing request, and only send a new outgoing request if (time - timeOfLastRequest > 500ms) but I'm pretty sure this is not the right way to handle this.
Here is a very basic version of the code to illustrate my issue :
// API1 SOMETIMES RECEIVES DOZENS OF API CALLS PER SECOND
app.post("/api1", async (req, res) => {
const data = req.body.object;
const transformedData = await transformData(data);
// API2 ONLY ACCEPTS 2 REQUEST PER SECOND
const resp = await sendToApi2WithAxios(transformedData);
})
Save this code with data.js file
you can replace the get call with your post call.
import axios from 'axios'
const delay = time => new Promise(res=>setTimeout(res,time));
const delayCall = async () => {
try {
let [seconds, nanoseconds] = process.hrtime()
console.log('Time is: ' + (seconds + nanoseconds/1000000000) + ' secs')
let resp = await axios.get(
'https://worldtimeapi.org/api/timezone/Europe/Paris'
);
console.log(JSON.stringify(resp.data.datetime, null, 4));
await delay(501);
[seconds, nanoseconds] = process.hrtime()
console.log('Time is: ' + (seconds + nanoseconds/1000000000) + ' secs')
resp = await axios.get(
'https://worldtimeapi.org/api/timezone/Europe/Paris'
);
console.log(JSON.stringify(resp.data.datetime, null, 4))
} catch (err) {
console.error(err)
}
};
delayCall()
In packages.json
{
"dependencies": {
"axios": "^1.2.1"
},
"type": "module"
}
Install and run it from terminal
npm install axios
node data.js
Result - it will guaranty more than 501 msec API call
$ node data.js
Time is: 3074.690104402 secs
"2022-12-07T18:21:41.093291+01:00"
Time is: 3077.166384501 secs
"2022-12-07T18:21:43.411450+01:00"
So your code
const delay = time => new Promise(res=>setTimeout(res,time));
// API1 SOMETIMES RECEIVES DOZENS OF API CALLS PER SECOND
app.post("/api1", async (req, res) => {
const data = req.body.object;
const transformedData = await transformData(data);
await delay(501);
// API2 ONLY ACCEPTS 2 REQUEST PER SECOND
const resp = await sendToApi2WithAxios(transformedData);
})

First time fetching API call is really slow - Node JS, React, Express

I have a website that is making a GET call to my backend database. The first time the website calls the API it takes 11 seconds, clearly, this is too long. But if I refresh the page and give it another go, it is super fast in less than a second.
I tried to find some solutions, so I opened DevTools and found this:
For some reason, the TTFB (Time to First Byte) takes 10 seconds!
How can I reduce the TTFB the first time the website calls the Rest API?
Here is my React code which is using Axios to fetch the rest API:
axios
.get(
"https://MY.API",
{
headers,
}
)
.then((response) => {
this.setState({
response: response.data,
});
})
.catch((err) => {
console.error(err);
});
Here is my backend code using Express, Mongoose, and MongoDB
router.get("/", async (req, res) => {
try {
const response = await Model.find();
res.json(response);
} catch (err) {
res.json({ message: err });
}
});
I would say that this is a pretty standard piece of code. I don't know why the TTFB is so much.
What tips I can implement to reduce the original wait time? This is annoying to my users.
Thanks!

Apache Benchmark - real concurrency level - issue

Recently I've created simple node.js script to validate concurency of some long lasting DB operations. Basic idea of is to receive web request, than wait around 10 seconds and return result (code below).
/*jshint esversion: 6 */
const express = require('express');
const app = express();
var counter = 0;
app.get('/test', (req, res) => {
counter++;
console.log(counter);
// simulate long db query
setTimeout(() => {
res.send();
}, 10000);
});
app.listen(80, () => console.log('Example app listening on port 80!'));
Apache Benchmar tool was used to start initial test - by executing command with following parameters:
ab -c 20 -n 200 -v 3 http://localhost/test.
The problem is, that presented scripts shows only one !!! connection to be send by Apache Benchmark.
After deeper investigation I observed, that first request send by Apache Bechmark seems to be a kind of service status checker and is not executed concurrently. Request concurrency is enabled after first request. To illustrate this I prepared slightly modified version of the code, which just simply responds immediatelly to first request.
app.get('/test', (req, res) => {
counter++;
console.log(counter);
if (counter == 1) {
res.send();
return;
}
// simulate long db query
setTimeout(() => {
res.send();
}, 10000);
});
Modified version shows apache benchmark concurrency level to be as expected. Of course this testing approach is rather counter productive, do you know how to disable this unexpected bahaviour of Apache Benchmark ?

model.find({}) is not responding when I give the timeout as 5 minutes but it responds when the timeout is of 1 minute

i am working upon mongoose to list all the data from a collection in mongodb but i want the response to come after 5 minutes but it is not respondin when timeout value is 5 minute but it responds when timeout is 1 minute
router.get(routeIdentifier+'/list/:id', function(req, res, next) {
model.find({}, function (err, objects) {
setTimeout(function(){
if (err) return res.send(err);
objects.push({id:req.params.id})
return res.json(objects);
},300000)
});
})
;
As I can sense, it is mainly due to the HTTP server timeout. It times out before sending the response from the database model .
The default timeout is 2 minutes as per the documentation here: https://nodejs.org/dist/latest-v9.x/docs/api/http.html#http_server_settimeout_msecs_callback
Please note that ExpressJS sits on the top of built-in NodeJS HTTP server.
Try the following:
let server = http.createServer( expressApp );
server.setTimeout( 300000 );
This will initialize the HTTP-Server timeout to be 5 minutes.

Nodejs: Async request with a list of URL

I am working on a crawler. I have a list of URL need to be requested. There are several hundreds of request at the same time if I don't set it to be async. I am afraid that it would explode my bandwidth or produce to much network access to the target website. What should I do?
Here is what I am doing:
urlList.forEach((url, index) => {
console.log('Fetching ' + url);
request(url, function(error, response, body) {
//do sth for body
});
});
I want one request is called after one request is completed.
You can use something like Promise library e.g. snippet
const Promise = require("bluebird");
const axios = require("axios");
//Axios wrapper for error handling
const axios_wrapper = (options) => {
return axios(...options)
.then((r) => {
return Promise.resolve({
data: r.data,
error: null,
});
})
.catch((e) => {
return Promise.resolve({
data: null,
error: e.response ? e.response.data : e,
});
});
};
Promise.map(
urls,
(k) => {
return axios_wrapper({
method: "GET",
url: k,
});
},
{ concurrency: 1 } // Here 1 represents how many requests you want to run in parallel
)
.then((r) => {
console.log(r);
//Here r will be an array of objects like {data: [{}], error: null}, where if the request was successfull it will have data value present otherwise error value will be non-null
})
.catch((e) => {
console.error(e);
});
The things you need to watch for are:
Whether the target site has rate limiting and you may be blocked from access if you try to request too much too fast?
How many simultaneous requests the target site can handle without degrading its performance?
How much bandwidth your server has on its end of things?
How many simultaneous requests your own server can have in flight and process without causing excess memory usage or a pegged CPU.
In general, the scheme for managing all this is to create a way to tune how many requests you launch. There are many different ways to control this by number of simultaneous requests, number of requests per second, amount of data used, etc...
The simplest way to start would be to just control how many simultaneous requests you make. That can be done like this:
function runRequests(arrayOfData, maxInFlight, fn) {
return new Promise((resolve, reject) => {
let index = 0;
let inFlight = 0;
function next() {
while (inFlight < maxInFlight && index < arrayOfData.length) {
++inFlight;
fn(arrayOfData[index++]).then(result => {
--inFlight;
next();
}).catch(err => {
--inFlight;
console.log(err);
// purposely eat the error and let the rest of the processing continue
// if you want to stop further processing, you can call reject() here
next();
});
}
if (inFlight === 0) {
// all done
resolve();
}
}
next();
});
}
And, then you would use that like this:
const rp = require('request-promise');
// run the whole urlList, no more than 10 at a time
runRequests(urlList, 10, function(url) {
return rp(url).then(function(data) {
// process fetched data here for one url
}).catch(function(err) {
console.log(url, err);
});
}).then(function() {
// all requests done here
});
This can be made as sophisticated as you want by adding a time element to it (no more than N requests per second) or even a bandwidth element to it.
I want one request is called after one request is completed.
That's a very slow way to do things. If you really want that, then you can just pass a 1 for the maxInFlight parameter to the above function, but typically, things would work a lot faster and not cause problems by allowing somewhere between 5 and 50 simultaneous requests. Only testing would tell you where the sweet spot is for your particular target sites and your particular server infrastructure and amount of processing you need to do on the results.
you can use set timeout function to process all request within loop. for that you must know maximum time to process a request.

Resources