How to improve async node fetch with a synchronous internet connection? - node.js

I have a fast synchronous fibre connection at home. The speed is great for streams and large packages. However multiple async node fetch are very slow due to the connection overhead. I never have more than 2 async fetch from my localhost. With the connection overhead this takes roughly 1 second for every two fetches. I need more than a minute to process about 100 fetches async.
Via my 4g phone as hotspot this takes less than 2 seconds.
Is there any way to bundle fetches for synchronous internet connections?
I run this test case with node 14.
const fetch = require('node-fetch')
const promises = []
for(var i = 0; i < 100; i++) {
promises.push(fetch('https://geolytix.github.io/public/geolytix.svg'))
}
console.time('promise all')
Promise
.all(promises)
.then(arr => {
console.log(arr.length)
console.timeEnd('promise all')
})
.catch(error => {
console.error(error)
})
10 fetch over 4g take 0.2 seconds, 100 take 1 second.
Over my gigabit line 10 fetch requests take 4 seconds, and 100 take 50 seconds.
The bevaviour with axios.get() is exactly the same.

I was able to resolve this by using a custom user agent for node-fetch. The custom user agent keeps alive and has 1 maxSockets. Increasing the maxSockets will affect the performance with synchronous internet connections.
const https = require('https');
const httpsAgent = new https.Agent({
keepAlive: true,
maxSockets: 1
})
const options = {
agent: httpsAgent
}
const getPromise = () => new Promise(resolve=>{
fetch('https://geolytix.github.io/public/geolytix.svg', options)
.then(response => response.text())
//.then(text => console.log(text))
.then(() => resolve())
})

Related

How to space out (rate-limiting) outgoing axios requests originating from an express app handling requests received from a webhook?

here is my issue :
I built a Node express app that handles incoming requests from a webhook that sometimes sends dozens of packages in one second. Once the request has been processed I need to make an API POST request using Axios with the transformed data.
Unfortunetely this API has a rate limit of 2 request per second.
I am looking for a way to build some kind of queuing system that accepts every incoming requests, and send the outgoing request at a limited rate of 2 request per seconds.
I tried adding a delay with SetTimeout, but it only delayed the load => when hundreds of requests were receieved, the handling of each of them was delayed by 10 seconds, but they were still being sent out at nearly the same time, just 10 seconds later.
I was thinking of trying to log the time of each outgoing request, and only send a new outgoing request if (time - timeOfLastRequest > 500ms) but I'm pretty sure this is not the right way to handle this.
Here is a very basic version of the code to illustrate my issue :
// API1 SOMETIMES RECEIVES DOZENS OF API CALLS PER SECOND
app.post("/api1", async (req, res) => {
const data = req.body.object;
const transformedData = await transformData(data);
// API2 ONLY ACCEPTS 2 REQUEST PER SECOND
const resp = await sendToApi2WithAxios(transformedData);
})
Save this code with data.js file
you can replace the get call with your post call.
import axios from 'axios'
const delay = time => new Promise(res=>setTimeout(res,time));
const delayCall = async () => {
try {
let [seconds, nanoseconds] = process.hrtime()
console.log('Time is: ' + (seconds + nanoseconds/1000000000) + ' secs')
let resp = await axios.get(
'https://worldtimeapi.org/api/timezone/Europe/Paris'
);
console.log(JSON.stringify(resp.data.datetime, null, 4));
await delay(501);
[seconds, nanoseconds] = process.hrtime()
console.log('Time is: ' + (seconds + nanoseconds/1000000000) + ' secs')
resp = await axios.get(
'https://worldtimeapi.org/api/timezone/Europe/Paris'
);
console.log(JSON.stringify(resp.data.datetime, null, 4))
} catch (err) {
console.error(err)
}
};
delayCall()
In packages.json
{
"dependencies": {
"axios": "^1.2.1"
},
"type": "module"
}
Install and run it from terminal
npm install axios
node data.js
Result - it will guaranty more than 501 msec API call
$ node data.js
Time is: 3074.690104402 secs
"2022-12-07T18:21:41.093291+01:00"
Time is: 3077.166384501 secs
"2022-12-07T18:21:43.411450+01:00"
So your code
const delay = time => new Promise(res=>setTimeout(res,time));
// API1 SOMETIMES RECEIVES DOZENS OF API CALLS PER SECOND
app.post("/api1", async (req, res) => {
const data = req.body.object;
const transformedData = await transformData(data);
await delay(501);
// API2 ONLY ACCEPTS 2 REQUEST PER SECOND
const resp = await sendToApi2WithAxios(transformedData);
})

Increase Headers Timeout in express

in order to access a Swagger UI based API I wrote some code.
app.get('/getData', async (req, res)=>{
token = await getToken().then(res =>{return res})
async function getData() {
return fetch(dataurl, {
method: 'GET',
headers: {
accept: 'application/json;charset=UTF-8',
authorization: 'Bearer ' + token.access_token
}
})
.then(res => res.json())
.catch(error => console.error('Error:', error));
}
const result = await getData().then(res =>{return res})
res.json(result)
})
The issue I have is that some requests will take about 10 minutes to finish since the data that gets accessed is very large and it just takes that time. I can't change that.
But after exactly 300 seconds I get "Headers Timeout Error" (UND_ERR_HEADERS_TIMEOUT).
I'm not sure where the 300 seconds come from. On the Swagger UI API the time is set to 600 seconds.
I think it's the standard timeout from express / NodeJS.
const port = 3000
const server = app.listen(port,()=>{ console.log('Server started')})
server.requestTimeout = 610000
server.headersTimeout = 610000
server.keepAliveTimeout = 600000
server.timeout = 600000
As you can see tried to increase all timeouts for express to about 600 seconds but nothing changes.
I also changed the network.http.response.timeout in Firefox to 600 seconds.
But still after 300 seconds I get "Headers Timeout Error".
Can anybody help me where and how I can increase the timeout for the request to go through?
Have you tried using the connect-timeout library?
npm install connect-timeout
//...
var timeout = require('connect-timeout');
app.use(timeout('600s'));
Read more here: https://www.npmjs.com/package/connect-timeout#examples

Firebase-admin nodejs SDK doens't connect

I'm attempt to connect to firebase/firestone using the nodejs SDK,however I it doesn't connect. I've attempted to connect multiple times, using setInterval but nothing works.
First, I initialize the the firebase using the credentials and the databaseURL, after this I get the databaseRef, and in the end I attempt to write to the database.
I've checked the ./info/connected on setInterval with timeout of 1000ms and mocha --timeout flag to 5000ms, and always marks as offline.
I've checked the credentials, when is a wrong credential or config json, they give an JSON parse error message(cause I have several storage instances, each connected according to a flag spawned during the execution time).
I'm using the TDD approach on my application, so, I have to mock the entire database and check against the resulted values of each operation. I've wrote a controller for the task of handling the firebase/firestone work, but I if I can't connected it has no use.
The code goes here:
const analyticsFirebaseMock = admin.initializeApp({
credentials: admin.credential.cert(analyticsCredentials),
databaseURL: process.env.ANALYTICS_FIREBASE_URL
}, 'analyticsMock')
const analyticsDbRef = analyticsFirebaseMock.database()
beforeEach(() => {})
afterEach(() => sinon.restore())
describe('POST - /analytics', () => {
it('should save the analytics data for new year', async (done) => {
const itens = 1
const price = 599.00
setInterval(() => {
clockAnalyticsDbRef.ref(`.info/connected`).once('value', (value) => {
if (value.val() === true) console.log('connected')
else console.error('offline')
})
}, 1000)
await analytics.updateAnalytics(user, itens, price)
await analyticsDbRef.ref(`${user}`).once('value', (value) => {
expect(R.view(userLens, value)).to.be.equals(user)
done()
})
})
})
In the above code, I use async/await on analyticsDbRef cause of the asynchronous characteristic of the js. Call the controller, await the query result, conclude with done. The test fails with timeout, expecting done to be called.
What could I doing wrong?

Node.js 10000 concurrent HTTP requests every 10 seconds

I have a use case where I need to make more than 10,000 external HTTP requests(to one API) in an infinite loop every 10 seconds. This API takes anywhere from 200-800ms to respond.
Goals:
Call an external API more than 10,000 times in 10 seconds
Have a failproof polling system that can run for months at a time without failing
Attempts:
I have attempted to use the Async library and limit requests to 300 concurrent calls(higher numbers fail quickly) but after about 300,000 requests I run into errors where I receive a connection refused(I am calling a local Node server on another port). I am running the local server to mimic the real API because our application has not scaled to more than 10,000 users yet and the external API requires 1 unique user per request. The local response server is a very simple Node server that has one route that waits between 200-800ms to respond.
Am I getting this error because I am calling a server running locally on my machine or is it because Node is having an issue handling this many requests? Any insight into the best way to perform this type of polling would be appreciated.
Edit: My question is about how to create a client that can send more than 10,000 requests in a 10 second interval.
Edit2:
Client:
//arr contains 10,000 tokens
const makeRequests = arr => {
setInterval(() => {
async.eachLimit(arr, 300, (token, cb) => {
axios.get(`http://localhost:3001/tokens/${token}`)
.then(res => {
//do something
cb();
})
.catch(err => {
//handle error
cb();
})
})
}, 10000);
}
Dummy Server:
const getRandomArbitrary = (min, max) => {
return Math.random() * (max - min) + min;
}
app.get('/tokens:token', (req, res) => {
setTimeout(() => {
res.send('OK');
}, getRandomArbitrary(200, 800))
});

Nodejs: Async request with a list of URL

I am working on a crawler. I have a list of URL need to be requested. There are several hundreds of request at the same time if I don't set it to be async. I am afraid that it would explode my bandwidth or produce to much network access to the target website. What should I do?
Here is what I am doing:
urlList.forEach((url, index) => {
console.log('Fetching ' + url);
request(url, function(error, response, body) {
//do sth for body
});
});
I want one request is called after one request is completed.
You can use something like Promise library e.g. snippet
const Promise = require("bluebird");
const axios = require("axios");
//Axios wrapper for error handling
const axios_wrapper = (options) => {
return axios(...options)
.then((r) => {
return Promise.resolve({
data: r.data,
error: null,
});
})
.catch((e) => {
return Promise.resolve({
data: null,
error: e.response ? e.response.data : e,
});
});
};
Promise.map(
urls,
(k) => {
return axios_wrapper({
method: "GET",
url: k,
});
},
{ concurrency: 1 } // Here 1 represents how many requests you want to run in parallel
)
.then((r) => {
console.log(r);
//Here r will be an array of objects like {data: [{}], error: null}, where if the request was successfull it will have data value present otherwise error value will be non-null
})
.catch((e) => {
console.error(e);
});
The things you need to watch for are:
Whether the target site has rate limiting and you may be blocked from access if you try to request too much too fast?
How many simultaneous requests the target site can handle without degrading its performance?
How much bandwidth your server has on its end of things?
How many simultaneous requests your own server can have in flight and process without causing excess memory usage or a pegged CPU.
In general, the scheme for managing all this is to create a way to tune how many requests you launch. There are many different ways to control this by number of simultaneous requests, number of requests per second, amount of data used, etc...
The simplest way to start would be to just control how many simultaneous requests you make. That can be done like this:
function runRequests(arrayOfData, maxInFlight, fn) {
return new Promise((resolve, reject) => {
let index = 0;
let inFlight = 0;
function next() {
while (inFlight < maxInFlight && index < arrayOfData.length) {
++inFlight;
fn(arrayOfData[index++]).then(result => {
--inFlight;
next();
}).catch(err => {
--inFlight;
console.log(err);
// purposely eat the error and let the rest of the processing continue
// if you want to stop further processing, you can call reject() here
next();
});
}
if (inFlight === 0) {
// all done
resolve();
}
}
next();
});
}
And, then you would use that like this:
const rp = require('request-promise');
// run the whole urlList, no more than 10 at a time
runRequests(urlList, 10, function(url) {
return rp(url).then(function(data) {
// process fetched data here for one url
}).catch(function(err) {
console.log(url, err);
});
}).then(function() {
// all requests done here
});
This can be made as sophisticated as you want by adding a time element to it (no more than N requests per second) or even a bandwidth element to it.
I want one request is called after one request is completed.
That's a very slow way to do things. If you really want that, then you can just pass a 1 for the maxInFlight parameter to the above function, but typically, things would work a lot faster and not cause problems by allowing somewhere between 5 and 50 simultaneous requests. Only testing would tell you where the sweet spot is for your particular target sites and your particular server infrastructure and amount of processing you need to do on the results.
you can use set timeout function to process all request within loop. for that you must know maximum time to process a request.

Resources