Express Node Request For Loop Issue [duplicate] - node.js

With node.js I want to http.get a number of remote urls in a way that only 10 (or n) runs at a time.
I also want to retry a request if an exception occures locally (m times), but when the status code returns an error (5XX, 4XX, etc) the request counts as valid.
This is really hard for me to wrap my head around.
Problems:
Cannot try-catch http.get as it is async.
Need a way to retry a request on failure.
I need some kind of semaphore that keeps track of the currently active request count.
When all requests finished I want to get the list of all request urls and response status codes in a list which I want to sort/group/manipulate, so I need to wait for all requests to finish.
Seems like for every async problem using promises are recommended, but I end up nesting too many promises and it quickly becomes uncypherable.

There are lots of ways to approach the 10 requests running at a time.
Async Library - Use the async library with the .parallelLimit() method where you can specify the number of requests you want running at one time.
Bluebird Promise Library - Use the Bluebird promise library and the request library to wrap your http.get() into something that can return a promise and then use Promise.map() with a concurrency option set to 10.
Manually coded - Code your requests manually to start up 10 and then each time one completes, start another one.
In all cases, you will have to manually write some retry code and as with all retry code, you will have to very carefully decide which types of errors you retry, how soon you retry them, how much you backoff between retry attempts and when you eventually give up (all things you have not specified).
Other related answers:
How to make millions of parallel http requests from nodejs app?
Million requests, 10 at a time - manually coded example
My preferred method is with Bluebird and promises. Including retry and result collection in order, that could look something like this:
const request = require('request');
const Promise = require('bluebird');
const get = Promise.promisify(request.get);
let remoteUrls = [...]; // large array of URLs
const maxRetryCnt = 3;
const retryDelay = 500;
Promise.map(remoteUrls, function(url) {
let retryCnt = 0;
function run() {
return get(url).then(function(result) {
// do whatever you want with the result here
return result;
}).catch(function(err) {
// decide what your retry strategy is here
// catch all errors here so other URLs continue to execute
if (err is of retry type && retryCnt < maxRetryCnt) {
++retryCnt;
// try again after a short delay
// chain onto previous promise so Promise.map() is still
// respecting our concurrency value
return Promise.delay(retryDelay).then(run);
}
// make value be null if no retries succeeded
return null;
});
}
return run();
}, {concurrency: 10}).then(function(allResults) {
// everything done here and allResults contains results with null for err URLs
});

The simple way is to use async library, it has a .parallelLimit method that does exactly what you need.

Related

Is there a way to intercept multiple requests linked by an "OR" conditional operator?

I am a beginner with cypress. I've been looking for a way to intercept API calls to at least one of multiple URLs.
Let's say a button is clicked and something like this code is executed to check if a list of requests were called :
cy.get('#request1').should('have.been.called').log(`Request was made to 'REQUEST1_URL'`)
OR
cy.get('#request2').should('have.been.called').log(`Request was made to ''REQUEST2_URL'`)
I want to check if a request was sent to one url or the other, or both.
Has anyone encountered this problem before ? Any contribution is appreciated.
Thanks.
The URL you use in the intercept should be general enough to catch both calls.
For example if the calls have /api/ in common, this catches both
cy.intercept('**/api/*') // note wildcards in the URL
.as('apiRequest')
cy.visit('/')
cy.wait('#apiRequest')
If you have more paths in the url than you need to catch, for example /api/dogs/ /api/cats/ and /api/pigs/, then use a function to weed out the ones you want
cy.intercept('**/api/*', (req) => {
if (req.url.includes('dogs') || req.url.includes('cats') { // no pigs
req.alias = 'dogsOrCats' // set alias
}
})
cy.visit('/')
cy.wait('#dogsOrCats')
Catching 0, 1, or 2 URLs
This is a bit tricky, if the number of calls isn't known then you have to know within what time frame they would be made.
To catch requests which you are fired fairly quickly by the app
let count = 0;
cy.intercept('**/api/*', (req) => {
count = count +1;
})
cy.visit('/')
cy.wait(3000) // wait to see if calls are fired
cy.then(() => {
cy.wrap(count).should('be.gt', 0) // 0 calls fails, 1 or 2 passes
})

Many resquest on GetStaticProps freezes my application build and return an server error

I'm trying to list a bunch of products and I wanted to request data on node and build the page in a static way, so The homepage would be faster.
The problem is that when I make over 80 request on GetStaticProps.
The following code with 80 items, does work
const urlList = [];
for (let i = 1; i <= 80; i++) {
const url = `myApiUrl`;
urlList.push(url);
}
const promises = urlList.map(url => axios.get(url));
const responses = await Promise.all(promises);
return responses;
The following code with 880 items, does not work
(Note that is does work outside of GetStaticProps))
const urlList = [];
for (let i = 1; i <= 880; i++) {
const url = `myApiUrl`;
urlList.push(url);
}
const promises = urlList.map(url => axios.get(url));
const responses = await Promise.all(promises);
return responses;
erro on console:
Uncaught at TLSWrap.onStreamRead (internal/stream_base_commons.js:209:20)
webpage error:
Server Error
Error
This error happened while generating the page. Any console logs will be displayed in the terminal window.
TLSWrap.onStreamRead
internal/stream_base_commons.js (209:20)
Is there a way to handle large requests amount like that?
I'm new to hhtp requests, is there a way for me to optimize that?
There are limits to how many connections you can create to fetch content. What you're seeing is that a method like Promise.all() isn't "smart" enough to avoid running into such limits.
Basically, when you call Promise.all() you tell the computer "do all these things simultaneously, the order does not matter, and give me all the output when done. And by the way, if a single of those operations fail stop everything and throw away all other results". It's very useful in many contexts, but perhaps not when trying to fetch over 800 things from the net..
So yes, unless you can tweak the requirements like number of allowed simultaneous connections or memory the script gets to use, you'll likely have to do this in batches. Perhaps one Promise.all() for slices of 100 jobs at a time, then next slice. You could look at using the async library and the mapLimit method or roll your own way to slice the list of jobs into batches.
this could be a problem based on the node version its using
but for await could also be an option for you...
You can leverage axios.all instead of Promise.all.
const urlList = [];
for (let i = 1; i <= 80; i++) {
const url = `myApiUrl`;
urlList.push(url);
}
const promises = urlList.map(url => axios.get(url));
const responses = await axios.all(promises);
return responses;
https://codesandbox.io/s/multiple-requests-axios-forked-nx1z9?file=/src/index.js
As for step, for debugging purposes I would use Promise.allSettled instead of Promise.all. This should help you to understand what is the error returned by the HTTP socket. If you don't control the external API, it is likely that a firewall is blocking you from this "DDOS" attack.
As you said, batching the call doesn't solve the issue (if you queue 80 requests followed by 80 etc, you may encounter the rate limit in any case)
You should check for throttling issues, and use a module to speed limit your HTTP call like throttle-debounce

how to use/implement a custom nodejs Circuit Breaker based on the number of requests?

I'm trying to figure out what is the best wait to implement a circuit breaker based of the number of requests been served in a Typescript/express application instead of fails percentage.
Since the application is meant to be executed by large number of users and under a heavy load, I'm trying to customize the response code in order to trigger a horizontal scaling event with k8s/istio.
The first thing I want to start with is to get is the number of requests in nodejs eventloop event if there is some async work in progress, because a big part of my request are executed asynchronously using async/await.
BTW:
I have seen these Libs
https://github.com/bennadel/Node-Circuit-Breaker
https://github.com/nodeshift/opossum
Is there any good Idea/path I can start with in order to make this possible ?
I can't tell for sure from your question, but if what you're trying to do is to just keep track of how many requests are in progress and then do something in particular if that number exceeds a particular value, then you can use this middleware:
function requestCntr() {
let inProgress = 0;
const events = ['finish', 'end', 'error', 'close'];
return function(req, res, next) {
function done() {
// request finished, so decrement the inProgress counter
--inProgress;
// unhook all our event handlers so we don't count it more than one
events.forEach(event => res.off(event, done));
}
// increment counter for requests in progress
++inProgress;
const maxRequests = 10;
if (inProgress > maxRequests) {
console.log('more than 10 requests in flight at the same time');
// do whatever you want to here
}
events.forEach(event => res.on(event, done));
next();
}
}
app.use(requestCntr());

Issues performing network I/O in a NodeJS worker thread

I have a script that will download thousands of files from a server, perform some CPU-intensive calculations on those files, and then upload the results somewhere. As an added level of complexity, I want to limit the number of concurrent connections to the server where I'm downloading the files.
To get the CPU-intensive calculations off the event thread, I leveraged workerpool by josdejong. I also figured I could take advantage of the fact that only a limited number of threads will be spun up at any given time to limit the number of concurrent connections to my server, so I tried putting the network I/O in the worker process like so (TypeScript):
import Axios from "axios";
import workerpool from "workerpool";
const pool = workerpool.pool({
minWorkers: "max",
});
async function processData(file: string) {
console.log("Downloading " + file);
const csv = await Axios.request<IncomingMessage>({
method: "GET",
url: file,
responseType: "stream"
});
console.log(csv);
// TODO: Will process the file here
}
export default async function (files: string[]) {
const promiseArray: workerpool.Promise<Promise<void>>[] = [];
// Only processing the first file for now during testing
files.slice(0, 1).forEach((file) => {
promiseArray.push(pool.exec(processData, [file]));
});
await Promise.allSettled(promiseArray);
await pool.terminate();
}
When I compile and run this code I see the message "Downloading test.txt", but after that I don't see the following log statement (console.log(csv))
I've tried various modifications on this code including removing the responseType, removing await and just inspecting the Promise that's returned by Axios, making the function non-async, etc. No matter what it seems to always crash on the Axios.request line
Are worker threads not able to open HTTP connections or something? Or am I just making a silly mistake?
If it is not getting to this line of code:
console.log(csv);
Then, either the Axios.request() is never fulfilling its promise or that promise is rejecting. You have no error handling at all in any of these functions so if it was rejecting, you wouldn't know and wouldn't be logging the problem. As a starter, I would suggest you instrument your code so you can log any rejections:
async function processData(file: string) {
try {
console.log("Downloading " + file);
const csv = await Axios.request<IncomingMessage>({
method: "GET",
url: file,
responseType: "stream"
});
console.log(csv);
} catch(e) {
console.log(e); // log an error
throw e; // propagate rejection/error
}
}
As a general point of code design, you should be catching and logging any possible promise rejection at some level. You don't have to catch them all at the lowest calling level as they will propagate up through returned promises, but you do need to catch any possible rejection somewhere and, for your own development sanity, you will want to log it so you can see when it happens and what the error is.
You can't execute TypeScript in a worker thread. The pool.exec method accepts either a static JavaScript function or a path to a JavaScript file with the same function.
Here is a quote from the workerpool readme:
Note that both function and arguments must be static and stringifiable, as they need to be sent to the worker in a serialized form. In case of large functions or function arguments, the overhead of sending the data to the worker can be significant.
I'm trying to make this work with TypeScript. Possible ways to resolve this are:
write a worker function in TypeScript, compile it to a separate bundle with any bundler, and then pass the path to the compiled file to the pool.exec. I managed to make this work but the only thing that I'm not satisfied with is that with this solution you can't use nodemon (if you use it)
use a JS wrapper that compiles the TS source code and executes it using ts-node. Then pass the path to that wrapper to the pool.exec function. This solution won't work with bundlers

Request rate is large

Im using Azure documentdb and accessing it through my node.js on express server, when I query in loop, low volume of few hundred there is no issue.
But when query in loop slightly large volume, say around thousand plus
I get partial results (inconsistent, every time I run result values are not same. May be because of asynchronous nature of Node.js)
after few results it crashes with this error
body: '{"code":"429","message":"Message: {\"Errors\":[\"Request rate is large\"]}\r\nActivityId: 1fecee65-0bb7-4991-a984-292c0d06693d, Request URI: /apps/cce94097-e5b2-42ab-9232-6abd12f53528/services/70926718-b021-45ee-ba2f-46c4669d952e/partitions/dd46d670-ab6f-4dca-bbbb-937647b03d97/replicas/130845018837894542p"}' }
Meaning DocumentDb fail to handle 1000+ request per second?
All together giving me a bad impression on NoSQL techniques.. is it short coming of DocumentDB?
As Gaurav suggests, you may be able to avoid the problem by bumping up the pricing tier, but even if you go to the highest tier, you should be able to handle 429 errors. When you get a 429 error, the response will include a 'x-ms-retry-after-ms' header. This will contain a number representing the number of milliseconds that you should wait before retrying the request that caused the error.
I wrote logic to handle this in my documentdb-utils node.js package. You can either try to use documentdb-utils or you can duplicate it yourself. Here is a snipit example.
createDocument = function() {
client.createDocument(colLink, document, function(err, response, header) {
if (err != null) {
if (err.code === 429) {
var retryAfterHeader = header['x-ms-retry-after-ms'] || 1;
var retryAfter = Number(retryAfterHeader);
return setTimeout(toRetryIf429, retryAfter);
} else {
throw new Error(JSON.stringify(err));
}
} else {
log('document saved successfully');
}
});
};
Note, in the above example document is within the scope of createDocument. This makes the retry logic a bit simpler, but if you don't like using widely scoped variables, then you can pass document in to createDocument and then pass it into a lambda function in the setTimeout call.

Resources