How to retry a http request until a condition is met using rxjs - node.js

I want to retry a http request until some data exists up to 10 times with a delay of 2 seconds between each retry.
const $metrics = from(axios(this.getMetrics(session._id, sessionRequest._id, side)));
const res = $metrics.pipe(
map((val: any) => {
console.log("VALUE:", val.data.metrics.length);
if (val.data.metrics.length === 0) {
throw val;
}
return val;
}),
retryWhen((errors) => errors.pipe(delay(2000), take(10))),
).subscribe();
I am trying to follow the example in the documentation. https://www.learnrxjs.io/operators/error_handling/retry.html
I create $metrics observable from an axios http promise.
I use the map operator to check if the response of the http request matches my condition to retry. val.data.metrics.length === 0. If it does it throws an error.
I retry the http requests up to 10 times with a 10 second delay.
I expect after 3-4 retries for this metrics array to have data, but in my console when i log the response i get the following.
VALUE: 0
Im not sure if this is even making multiple http requests because the console log only returns one output instead of 10.
UPDATE
Ive updated the code to use retryWhen instead of retry, it does a delay of 2 seconds and will only take 10 errors before stopping.
Now i believe the problem is that it only makes 1 http request because the console log only returns a single output.

try use defer()
const $metrics = defer(()=>from(axios(this.getMetrics(session._id, sessionRequest._id, side))))
one thing to point out, you should inspect your network tab and see if the request is made in retry. your console.log is in map() operator which will be skipped when error thrown, could be why u don't see console.log there. You can try out the example below.
import { timer, interval,from } from 'rxjs';
import { map, tap, retryWhen, delayWhen,delay,take } from 'rxjs/operators';
//emit value every 1s
const source = from(fetch('http://kaksfk')).pipe(tap(val => console.log(`fetcching you won't see`)))
const example = source.pipe(
retryWhen(errors =>
errors.pipe(
// log error message
tap(val => console.log(`retrying`)),
// restart in 5 seconds
delay(2000),
take(5),
),
),
)

Related

Retry http request with backoff - (Nestjs - axios - rxjs) throws socket hang up

[second-Update]: i solved the issue by implmentig retry with promises and try & catch
[first-Update]:
I tried the retrial mechanism with HTTP request with content-type: application/json and it works!! but my issue Is with content type form-data.
I guess it's similar to this problem: Axios interceptor to retry sending FormData
Services architecture
I'm trying to make an HTTP request to service-a from NestJS-app.
and I want to implement retry with backoff logic.
to re-produce service-a failure, I'm restarting its docker container and make the HTTP request.
retry logic implemented as 3 retries.
first time as service-a is restarting.. throws 405 service not available error and make retry.
all 3 retries failed with a socket hang up error.
HTTP request code using axios nestjs wrapper lib
retryWithBackOff rxjs operator implementation
the first call throws a 405 Service Unavailable error.
then application starts retries.
first retry fires after service-a started, failed with error socket hang up
first, second, and third retries failed with socket hang up.
3 sockets hang up errors
my expected behavior is:
when service-a started then the first retry fires, it should work with a successful response.
notice that 3 retries don't log to the Nginx server anything!
While your solution probably works, it could be improved in terms of single responsibility, which RxJS can help with. I use an adapted solution of a code snippet I found once on the web (I can't find the original source any more).
interface GenericRetryStrategy {
getAttempt?(): number;
maxRetryAttempts?: number;
scalingDuration?: number;
maxDuration?: number;
retryFormula?: RetryFormula;
excludedStatusCodes?: number[]; // All errors with these codes will circumvent retry and just return the error
}
const genericRetryStrategy$ =
({
getAttempt,
maxRetryAttempts = 3,
scalingDuration = 1000,
maxDuration = 64000,
retryFormula = 'constant', // time-to-retry-count interpolation
excludedStatusCodes = [], // All errors with these codes will circumvent retry and just return the error
}: GenericRetryStrategy = {}) =>
(error$: Observable<unknown>): Observable<number> =>
error$.pipe(
switchMap((error, i) => {
const retryAttempt = getAttempt ? getAttempt() : i + 1;
// if maximum number of retries have been met
// or response is a error code we don't wish to retry, throw error
if (
retryAttempt > maxRetryAttempts ||
excludedStatusCodes.find(e => e === error.code)
) {
return throwError(error);
}
const retryDuration = getRetryCurve(retryFormula, retryAttempt);
const waitDuration = Math.min(
maxDuration,
retryDuration * scalingDuration,
);
// retry after 1000ms, 2000ms, etc …
return timer(waitDuration);
}),
);
You would then call it like this:
const retryThreeTimes$ = genericRetryStrategy$({
maxRetryAttempts: 3,
excludedStatusCodes: [HttpStatus.PayloadTooLarge, HttpStatus.NotFound] // This will throw the error straight away
});
this.setupUploadAttachements(url, clientApiKey, files, toPoTenantId).pipe(retryWhen(retryThreeTimes$))
This function/operator can now be re-used for all kinds of requests. It is very flexible. It also makes your operator logic more readable, since the complex retry logic sits somewhere else and does not “pollute” your pipe.
You might have to do some adjustment, since axios does return a different error payload, it seems (at least judging from your code examples). Also, if I understood your code correctly, you actually don't want to throw and error when the above error codes apply. In that case, you could add another catchError after the retryWhen and filter these codes, while returning of([]).

How do I fail a specific SQS message in a batch from a Lambda?

I have a Lambda with an SQS trigger. When it gets hit, a batch of records from SQS comes in (usually about 10 at a time, I think). If I return a failed status code from the handler, all 10 messages will be retried. If I return a success code, they'll all be removed from the queue. What if 1 out of those 10 messages failed and I want to retry just that one?
exports.handler = async (event) => {
for(const e of event.Records){
try {
let body = JSON.parse(e.body);
// do things
}
catch(e){
// one message failed, i want it to be retried
}
}
// returning this causes ALL messages in
// this batch to be removed from the queue
return {
statusCode: 200,
body: 'Finished.'
};
};
Do I have to manually re-add that ones message back to the queue? Or can I return a status from my handler that indicates that one message failed and should be retried?
As per AWS documentation, SQS event source mapping now supports handling of partial failures out of the box. Gist of the linked article is as follows:
Include ReportBatchItemFailures in your EventSourceMapping configuration
The response syntax in case of failures has to be modified to have:
{
"batchItemFailures": [
{ "itemIdentifier": "id2" },
{ "itemIdentifier": "id4" }
]
}
Where id2 and id4 the failed messageIds in a batch.
Quoting the documentation as is:
Lambda treats a batch as a complete success if your function returns
any of the following
An empty batchItemFailure list
A null batchItemFailure list
An empty EventResponse
A null EventResponse
Lambda treats a batch as a complete failure if your function returns
any of the following:
An invalid JSON response
An empty string itemIdentifier
A null itemIdentifier
An itemIdentifier with a bad key name
An itemIdentifier value with a message ID that doesn't exist
SAM support is not yet available for the feature as per the documentation. But one of the AWS labs example points to its usage in SAM and it worked for me when tested
Yes you have to manually re-add the failed messages back to the queue.
What I suggest doing is setting up a fail count, so that if all messages failed you can simply return a failed status for all messages, otherwise if the fail count is < 10 then you can individually send back the failed messages to the queue.
You've to programmatically delete each message from after processing it successfully.
So you can have a flag set to true if anyone of the messages failed and depending upon it you can raise error after processing all the messages in a batch so successful messages will be deleted and other messages will be reprocessed based on retry policies.
So as per the below logic only failed and unprocessed messages will get retried.
import boto3
sqs = boto3.client("sqs")
def handler(event, context):
for message in event['records']:
queue_url = "form queue url recommended to set it as env variable"
message_body = message["body"]
print("do some processing :)")
message_receipt_handle = message["receiptHandle"]
sqs.delete_message(
QueueUrl=queue_url,
ReceiptHandle=message_receipt_handle
)
there is also another way to save successfully processed message id into a variable and perform batch delete operation based on message id
response = client.delete_message_batch(
QueueUrl='string',
Entries=[
{
'Id': 'string',
'ReceiptHandle': 'string'
},
]
)
You need to design your app iin diffrent way here is few ideas not best but will solve your problem.
Solution 1:
Create sqs delivery queues - sq1
Create delay queues as per delay requirment sq2
Create dead letter queue sdl
Now inside lambda function if message failed in sq1 then delete it on sq1 and drop it on sq2 for retry Any Lambda function invoked asynchronously is retried twice before the event is discarded. If the retries fail.
If again failed after give retry move into dead letter queue sdl .
AWS Lambda - processing messages in Batches
https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
Note :When an SQS event source mapping is initially created and enabled, or first appear after a period with no traffic, then the Lambda service will begin polling the SQS queue using five parallel long-polling connections, as per AWS documentation, the default duration for a long poll from AWS Lambda to SQS is 20 seconds.
https://docs.aws.amazon.com/lambda/latest/dg/lambda-services.html#supported-event-source-sqs
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-delay-queues.html
https://nordcloud.com/amazon-sqs-as-a-lambda-event-source/
Solution 2:
Use AWS StepFunction
https://aws.amazon.com/step-functions/
StepFunction will call lambda and handle the retry logic on failure with configurable exponential back-off if needed.
https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html
https://cloudacademy.com/blog/aws-step-functions-a-serverless-orchestrator/
**Solution 3: **
CloudWatch scheduled event to trigger a Lambda function that polls for FAILED.
Error handling for a given event source depends on how Lambda is invoked. Amazon CloudWatch Events invokes your Lambda function asynchronously.
https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
https://engineering.opsgenie.com/aws-lambda-performance-series-part-2-an-analysis-on-async-lambda-fail-retry-behaviour-and-dead-b84620af406
https://dzone.com/articles/asynchronous-retries-with-aws-sqs
https://medium.com/#ron_73212/how-to-handle-aws-lambda-errors-like-a-pro-e5455b013d10
AWS supports partial batch response. Here is example for Typescript code
type Result = {
itemIdentifier: string
status: 'failed' | 'success'
}
const isFulfilled = <T>(
result: PromiseFulfilledResult<T> | PromiseRejectedResult
): result is PromiseFulfilledResult<T> => result.status === 'fulfilled'
const isFailed = (
result: PromiseFulfilledResult<Result>
): result is PromiseFulfilledResult<
Omit<Result, 'status'> & { status: 'failed' }
> => result.value.status === 'failed'
const results = await Promise.allSettled(
sqsEvent.Records.map(async (record) => {
try {
return { status: 'success', itemIdentifier: record.messageId }
} catch(e) {
console.error(e);
return { status: 'failed', itemIdentifier: record.messageId }
}
})
)
return results
.filter(isFulfilled)
.filter(isFailed)
.map((result) => ({
itemIdentifier: result.value.itemIdentifier,
}))

How to specify HTTP timeout for DownloadURL() in Akavache?

I am developing an application targetting mobile devices, so I have to consider bad network connectivity. In one use case, I need to reduce the timeout for a request, because if no network is available, that's okay, and I'd fall back to default data immediately, without having the user wait for the HTTP response.
I found that HttpMixin.MakeWebRequest() has a timeout parameter (with default=null) but DownloadUrl() never makes use of it, so the forementioned function always waits for up to 15 seconds:
request.Timeout(timeout ?? TimeSpan.FromSeconds(15),
BlobCache.TaskpoolScheduler).Retry(retries);
So actually I do not have the option to use a different timeout, or am I missing something?
Thanks for considering a helpful response.
So after looking at the signature for DownloadUrl in
HttpMixin.cs
I saw what you are talking about and am not sure why it is there but, it looks like the timeout is related to building the request and not a timeout for the request itself.
That being said, in order to set a timeout with a download, you have a couple options that should work.
Via TPL aka Async Await
var timeout = 1000;
var task = BlobCache.LocalMachine.DownloadUrl("http://stackoverflow.com").FirstAsync().ToTask();
if (await Task.WhenAny(task, Task.Delay(timeout)) == task) {
// task completed within timeout
//Do Stuff with your byte data here
//var result = task.Result;
} else {
// timeout logic
}
Via Rx Observables
var obs = BlobCache.LocalMachine
.DownloadUrl("http://stackoverflow.com")
.Timeout(TimeSpan.FromSeconds(5))
.Retry(retryCount: 2);
var result = obs.Subscribe((byteData) =>
{
//Do Stuff with your byte data here
Debug.WriteLine("Byte Data Length " + byteData.Length);
}, (ex) => {
Debug.WriteLine("Handle your exceptions here." + ex.Message);
});

Express Node Request For Loop Issue [duplicate]

With node.js I want to http.get a number of remote urls in a way that only 10 (or n) runs at a time.
I also want to retry a request if an exception occures locally (m times), but when the status code returns an error (5XX, 4XX, etc) the request counts as valid.
This is really hard for me to wrap my head around.
Problems:
Cannot try-catch http.get as it is async.
Need a way to retry a request on failure.
I need some kind of semaphore that keeps track of the currently active request count.
When all requests finished I want to get the list of all request urls and response status codes in a list which I want to sort/group/manipulate, so I need to wait for all requests to finish.
Seems like for every async problem using promises are recommended, but I end up nesting too many promises and it quickly becomes uncypherable.
There are lots of ways to approach the 10 requests running at a time.
Async Library - Use the async library with the .parallelLimit() method where you can specify the number of requests you want running at one time.
Bluebird Promise Library - Use the Bluebird promise library and the request library to wrap your http.get() into something that can return a promise and then use Promise.map() with a concurrency option set to 10.
Manually coded - Code your requests manually to start up 10 and then each time one completes, start another one.
In all cases, you will have to manually write some retry code and as with all retry code, you will have to very carefully decide which types of errors you retry, how soon you retry them, how much you backoff between retry attempts and when you eventually give up (all things you have not specified).
Other related answers:
How to make millions of parallel http requests from nodejs app?
Million requests, 10 at a time - manually coded example
My preferred method is with Bluebird and promises. Including retry and result collection in order, that could look something like this:
const request = require('request');
const Promise = require('bluebird');
const get = Promise.promisify(request.get);
let remoteUrls = [...]; // large array of URLs
const maxRetryCnt = 3;
const retryDelay = 500;
Promise.map(remoteUrls, function(url) {
let retryCnt = 0;
function run() {
return get(url).then(function(result) {
// do whatever you want with the result here
return result;
}).catch(function(err) {
// decide what your retry strategy is here
// catch all errors here so other URLs continue to execute
if (err is of retry type && retryCnt < maxRetryCnt) {
++retryCnt;
// try again after a short delay
// chain onto previous promise so Promise.map() is still
// respecting our concurrency value
return Promise.delay(retryDelay).then(run);
}
// make value be null if no retries succeeded
return null;
});
}
return run();
}, {concurrency: 10}).then(function(allResults) {
// everything done here and allResults contains results with null for err URLs
});
The simple way is to use async library, it has a .parallelLimit method that does exactly what you need.

Azure Search .net SDK- How to use "FindFailedActionsToRetry"?

Using the Azure Search .net SDK, when you try to index documents you might get an exception IndexBatchException.
From the documentation here:
try
{
var batch = IndexBatch.Upload(documents);
indexClient.Documents.Index(batch);
}
catch (IndexBatchException e)
{
// Sometimes when your Search service is under load, indexing will fail for some of the documents in
// the batch. Depending on your application, you can take compensating actions like delaying and
// retrying. For this simple demo, we just log the failed document keys and continue.
Console.WriteLine(
"Failed to index some of the documents: {0}",
String.Join(", ", e.IndexingResults.Where(r => !r.Succeeded).Select(r => r.Key)));
}
How should e.FindFailedActionsToRetry be used to create a new batch to retry the indexing for failed actions?
I've created a function like this:
public void UploadDocuments<T>(SearchIndexClient searchIndexClient, IndexBatch<T> batch, int count) where T : class, IMyAppSearchDocument
{
try
{
searchIndexClient.Documents.Index(batch);
}
catch (IndexBatchException e)
{
if (count == 5) //we will try to index 5 times and give up if it still doesn't work.
{
throw new Exception("IndexBatchException: Indexing Failed for some documents.");
}
Thread.Sleep(5000); //we got an error, wait 5 seconds and try again (in case it's an intermitent or network issue
var retryBatch = e.FindFailedActionsToRetry<T>(batch, arg => arg.ToString());
UploadDocuments(searchIndexClient, retryBatch, count++);
}
}
But I think this part is wrong:
var retryBatch = e.FindFailedActionsToRetry<T>(batch, arg => arg.ToString());
The second parameter to FindFailedActionsToRetry, named keySelector, is a function that should return whatever property on your model type represents your document key. In your example, your model type is not known at compile time inside UploadDocuments, so you'll need to change UploadsDocuments to also take the keySelector parameter and pass it through to FindFailedActionsToRetry. The caller of UploadDocuments would need to specify a lambda specific to type T. For example, if T is the sample Hotel class from the sample code in this article, the lambda must be hotel => hotel.HotelId since HotelId is the property of Hotel that is used as the document key.
Incidentally, the wait inside your catch block should not wait a constant amount of time. If your search service is under heavy load, waiting for a constant delay won't really help to give it time to recover. Instead, we recommend exponentially backing off (e.g. -- the first delay is 2 seconds, then 4 seconds, then 8 seconds, then 16 seconds, up to some maximum).
I've taken Bruce's recommendations in his answer and comment and implemented it using Polly.
Exponential backoff up to one minute, after which it retries every other minute.
Retry as long as there is progress. Timeout after 5 requests without any progress.
IndexBatchException is also thrown for unknown documents. I chose to ignore such non-transient failures since they are likely indicative of requests which are no longer relevant (e.g., removed document in separate request).
int curActionCount = work.Actions.Count();
int noProgressCount = 0;
await Polly.Policy
.Handle<IndexBatchException>() // One or more of the actions has failed.
.WaitAndRetryForeverAsync(
// Exponential backoff (2s, 4s, 8s, 16s, ...) and constant delay after 1 minute.
retryAttempt => TimeSpan.FromSeconds( Math.Min( Math.Pow( 2, retryAttempt ), 60 ) ),
(ex, _) =>
{
var batchEx = ex as IndexBatchException;
work = batchEx.FindFailedActionsToRetry( work, d => d.Id );
// Verify whether any progress was made.
int remainingActionCount = work.Actions.Count();
if ( remainingActionCount == curActionCount ) ++noProgressCount;
curActionCount = remainingActionCount;
} )
.ExecuteAsync( async () =>
{
// Limit retries if no progress is made after multiple requests.
if ( noProgressCount > 5 )
{
throw new TimeoutException( "Updating Azure search index timed out." );
}
// Only retry if the error is transient (determined by FindFailedActionsToRetry).
// IndexBatchException is also thrown for unknown document IDs;
// consider them outdated requests and ignore.
if ( curActionCount > 0 )
{
await _search.Documents.IndexAsync( work );
}
} );

Resources