Can aiobotocore be configured to automatically handle rate limiting? - python-3.x

I'm using boto3 to write a lot of objects to S3. I keep hitting an error like:
botocore.exceptions.ClientError: An error occurred (SlowDown) when calling the PutObject operation (reached max retries: 0): Please reduce your request rate.
This is clearly coming from the server and asking my code to slow down the requests it's making. My question is whether there's a way to get aiobotocore to handle this automatically with its retry logic?
It should be theoretically possible for the response to be handled automatically, including a wait. The issue with doing this in my own code is that there are many tasks all hitting the same bucket, so negotiating rate limiting between them will be very complex indeed.

Related

Timeout error when creating ServiceBusMessageBatch in Azure.Messaging.ServiceBus

I have the following code where I start getting an error during long-running tests on the same Service Bus Client.
ServiceBusMessageBatch batch = this._serviceBusSender.CreateMessageBatchAsync().GetAwaiter().GetResult();
The error is,
Azure.Messaging.ServiceBus.ServiceBusException: 'The operation did not complete within the allocated time 00:01:00 for object request42. (ServiceTimeout)'
Why is this statement throwing this error? Is the creation of a batch object such a heavy operation that it can even timeout? If this is the case, should I switch to the overload of using the List of ServiceBusMessage instead of this batch mode?
My understanding is that this way of batch creation can protect me from creating a batch that the queue may not allow. I am finding it difficult to understand why it times out after 1 min
.
In order for a batch to be able to enforce limits on the size, it has to establish an AMQP link to the entity that you'll be sending to and read the maximum allowable message size from the service. This results in a network operation that, in this case, timed out. This overhead is performed only in the case that there is not an existing AMQP link already established - typically on the first call that requires a network operation.
What jumps out at me from your code is the use of GetAwaiter().GetResult() to perform sync-over-async. This is really not a good idea and is very likely to cause contention in the thread pool that prevents continuations from being scheduled in a timely manner. Because network operations in Service Bus are asynchronous - including establishing the AMQP link - delays in scheduling continuations would certainly increase the chance of timeouts.
I'd strongly advise refactoring your sync-over-async code paths and shifting to an asynchronous approach. In those scenarios where it's not possible to go full async, limiting sync-over-async to the outermost layer of your code would be the next best thing.

Limiting number of requests in cassandra without causing starting timeout ticking

The DataStax Cassandra driver of version 4 has got a feature of the throttling.
The documentation states:
Similarly, the request timeout encompasses throttling: the timeout starts ticking before the
throttler has started processing the request; a request may time out while it is still in the
throttler's queue, before the driver has even tried to send it to a node.
Great. However, let's say I have a dynamic list of some ids and I want to execute select requests to cassandra in parallel (using executeAsync()) for all ids in the list. Having list too large I will eventually face timeouts if requests are residing in the throttler's queue too long.
How can I overcome this issue? Is there any built-in rate limiting technique so I can do not care about how many requests in parallel I can execute, but just throw all of them to cassandra and then wait until they all are completed??
UPD: I am not interested in custom code solutions, as ofc we are capable to implement our own rate limit solution. I am asking precisely about driver's built-in mechanisms to achieve this.

How to set intervals between multiple requests AWS Lambda API

I have created an API using AWS Lambda function (using Python). Now my react js code hits this API whenever an event fire. So user can request API as many times the events are fired. Now the problem is we are not getting the response from lambda API sequentially. Sometime we are getting the response of our last request faster than the previous response of previous request.
So we need to handle our response in Lambda function sequentially, may be adding some delay between 2 request or may be implementing throttling. So how can I do that.
Did you check the concurrency setting on Lambda? You can throttle the lambda there.
But if you throttle the lambda and the requests being sent are not being received, the application sending the requests might be receiving an error unless you are storing the requests somewhere on AWS for being processed later.
I think putting an SQS in front of lambda might help. You will be hitting API gateway, the requests get sent to SQS, lambda polls requests concurrently (you can control the concurrency) and then send the response back.
You can use SQS FIFO Queue as a trigger on the Lambda function, set Batch size to 1, and the Reserved Concurrency on the Function to 1. The messages will always be processed in order and will not concurrently poll the next message until the previous one is complete.
SQS triggers do not support Batch Window - which will 'wait' until polling the next message. This is a feature for Stream based Lambda triggers (Kinesis and DynamoDB Streams)
If you want to streamlined process, Step Function will let you manage states using state machines and supports automatic retry based off the outputs of individual states.
As a previous response said, potentially what could help is to put an SQS in front of the Lambda - if order of processing is important, you could also look at setting the SQS queue up as a FIFO queue, which preserves order:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html
As the other comment said, the other option is to limit concurrency, but even then you're probably best off putting SQS in front as you're then limiting your throughput.

AWS SDK calls from a Lambda take longer than 30 seconds

I have a NodeJs Lambda function in AWS which needs to read some data. As a data source we've tried two options - S3 and DynamoDB. Both on them have the same issue - when we conduct load testing (10 req/sec during 100sec) some requests to those S3/DynamoDB fail to complete in 30 sec, which is our Lambda timeout. The requests themselves are very light - for S3 it is a 1KB file and for DynamoDB it is a table with only one record in it. On average those requests take less than 100ms, but sometimes we get these very long peaks I'm talking about.
The rate of such long requests is quite small - less than 1%, but this is still not acceptable for us. Moreover, I don't see any reasons why we have such long responses.
Another thing we've noticed is that those 30sec+ requests usually happen after long periods (4h or more) of not calling those S3/DynamoDB resources.
The only reason I can think of is that after long inactivity periods AWS infrastructure unable to create required number of ENIs fast enough. ENIs are needed because both S3 and DynamoDB are called via HTTP by aws-sdk. But this is just a guess which I don't know how to validate.
Currently, I'm thinking of warming-up ENIs by making requests to S3/DynamoDB, but I haven't tried it yet.
If anybody has had similar issues I would appreciate any suggestions on how to fix the issue.
P.S. Increasing a Lambda timeout is not an options for us. 30secs are more than enough to make such a simple calls.

What should be done when the provisioned throughput is exceeded?

I'm using AWS SDK for Javascript (Node.js) to read data from a DynamoDB table. The auto scaling feature does a great job during most of the time and the consumed Read Capacity Units (RCU) are really low most part of the day. However, there's a programmed job that is executed around midnight which consumes about 10x the provisioned RCU and since the auto scaling takes some time to adjust the capacity, there are a lot of throttled read requests. Furthermore, I suspect my requests are not being completed (though I can't find any exceptions in my error log).
In order to handle this situation, I've considered increasing the provisioned RCU using the AWS API (updateTable) but calculating the number of RCU my application needs may not be straightforward.
So my second guess was to retry failed requests and simply wait for auto scale increase the provisioned RCU. As pointed out by AWS docs and some Stack Overflow answers (particularlly about ProvisionedThroughputExceededException):
The AWS SDKs for Amazon DynamoDB automatically retry requests that receive this exception. So, your request is eventually successful, unless the request is too large or your retry queue is too large to finish.
I've read similar questions (this one, this one and this one) but I'm still confused: is this exception raised if the request is too large or the retry queue is too large to finish (therefore after the automatic retries) or actually before the retries?
Most important: is that the exception I should be expecting in my context? (so I can catch it and retry until auto scale increases the RCU?)
Yes.
Every time your application sends a request that exceeds your capacity you get ProvisionedThroughputExceededException message from Dynamo. However your SDK handles this for you and retries. The default Dynamo retry time starts at 50ms, the default number of retries is 10, and backoff is exponential by default.
This means you get retries at:
50ms
100ms
200ms
400ms
800ms
1.6s
3.2s
6.4s
12.8s
25.6s
If after the 10th retry your request has still not succeeded, the SDK passes the ProvisionedThroughputExceededException back to your application and you can handle it how you like.
You could handle it by increasing throughput provision but another option would be to change the default retry times when you create the Dynamo connection. For example
new AWS.DynamoDB({maxRetries: 13, retryDelayOptions: {base: 200}});
This would mean you retry 13 times, with an initial delay of 200ms. This would give your request a total of 819.2s to complete rather than 25.6s.

Resources