AWS SDK calls from a Lambda take longer than 30 seconds - node.js

I have a NodeJs Lambda function in AWS which needs to read some data. As a data source we've tried two options - S3 and DynamoDB. Both on them have the same issue - when we conduct load testing (10 req/sec during 100sec) some requests to those S3/DynamoDB fail to complete in 30 sec, which is our Lambda timeout. The requests themselves are very light - for S3 it is a 1KB file and for DynamoDB it is a table with only one record in it. On average those requests take less than 100ms, but sometimes we get these very long peaks I'm talking about.
The rate of such long requests is quite small - less than 1%, but this is still not acceptable for us. Moreover, I don't see any reasons why we have such long responses.
Another thing we've noticed is that those 30sec+ requests usually happen after long periods (4h or more) of not calling those S3/DynamoDB resources.
The only reason I can think of is that after long inactivity periods AWS infrastructure unable to create required number of ENIs fast enough. ENIs are needed because both S3 and DynamoDB are called via HTTP by aws-sdk. But this is just a guess which I don't know how to validate.
Currently, I'm thinking of warming-up ENIs by making requests to S3/DynamoDB, but I haven't tried it yet.
If anybody has had similar issues I would appreciate any suggestions on how to fix the issue.
P.S. Increasing a Lambda timeout is not an options for us. 30secs are more than enough to make such a simple calls.

Related

What is causing intermittent long execution times for DynamoDB operations

We are experiencing intermittent DynamoDB execution times, taking consistently just over 10 seconds. This is happening roughly once every hour or three.
The DynamoDB operations are called from a Node.js Lambda (either getItem or updateItem)
The AWS SDK DynamoDB instance instantiation happens once outside of the Lambda handler
The Lambda and DynamoDB are in the same VPC with a DynamoDB endpoint enabled
The Lambda is triggered by a DynamoDB stream from another DynamoDB table which can happen multiple times a second
The tables are set to PAY_PER_REQUEST
The Partition Key is unique and we are getting concurrency of 4 Lambdas
We have given the Lambda 2560MB of memory
We have tried setting the BATCH_SIZE of the stream from 1 to 10 but it never prevents the intermittent long execution time
We have enabled AWS SDK logging and can see on the long spike times that the DynamoDB call has been retried once - the duration on these occasions is consistently just over 10 seconds
We have set the AWS SDK timeout from 5 secs to 2 secs and still see the 10 second delay
We have maxRetries on the table set to 10
The cause for the consistent 10 second timing of the long spikes is hard to figure out. If anyone has any other idea to try out or advice it would be much appreciated. We are also liaising with DynamoDB, Lambda and Networking AWS support departments but so far they haven't figured out what is going on.
Much thanks,
Sam

How to handle multiple API Requests

I am working with the Google Admin SDK to create Google Groups for my organization. I can't add members to the group when creating the group, ideally, when I create a new group I'd like to add roughly 60 members. In addition, the ability to add members after the group is created in bulk (a batch request) was deprecated August 2020. Right now, after I create a group I need to make a web request to the API to add each member of the group individually (which will be about 60 members).
I am using node.js and express, is there a good way to handle 60 web requests to an api? I don't know how taxing this will be on my server. If anyone has any resources to share where I can learn about the impact this would have on a nodejs server that would be great.
Fortunately, these groups aren't created often, maybe 15 a month.
One idea I have is to offload the work to something like a cloud function so my node server makes one request to the cloud function, then the cloud function makes all the additional requests to add members to the Group. I'm not 100% sure if this is necessary and I'm curious on other approaches.
Limits and Quotas
Note that adding group members may take 10 minutes to propagate.
The rate limit for the Directory API is 3000 queries per 100 seconds per IP address. This works out to around 30 per second. 60 requests is not a large amount of requests, but if you try to send them all in a few milliseconds the system may extrapolate the rate and deem it over the limit, I wouldn't think so, though probably best to test it on your end with your system and connection etc.
Exponential Backoff
If you do need to make many requests this is the method Google recommends. It involves repeating the request if it fails and then exponentially increasing the amount of time to wait until it reaches 16 seconds. You can always implement a longer wait to retry. Its only 60 requests after all.
Batch Requests
The previously mentioned methods should work no issue for you, since there are only 60 requests to make, it won't put any "stress" on the system as such. That said, the most performant way to handle many requests to the Directory API is to use a batch request. This allows you to bundle up all your member requests into one large batch, of up to 1000 calls. This will also give you a nice cushion in case you need to increase your requests in future!
EDIT - I am sorry, I missed that you mentioned that Batching is deprecated. Only global batching is deprecated. If you send a batch request to a specific API, batching will still be supported. What you can no longer do is send a single batch request to different APIs, like Directory and Sheets in one.
References
Limits and Quotas
Exponential Backoff
Batch Requests

Handling time consuming operations using Nodejs and AWS

The current setup of the project I am working on is based on Nodejs/Express and AWS. AWS Lambda is triggered on a daily basis and is used to call an API endpoint which is expected to fire a varying number of emails via Sendgrid (hundreds to thousands). With a lower number of emails it worked fine but when the number of emails was around 1000 the Lambda timed out and the API crashed.
The limit on Lambda was 1 minute. Raising it up to 5 minutes might make this case of 1000 emails pass but might fail when the number is several thousands. Apart from that we would like to avoid keeping the server busy for several minutes because of which it was set to 1 minute initially.
We are now looking for better solutions to this specific situation. What would be a better approach, is it an option to use SNS Queue, or Serverless with moving all the code that sends the emails to Lambda?
Thanks for any inputs in advance and if more information is required please let me know.
Lambdas are not designed for long running operations. You can use Elastic Beanstalk Workers https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
Briefly, the lambda publish the task to an SQS queue and an elastic beanstalk app handles it.

Amazon Cloudfront timeout error

I am working on a node project which generates data using mongodb dataset-generator and I've added my data generation server code to AWS's Lambda which I've expose to AWS's api gateway.
So now the issue is that CloudFront timeout the request after 30 seconds. And the problem is that the computation I am doing cannot be break into multiple API hits. So can anyone from community can help me out here or can tell me some alternative which allows me to hit request which won't get timeout.
I believe I originally misinterpreted the nature of the problem you are experiencing.
So now the issue is that CloudFront timeout the request after 30 seconds
I assumed, since you mentioned CloudFront, that you had explicitly configured CloudFront in front of your API Gateway endpoint.
It may be true that you didn't, since API Gateway implicitly uses services from "the AWS Edge Network" (a.k.a. CloudFront) to provide a portion of its service.
My assumption was that API Gateway's "hidden" CloudFront distributions had different behavior than a standard CloudFront distribution, but apparently that is not the case to any extent that is relevant, here.
In fact, API Gateway also has a 30 second response timeout and Can Be Increased? is No. So the "CloudFront" timeout is essentially the same timeout as the one imposed by API Gateway.
This, of course, would have have precedence over any longer timeout on your Lambda function.
There isn't a simple and obvious workaround. This seems like a task that is outside the scope of the design of API Gateway.
One option -- which I personally tend to dislike when APIs force it on me -- is to require pagination. I really hate that... just give me the data, I can handle it... but it has its practical applications. If the request is for 1000000 rows, return rows 1 through 1000 and return a next_url that will fetch rows 1001 through 2000.
Another option is for the initial function to submit the request to a second lambda function, using asynchronous invocation, for processing, and return a redirect that will send the user to a new URL where the data can be fetched. Now, stick with me, because this solution sounds really horrible, but it's theoretically viable. The asynchronous function would do the work in the background, and store the response in S3. The URL where the data is fetched would be third lambda function that would poll the key in the S3 bucket where the data is to be stored, say once per second for 20 seconds. If the file shows up, it would pre-sign a URL for that location, and issue a final redirect to the browser with the signed URL as the Location. If the file does not show up, it would redirect the browser back to itself again so that polling would continue until the file shows up or the browser gets tired of the redirect loop.
Sketchy? Yes. Viable? Probably. Good idea? That's debatable... but it seems as if you are doing something that really is outside the fundamental design parameters of API Gateway, so either a fairly complex workaround is needed, of you'll want to implement this somewhere other than with API Gateway.
Of course, you could write your own "API Gateway" that runs on EC2 and invokes Lambda functions directly through the Lamdba API and returns results to the caller -- so Lambda still handles the work and the scaling, but you avoid the 30 second timeout. 30 seconds is a long time to wait for a web response.
I see that this is the old question but need to say that start from March 2017 it is possible to change an origin response timeout and keep-alive timeout.
https://aws.amazon.com/about-aws/whats-new/2017/03/announcing-configure-read-timeout-and-keep-alive-timeout-values-for-your-amazon-cloudfront-custom-origins/
Max value is 60 seconds for Origin response timeout but if needed AWS can increase value to 180 seconds (with support request)

How to optimize AWS Lambda?

I'm currently building web API using AWS Lambda with Serverless Framework.
In my lambda functions, each of them connects to Redis (elasticache) and RDB (Aurora, RDS) or DynamoDB to retrieve data or write new data.
And all my lambda functions are running in my VPC.
Everything works fine except that when a lambda function is first executed or executed a while after last execution, it takes quite a long time (1-3 seconds) to execute the lambda function, or sometimes it even respond with a gateway timeout error (around 30 seconds), even though my lambda functions are configured to 60 seconds timeout.
As stated in here, I assume 1-3 seconds is for initializing a new container. However, I wonder if there is a way to reduce this time, because 1-3 seconds or gateway timeout is not really an ideal for production use.
You've go two issues:
The 1-3 second delay. This is expected and well-documented when using Lambda. As #Nick mentioned in the comments, the only way to prevent your container from going to sleep is using it. You can use Lambda Scheduled Events to execute your function as often as every minute using a rate expression rate(1 minute). If you add some parameters to your function to help you distinguish between a real request and one of these ping requests you can immediately return on the ping requests and then you've worked around your problem. It will cost you more, but we're probably talking pennies per month if anything. Lambda has a generous free tier.
The 30 second delay is unusual. I would definitely check your CloudWatch logs. If you see logs from when your function is working normally but no logs from when you see the 30 second timeout then I would assume the problem is with API Gateway and not with Lambda. If you do see logs then maybe they can help you troubleshoot. Another place to check is the AWS Status Page. I've seen sometimes where Lambda functions timeout and respond intermittently and I pull my hair out only to realize that there's a problem on Amazon's end and they're working on it.
Here's a blog post with additional information on Lambda Container Reuse that, while a little old, still has some good information.

Resources