If there is a Node.js Lambda function that creates a connection to a RDS Postgres database, is this connection automatically destroyed (i.e. without configuring any idle timeout setting) when the Lambda container is destroyed?
(I've seen mixed responses regarding this.)
When Lambda runs your function, it creates an Execution Context, which would be your "Container" although it's more like a Micro-VM for betters isolation (as we learned at re:Invent 2018).
To quote from the documentation (emphasis mine).
After a Lambda function is executed, AWS Lambda maintains the
execution context for some time in anticipation of another Lambda
function invocation. In effect, the service freezes the execution
context after a Lambda function completes, and thaws the context for
reuse, if AWS Lambda chooses to reuse the context when the Lambda
function is invoked again. This execution context reuse approach has
the following implications:
Any declarations in your Lambda function code (outside the handler
code, see Programming Model) remains initialized, providing additional
optimization when the function is invoked again. For example, if your
Lambda function establishes a database connection, instead of
reestablishing the connection, the original connection is used in
subsequent invocations. We suggest adding logic in your code to check
if a connection exists before creating one.
[...]
When you write your Lambda function code, do not assume that AWS
Lambda automatically reuses the execution context for subsequent
function invocations. Other factors may dictate a need for AWS Lambda
to create a new execution context, which can lead to unexpected
results, such as database connection failures.
With regards to your question I'd conclude the following:
If Lambda has a frozen context stored away, it may use it, but there are no guarantees
Lambda may store your context for a while after the execution and thereby keeping up the database connection, if it has been established/stored outside of your handler function.
Lambda decides if and when to remove any frozen contexts and makes no guarantees when it will do that.
If you need to ensure that your connection gets terminated at the end of the execution, you have to do so yourself or keep it within the handler function, you can't rely on Lambda to do that for you.
Related
I develop for first time on aws lambda with serverless
I know that my NodeJS code is not blocking so a NodeJS server can handle several requests simultaneously
My question : does Lambda create an instance for each call ? example if there are 10 simultaneous connections, will Lambda create 10 instances of NodeJS
Currently, in my tests, I have the impression that lambda creates an instance for each call because at each call, my code creates a new connection to my database while locally my code keeps in memory the connection to my database
Yes, this is a fundamental feature of AWS Lambda (and "serverless" functions in general). A new instance is created for each request.
If you have multiple parallel executions, all will be separate instances (and this, each would use its own connection to the DB).
Now, if you are invoking multiple Lambda functions one after another, that's a bit different. It is possible that subsequent invocations of the Lambda function reuse the context. That means there is a possibility of reusing some things, like DB connection in subsequent calls to the Lambda function.
There is no exact information about how long a Lambda function keeps the previous context alive. Also, in order to reuse things like DB connection, you must define and obtain a connection outside of your Handler function. If you put it in the handler function, it will certainly not be reused.
When the context is reused, you have something called a "warm" start. Lambda function is started quicker. If some time has passed and the context cannot be reused anymore, you have a "cold" start, meaning the Lambda function will take more time to start its execution (it needs to pull all the dependencies when doing the cold start)
So, according to the docs here https://cloud.google.com/functions/docs/writing/http
Terminating HTTP functions
If a function creates background tasks (such as threads, futures, Node.js Promise objects, callbacks, or system processes), you must terminate or otherwise resolve these tasks before returning an HTTP response. Any tasks not terminated prior to an HTTP response may not be completed, and may also cause undefined behavior.
So, if one needs to launch a long-running background task from within HTTP function, but still return from function fast, there is no a straightforward way.
Have tried the PubSub approach (calling await topic.publishJSON(pars)), but looks like publishing a topic is quite time-consuming operation - which takes 2-3 secs. (8-)
Then probably pubsub trigger function runs well ok, but this 2-3 seconds delay makes it useless.
P.S.: using the approach with starting Promise from inside function is actually working, but it sounds like error-prone since it's against the docs.
If you need a quick answer you have 2 type of solutions
Async
With Cloud Functions, you need to invoke (perform an HTTP call) another functions (or Cloud Run or App Engine), without waiting the answer, and answer back to the requester. The call that you performed will run in background and answer something to your cloud function that no longer listen!
With PubSub, it's similar. Instead of invoking a Cloud Functions (or Cloud Run or App Engine), you publish a message into a PubSub topic. Then create a subscription to call your long running pocess
Same idea with Cloud Task, but you create a Task in a queue
Sync
If you use Cloud Run instead of Cloud Functions, you are able to perform partial answer to the requester. Like that, you can immediately answer back to the requester with a partial response which says "OK" and continue the process in the request context, and send another partial response when you want, or at the end of the long running process to inform the user the end of their process.
On AWS, is it possible to have one HTTP request execute a Lambda, which in turn triggers a cascade of Lambdas running in serial, where the final Lambda returns the result to the user?
I know one way to achieve this is for the initial Lambda to "stay running" and orchestrate the other Lambdas, but I'd be paying for that orchestration Lambda to effectively do nothing most of the time, i.e. paying for the time it's waiting on the others. If it were non-lambda code, that would be like blocking (and paying for) an entire thread while the other threads do their work.
Unless AWS stops the billing clock while async Lambdas are "sleeping"/waiting on network IO?
Unfortunately as you've found only a single Lambda function can be invoked, this becomes an orchestrator.
This is not ideal but will have to be the case if you want to use multiple Lambda functions as you're serving a HTTP request, you can either use the Lambda to call a number of Lambda or instead create a Step Function which can can orchestrate the individual steps. You would still need the Lambda to start this, and then poll the status of it before returning the results.
Here is my use case:
I have a scheduler lamdba and a executor lambda.
In the scheduler lambda, I receive a list of (time, message) tuples indicating that, at time I would like to invoke the executor lambda with event message.
Here is what I have tried
In the scheduler lambda, first clear all triggers from the executor lambda. Then create a EventBridge scheduled event for each (time, message) tuple. This has a few drawbacks...
It's quite difficult to remove all triggers from a lambda, as the Lambda API doesn't let you do that (I believe I have to do it through the EventBridge API with proper tagging)
Adding and removing ~100 triggers every day seems uneconomical and is not the intended use case of event bridge
Running a dedicated EC2 instance to call the lambda function
I'm cheap and I don't want to pay for an instance that will lay idle for ~99.9% of the time.
Not serverless
Is there a serverless way of trigger a lambda in a non-periodic fashion?
A bit of a departure, but could you use dynamodb with a ttl? The scheduler could simply write to the table with the message, and format the ttl column to expire at the time you're adding to the tuple.
You could subscribe the executor lambda to the DynamoDb events, and only respond to events that are removed, and if you use New and old images you can retrieve the message from the old image (otherwise I believe it's empty when the item is deleted).
I trying to wrap my head around Cloud Function's instances and how they work.
I'm asking about an example of an HTTP function, but I think the concept applies to any kind of function.
Let's say I have this cloud function that handles SSR for my app, named ssrApp.
And let's assume that it takes 1 second to complete every time it gets a request.
When Cloud Function receives the 1st request, it will spin up an instance to respond it.
QUESTION
How does that instance behave when multiple requests are coming?
From: https://cloud.google.com/functions/docs/concepts/exec
Each instance of a function handles only one concurrent request at a time. This means that while your code is processing one request, there is no possibility of a second request being routed to the same instance. Thus the original request can use the full amount of resources (CPU and memory) that you requested.
Does it mean that during that 1 second when my ssrApp function is running, if somebody hits my app URL, it is guaranteed that Cloud Function will spin up another instance for that second request? Does it matter if the function does only sync calls or some async calls in its execution? What I mean is, could an async call free the instance to respond to another request in parallel?
Does it mean that during that 1 second when my ssrApp function is running, if somebody hits my app URL, it is guaranteed that Cloud Function will spin up another instance for that second request?
That's the general behavior, although there are no guarantees around scheduling.
Does it matter if the function does only sync calls or some async calls in its execution? What I mean is, could an async call free the instance to respond to another request in parallel?
No, that makes no difference. If the container is waiting for an async call, it it still considered to be in-use.
2022 Update
For future searchers, Cloud Functions Gen2 now supports concurrency: https://cloud.google.com/functions/docs/2nd-gen/configuration-settings#concurrency