Aws lambda node and concurrency - node.js

I develop for first time on aws lambda with serverless
I know that my NodeJS code is not blocking so a NodeJS server can handle several requests simultaneously
My question : does Lambda create an instance for each call ? example if there are 10 simultaneous connections, will Lambda create 10 instances of NodeJS
Currently, in my tests, I have the impression that lambda creates an instance for each call because at each call, my code creates a new connection to my database while locally my code keeps in memory the connection to my database

Yes, this is a fundamental feature of AWS Lambda (and "serverless" functions in general). A new instance is created for each request.

If you have multiple parallel executions, all will be separate instances (and this, each would use its own connection to the DB).
Now, if you are invoking multiple Lambda functions one after another, that's a bit different. It is possible that subsequent invocations of the Lambda function reuse the context. That means there is a possibility of reusing some things, like DB connection in subsequent calls to the Lambda function.
There is no exact information about how long a Lambda function keeps the previous context alive. Also, in order to reuse things like DB connection, you must define and obtain a connection outside of your Handler function. If you put it in the handler function, it will certainly not be reused.
When the context is reused, you have something called a "warm" start. Lambda function is started quicker. If some time has passed and the context cannot be reused anymore, you have a "cold" start, meaning the Lambda function will take more time to start its execution (it needs to pull all the dependencies when doing the cold start)

Related

Is it possible for multiple AWS Lambdas to service a single HTTP request?

On AWS, is it possible to have one HTTP request execute a Lambda, which in turn triggers a cascade of Lambdas running in serial, where the final Lambda returns the result to the user?
I know one way to achieve this is for the initial Lambda to "stay running" and orchestrate the other Lambdas, but I'd be paying for that orchestration Lambda to effectively do nothing most of the time, i.e. paying for the time it's waiting on the others. If it were non-lambda code, that would be like blocking (and paying for) an entire thread while the other threads do their work.
Unless AWS stops the billing clock while async Lambdas are "sleeping"/waiting on network IO?
Unfortunately as you've found only a single Lambda function can be invoked, this becomes an orchestrator.
This is not ideal but will have to be the case if you want to use multiple Lambda functions as you're serving a HTTP request, you can either use the Lambda to call a number of Lambda or instead create a Step Function which can can orchestrate the individual steps. You would still need the Lambda to start this, and then poll the status of it before returning the results.

How does a Cloud Function instance handles multiple requests?

I trying to wrap my head around Cloud Function's instances and how they work.
I'm asking about an example of an HTTP function, but I think the concept applies to any kind of function.
Let's say I have this cloud function that handles SSR for my app, named ssrApp.
And let's assume that it takes 1 second to complete every time it gets a request.
When Cloud Function receives the 1st request, it will spin up an instance to respond it.
QUESTION
How does that instance behave when multiple requests are coming?
From: https://cloud.google.com/functions/docs/concepts/exec
Each instance of a function handles only one concurrent request at a time. This means that while your code is processing one request, there is no possibility of a second request being routed to the same instance. Thus the original request can use the full amount of resources (CPU and memory) that you requested.
Does it mean that during that 1 second when my ssrApp function is running, if somebody hits my app URL, it is guaranteed that Cloud Function will spin up another instance for that second request? Does it matter if the function does only sync calls or some async calls in its execution? What I mean is, could an async call free the instance to respond to another request in parallel?
Does it mean that during that 1 second when my ssrApp function is running, if somebody hits my app URL, it is guaranteed that Cloud Function will spin up another instance for that second request?
That's the general behavior, although there are no guarantees around scheduling.
Does it matter if the function does only sync calls or some async calls in its execution? What I mean is, could an async call free the instance to respond to another request in parallel?
No, that makes no difference. If the container is waiting for an async call, it it still considered to be in-use.
2022 Update
For future searchers, Cloud Functions Gen2 now supports concurrency: https://cloud.google.com/functions/docs/2nd-gen/configuration-settings#concurrency

Are Postgres connections automatically destroyed with Lambda container?

If there is a Node.js Lambda function that creates a connection to a RDS Postgres database, is this connection automatically destroyed (i.e. without configuring any idle timeout setting) when the Lambda container is destroyed?
(I've seen mixed responses regarding this.)
When Lambda runs your function, it creates an Execution Context, which would be your "Container" although it's more like a Micro-VM for betters isolation (as we learned at re:Invent 2018).
To quote from the documentation (emphasis mine).
After a Lambda function is executed, AWS Lambda maintains the
execution context for some time in anticipation of another Lambda
function invocation. In effect, the service freezes the execution
context after a Lambda function completes, and thaws the context for
reuse, if AWS Lambda chooses to reuse the context when the Lambda
function is invoked again. This execution context reuse approach has
the following implications:
Any declarations in your Lambda function code (outside the handler
code, see Programming Model) remains initialized, providing additional
optimization when the function is invoked again. For example, if your
Lambda function establishes a database connection, instead of
reestablishing the connection, the original connection is used in
subsequent invocations. We suggest adding logic in your code to check
if a connection exists before creating one.
[...]
When you write your Lambda function code, do not assume that AWS
Lambda automatically reuses the execution context for subsequent
function invocations. Other factors may dictate a need for AWS Lambda
to create a new execution context, which can lead to unexpected
results, such as database connection failures.
With regards to your question I'd conclude the following:
If Lambda has a frozen context stored away, it may use it, but there are no guarantees
Lambda may store your context for a while after the execution and thereby keeping up the database connection, if it has been established/stored outside of your handler function.
Lambda decides if and when to remove any frozen contexts and makes no guarantees when it will do that.
If you need to ensure that your connection gets terminated at the end of the execution, you have to do so yourself or keep it within the handler function, you can't rely on Lambda to do that for you.

Connecting from AWS Lambda to MongoDB

I'm working on a NodeJS project and using pretty common AWS setup it seems. My ApiGateway receives call, triggers lambda A, then this lambda A triggers other lambdas, say B or C depending on params passed from ApiGateway.
Lambda A needs to access MongoDB and to avoid hassle with running MongoDB myself I decided to use mLab. ATM Lambda A is accessing MongoDB using NodeJS driver.
Now, not to start connection with every Lambda A execution I use connection pool, again, inside of Lambda A code, outside of handler I keep connection pool that allows me to reuse connections when Lambda A is invoked multiple times.
This seems to work fine.
However, I'm not sure how to deal with connections when Lambda A is invoking Lambda B and Lambda B needs to access mLab's MongoDB database.
Is it possible to pass connection pool somehow or Lambda B would have to keep its own connection pool?
I was thinking of using mLab's Data API that exposes most of the operations of MongoDB driver and so I could use HTTP calls e.g. GET and POST to run commands against database. It seems similar to RESTHeart it seems.
I'm leaning towards option 2 but on mLab's Data API it clearly states to avoid using REST api unless cannot connect using MongoDB driver directly:
The first method—the one we strongly recommend whenever possible for
added performance and functionality—is to connect using one of the
available MongoDB drivers. You do not need to use our API if you use
the driver. The second method, documented in this article, is to
connect via mLab’s RESTful Data API. Use this method only if you
cannot connect using a MongoDB driver.
Given all this how would it be best to approach it? 1 or 2 or is there any other option I should consider?
Unfortunately you won't be able to 'share' a mongo connection across lambdas because ultimately there's a 'physical' socket to the connection which is specific to that instance.
I think both of your solutions are good depending on usage.
If you tend to have steady average concurrency on both lambda A and B across an hour period (which is a bit of a rule of thumb as to how long AWS keeps a lambda instance alive), then having them both own their own static connections is a good solution. This is because the chances are that a request will reach an already started and connected lambda. I would also guess that node drivers for 'vanilla' mongo are more mature than those for the RESTFul Data API.
However if you get spikey or uneven load, then you might use the RESTFul Data API. This is because you'll be centralising the responsibility for managing the number of open connections to your instances to a single point, which under these conditions means you're less likely to be opening unneeded connections, or using all of your current capacity and having to wait for a new connection to be established.
Ultimately it's a game of probabilistic load balancing- either you 'pool' all your connections in a central place (the Data API) and become less affected by the usage of a single function at the expense of greater latency on individual operations, or you pool at a function level but are more exposed to cold-starts opening connections under uneven concurrency.

How to share database connection between different lambda functions

I went through some articles about taking advantage of lambda's container and sharing things like database connection between multiple instances, however, what if I have multiple lambda functions accessing the database and I want to have them share the same connection knowing that these functions call each other, for example, an API gateway calls the authenticator lambda function and then calls the insert user function, both of these functions make calls to the database, is it possible for them to share the same connection?
I'm using NodeJS but I can use a different language if it would support that.
You can't share connections between instances. Concurrent invocations do not use the same instance.
You can however share connections between invocations (which might be executed on the same container/instance). However, there you have to check if you connection is (still) open, in which case you can reuse it. Otherwise open a new one.
If you are worried about too many connections to your db just close the connections when you exit your lambda & instantiate new ones every time. You may also need to think about concurrency if that is a problem. A few weeks ago AWS added the possibility to control concurrency on a per function basis, which is neat.

Resources