AWS Lambda does not run independently

AWS Lambda does not run independently - node.js

I am using the nodejs to use AWS Lambda.
As I know each function of lambda is handled in independent and parallel process.
However, following example shows different result than I expected.
// test.js
const now = new Date();
module.exports = () => {
console.log(now);
};
// handler.js
const test = require('./test');
module.exports.hello = async (event, context) => {
test();
return {
statusCode: 200,
body: null
};
};
RESULT:
hello handler log
As I intended, each function was executed independently, so the value of console.log(now) should always be the point at which it was executed.
However, in the actual log, the value of now is continuously recorded at the point of the very first execution - rather than each function’s execution.
The log’s value after 5 minutes was the same.
However, the value changed after 12 hours, but after that, it shows the same problem.
This result gives us serious consideration of how to manage the DB connection.
There are two assumption for each case of lambda’s recycling
If lambda recycles like test.js,
better to use connection pool
also recommends to use a orm such as sequelize which requires initialization
If not,
better to use simple connections and regular queries to quickly consume connections
How can we use lambda within maximum performance?
How can we interpret the test results above?

AWS Lambda creates and reuses the containers, so you need to understand the impact of this practice on the programming model.
The first time a function executes, a new container will be created to execute it.
Let’s say your function finishes, and some time passes, then you call it again. Lambda may create a new container all over again. However, if you haven’t changed the Lambda function code and not too much time has gone by, Lambda may reuse the previous container. This offers performance advantages: Lambda gets to skip the nodejs language initialization, and you get to skip initialization in your code (so you can reuse DB connections, for example); files that you wrote to /tmp last time around will still be there if the container gets reused; anything you initialized globally outside of the Lambda function handler persists.
For more see Understanding Container Reuse in AWS Lambda.

The behavior that you have described is a result of AWS optimizations. It looks like your lambda is very fast and it is more efficient to use only one unit of execution (process/container/instance) fro AWS. So try to simulate a long running process and see that the actual timestamps are different in this case.

Related

Does AWS lambda (nodejs) share memory across different executions?

I have this lambda func that responds to Http event (get),
sending the request I accept a pair of headers headerA and headerB.
Lambda is written in nodejs / typescript
On the implementation I have a global variable defined as a collection of objects:
let storage:{[myIndex :typeofHeaderA]? : storageDrivers } = {}
When I get a request I store a new instance of a Class I have into the above global var, on the "intended" index, if not yet present:
if(!storage[headerA]){
storage[headerA] = new MyStorageDriver(headerB)
}
What I am experiencing is that, if considering two distinct request, close to each other:
1st request headerA1 headerB1
2nd request headerA1 headerB2
After the 2nd request storage[headerA1] will contain a instance of MyStorageDriver(headerB1) and not MyStorageDriver(headerB2)
As if on different request lambda execution , the global scope memory is shared or reused across request.
Is it something expected with AWS lambda or is it something else that lead me to this unexpected behaviour?
My current solution (and to double check this behaviour, is to change the global var like this :
let storage:{[myIndex :typeofHeaderB]? :{
[myIndex :typeofHeaderA]? : storageDrivers }
} = {}
and then assign like this on requests:
if(!storage[headerB]){storage[headerB]={}
if(!storage[headerB][headerA]){
storage[headerB][headerA] = new MyStorageDriver(headerB)
}

Yes it does happen and no, you shouldn't rely on it.
What happens is that each time there is a request for a Lambda to act, AWS backend looks to see whats available. If no current running container exists for that lambda, it will spin one up and create the container, the lambda, and then use it. Calling the lambda is called 'Invoke'. The process of starting up a container and initializing the lambda is called the Cold Start, and is a bit of a problem - it can take upwards of 15-30 seconds depending on the complexity of your lambda and layers
The next time AWS goes to Invoke that lambda, if an already existing container is still running, it will attempt to re-use that container. This is known as 'Keeping the Lambda Warm' so that it can be Invoked without the cold start, resulting in a much faster response time.
If it needs to respond to multiple requests (ie: scale) it will spin up additional containers. These are known as Concurrent Executions - if you view the metrics for your lambda you will find maybe thousands of Invokes - but only 3 or 4 Concurrent.
The logic that AWS uses on when to spin up and when to shut down conccurrent executions is based on predictive understanding of your traffic + need + whatever settings you have for provisioned capacity.
Across any invoke that is using an existing container, global variables will be maintained - because it is the same environment. This is why you are not supposed to use global variables for anything that may change during the execution of the lambda You cannot rely on concurrent executions and being in the same container as a previous execution, and you cannot rely on it not being one either.

Are there conditions under which variables in an AWS Node Lambda persist between invocations?

I wouldn't think so, but I don't have another good explanation for what I observed. Here's a rough version of the relevant code, which is inside the handler function (i.e., it would not be expected to persist between invocations):
const res = await graphqlClient.query({my query})
const items = res.data.items
console.log(items) // <- this is the line that logs the output below
items.push({id: 'some-id'})
const itemResults = await Promise.all(items.map((item) => etc etc)
Over successive invocations from my client, spaced less than ten seconds apart, some-id was repeatedly added to items. On the first invocation, this is what was logged in CloudWatch after const items = res.data.items:
[
{
anotherId: 'foo',
id: 'bar',
}
]
The 2nd time it was invoked, after a few seconds, written to the logs before the call to items.push():
[
{
anotherId: 'foo',
id: 'bar',
},
{ id: 'some-id' }
]
The 3rd time, again written to the logs before the call to items.push():
[
{
anotherId: 'foo',
id: 'bar',
},
{ id: 'some-id' },
{ id: 'some-id' }
]
items is never written to persistent storage. some-id is only modified twice: when it's set to equal the value returned by the graphql query, and when I manually push another value onto the stack. I can prevent this bug by checking to see if some-id is already on the stack, so I'm unblocked for now, but how could it persist over successive runs? I never would've expected a Lambda to behave that way! I thought each invocation was stateless.

AWS Lambda is kind of stateless but not fully. You have to take care yourself that this is really true. Since your example code above is missing a handler function, I assume you didn't provide the full code and that you have defined const items outside of your handler function. A rough explanation based on my assumption:
Everything outside of your handler function is initialized once when starting your Lambda function for the first time (i.e. 'cold start'). You can initalize variables, database connections, etc. and reuse them in every invocation as long as the instance of your Lambda function stays alive.
Then, your handler function is invoked after the initialization steps and also for each future invocation. If you change values/objects outside of your handler function, then they'll survive the current invocation and you can use them in your next invocation. This way, you can cache some expensive data or do some other optimizations. For example:
const items = []
exports.handler = function(...) {
// ...
items.push(...)
// ...
}
This is also true for Java and Python Lambda functions and I believe for most other runtimes as well. Now, this is probably the explanation to what you observe: in one invocation you are pushing something to items and in the next one invocation, the previous data has survived because it was stored outside of the handler function.
Suggestion in your case: if you want full stateless functions, don't modify data outside of your handler function and instead, only store values inside. But take care that this can slow down your Lambda functions if you need to initialize data in each invocation.
Since this behavior of AWS Lambda is often used for caching data, there are a few blog posts covering this topic as well and how the code is handling it. They usually provide more visual explanations and example code:
Caching in AWS Lambda (note: my own blog post)
Leveraging Lambda Cache for Serverless Cost-Efficiency
All you need to know about caching for serverless application (This is covering much more about caching but one part of it is also considering caching inside a Lambda function)
There's much more happening behind the scenes of course. If you are interested in how this whole process works, I can recommend you taking a look into the Execution Environment Details. The article is more focused on giving background to building extensions and how the process outside of the code is working but it might help you understand what's happening behind the scenes.

How are Firebase Cloud Functions invocations counted?

I have plenty of functions in my program with Firebase Cloud Functions. But I'm not sure how the invocations are counted. I am asking this so that I can minimise my invocations as much as possible.
TO start off, I have a function:
export const fucntionOne = functions.auth.user().onCreate((user) => {
console.log('1 invocation used!!')
})
When a new user joins, this function will be executed and ofcourse it will consume my 1 invocation.
So before asking the question here, I did a couple of experiments and found out that if I use another function functionTwo and call it in the first one like this:
export const fucntionOne = functions.auth.user().onCreate((user) => {
console.log('1 invocation used!!')
functionTwo()
})
function functionTwo(){
console.log('Second function worked. . .')
}
This time I was expecting my function invocation count by 2. But it did increased only by 1. This is fine.
But let's say I use a function with some relation with firebase-realtime-database, let's say save or retrieve data using either .onCreate() or .onDelete() and so one and call the function like:
export const fucntionOne = functions.auth.user().onCreate((user) => {
console.log('1 invocation used!!')
})
export const functionOnCreate = functions.database
.ref('/path')
.onCreate((snapshot, context) => {
//Do any thing in the function
console.log(`functionOnCreate executed.`)
return <something>
})
Now let's be clear, functionOnCreate() is executed when some nodes are created using functionOne() in the database using .onCreate() method.
BUT, this time it costed me 2 Invocations.
Okay, I'm clear that it's because functionOnCreate() is being called by [or invoked by] firebase functions so it might be the reason for it.
But if I manage to perform tasks on database by manually calling functions like functionTwo() did. Then will it SAVE me from some extra function invocations?

You are charged for the number of times one of your Cloud Functions is invoked. In your second code snippet, you have only one Cloud Function. Calls inside this Cloud Function to other regular JavaScript functions are not charged operations.
You are charged for the amount of time that your code runs, and for the memory it has allocated during that time. Calling multiple regular functions will increase the time the code runs, and may increase the memory required.
Performing the additional database writes in your functions.auth.user().onCreate triggered Cloud Function will indeed prevent a charge for the functions.database.ref('/path').onCreate( triggered Cloud Function. I recommend that you do some calculations on the number of invocations though, as invocation count is seldom a major factor in their cost.

AWS Lambda times out after running successfully

I created a Node.js Lambda Function for AWS using the Serverless framework for increasing different counters in a Postgres database based on event parameters. The function itself runs without any errors when invoking with serverless invoke local, it runs and works as expected, however, when invoked from Java, while it should finish and return, it simply times out.
I've tried several things including waiting for the Postgres pool to close, increasing timeout, returning with the callback function (which I thing is a good practice nevertheless as it makes more clear that the function ends there), and using promise chains instead of async-await, with no luck. The real question is if it's just how it works and I have to always add callbackWaitsForEmptyEventLoop(false) or is there a more elegant solution? I even tried the why-is-node-running package, and it says that 4 handles are keeping the process running, a TCPWRAP, a Timeout, and two TickObjects. I'm almost sure that node-postgres is causing this as I created multiple lambda functions suffering from the same issue.
// These are the last lines of the handler function
const insertQueries = [
// Multiple queries using a node-postgres pool, e.g.
// pool.query(...);
];
try {
await Promise.all(insertQueries);
} catch(err) {
return callback('Couldn\'t insert API stats: ' + err);
}
return callback(null, 'API stats inserted successfully!');
The AWS Java SDK only prints a debug message telling me that task timed out after 10.01 seconds (serverless.yml has 10 seconds set).

Should an AWS Lambda function instance in Node.js pick up another request during an async await?

Let's say I've got a queue of requests for my Lambda, and inside the lambda might be an external service call that takes 500ms, which is wrapped in async await like
async callSlowService(serializedObject: string) Promise<void>{
await slowServiceClient.post(serializedObject);
}
Should I expect that my Lambda instance will pick up another request off the queue while awaiting the slow call? I know it'll also spin up new Lambda instances but that's not what I'm talking about interleaving requests on a single instance.
I'm asking because I would think that it should do this, however I'm testing with a sleep function and a load generator and it's not happening. My code actually looks like this:
async someCoreFunction() Promise<void>{
// Business logic
console.log("Before wait");
await sleep(2000);
console.log("After wait");
}
}
const sleep = (milliseconds) => {
return new Promise(resolve => setTimeout(resolve, milliseconds))
};
And while it definitely is taking 2 seconds between the "Before wait" and "After wait" statements, there's no new logs being written in that time.

No.
Lambda as a service is largely unaware of what your code is doing. It simply takes a request, invokes your code and then waits for it to return.
I would not expect AWS to implement a feature like interleaving any time soon. It would require the lambda runtime to have substantial knowledge of how your code behaves (for example, you may be awaiting two concurrent long asynchronous calls within one invocation- so simply interrupting when you hit your first await would be incorrect). It would also cause no end of issues for people using the shared scope outside of the handler for common setup/teardown.
As you pay per invocation and time, I don't really see that there is much difference between interleaving and processing the queue in parallel (which lambda natively supports); considering that time spent awaiting still requires some compute. If interleaving ever happens I'd expect it to be a way for AWS to reduce the drain on their own resources.
n.b. If you are awaiting for a long time in a lambda function then there is probably a better way of doing things. For example, Step Functions provide a great way to kick off and poll long running tasks. Similarly, the pattern of using a session variable in your payload is a good way of allowing a long service to callback into lambda without having the lambda idling.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string