AWS Lambda times out after running successfully - node.js

I created a Node.js Lambda Function for AWS using the Serverless framework for increasing different counters in a Postgres database based on event parameters. The function itself runs without any errors when invoking with serverless invoke local, it runs and works as expected, however, when invoked from Java, while it should finish and return, it simply times out.
I've tried several things including waiting for the Postgres pool to close, increasing timeout, returning with the callback function (which I thing is a good practice nevertheless as it makes more clear that the function ends there), and using promise chains instead of async-await, with no luck. The real question is if it's just how it works and I have to always add callbackWaitsForEmptyEventLoop(false) or is there a more elegant solution? I even tried the why-is-node-running package, and it says that 4 handles are keeping the process running, a TCPWRAP, a Timeout, and two TickObjects. I'm almost sure that node-postgres is causing this as I created multiple lambda functions suffering from the same issue.
// These are the last lines of the handler function
const insertQueries = [
// Multiple queries using a node-postgres pool, e.g.
// pool.query(...);
];
try {
await Promise.all(insertQueries);
} catch(err) {
return callback('Couldn\'t insert API stats: ' + err);
}
return callback(null, 'API stats inserted successfully!');
The AWS Java SDK only prints a debug message telling me that task timed out after 10.01 seconds (serverless.yml has 10 seconds set).

Related

Should an AWS Lambda function instance in Node.js pick up another request during an async await?

Let's say I've got a queue of requests for my Lambda, and inside the lambda might be an external service call that takes 500ms, which is wrapped in async await like
async callSlowService(serializedObject: string) Promise<void>{
await slowServiceClient.post(serializedObject);
}
Should I expect that my Lambda instance will pick up another request off the queue while awaiting the slow call? I know it'll also spin up new Lambda instances but that's not what I'm talking about interleaving requests on a single instance.
I'm asking because I would think that it should do this, however I'm testing with a sleep function and a load generator and it's not happening. My code actually looks like this:
async someCoreFunction() Promise<void>{
// Business logic
console.log("Before wait");
await sleep(2000);
console.log("After wait");
}
}
const sleep = (milliseconds) => {
return new Promise(resolve => setTimeout(resolve, milliseconds))
};
And while it definitely is taking 2 seconds between the "Before wait" and "After wait" statements, there's no new logs being written in that time.
No.
Lambda as a service is largely unaware of what your code is doing. It simply takes a request, invokes your code and then waits for it to return.
I would not expect AWS to implement a feature like interleaving any time soon. It would require the lambda runtime to have substantial knowledge of how your code behaves (for example, you may be awaiting two concurrent long asynchronous calls within one invocation- so simply interrupting when you hit your first await would be incorrect). It would also cause no end of issues for people using the shared scope outside of the handler for common setup/teardown.
As you pay per invocation and time, I don't really see that there is much difference between interleaving and processing the queue in parallel (which lambda natively supports); considering that time spent awaiting still requires some compute. If interleaving ever happens I'd expect it to be a way for AWS to reduce the drain on their own resources.
n.b. If you are awaiting for a long time in a lambda function then there is probably a better way of doing things. For example, Step Functions provide a great way to kick off and poll long running tasks. Similarly, the pattern of using a session variable in your payload is a good way of allowing a long service to callback into lambda without having the lambda idling.

lambda trigger callback vs context.done

I was following the guide here for setting up a presignup trigger.
However, when I used callback(null, event) my lambda function would never actually return and I would end up getting an error
{ code: 'UnexpectedLambdaException',
name: 'UnexpectedLambdaException',
message: 'arn:aws:lambda:us-east-2:642684845958:function:proj-dev-confirm-1OP5DB3KK5WTA failed with error Socket timeout while invoking Lambda function.' }
I found a similar link here that says to use context.done().
After switching it works perfectly fine.
What's the difference?
exports.confirm = (event, context, callback) => {
event.response.autoConfirmUser = true;
context.done(null, event);
//callback(null, event); does not work
}
Back in the original Lambda runtime environment for Node.js 0.10, Lambda provided helper functions in the context object: context.done(err, res) context.succeed(res) and context.fail(err).
This was formerly documented, but has been removed.
Using the Earlier Node.js Runtime v0.10.42 is an archived copy of a page that no longer exists in the Lambda documentation, that explains how these methods were used.
When the Node.js 4.3 runtime for Lambda was launched, these remained for backwards compatibility (and remain available but undocumented), and callback(err, res) was introduced.
Here's the nature of your problem, and why the two solutions you found actually seem to solve it.
Context.succeed, context.done, and context.fail however, are more than just bookkeeping – they cause the request to return after the current task completes and freeze the process immediately, even if other tasks remain in the Node.js event loop. Generally that’s not what you want if those tasks represent incomplete callbacks.
https://aws.amazon.com/blogs/compute/node-js-4-3-2-runtime-now-available-on-lambda/
So with callback, Lambda functions now behave in a more paradigmatically correct way, but this is a problem if you intend for certain objects to remain on the event loop during the freeze that occurs between invocations -- unlike the old (deprecated) done fail succeed methods, using the callback doesn't suspend things immediately. Instead, it waits for the event loop to be empty.
context.callbackWaitsForEmptyEventLoop -- default true -- was introduced so that you can set it to false for those cases where you want the Lambda function to return immediately after you call the callback, regardless of what's happening in the event loop. The default is true because false can mask bugs in your function and can cause very erratic/unexpected behavior if you fail to consider the implications of container reuse -- so you shouldn't set this to false unless and until you understand why it is needed.
A common reason false is needed would be a database connection made by your function. If you create a database connection object in a global variable, it will have an open socket, and potentially other things like timers, sitting on the event loop. This prevents the callback from causing Lambda to return a response, until these operations are also finished or the invocation timeout timer fires.
Identify why you need to set this to false, and if it's a valid reason, then it is correct to use it.
Otherwise, your code may have a bug that you need to understand and fix, such as leaving requests in flight or other work unfinished, when calling the callback.
So, how do we parse the Cognito error? At first, it seemed pretty unusual, but now it's clear that it is not.
When executing a function, Lambda will throw an error that the tasked timed out after the configured number of seconds. You should find this to be what happens when you test your function in the Lambda console.
Unfortunately, Cognito appears to have taken an internal design shortcut when invoking a Lambda function, and instead of waiting for Lambda to timeout the invocarion (which could tie up resources inside Cognito) or imposing its own explicit timer on the maximum duration Cognito will wait for a Lambda response, it's relying on a lower layer socket timer to constrain this wait... thus an "unexpected" error is thrown while invoking the timeout.
Further complicating interpreting the error message, there are missing quotes in the error, where the lower layer exception is interpolated.
To me, the problem would be much more clear if the error read like this:
'arn:aws:lambda:...' failed with error 'Socket timeout' while invoking Lambda function
This format would more clearly indicate that while Cognito was invoking the function, it threw an internal Socket timeout error (as opposed to Lambda encountering an unexpected internal error, which was my original -- and incorrect -- assumption).
It's quite reasonable for Cognito to impose some kind of response time limit on the Lambda function, but I don't see this documented. I suspect a short timeout on your Lambda function itself (making it fail more promptly) would cause Cognito to throw a somewhat more useful error, but in my mind, Cognito should have been designed to include logic to make this an expected, defined error, rather than categorizing it as "unexpected."
As an update the Runtime Node.js 10.x handler supports an async function that makes use of return and throw statements to return success or error responses, respectively. Additionally, if your function performs asynchronous tasks then you can return a Promise where you would then use resolve or reject to return a success or error, respectively. Either approach simplifies things by not requiring context or callback to signal completion to the invoker, so your lambda function could look something like this:
exports.handler = async (event) => {
// perform tasking...
const data = doStuffWith(event)
// later encounter an error situation
throw new Error('tell invoker you encountered an error')
// finished tasking with no errors
return { data }
}
Of course you can still use context but its not required to signal completion.

AWS Lambda does not run independently

I am using the nodejs to use AWS Lambda.
As I know each function of lambda is handled in independent and parallel process.
However, following example shows different result than I expected.
// test.js
const now = new Date();
module.exports = () => {
console.log(now);
};
// handler.js
const test = require('./test');
module.exports.hello = async (event, context) => {
test();
return {
statusCode: 200,
body: null
};
};
RESULT:
hello handler log
As I intended, each function was executed independently, so the value of console.log(now) should always be the point at which it was executed.
However, in the actual log, the value of now is continuously recorded at the point of the very first execution - rather than each function’s execution.
The log’s value after 5 minutes was the same.
However, the value changed after 12 hours, but after that, it shows the same problem.
This result gives us serious consideration of how to manage the DB connection.
There are two assumption for each case of lambda’s recycling
If lambda recycles like test.js,
better to use connection pool
also recommends to use a orm such as sequelize which requires initialization
If not,
better to use simple connections and regular queries to quickly consume connections
How can we use lambda within maximum performance?
How can we interpret the test results above?
AWS Lambda creates and reuses the containers, so you need to understand the impact of this practice on the programming model.
The first time a function executes, a new container will be created to execute it.
Let’s say your function finishes, and some time passes, then you call it again. Lambda may create a new container all over again. However, if you haven’t changed the Lambda function code and not too much time has gone by, Lambda may reuse the previous container. This offers performance advantages: Lambda gets to skip the nodejs language initialization, and you get to skip initialization in your code (so you can reuse DB connections, for example); files that you wrote to /tmp last time around will still be there if the container gets reused; anything you initialized globally outside of the Lambda function handler persists.
For more see Understanding Container Reuse in AWS Lambda.
The behavior that you have described is a result of AWS optimizations. It looks like your lambda is very fast and it is more efficient to use only one unit of execution (process/container/instance) fro AWS. So try to simulate a long running process and see that the actual timestamps are different in this case.

Lambda Timing out after calling callback

I'm using two lambda functions with Javascript's 4.3 runtime. I run the first and it calls the second synchronously (sync is the intent). Problem is the second one times out (at 60sec) but it actually reaches a successful finish after only 22 seconds.
Here's the flow between the two Lambda functions:
Lamda function A I am no longer getting CloudWatch logs for but the real problem (I think) is function B which times out for no reason.
Here's some CloudWatch logs to illustrate this:
The code in Function B at the end -- which includes the "Success" log statement see in picture above -- is included below:
Originally I only had the callback(null, 'successful ...') line and not the nodejs 0.10.x way where you called succeed() off of context. In desperation I added both but the result is the same.
Anyone have an idea what's going on? Any way in which I can debug this?
In case the invocation logic between A and B makes a difference in the state that B starts in, here's the invocation:
As Michael - sqlbot said; the issue seems to be that as long as there is an open connection, because of the non empty event loop, calling the callback doesn't terminate the function. Had the same problem with a open Redis connection; solution as stated is context.callbackWaitsForEmptyEventLoop = false;
At least for redis conenctions it helps to quit the connection to redis in order to let Lambda finish the job properly.

Asynchronous calls using postgres as an example in NodeJS

When implementing this code (example taken directly from https://github.com/brianc/node-postgres):
var pg = require('pg');
var conString = "tcp://postgres:1234#localhost/postgres";
pg.connect(conString, function(err, client) {
client.query("SELECT NOW() as when", function(err, result) {
console.log("Row count: %d",result.rows.length); // 1
console.log("Current year: %d", result.rows[0].when.getFullYear());
//Code halts here
});
});
After the last console.log, node hangs. I think this is because the asynchronous nature, and I suspect at this point, one should call a callback function.
I have two questions:
Is my thinking correct?
If my thinking is correct, then how does the mechanics work. I know NodeJS is using an event loop, but what is making this event loop halt at this point?
It appears to hang because the connection to Postgres is still open. Until it's closed, or "ended"...
client.end(); // Code halts here
Node will continue to wait in idle for another event to be added to the queue.
Not quite. This is a detail of node-postgres and its dependencies, not of Node or of its "asynchronous nature" in general.
The idling is due to and documented for the generic-pool module that node-postgres uses:
If you are shutting down a long-lived process, you may notice that node fails to exit for 30 seconds or so. This is a side effect of the idleTimeoutMillis behavior -- the pool has a setTimeout() call registered that is in the event loop queue, so node won't terminate until all resources have timed out, and the pool stops trying to manage them.
And, as it explains under Draining:
If you know would like to terminate all the resources in your pool before their timeouts have been reached, you can use destroyAllNow() in conjunction with drain():
pool.drain(function() {
pool.destroyAllNow();
});
One side-effect of calling drain() is that subsequent calls to acquire() will throw an Error.
Which is what pg.end() does and can certainly be done if your intention is to exit at the end of a serial application, such as unit testing or your given snippet.

Resources