I've been messing around with serverless and postgresql. It seems that connection pooling is possible, but when I declared a connection pool to my postgresql instance outside:
var pool = new pg.Pool(config);
Not calling pool.end() at the end of request handlers seem to cause lambda-local to not terminate when I call it.
If I call pool.end() lambda-local does terminate, but I wonder if this means that the function will stop working?
If I don't call pool.end(), will the function run forever on AWS, costing me a lot of money?
This is because by default, the lambda callback waits for empty event loop before "freezing the process" doc
You can change this behavior by setting context.callbackWaitsForEmptyEventLoop to false. On subsequent calls, in case of "hot start", your lambda should be able to reuse the pool.
You can use middy middleware or serverless plugin to warmup your lambda and prevent cold start.
Also lambdas never run forever, the maximum execution duration per request is 300 seconds doc and of course, you can set your own (lower) timeout.
That being said, it's a risky path and should be used with caution.
Related
Given a Parse Server has a timeout set of 10s:
const httpServer = require('http').createServer(app);
httpServer.timeout = 10 * 1000;
And a Cloud Code function is called that takes 20s to execute. NodeJS destroys the socket after 10s, but the Cloud Code function keeps on executing. Some cloud code functions are just wasting resources in that case and they should be aborted in a controlled manner, e.g. with certain breakpoints.
I could add a callback for when the timeout is fired:
httpServer.setTimeout(10 * 1000, {
// Terminate Cloud Code, but how?
});
Is there a way to terminate the Cloud Code execution on timeout?
Update
A feature request has been opened with Parse Server since there doesn't seem to be solution.
I have an AWS Lambda application built upon an external library that contains an EventEmitter. On a certain event, I need to make a HTTP request. So I was using this code (simplified):
myEmitter.on("myEvent", async() => {
setup();
await doRequest();
finishingWork();
});
What I understand that happens is this:
My handler is called, but as soon as the doRequest function is called, a Promise is returned and the EventEmitter continues with the next handlers. When all that is done, the work of the handler can continue (finishingWork).
This works locally, because my NodeJS process keeps running and any remaining events on the eventloop are handled. The strange thing is that this doesn't seem to work on AWS Lambda. Even if context.callbackWaitsForEmptyEventLoop is set to true.
In my logging I can see my handler enters the doRequest function, but nothing after I call the library to make the HTTP call (request-promise which uses request). And the code doesn't continue when I make another request (which I would expect if callbackWaitsForEmptyEventLoop is set to false, which it isn't).
Has anyone experienced something similar and know how to perform an ansynchronous HTTP request in the handler of a NodeJS event emitter, on AWS Lambda?
I have similar issue as well, my event emitter logs all events normally until running into async function. It works fine in ECS but not in Lambda, as event emitter runs synchronously but Lambda will exit once the response is returned.
At last, I used await-event-emitter to solve the problem.
await emitter.emit('onUpdate', ...);
If you know how to solve this, feel free to add another answer. But for now, the "solution" for us was to put the eventhandler code elsewhere in our codebase. This way, it is executed asynchronously.
We were able to do that because there is only one place where the event is emitted, but the eventhandler way would have been a cleaner solution. Unfortunately, it doesn't seem like it's possible.
Im trying to make a 200 ok response to a request before the work is done, however the work i need to do takes longer than the 3 seconds i need to make the response in.
Im working in aws lambda and the way i approached this was through threading:
t = threading.Thread(target=worker, args=(xml,))
t.start()
# So that you can return before worker is done
return response(200)
However, even when I threaded the work to be done in the background, it seems that aws lambda won't finish the work. It seems that as soon as the response is made, lambda just shuts down. For example, if the work takes 2 seconds to be done, then the following will not work:
t = threading.Thread(target=worker, args=(xml,))
t.start()
# So that you can return before worker is done
return response(200)
but if we sleep for 2 seconds, the work will be done:
t = threading.Thread(target=worker, args=(xml,))
t.start()
time.sleep(2)
# So that you can return before worker is done
return response(200)
If so, what can I do to make a 200 ok response to the request with aws lambda, but also have the work be done in the same lambda function?
A threading solution won't work, because once you send a response your Lambda environment will be shutdown. You need to trigger an AWS Lambda function in async mode instead of event. See this guide for one possible solution.
Alternatively you could have your current Lambda function simply call another Lambda function with an async event type that does all the actual work, and then return the 200 response code.
I will assume you're using AWS API Gateway to call your lambda function (hence the 200 status code return).
There are two ways to call a lambda, the sync way and the event or async way. API Gateway by default uses the first way unless you explicitly tell it otherwise. In order to make your lambda integration async by default you should add a custom header:
X-Amz-Invocation-Type: Event
In your API Gateway console. You can also add a
InvocationType: Event
header in you client calls
My Scenario
I'm trying to utilize the global scope of Node.js to initialize a database connection once, and use the initialized connection when the lambda function is invoked.
This can save a lot of resources and time, as opening a DB connection is a lengthy process:
// Global scope: Runs only once
const redis = require('redis');
const client = redis.createClient({ <HOST>, <PORT> });
// Function scope: runs per invocation
exports.handler = (event, context, callback) => {
do-something-with-redis
};
My Problem
Some common connection errors may occur
Uninitialized connection: Since Node.js is asynchronous, the function may start executing code before redis.create returns, hence using an uninitialized connection.
Timeout: If the connection attempt times out for some reason, the function will have an erroneous handler.
Runtime error: If a connection error happens during code execution, following invocation will have an erroneous handler.
My Question
What's the proper way to overcome errors (initialization, timeout and runtime) of a global Redis connection used by an AWS Lambda function?
Lambda functions were designed to be stateless, so I don't know if there is one best answer to this. There's a really helpful GitHub comment about Lambda and RDS, but it mostly applies. It mentions that the answer depends on how many requests it'll be making.
Regardless, this SO answer is more or less how I would do it; though I prefer a Promise-based API for the Redis library. The author handles the Uninitialized Connection issue by using callbacks to wait until the connection is opened before trying to use the connection. The other two issues you raise are also handled in that SO answer. Basically: if (err) callback(err).
I mean, given the GitHub comment message is from support at AWS, you need to make a connection inside the handler, so you may as well only do it there until you're sure you need the perf boost.
I realize this doesn't exactly answer the question, but the question has been open for a few days now and I'm curious. And there's nothing like being wrong on the internet to find out the right answer...
I'd like to use Postgresql as a database on my AWS lambda functions but I'm worried about performance.
I'm worried that Lambdas are stateless and only exist in the time they're executing so I imagine every time the Lambda is triggered it'll try to initiate a brand new PG connection.
I'm not sure if this decreases performance or causes issues with stale connections somehow. Anyone know more about this?
I know DynamoDB is more in line with Lambda but I really need a relational database but at the same time Lambda's scalability.
You can make use of the container execution model of AWS lambda. When a lambda is invoked, AWS spins up a container to run the code inside the handler function. So if you define the PG connection outside the handler function it will be shared among the invocations of Lambda functions. You can find that in the above link.
Any declarations in your Lambda function code (outside the handler code, see Programming Model) remains initialized, providing additional optimization when the function is invoked again. For example, if your Lambda function establishes a database connection, instead of reestablishing the connection, the original connection is used in subsequent invocations. You can add logic in your code to check if a connection already exists before creating one.
const pg = require('pg');
const client = new pg.Client(<connection_string>);
exports.handler = (event, context, cb) => {
client.query('SELECT * FROM users WHERE ', (err, users) => {
// Do stuff with users
cb(null); // Finish the function cleanly
});
};
Refer this blog post.
But there is a caveat.
When you write your Lambda function code, do not assume that AWS Lambda always reuses the container because AWS Lambda may choose not to reuse the container. Depending on various other factors, AWS Lambda may simply create a new container instead of reusing an existing container.
Additionally you can create a scheduled job to warm up lambda function. (runs in every 5mins)