Using AWS SQS to handle a long query - node.js

I have a NodeJS endpoint that receives requests to gather data from a reporting engine.
To keep the request endpoint light and because some of the reports generated have a few steps (Gather data -> assemble report -> convert to PDF -> Email to relevant person) I want to separate the inbound request from the job itself.
Using AWS.SQS I can accept the request, put the variables into SQS and the respond with a 200 / 201.
What are some of the better practices around picking this job up on the other end?
If I were to trigger a lambda function would I have to wait for that function to complete before 200 / 201 can be sent? or can I:
Accept Request ->
Job to SQS ->
Initiate Lamba function ->
200 Response.
Alternatively what other options would be available to decouple the inbound request from the processing itself?

Here are a few options:
Insert the request in your SQS queue and return a 200 response immediately. Have a process on an EC2 server polling the SQS queue and performing the query when it gets a message out of SQS.
Invoke a Lambda function asynchronously, passing it the properties needed to perform the query, and return a 200 response immediately. Since you invoked the Lambda function asynchronously your NodeJS code that invoked the Lambda function doesn't wait for the function to complete.
An alternative to #2 is to send the request to an SNS topic, and have the SNS topic configured to invoke the Lambda function. This is probably the best method if you are using Lambda, because SNS will retry if the Lambda function fails for some reason.
I don't recommend combining SQS with Lambda because those two services don't integrate very well. SNS on the other hand does integrate very well with Lambda.
Also, you need to make sure your Lambda function invocations can be completed in under 5 minutes since that's currently the maximum time a Lambda function can execute. If you need individual steps to run for longer than 5 minutes you will need to use EC2 or ECS.
I think AWS Step Functions may be a good fit for your use case.

Related

AWS SQS FIFO message doesn't seem to be retrying

I've set up a function to hit an API endpoint for a newly created entity that isn't immediately available. If the endpoint returns a status of "pending", the function throws an error. If the endpoint returns a status of "active", the function then deletes the SQS message and triggers several other microservices to do their things using SNS. The SQS queue that triggers the function has a visibility timeout of 2 minutes, and the function itself has a 1 minute timeout.
What I'm expecting to happen is that if the endpoint returns a "pending" status, and the function throws an error, then after the 2 minute visibility timeout, the message would trigger the function again. This should happen every 2 minutes until the api call returns an "active" status and the message is deleted, or until the message retention period is surpassed (currently 1 hour). This seemed like a nice serverless way to poll my newly created entity to check if it was ready for other post-processing.
What's actually happening after adding a message to the SQS queue is that the CloudWatch logs are showing that the function is throwing an error like I'd expect, but the function is only being triggered one time. I can't tell if the message is just not not visible for some reason, or if it somehow was deleted. I don't know. I'm am new to using SQS for a Lambda trigger, am I thinking about this wrong?
A few possible causes here:
your Lambda function handler did not actually throw an exception to the Lambda runtime environment, so Lambda thought the function had successfully processed the message and the Lambda service then deleted the message from the queue (so that it would not get processed again)
your SQS queue has a configured DLQ with maximum receives set to 1, so the message is delivered once, the Lambda function fails, and the message is subsequently moved to the DLQ
the SQS message was re-delivered to the Lambda function and was logged but the logs were made to an earlier log stream (because this invocation was warm) and so it wasn't obvious that the Lambda function had actually been invoked multiple times with the same failed message
To verify this all works normally, I set up a simple test with both FIFO and non-FIFO queues and configured the queues to trigger a Lambda function that simply logged the SQS message and then threw an exception. As expected, I saw the same SQS message delivered to the Lambda function every 2 minutes (which is the queue's message visibility timeout). That continued until it hit the max receive count on the SQS redrive policy (defaults to 10 attempts) at which point the failed message was correctly moved to the associated DLQ.

How to increase the AWS lambda to lambda connection timeout or keep the connection alive?

I am using boto3 lambda client to invoke a lambda_S from a lambda_M. My code looks something like
cfg = botocore.config.Config(retries={'max_attempts': 0},read_timeout=840,
connect_timeout=600) # tried also by including the ,
# region_name="us-east-1"
lambda_client = boto3.client('lambda', config=cfg) # even tried without config
invoke_response = lambda_client.invoke (
FunctionName=lambda_name,
InvocationType='RequestResponse',
Payload=json.dumps(request)
)
Lambda_S is supposed to run for like 6 minutes and I want lambda_M to be still alive to get the response back from lambda_S but lambda_M is timing out, after giving a CW message like
"Failed to connect to proxy URL: http://aws-proxy..."
I searched and found someting like configure your HTTP client, SDK, firewall, proxy or operating system to allow for long connections with timeout or keep-alive settings. But the issue is I have no idea how to do any of these with lambda. Any help is highly appreciated.
I would approach this a bit differently. Lambdas charge you by second, so in general you should avoid waiting in them. One way you can do that is create an sns topic and use that as the messenger to trigger another lambda.
Workflow goes like this.
SNS-A -> triggers Lambda-A
SNS-B -> triggers lambda-B
So if you lambda B wants to send something to A to process and needs the results back, from lambda-B you send a message to SNS-A topic and quit.
SNS-A triggers lambda, which does its work and at the end sends a message to SNS-B
SNS-B triggers lambda-B.
AWS has example documentation on what policies you should put in place, here is one.
I don't know how you are automating the deployment of native assets like SNS and lambda, assuming you will use cloudformation,
you create your AWS::Lambda::Function
you create AWS::SNS::Topic
and in its definition, you add 'subscription' property and point it to you lambda.
So in our example, your SNS-A will have a subscription defined for lambda-A
lastly you grant SNS permission to trigger the lambda: AWS::Lambda::Permission
When these 3 are in place, you are all set to send messages to SNS topic which will now be able to trigger the lambda.
You will find SO answers to questions on how to do this cloudformation (example) but you can also read up on AWS cloudformation documentation.
If you are not worried about automating the stuff and manually tests these, then aws-cli is your friend.

Troubleshoot DynamoDB to Elastic Search

Let's suppose I have a database on DynamoDB, and I am currently using streams and lambda functions to send that data to Elasticsearch.
Here's the thing, supposing the data is saved successfully on DynamoDB, is there a way for me to be 100% sure that the data has been saved on Elasticsearch as well?
Considering I have a function to save that data on DDB is there a way for me communicate with the lambda function triggered by DDB before returning a status code answer, so I can receive confirmation before returning?
I want to do that in order to return ok both from my function and the lambda function at the same time.
This doesn't look like the correct approach for this problem. We generally use DynamoDB Streams + Lambda for operations that are async in nature and when we don't have to communicate the status of this Lambda execution to the client.
So I suggest the following two approaches that are the closest to what you are trying to achieve -
Make the operation completely synchronous. i.e., do the DynamoDB insert and ElasticSearch insert in the same call (without any Ddb Stream and Lambda triggers). This will ensure that you return the correct status of both writes to the client. Also, in case the ES insert fails, you have an option to revert the Ddb write and then return the complete status as failed.
The first approach obviously adds to the Latency of the function. So you can continue with your original approach, but let the client know about it. It will work as follows -
Client calls your API.
API inserts record into Ddb and returns to the client.
The client receives the status and displays a message to the user that their request is being processed.
The client then starts polling for the status of the ES insert via another API.
Meanwhile, the Ddb stream triggers the ES insert Lambda fn and completes the ES write.
The poller on the client comes to know about the successful insert into ES and displays a final success message to the user.

Disable lambda retries on Kinesis EventSourceMapping

I want simply to disable lambda retries when it's launched by a kinesis trigger. If the lambda fails or exit, I don't want it to retry.
From AWS Lambda Retry Behavior - AWS Lambda:
Poll-based (or pull model) event sources that are stream-based: These consist of Kinesis Data Streams or DynamoDB. When a Lambda function invocation fails, AWS Lambda attempts to process the erring batch of records until the time the data expires, which can be up to seven days.
The exception is treated as blocking, and AWS Lambda will not read any new records from the shard until the failed batch of records either expires or is processed successfully. This ensures that AWS Lambda processes the stream events in order.
There does not appear to be any configuration options to change this behaviour.
How about handling your error properly so that the invocation will still succeed and Lambda will not retry it anymore?
In NodeJS, it would be something like this...
export const handler = (event, context) => {
return doWhateverAsync()
.then(() => someSuccessfulValue)
.catch((err) => {
// Log the error at least.
console.log(error)
// But still return something so Lambda won't retry.
return someSuccessfulValue
})
}
If you are using a Lambda Event Source Mapping to trigger your Lambda with a batch of records from kinesis stream shard then you can configure the maximum number of retries that will be made by the event source mapping.
another option is to configure the maximum age of the record which is sent to the function.
Retry attempts – The maximum number of times that Lambda retries when the function returns an error. This doesn't apply to service errors or throttles where the batch didn't reach the function.
Maximum age of record – The maximum age of a record that Lambda sends to your function.
A good practice is to configure failure destination. this is usually an SQS queue or SNS topic. details of the batch that caused the invocation to fail are stored here.
https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html#services-kinesis-errors for more info.

lambdas fail to log to CloudWatch

Situation - I have a lambda that:
is built with Node.js v8
has console.log() statements
is triggered by SQS events
works properly (the downstream system receives all messages, AWS X-Ray can see those executions)
Problem:
this lambda does not log anything!
But if the same lambda is called manually (using "Test" button) - all logging statements are visible in CloudWatch.
My lambda is based on this tutorial: https://www.jeremydaly.com/serverless-consumers-with-lambda-and-sqs-triggers/
A very similar situation occurs if the lambda was called from within another lambda (recursion). Only the first lambda logs stuff (started manually), but every next lambda in the recursion chain does not log anything.
an example can be found here:
https://theburningmonk.com/2016/04/aws-lambda-use-recursive-function-to-process-sqs-messages-part-1/
any idea how to tackle this problem will be highly appreciated.

Resources