Troubleshoot DynamoDB to Elastic Search - node.js

Let's suppose I have a database on DynamoDB, and I am currently using streams and lambda functions to send that data to Elasticsearch.
Here's the thing, supposing the data is saved successfully on DynamoDB, is there a way for me to be 100% sure that the data has been saved on Elasticsearch as well?
Considering I have a function to save that data on DDB is there a way for me communicate with the lambda function triggered by DDB before returning a status code answer, so I can receive confirmation before returning?
I want to do that in order to return ok both from my function and the lambda function at the same time.

This doesn't look like the correct approach for this problem. We generally use DynamoDB Streams + Lambda for operations that are async in nature and when we don't have to communicate the status of this Lambda execution to the client.
So I suggest the following two approaches that are the closest to what you are trying to achieve -
Make the operation completely synchronous. i.e., do the DynamoDB insert and ElasticSearch insert in the same call (without any Ddb Stream and Lambda triggers). This will ensure that you return the correct status of both writes to the client. Also, in case the ES insert fails, you have an option to revert the Ddb write and then return the complete status as failed.
The first approach obviously adds to the Latency of the function. So you can continue with your original approach, but let the client know about it. It will work as follows -
Client calls your API.
API inserts record into Ddb and returns to the client.
The client receives the status and displays a message to the user that their request is being processed.
The client then starts polling for the status of the ES insert via another API.
Meanwhile, the Ddb stream triggers the ES insert Lambda fn and completes the ES write.
The poller on the client comes to know about the successful insert into ES and displays a final success message to the user.

Related

Why is the userIdentity property always empty in AWS’ Kinesis DataStream?

I have enabled Kinesis DataStream in DynamoDB and have configured a Delivery Stream to store the stream as audit logs into an s3 bucket.
I then query the s3 bucket from Amazon Athena.
Everything seems to be working, but the userIdentity property is always empty (null) which seems pointless to me to have an audit if I cannot capture who did the transaction. Is this property only populated when a record is deleted from DynamoDB and TTL is enabled?
Questions:
How do I capture the user id / name of the user responsible for adding, updating, or deleting a record via the application or directly via DynamoDB in AWS console?
(Less important question) How do I format the stream before it hits the s3 bucket so I can include the record id being updated?
Also please note that I have a lambda function that I use from the Delivery Stream that simply adds new line to each stream as a delimeter. If I wanted to do more processing/formatting to the stream, should I be executing this lambda when the stream hits the DeliveryStream? Or should I be executing this as a trigger in the DynamoDB table itself before it hits the DeliveryStream?
DynamoDB does not include the user details in the Data Stream. This needs to be implemented by the application, then you can get the values from the newImage if provided by the stream.

how to build search functionality with ElasticSearch and lambda function into your existing project

I am having a Node + Express application running on EC2 server and trying to add a new search feature to it. I am thinking about using Lambda function and ElasticSearch. When the client fires a request to update a table in dynamodb, Lambda function will react to this event and update the elastcsearch index.
I know lambda runs serverless whereas my original application runs within a server. Can anybody give me some hints about how to do it or let me know if it's even possible?
The link between a DynamoDB update and a Lambda is "DynamoDB Streams".
The documentation says, in part,
Amazon DynamoDB is integrated with AWS Lambda so that you can create
triggers—pieces of code that automatically respond to events in
DynamoDB Streams. With triggers, you can build applications that react
to data modifications in DynamoDB tables.
If you enable DynamoDB Streams on a table, you can associate the
stream Amazon Resource Name (ARN) with an AWS Lambda function that you
write. Immediately after an item in the table is modified, a new
record appears in the table's stream. AWS Lambda polls the stream and
invokes your Lambda function synchronously when it detects new stream
records.

How to get results of AWS Glue Job when executing via API?

I executed an AWS Glue Job via API Gateway to start the job run. The job run is successful. But the result of the Script (print of a result) has not gotten through the execution. Only job run ID comes as the response. Is there any way to get the result of the job through an API?
For glue anything you print or log goes into cloud watch
You have an option of adding a handler in your logger that writes to a stream and push that stream to a file in s3. Or better yet, create a StringIO object , store your result to it and then send that to s3

Disable lambda retries on Kinesis EventSourceMapping

I want simply to disable lambda retries when it's launched by a kinesis trigger. If the lambda fails or exit, I don't want it to retry.
From AWS Lambda Retry Behavior - AWS Lambda:
Poll-based (or pull model) event sources that are stream-based: These consist of Kinesis Data Streams or DynamoDB. When a Lambda function invocation fails, AWS Lambda attempts to process the erring batch of records until the time the data expires, which can be up to seven days.
The exception is treated as blocking, and AWS Lambda will not read any new records from the shard until the failed batch of records either expires or is processed successfully. This ensures that AWS Lambda processes the stream events in order.
There does not appear to be any configuration options to change this behaviour.
How about handling your error properly so that the invocation will still succeed and Lambda will not retry it anymore?
In NodeJS, it would be something like this...
export const handler = (event, context) => {
return doWhateverAsync()
.then(() => someSuccessfulValue)
.catch((err) => {
// Log the error at least.
console.log(error)
// But still return something so Lambda won't retry.
return someSuccessfulValue
})
}
If you are using a Lambda Event Source Mapping to trigger your Lambda with a batch of records from kinesis stream shard then you can configure the maximum number of retries that will be made by the event source mapping.
another option is to configure the maximum age of the record which is sent to the function.
Retry attempts – The maximum number of times that Lambda retries when the function returns an error. This doesn't apply to service errors or throttles where the batch didn't reach the function.
Maximum age of record – The maximum age of a record that Lambda sends to your function.
A good practice is to configure failure destination. this is usually an SQS queue or SNS topic. details of the batch that caused the invocation to fail are stored here.
https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html#services-kinesis-errors for more info.

Using AWS SQS to handle a long query

I have a NodeJS endpoint that receives requests to gather data from a reporting engine.
To keep the request endpoint light and because some of the reports generated have a few steps (Gather data -> assemble report -> convert to PDF -> Email to relevant person) I want to separate the inbound request from the job itself.
Using AWS.SQS I can accept the request, put the variables into SQS and the respond with a 200 / 201.
What are some of the better practices around picking this job up on the other end?
If I were to trigger a lambda function would I have to wait for that function to complete before 200 / 201 can be sent? or can I:
Accept Request ->
Job to SQS ->
Initiate Lamba function ->
200 Response.
Alternatively what other options would be available to decouple the inbound request from the processing itself?
Here are a few options:
Insert the request in your SQS queue and return a 200 response immediately. Have a process on an EC2 server polling the SQS queue and performing the query when it gets a message out of SQS.
Invoke a Lambda function asynchronously, passing it the properties needed to perform the query, and return a 200 response immediately. Since you invoked the Lambda function asynchronously your NodeJS code that invoked the Lambda function doesn't wait for the function to complete.
An alternative to #2 is to send the request to an SNS topic, and have the SNS topic configured to invoke the Lambda function. This is probably the best method if you are using Lambda, because SNS will retry if the Lambda function fails for some reason.
I don't recommend combining SQS with Lambda because those two services don't integrate very well. SNS on the other hand does integrate very well with Lambda.
Also, you need to make sure your Lambda function invocations can be completed in under 5 minutes since that's currently the maximum time a Lambda function can execute. If you need individual steps to run for longer than 5 minutes you will need to use EC2 or ECS.
I think AWS Step Functions may be a good fit for your use case.

Resources