Combine SQS messages that arrive within milliseconds of each other - node.js

I am faced with a situation that I am not quite sure how to solve. Basically my system receives data from a third-party source via API gateway, publishes this data to an SNS topic which triggers a lambda function. Based on the message parameters, the lambda function pushes the message to one of three different SQS queues. These queues trigger one of three lambda functions which perform one of three possible actions - create, update or delete items in that order in another third-party system through their API endpoints.
The usual flow would be to first create an entity on the destination system and then each subsequent action should be to update/delete this entity. The problem is, sometimes I receive data for the same entity from the source within milliseconds, thus my system is unable to create the entity on the destination due to the fact that their API requires at least 300-400ms to do so. So when my system tries to update the entity, it's not existing yet, thus my system creates it. But since I have a create action in the process of executing, it creates a duplicate entry on my destination.
So my question is, what is the best practice to consolidate messages for the same entity that arrive within less than a second of each other?
My Thoughts so far:
I am thinking of using redis to consolidate messages that are for the same entity before pushing them to the SNS topic, but I was hoping there would be a more straight-forward approach as I don't want to introduce another layer of logic.
Any help would be much appreciated. Thank you.

The best option would be to use an Amazon SQS FIFO queue, with each message using a Message Group ID that is set to the unique ID of the item that is being created.
In a FIFO queue, SQS will ensure that messages are processed in-order, and will only allow one message per Message Group ID to be received at a time. Thus, any subsequent messages for the same Message Group ID will wait until an existing message has been fully processed.
If this is not acceptable, then AWS Lambda now supports batch windows of up to 5 minutes for functions with Amazon SQS as an event source:
AWS Lambda now allows customers using Amazon Simple Queue Service (Amazon SQS) as an event source to define a wait period, called MaximumBatchingWindowInSeconds, to allow messages to accumulate in their SQS queue before invoking a Lambda function. In addition to Batch Size, this is a second option to send records in batches, to reduce the number of Lambda invokes. This option is ideal for workloads that are not time-sensitive, and can choose to wait to optimize cost.
Previously, Lambda functions polling from an SQS queue would send messages in batches of up to 10 before invoking the function. Now, customers can also define a time window that Lambda should wait to poll messages from their SQS queue before invoking their function. Lambda will wait for up to 300 seconds to poll messages from the SQS queue. When a batch window is defined, Lambda will also allow customers to define a batch size of up to 10,000 messages.
To get started, when creating a new Lambda function or updating an existing function with SQS as an event source, customers can set the MaximumBatchingWindowInSeconds field to any value between 0 and 300 seconds on the AWS Management Console, the AWS CLI, AWS SAM or AWS SDK for Lambda. This feature is available in all AWS Regions where AWS Lambda and Amazon SQS are available, and requires no additional charge to use.

the lambda function pushes the message to one of three different SQS queues
...
So when my system tries to update the entity, it's not existing yet, thus my system creates it. But since I have a create action in the process of executing, it creates a duplicate entry on my destination
By using multiple queue you created yourself a thread race and now you are trying to patch it.
Based on the provided information and context - as already answered - a single fifo queue with context id could be more appropriate (do you really need 3 queues?)
If latency is critical, then a streaming could be a solution as well.
As you described your issue, I think you don't need to combine the messages (indeed you could use Redis, AWS Kinesis Analytics, DynamoDB..), but rather not to create the issue at thecfirst place
Options
having a single fifo queue
having an idempotent and thread-safe backend service able handling concurrent updates (transactions, atomic updates,..)
As well if you can create "duplicate" entries, it means the unique indexes are not enforced. They exist exactly for that reason.
You did not specify the backend service (RDBMS, DynamoDB, MongoDB, other?) each has an option to handle the problem somehow.

Related

AWS: inconsistency between SQS and lambda

I want to trigger lambda with a websocket. I have deployed a EC2 instance of websocket producer which is throwing all its data through SQS FIFO and SQS triggering lambda with same messageGroupId. But sometimes lambda is executing concurrently, I am expecting lambda to be executed sequentially. Because data is coming in a Queue. Since it is a cryptocurrency exchange websocket, data frequency is really high. And I checked one message from the websocket takes 3ms in lambda to get processed.
I was expecting lambda to run only 1 process not concurrently (which is causing wrong data calculation). Can anyone tell me what config in Queue should I configure or is there any other method to achieve this goal.
Thanks
Edit: Attaching config for fifo
There are two types of Amazon SQS queues: first-in, first-out (FIFO) and standard queues.
In FIFO queues, message strings remain in the same order in which the original messages were sent and received. FIFO queues support up to 300 send, receive or delete messages per second.
Standard queues attempt to keep message strings in the same order in which the messages were originally sent, but processing requirements may change the original order or sequence of messages. For example, standard queues can be used to batch messages for future processing or allocate tasks to multiple worker nodes.
The frequency of message delivery differs between standard and FIFO queues, as FIFO messages are delivered exactly once, while in standard queues, messages are delivered at least once.
Suggestion : check your que type and change it to FIFO.
You need to set the maximum lambda concurrency to 1.
https://aws.amazon.com/about-aws/whats-new/2017/11/set-concurrency-limits-on-individual-aws-lambda-functions/
To process each message sequentially:
When sending a message to the Amazon SQS queue, specify the same MessageGroupId for each message
In the Lambda function, configure the SQS Trigger to have a Batch size of 1
When using a FIFO queue, if a message is 'in-flight' then SQS will not permit another message with the same MessageGroupId to be processed. It will, however, allow multiple messages with the same MessageGroupId to be sent to a Lambda function, which is why you should set the Batch Size to 1.
See:
Using the Amazon SQS message group ID - Amazon Simple Queue Service
New for AWS Lambda – SQS FIFO as an event source | AWS Compute Blog

Publishing over 6000 messages to SNS from a lambda

I have a script that we wanted to run in AWS as a Lambda, written in Python. This will run once ever 24 hours. It describes all the ec2 instances in an environment and I'm checking some details for compliance. After that, we want to send all the instances details to an SNS topic that other similar compliance scripts are using, which are sent to a lambda and from there, into sumologic. I'm currently testing with just one environment, but there is over 1000 instances and so over 1000 messages to be published to SNS, one by one. The lambda times out long before it can send all of these. Once all environments and their instances are checked, it could be close to 6000 messages to publish to SNS.
I need some advice on how to architect this to work with all environments. I was thinking maybe putting all the records into an S3 bucket from my lambda, then creating another lambda that would read each record from the bucket say, 50 at a time, and push them one by one into the SNS topic. I'm not sure though how that would work either.
Any ideas appreciated!
I would recommend to use Step Functions for this problem. Even if you fix the issue for now, sooner or later, with a rising number of instances or additional steps you want to perform the 900 second maximum runtime duration of a single Lambda won't be sufficient anymore.
A simple approach with Step Functions could be:
Step 1: Create a list of EC2 instances to "inspect" with this first Lambda. Could be all instances or just instances with a certain tag etc. You can be as creative as you want.
Step 2: Process this list of instances with a parallel step that calls one Lambda per instance id.
Step 3: The triggered Lambda reads the details from the provided EC2 instance and then publishes the result to SNS. There might already be a pre-defined step for the publication to SNS, so you don't need to program it yourself.
With the new Workflow Studio this should be relatively easy to implement.
Note: This might not be faster than a single Lambda, but it will scale better with more EC2 instances to scan. The only bottleneck here might become the Lambda in Step 1. If that Lambda needs more than 15 minutes to "find" all the EC2 instances to "scan", then you need to become a bit creative. But this is solvable.
I think you could you a SQS queue to solve you problem.
From SNS send the message to a SQS queue. Then from the SQS you lambda can poll for messaged ( default is 10 but you can decrees it though cli ).

How to set intervals between multiple requests AWS Lambda API

I have created an API using AWS Lambda function (using Python). Now my react js code hits this API whenever an event fire. So user can request API as many times the events are fired. Now the problem is we are not getting the response from lambda API sequentially. Sometime we are getting the response of our last request faster than the previous response of previous request.
So we need to handle our response in Lambda function sequentially, may be adding some delay between 2 request or may be implementing throttling. So how can I do that.
Did you check the concurrency setting on Lambda? You can throttle the lambda there.
But if you throttle the lambda and the requests being sent are not being received, the application sending the requests might be receiving an error unless you are storing the requests somewhere on AWS for being processed later.
I think putting an SQS in front of lambda might help. You will be hitting API gateway, the requests get sent to SQS, lambda polls requests concurrently (you can control the concurrency) and then send the response back.
You can use SQS FIFO Queue as a trigger on the Lambda function, set Batch size to 1, and the Reserved Concurrency on the Function to 1. The messages will always be processed in order and will not concurrently poll the next message until the previous one is complete.
SQS triggers do not support Batch Window - which will 'wait' until polling the next message. This is a feature for Stream based Lambda triggers (Kinesis and DynamoDB Streams)
If you want to streamlined process, Step Function will let you manage states using state machines and supports automatic retry based off the outputs of individual states.
As a previous response said, potentially what could help is to put an SQS in front of the Lambda - if order of processing is important, you could also look at setting the SQS queue up as a FIFO queue, which preserves order:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html
As the other comment said, the other option is to limit concurrency, but even then you're probably best off putting SQS in front as you're then limiting your throughput.

Which amazon service should i use to implement time based queue dispatcher (serverless application)?

User submit a csv file which contains time (Interval) with message. I want to submit that message on the time mentioned with message to chat API. I am using DynamoDB to store message and a lambda function which read the message from DynamoDB and one at a time use setTimeout function to publish message on chat. I am using node js to implement that functionality. I also created a amazon API to trigger that lambda fUnction.
But this approach is not working. Can any one suggest me which other service should i use to do same ? Is there any amazon queue service for that?
From the context of your question what I understand is that you basically need to create a futuristic timer. A system that can inform you sometime in the future with some metadata to take an action.
If this is the case, on top of my head I think you can use the below solution to achieve your goal:
Pre-requisites: I assume, you are already using Dynamo DB(aka DDB) as a primary store. So all CSV data is persisted in the dynamo and you are using dynamo stream to read the insert and updated records to trigger your lambda function(let's call this lambda function as Proxy_Lambda).
Create another lambda function that processes records and sends a message to your chat system(let's call this lambda function as Processor_Lambda)
Option 1: AWS SQS
Proxy_Lambda reads records from DDB stream and based on the future timestamp attribute present in the record, it publishes a message to AWS SQS queue with initial visibility timeout equals to the timestamp. Sample example: Link. Remember, these messages will not be visible to any of the consumer until the visibility timeout.
Add a trigger for Processor_Lambda to start polling from this SQS queue.
Once message becomes visible in the queue(after the initial timeout), Processor_Lambda consumes the message and send the chat events.
Result: You will be able to create a futuristic timer using the SQS visibility timeout feature. Cons here is that you will not be able to view the in-flight SQS message content until the visibility timeout of the message occurs.
Note: Max visibility timeout can be set for 12 hours. So if your use-case demand a timer for more then 12 hours, you need to add code logic in Processor_Lambda to send that message back to queue with new visibility timeout.
Option 2: AWS Step function (my preferred approach ;) )
Crate state machine in AWS Step function to generate task timers (let's call it Timer_Function). These task timers will keep looping between the wait state until the timer expires. Timer window will be provided as an input to this step function.
Link Timer_Function to trigger Processor_Lambda once the task timer expires. Basically, that will be the next step after the Timer step.
Connect Proxy_Lambda with Timer_Function i.e. Proxy_Lambda will read records from DDB stream and invoke the Timer_Function with message interval attribute present the Dynamo DB record and the necessary payload.
Result: A Timer_Function that keep looping until the time window(message interval) expires. Which in turn provide you a mechanism to trigger Proxy_Lambda in the future(i.e. the timer window)
Having said that, now I will leave this up to you to choose the right solution based on the use-case and business requirement.

Schedule a task to run at some point in the future (architecture)

So we have a Python flask app running making use of Celery and AWS SQS for our async task needs.
One tricky problem that we've been facing recently is creating a task to run in x days, or in 3 hours for example. We've had several needs for something like this.
For now we create events in the database with timestamps that store the time that they should be triggered. Then, we make use of celery beat to run a scheduled task every second to check if there are any events to process (based on the trigger timestamp) and then process them. However, this is querying the database every second for events which we feel could be bettered somehow.
We looked into using the eta parameter in celery (http://docs.celeryproject.org/en/latest/userguide/calling.html) that lets you schedule a task to run in x amount of time. However it seems to be bad practice to have large etas and also AWS SQS has a visibility timeout of about two hours and so anything more than this time would cause a conflict.
I'm scratching my head right now. On the one had this works, and pretty decent in that things have been separated out with SNS, SQS etc. to ensure scaling-tolerance. However, it just doesn't feel write to query the database every second for events to process. Surely there's an easier way or a service provided by Google/AWS to schedule some event (pub/sub) to occur at some time in the future (x hours, minutes etc.)
Any ideas?
Have you taken a look at AWS Step Functions, specifically Wait State? You might be able to put together a couple of lambda functions with the first one returning a timestamp or the number of seconds to wait to the Wait State and the last one adding the message to SQS after the Wait returns.
Amazon's scheduling solution is the use of CloudWatch to trigger events. Those events can be placing a message in an SQS/SNS endpoint, triggering an ECS task, running a Lambda, etc. A lot of folks use the trick of executing a Lambda that then does something else to trigger something in your system. For example, you could trigger a Lambda that pushes a job onto Redis for a Celery worker to pick up.
When creating a Cloudwatch rule, you can specify either a "Rate" (I.e., every 5 minutes), or an arbitrary time in CRON syntax.
So my suggestion for your use case would be to drop a cloudwatch rule that runs at the time your job needs to kick off (or a minute before, depending on how time sensitive you are). That rule would then interact with your application to kick off your job. You'll only pay for the resources when CloudWatch triggers.
Have you looked into Amazon Simple Notification Service? It sounds like it would serve your needs...
https://aws.amazon.com/sns/
From that page:
Amazon SNS is a fully managed pub/sub messaging service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications. With SNS, you can use topics to decouple message publishers from subscribers, fan-out messages to multiple recipients at once, and eliminate polling in your applications. SNS supports a variety of subscription types, allowing you to push messages directly to Amazon Simple Queue Service (SQS) queues, AWS Lambda functions, and HTTP endpoints. AWS services, such as Amazon EC2, Amazon S3 and Amazon CloudWatch, can publish messages to your SNS topics to trigger event-driven computing and workflows. SNS works with SQS to provide a powerful messaging solution for building cloud applications that are fault tolerant and easy to scale.
You could start the job with apply_async, and then use a countdown, like:
xxx.apply_async(..., countdown=TTT)
It is not guaranteed that the job starts exactly at that time, depending on how busy the queue is, but that does not seem to be an issue in your use case.

Resources