Multiple worker to same queue messages - node.js

I have a task to migrate keys (S3 objects) from one S3 account to another S3 account as soon as possible. I would like to publish keys in a single queue but want multiple workers(Running on different EC2 instances) to consume keys from the same queue in parallel manner.
Let's say I am publishing 1000 keys continuously in the queue and there are 5 workers consuming keys from queue. I want each 1000 to be picked by each different worker and all process keys parallely.
I am not sure how to do it and distinguish which worker has already picked keys and which one yet to pick.

You should create:
An Amazon SQS queue
An AWS Lambda function that is configured to Trigger when a message is sent to the SQS queue, and which will process the keys mentioned in the message(s)
Something that will 'push' the keys as messages into the SQS queue
The Amazon SQS queue will automatically trigger the AWS Lambda function when a message is available. It can pass up to 10 messages to each Lambda function being invoked and the default is a maximum of 1000 concurrently-running Lambda functions.
Each 'message' could contain a single key or multiple keys -- that is up to you to decide. The Lambda function simply needs to know how to process the message that you have sent.
The Lambda function will receive the message(s) in the event['Records'] list (array). It should process those messages and then exit the function. This will cause those messages to be deleted from the queue. If the function does not exit successfully (eg if there is an error generated), the messages will automatically reappear on the queue for re-processing.

If you are running the workers on Amazon EC2 instances, then each worker should do the following:
Loop forever, doing these steps:
Call receiveMessage() with WaitTimeSeconds = 20 -- this will wait up to 20 seconds before returning an empty response, which reduces the number of times the worker will call the queue
Loop through Messages array that is returned
For each message in the array, process the message and then call deleteMessage(), passing the ReceiptHandle of the message
Each worker can request up to 10 messages per ReceiveMessage() call.
It is up to you what to put in the message -- it might be a single key or it might be multiple keys. You should code your worker to know how to interpret the message.
So, if you require that each worker gets 1000 keys, the you should send 1000 keys in a single message and configure each worker to use MaxNumberOfMessages = 1 when calling receiveMessage(). However, I would recommend that you use smaller quantities to more-evenly distribute the work amongst the workers.

Related

AWS: inconsistency between SQS and lambda

I want to trigger lambda with a websocket. I have deployed a EC2 instance of websocket producer which is throwing all its data through SQS FIFO and SQS triggering lambda with same messageGroupId. But sometimes lambda is executing concurrently, I am expecting lambda to be executed sequentially. Because data is coming in a Queue. Since it is a cryptocurrency exchange websocket, data frequency is really high. And I checked one message from the websocket takes 3ms in lambda to get processed.
I was expecting lambda to run only 1 process not concurrently (which is causing wrong data calculation). Can anyone tell me what config in Queue should I configure or is there any other method to achieve this goal.
Thanks
Edit: Attaching config for fifo
There are two types of Amazon SQS queues: first-in, first-out (FIFO) and standard queues.
In FIFO queues, message strings remain in the same order in which the original messages were sent and received. FIFO queues support up to 300 send, receive or delete messages per second.
Standard queues attempt to keep message strings in the same order in which the messages were originally sent, but processing requirements may change the original order or sequence of messages. For example, standard queues can be used to batch messages for future processing or allocate tasks to multiple worker nodes.
The frequency of message delivery differs between standard and FIFO queues, as FIFO messages are delivered exactly once, while in standard queues, messages are delivered at least once.
Suggestion : check your que type and change it to FIFO.
You need to set the maximum lambda concurrency to 1.
https://aws.amazon.com/about-aws/whats-new/2017/11/set-concurrency-limits-on-individual-aws-lambda-functions/
To process each message sequentially:
When sending a message to the Amazon SQS queue, specify the same MessageGroupId for each message
In the Lambda function, configure the SQS Trigger to have a Batch size of 1
When using a FIFO queue, if a message is 'in-flight' then SQS will not permit another message with the same MessageGroupId to be processed. It will, however, allow multiple messages with the same MessageGroupId to be sent to a Lambda function, which is why you should set the Batch Size to 1.
See:
Using the Amazon SQS message group ID - Amazon Simple Queue Service
New for AWS Lambda – SQS FIFO as an event source | AWS Compute Blog

Combine SQS messages that arrive within milliseconds of each other

I am faced with a situation that I am not quite sure how to solve. Basically my system receives data from a third-party source via API gateway, publishes this data to an SNS topic which triggers a lambda function. Based on the message parameters, the lambda function pushes the message to one of three different SQS queues. These queues trigger one of three lambda functions which perform one of three possible actions - create, update or delete items in that order in another third-party system through their API endpoints.
The usual flow would be to first create an entity on the destination system and then each subsequent action should be to update/delete this entity. The problem is, sometimes I receive data for the same entity from the source within milliseconds, thus my system is unable to create the entity on the destination due to the fact that their API requires at least 300-400ms to do so. So when my system tries to update the entity, it's not existing yet, thus my system creates it. But since I have a create action in the process of executing, it creates a duplicate entry on my destination.
So my question is, what is the best practice to consolidate messages for the same entity that arrive within less than a second of each other?
My Thoughts so far:
I am thinking of using redis to consolidate messages that are for the same entity before pushing them to the SNS topic, but I was hoping there would be a more straight-forward approach as I don't want to introduce another layer of logic.
Any help would be much appreciated. Thank you.
The best option would be to use an Amazon SQS FIFO queue, with each message using a Message Group ID that is set to the unique ID of the item that is being created.
In a FIFO queue, SQS will ensure that messages are processed in-order, and will only allow one message per Message Group ID to be received at a time. Thus, any subsequent messages for the same Message Group ID will wait until an existing message has been fully processed.
If this is not acceptable, then AWS Lambda now supports batch windows of up to 5 minutes for functions with Amazon SQS as an event source:
AWS Lambda now allows customers using Amazon Simple Queue Service (Amazon SQS) as an event source to define a wait period, called MaximumBatchingWindowInSeconds, to allow messages to accumulate in their SQS queue before invoking a Lambda function. In addition to Batch Size, this is a second option to send records in batches, to reduce the number of Lambda invokes. This option is ideal for workloads that are not time-sensitive, and can choose to wait to optimize cost.
Previously, Lambda functions polling from an SQS queue would send messages in batches of up to 10 before invoking the function. Now, customers can also define a time window that Lambda should wait to poll messages from their SQS queue before invoking their function. Lambda will wait for up to 300 seconds to poll messages from the SQS queue. When a batch window is defined, Lambda will also allow customers to define a batch size of up to 10,000 messages.
To get started, when creating a new Lambda function or updating an existing function with SQS as an event source, customers can set the MaximumBatchingWindowInSeconds field to any value between 0 and 300 seconds on the AWS Management Console, the AWS CLI, AWS SAM or AWS SDK for Lambda. This feature is available in all AWS Regions where AWS Lambda and Amazon SQS are available, and requires no additional charge to use.
the lambda function pushes the message to one of three different SQS queues
...
So when my system tries to update the entity, it's not existing yet, thus my system creates it. But since I have a create action in the process of executing, it creates a duplicate entry on my destination
By using multiple queue you created yourself a thread race and now you are trying to patch it.
Based on the provided information and context - as already answered - a single fifo queue with context id could be more appropriate (do you really need 3 queues?)
If latency is critical, then a streaming could be a solution as well.
As you described your issue, I think you don't need to combine the messages (indeed you could use Redis, AWS Kinesis Analytics, DynamoDB..), but rather not to create the issue at thecfirst place
Options
having a single fifo queue
having an idempotent and thread-safe backend service able handling concurrent updates (transactions, atomic updates,..)
As well if you can create "duplicate" entries, it means the unique indexes are not enforced. They exist exactly for that reason.
You did not specify the backend service (RDBMS, DynamoDB, MongoDB, other?) each has an option to handle the problem somehow.

Spawning hundreds of Lambda processes but waiting for them all to finish

I'm currently using AWS Step Functions in a "queue watcher" setup.
I have an initial Lambda that spawns hundreds of ID's that are added to an SQS queue, which is then consumed by a "Worker" Lambda. When the "Worker" lambda has consumed the queue I need to run a "Logout" Lambda to expire a ticket.
The problem I'm having is sometimes the logout happens before the queue is empty.
Is there a better solution to this? I've looked into callbacks but it doesn't seem usable in this scenario? Passing the payload through Step Functions instead of SQS isn't possible either due to payload limits.
Thanks,
Step Function:
Flow Chart of Lambdas:

SQS Lambda - retry logic?

When the message has been added to an SQS queue and it is configured to trigger a lambda function (nodejs).
When a lambda function is triggered - I may want to retry same message again after 5 minute without deleting the message from the Queue. The reason I want to do this if Lambda could not connect external host (eg: API) - i like to try again after 5 minutes for 3 attempts only.
How can that be written in node js?
For example in Laravel, we can Specifying Max Job Attempts functionality. The number of times the job may be attempted using public $tries = 5;
Source: https://laravel.com/docs/5.7/queues#max-job-attempts-and-timeout
How can we do similar fashion in node.js?
I am thinking adding a message to another queue (for retry). A lambda function read all the messages from that queue after 5 minutes and send that message back to main Queue and it will be trigger a lambda function.
Re-tries and re-tries "timeout" can all be configured directly in the SQS queue.
When you create a queue, set up the following attributes:
The Default Visibility Timeout will be the time that the message will be hidden once it has been received by your application. If the message fails during the lambda run and an exception is thrown, lambda will not delete any of the messages in the batch and all of them will eventually re-appear in the queue.
If you only want to try 3 times, you must set the SQS re-drive policy (AKA Dead Letter Queue)
The re-drive policy will enable your queue to redirect messages to a Dead Letter Queue (DLQ) after the message has re-appeared in the queue N number of times, where N is a number between 1 and 1000.
It is essential to understand that lambda will continue to process a failed message (a message that generates an exception in the code) until:
It is processed without any errors (lambda deletes the message)
The Message Retention Period expires (SQS deletes the message)
It is sent to the DLQ set in the SQS queue re-drive policy (SQS "moves" the message to the DLQ)
You delete the message from the queue directly in your code (User deletes the message)
Lambda will not dispose of this bad message otherwise.
Important observations
Lambda will not deal with failed messages
Based on several experiments I ran to understand the behavior of the SQS integration (the documentation on re-tries can be ambiguous).
Lambda will not delete failed messages and will continue to re-try them. Even if you have a Lambda DLQ setup, failed messages will not be sent to the lambda DLQ. Lambda fully relies on the configuration of the SQS queue for this purpose as stated in the lambda DLQ documentation.
Recommendation:
Always use a re-drive policy in your SQS queue.
Exceptions will fail a whole batch of messages
As I stated earlier if there is an exception in your code while processing a message, the whole batch of messages is re-tried, it doesn't matter if some of the messages were processed correctly. If for some reason a downstream service is failing you may end up with messages that were processed in the DLQ.
Recommendation:
Manually delete messages that have been processed correctly
Ensure that your lambda function can process the same message more than once
Lambda concurrency limits and SQS side effects
The blog post "Lambda Concurrency Limits and SQS Triggers Don’t Mix Well (Sometimes)" describes how, if your concurrency limit is set too low, lambda may cause batches of messages to be throttled and the received attempt to be incremented without ever being processed.
Recommendation:
The post and Amazon's recommendations are:
Set the queue’s visibility timeout to at least 6 times the timeout that you configure on your function.
The extra time allows for Lambda to retry if your function execution is throttled while your function is processing a previous batch.
Set the maxReceiveCount on the queue’s re-drive policy to at least 5. This will help avoid sending messages to the dead-letter queue due to throttling.
Configure the dead-letter to retain failed messages long enough so that you can move them back later to be reprocessed
Here is how I did it.
Create Normal Queues (Immediate Delivery), Q1
Create Delay Queues (5 mins delay), Q2
Create DLQ (After retries), DLQ1
(Q1/Q2) SQS Trigger --> Lambda L1 (if failed, delete on (Q1/Q2), drop
it on Q2) --> On Failure DLQ
When messages arrive on Q1 it triggers Lambda L1 if success goes from there. If fails, drop it to Q2 (which is a delayed queue). Every message that arrives on Q2 will have a delay of 5 minutes.
If your initial message can have a delay of 5 mins, then you might not need two queues. One queue should be good. If the initial delay is not acceptable then you need two queues. One another reason to have two queues, you will always have a way for new messages that comes in the path.
If you have a code failure in handling Q1/Q2 aws infrastructure will retry immediately for 3 times before it sends it to DLQ1. If you handle the error in the code, then you can get the pipeline to work with the timings you mentioned.
SQS Delay Queues:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-delay-queues.html
SQS Lambda Architecture:
https://nordcloud.com/amazon-sqs-as-a-lambda-event-source/
Hope it helps.
Fairly simple (if you execute the Lambda in a Async way) and without the need to do any coding. First of all: if you code will throw an error, AWS Lambda will retry 3 more times to execute you code. In this case if the external API was not accessible, there is a big change that by the third time AWS retries – the API will work. Plus the delay between the re-tries is random-ish meaning, there a is a delay between the re-tries.
If the worst happens, and the external API is not yet up, you can take advantage of the dead-letter queue (DLQ) feature that each lambda have. Which will push to SQS a message saying what went wrong, so you can take additional actions. In this case, keep re-trying until you make it.
You can read more here: https://docs.aws.amazon.com/lambda/latest/dg/dlq.html
According this blog:
https://www.lucidchart.com/blog/cloud/5-reasons-why-sqs-lambda-triggers-are-a-big-deal
Leverage existing retry logic and dead letter queues. If the Lambda
function does not return success, the message will not be deleted from
the queue and will reappear after the visibility timeout has expired.

RabbitMQ - Single concurrent worker per routing key

Quite new to RabbitMQ and I'm trying to see if I can achieve what I need with it.
I am looking for the Worker Queues pattern but with one caveat. I want to have only a single worker running concurrently per routing key.
An example for clarification:
If i send the following messages with routing keys by order: a, a, b, c, I want to have only 3 workers running concurrently. When the first a message is received a worker picks it up and handles it.
When the next a message is received and the previous a message is still handled (not acknowledged) the new a message should wait in queue. When the b and c messages are received they each get a worker handling them. When the first a message is acknowledged any worker can pick up the next a message.
Would that pattern be possible using RabbitMQ in a natural way (without writing any application code on my side to handle the locking and stuff...)
Edit:
Another clarification. All workers can and should handle all messages, and I don't want to have a queue per Worker as I want to share the load between them, and the Publisher doesn't know which Worker should process the message. But I do want to make sure that no 2 Workers are working on messages sharing the same key at the same time.
For example, if I have a Publisher publishing messages with a userId field, I want to make sure no 2 Workers are handling messages with the same userId at the same time.
Edit 2
Expanding on the userId example. Let's say I have a single Publisher and 3 Workers. The publisher publishes messages like these: { userId: 1, text: 'Hello' }, with varying userIds. My 3 Workers all do the same thing to this messages, so I can have any of them handle the messages coming in. But what I'm trying to achieve is to have only a single worker processing a message from a certain user at the same time. If a Worker has received a message with userId 1 and is still processing it, and another message with userId 1 is received I want to make sure no other Worker picks up that message. But other messages coming in with different userIds should be processed by other available Workers.
userIds are not known beforehand, and the publisher doesn't know how many workers are or anything specific about them, he just wants to schedule the messages for processing.
what your asking is not possible with routing keys, but is built into queues with a few settings.
if you define "queue_a" for a messages, "queue_b" for b messages, etc, you can then have as many consumers connect to it as you want.
RabbitMQ will only deliver a given message to a single consumer of a given queue.
The way it works with multiple consumers on a single queue is basic round-robin style dispatch of the messages. that is, the first message will be delivered to one of the consumers, and the next message (assuming the first consumer is still busy) will be delivered to the next consumer.
So, that should satisfy the need to deliver the message to any given consumer of the queue.
To ensure your messages have an equal chance of getting to any of the consumer (and are not all delivered to the same consumer all the time), there are a few other settings you should put in place.
First, make sure to set the message consumer no ack setting to false (sometimes called "auto ack"). This will force you to ack the message from your code.
Lastly, set the "consumer prefetch" limit of the consumer to 1.
With this combination of settings, a single consumer will retrieve a single message and begin working on it. While that consumer is working, any message waiting in the queue will be delivered to other consumers if any are available. If there are none available, the message will wait in the queue until a consumer is available.
With this, you should be able to achieve the behavior you are wanting, on a given queue.
...
Keep in mind this only applies to queues, though. routing keys cannot be managed this way. all matched routing keys from an exchange will cause a copy of the message to be sent to the destination queue.

Resources