terratest to test SQS - terraform

I am using terratests to test my terraform code.
so, my terraform code setup: SQS which is connected to Lambda and Lambda is the consumer of the SQS Message
so, I would like to test the whole flow, So thought of sending a dummy message to SQS and then read the message, using terratest however I can't read the message as the lambda already consumed it !!! I see cw logs for the lambda and it has consumed the message
can anyone suggest how to test this complete set of flow with terratest?
The testcode for SQS looks like this:
ack_queue_url := terraform.Output(t, terraformOptions, "acknowledgment_queue_url")
time_out_sec := 120
test_message := fmt.Sprintf("terratest-test-message-%s", uniqueId)
aws.SendMessageToQueue(t, awsRegion, ack_queue_url, test_message)
response := aws.WaitForQueueMessage(t, awsRegion, ack_queue_url, time_out_sec)
assert.NoError(t, response.Error)
fmt.Println("###Message Body####:", response.MessageBody)
aws.DeleteMessageFromQueue(t, awsRegion, ack_queue_url, response.ReceiptHandle)
delete_response := aws.WaitForQueueMessage(t, awsRegion, ack_queue_url, time_out_sec)
assert.Error(t, delete_response.Error, aws.ReceiveMessageTimeout{QueueUrl: ack_queue_url, TimeoutSec: time_out_sec})
the output looks like this:
logger.go:66: "https://sqs.us-east-1.amazonaws.com/1234567/tst-queue"
sqs.go:150: Sending message terratest-test-message-DkKAvt to queue https://sqs.us-east-1.amazonaws.com/1234567/tst-queue
sqs.go:170: Message id b9b0a000-1d71-4821-8659-21aebe33cdc0 sent to queue https://sqs.us-east-1.amazonaws.com/1234567/tst-queue
sqs.go:234: Waiting for message on https://sqs.us-east-1.amazonaws.com/1234567/tst-queue (0s)
sqs.go:234: Waiting for message on https://sqs.us-east-1.amazonaws.com/1234567/tst-queue(20s)
sqs.go:234: Waiting for message on https://sqs.us-east-1.amazonaws.com/1234567/tst-queue (40s)
sqs.go:234: Waiting for message on https://sqs.us-east-1.amazonaws.com/1234567/tst-queue (60s)
sqs.go:234: Waiting for message on https://sqs.us-east-1.amazonaws.com/1234567/tst-queue (80s)
sqs.go:234: Waiting for message on https://sqs.us-east-1.amazonaws.com/1234567/tst-queue (100s)
Error Trace: /Users/xxxx/Projects/dummy_test/terratest/complete_test.go:83
Error: Received unexpected error:
Failed to receive messages on https://sqs.us-east-1.amazonaws.com/1234567/tst-queue within 120 seconds
Test: complete_test
###Message Body####:
sqs.go:125: Deleting message from queue https://sqs.us-east-1.amazonaws.com/1234567/tst-queue()
sqs.go:119: MissingParameter: The request must contain the parameter ReceiptHandle.
status code: 400, request id: 4421cdfc-4326-5755-9623-91ca3414ed6f

Related

Camel error handling fixed redelivery delay for azure storage queue not working correctly

I have an azure app service that reads from a azure storage queue through camel route version 3.14.0. Below is my code:
queue code:
QueueServiceClient client = new QueueServiceClientBuilder()
.connectionString(storageAccountConnectionString)
.buildClient();
getContext().getRegistry().bind("client", client);
errorHandler(deadLetterChannel(SEND_TO_POISON_QUEUE)
.useOriginalBody()
.log("Message sent to poison queue for handling")
.retryWhile(method(new RetryRuleset(), "shouldRetry"))
.maximumRedeliveries(24)
.asyncDelayedRedelivery()
.redeliveryDelay(3600 * 1000L) // initial delay
);
// Route to retrieve a message from storage queue.
from("azure-storage-queue:" + storageAccountName + "/" + QUEUE_NAME + "?serviceClient=#client&maxMessages=1&visibilityTimeout=P2D")
.id(QUEUE_ROUTE_CONSUMER)
.log("Message received from queue with messageId: ${headers.CamelAzureStorageQueueMessageId} and ${headers.CamelAzureStorageQueueInsertionTime} in UTC")
.bean(cliFacilityService, "processMessage(${body}, ${headers.CamelAzureStorageQueueInsertionTime})")
.end();
RetryRuleset code:
public boolean shouldRetry(#Header(QueueConstants.MESSAGE_ID) String messageId,
#Header(Exchange.REDELIVERY_COUNTER) Integer counter,
#Header(QueueConstants.INSERTION_TIME) OffsetDateTime insertionTime) {
OffsetDateTime futureRetryOffsetDateTime = OffsetDateTime.now(Clock.systemUTC()).plusHours(1); //because redelivery delay is 1hr
OffsetDateTime insertionTimePlus24hrs = insertionTime.plusHours(24);
if (futureRetryOffsetDateTime.isAfter(insertionTimePlus24hrs)) {
log.info("Facility queue message: {} done retrying because next time to retry {}. Redelivery count: {}, enqueue time: {}",
messageId, futureRetryOffsetDateTime, counter, insertionTime);
return false;
}
return true;
}
the redeliveryDelay is 1hr and maximumRedeliveries is 24, because i want to try once an hour for about 24 hrs. so not necessarily needs to be 24 times, just as many as it can do with 24hrs. and if it passes 24hrs, send it to the poison queue (this code is in the retry ruleset)
The problem is the app service retrying for first lets say 2 - 5 times normally once an hour. but after that the app service retries after 2 days later. So the message is expired and not retried because of the ruleset and sent to poison queue. Sometimes the app service does the first read from queue and the next retry is after 2 days. so very unstable. so total it is retrying tops 1-10 times and the last retry is always 2 days later in the same time from the first retry.
Is there anything i am doing wrong?
Thank you for you help!

AWS SQS messages does not become available again after visibility timeout

This is most likely something really simple but for some reason my SQS messages does not become available again after visibility timeout. At least this is what I figured since consuming lambda has no log entries indicating that any retries have been triggered.
My use case is that another lambda is feeding SQS queue with JSON entities that then needs to be sent forward. Sometimes there's so much data to be sent that receiving end is responding with HTTP 429.
My sending lambda (JSON body over HTTPS) is deleting the messages from queue only when service is responding with HTTP 200, otherwise I do nothing with the receiptHandle which I think should then keep the message in the queue.
Anyway, when request is rejected by the service, the message does not become available anymore and so it's never tried to send again and is forever lost.
Incoming SQS has been setup as follows:
Visibility timeout: 3min
Delivery delay: 0s
Receive message wait time: 1s
Message retention period: 1d
Maximum message size: 256Kb
Associated DLQ has Maximum receives of 100
The consuming lambda is configured as
Memory: 128Mb
Timeout: 10s
Triggers: The source SQS queue, Batch size: 10, Batch window: None
And the actual logic I have in my lambda is quite simple, really. Event it receives is the Records in the queue. Lambda might get more than one record at a time but all records are handled separately.
console.log('Response', response);
if (response.status === 'CREATED') {
/* some code here */
const deleteParams = {
QueueUrl: queueUrl, /* required */
ReceiptHandle: receiptHandle /* required */
};
console.log('Removing record from ', queueUrl, 'with params', deleteParams);
await sqs.deleteMessage(deleteParams).promise();
} else {
/* any record that ends up here, are never seen again :( */
console.log('Keeping record in', eventSourceARN);
}
What do :( ?!?!11
otherwise I do nothing with the receiptHandle which I think should then keep the message in the queue
That's not now it works:
Lambda polls the queue and invokes your Lambda function synchronously with an event that contains queue messages. Lambda reads messages in batches and invokes your function once for each batch. When your function successfully processes a batch, Lambda deletes its messages from the queue.
When an AWS Lambda function is triggered from an Amazon SQS queue, all activities related to SQS are handled by the Lambda service. Your code should not call any Amazon SQS functions.
The messages will be provided to the AWS Lambda function via the event parameter. When the function successfully exits, the Lambda service will delete the messages from the queue.
Your code should not be calling DeleteMessage().
If you wish to signal that some of the messages were not successfully processed, you can use a partial batch response to indicate which messages were successfully processed. The AWS Lambda service will then make the unsuccessful messages available on the queue again.
Thanks to everyone who answered. So I got this "problem" solved just by going through the documents.
I'll provide more detailed answer to my own question here in case someone besides me didn't get it on the first go :)
So function should return batchItemFailures object containing message ids of failures.
So, for example, one can have Lambda handler as
/**
* Handler
*
* #param {*} event SQS event
* #returns {Object} batch item failures
*/
exports.handler = async (event) => {
console.log('Starting ' + process.env.AWS_LAMBDA_FUNCTION_NAME);
console.log('Received event', event);
event = typeof event === 'object'
? event
: JSON.parse(event);
const batchItemFailures = await execute(event.Records);
if (batchItemFailures.length > 0) {
console.log('Failures', batchItemFailures);
} else {
console.log('No failures');
}
return {
batchItemFailures: batchItemFailures
}
}
and execute function, that handles the messages
/**
* Execute
*
* #param {Array} records SQS records
* #returns {Promise<*[]>} batch item failures
*/
async function execute (records) {
let batchItemFailures = [];
for (let index = 0; index < records.length; index++) {
const record = records[index];
// ...some async stuff here
if (someSuccessCondition) {
console.log('Life is good');
} else {
batchItemFailures.push({
itemIdentifier: record.messageId
});
}
}
return batchItemFailures;
}

How do I fail a specific SQS message in a batch from a Lambda?

I have a Lambda with an SQS trigger. When it gets hit, a batch of records from SQS comes in (usually about 10 at a time, I think). If I return a failed status code from the handler, all 10 messages will be retried. If I return a success code, they'll all be removed from the queue. What if 1 out of those 10 messages failed and I want to retry just that one?
exports.handler = async (event) => {
for(const e of event.Records){
try {
let body = JSON.parse(e.body);
// do things
}
catch(e){
// one message failed, i want it to be retried
}
}
// returning this causes ALL messages in
// this batch to be removed from the queue
return {
statusCode: 200,
body: 'Finished.'
};
};
Do I have to manually re-add that ones message back to the queue? Or can I return a status from my handler that indicates that one message failed and should be retried?
As per AWS documentation, SQS event source mapping now supports handling of partial failures out of the box. Gist of the linked article is as follows:
Include ReportBatchItemFailures in your EventSourceMapping configuration
The response syntax in case of failures has to be modified to have:
{
"batchItemFailures": [
{ "itemIdentifier": "id2" },
{ "itemIdentifier": "id4" }
]
}
Where id2 and id4 the failed messageIds in a batch.
Quoting the documentation as is:
Lambda treats a batch as a complete success if your function returns
any of the following
An empty batchItemFailure list
A null batchItemFailure list
An empty EventResponse
A null EventResponse
Lambda treats a batch as a complete failure if your function returns
any of the following:
An invalid JSON response
An empty string itemIdentifier
A null itemIdentifier
An itemIdentifier with a bad key name
An itemIdentifier value with a message ID that doesn't exist
SAM support is not yet available for the feature as per the documentation. But one of the AWS labs example points to its usage in SAM and it worked for me when tested
Yes you have to manually re-add the failed messages back to the queue.
What I suggest doing is setting up a fail count, so that if all messages failed you can simply return a failed status for all messages, otherwise if the fail count is < 10 then you can individually send back the failed messages to the queue.
You've to programmatically delete each message from after processing it successfully.
So you can have a flag set to true if anyone of the messages failed and depending upon it you can raise error after processing all the messages in a batch so successful messages will be deleted and other messages will be reprocessed based on retry policies.
So as per the below logic only failed and unprocessed messages will get retried.
import boto3
sqs = boto3.client("sqs")
def handler(event, context):
for message in event['records']:
queue_url = "form queue url recommended to set it as env variable"
message_body = message["body"]
print("do some processing :)")
message_receipt_handle = message["receiptHandle"]
sqs.delete_message(
QueueUrl=queue_url,
ReceiptHandle=message_receipt_handle
)
there is also another way to save successfully processed message id into a variable and perform batch delete operation based on message id
response = client.delete_message_batch(
QueueUrl='string',
Entries=[
{
'Id': 'string',
'ReceiptHandle': 'string'
},
]
)
You need to design your app iin diffrent way here is few ideas not best but will solve your problem.
Solution 1:
Create sqs delivery queues - sq1
Create delay queues as per delay requirment sq2
Create dead letter queue sdl
Now inside lambda function if message failed in sq1 then delete it on sq1 and drop it on sq2 for retry Any Lambda function invoked asynchronously is retried twice before the event is discarded. If the retries fail.
If again failed after give retry move into dead letter queue sdl .
AWS Lambda - processing messages in Batches
https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
Note :When an SQS event source mapping is initially created and enabled, or first appear after a period with no traffic, then the Lambda service will begin polling the SQS queue using five parallel long-polling connections, as per AWS documentation, the default duration for a long poll from AWS Lambda to SQS is 20 seconds.
https://docs.aws.amazon.com/lambda/latest/dg/lambda-services.html#supported-event-source-sqs
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-delay-queues.html
https://nordcloud.com/amazon-sqs-as-a-lambda-event-source/
Solution 2:
Use AWS StepFunction
https://aws.amazon.com/step-functions/
StepFunction will call lambda and handle the retry logic on failure with configurable exponential back-off if needed.
https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html
https://cloudacademy.com/blog/aws-step-functions-a-serverless-orchestrator/
**Solution 3: **
CloudWatch scheduled event to trigger a Lambda function that polls for FAILED.
Error handling for a given event source depends on how Lambda is invoked. Amazon CloudWatch Events invokes your Lambda function asynchronously.
https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
https://engineering.opsgenie.com/aws-lambda-performance-series-part-2-an-analysis-on-async-lambda-fail-retry-behaviour-and-dead-b84620af406
https://dzone.com/articles/asynchronous-retries-with-aws-sqs
https://medium.com/#ron_73212/how-to-handle-aws-lambda-errors-like-a-pro-e5455b013d10
AWS supports partial batch response. Here is example for Typescript code
type Result = {
itemIdentifier: string
status: 'failed' | 'success'
}
const isFulfilled = <T>(
result: PromiseFulfilledResult<T> | PromiseRejectedResult
): result is PromiseFulfilledResult<T> => result.status === 'fulfilled'
const isFailed = (
result: PromiseFulfilledResult<Result>
): result is PromiseFulfilledResult<
Omit<Result, 'status'> & { status: 'failed' }
> => result.value.status === 'failed'
const results = await Promise.allSettled(
sqsEvent.Records.map(async (record) => {
try {
return { status: 'success', itemIdentifier: record.messageId }
} catch(e) {
console.error(e);
return { status: 'failed', itemIdentifier: record.messageId }
}
})
)
return results
.filter(isFulfilled)
.filter(isFailed)
.map((result) => ({
itemIdentifier: result.value.itemIdentifier,
}))

ScriptHost error occured despite catching an exception

I have a simple function that takes a message from a queue and saves it to a storage table. I expect that in some cases a table entity with the same data can already exist. Because of that, I added an exception handling to skip this type of situation and mark the queue message as processed. Despite the fact that exception is handled now, the scripthost informs me about an error and the message is still in the queue.
I suppose it is caused by the fact that I'm using table binding that is on edge between host and my code. Am I right? Should I use a table client within my code instead of binding? Is there a different approach?
Sample code to generate this situation:
[FunctionName("MyFunction")]
public static async Task Run([QueueTrigger("myqueue", Connection = "Conn")]string msg, [Table("mytable", Connection = "Conn")] IAsyncCollector<DataEntity> dataEntity, TraceWriter log)
{
try
{
await dataEntity.AddAsync(new DataEntity()
{
PartitionKey = "1",
RowKey = "1",
Data = msg
});
await dataEntity.FlushAsync();
}
catch (StorageException e)
{
// when it is an exception that informs "entity already exists" skip it
}
}
When a queue trigger function fails, Azure Functions retries the function up to five times for a given queue message, including the first try.
If all five attempts fail, the functions runtime adds a message to a queue named <originalqueuename>-poison.
You can write a function to process messages from the poison queue by logging them or sending a notification that manual attention is needed.
The host.json file contains settings that control queue trigger behavior:
{
"queues": {
"maxPollingInterval": 2000,
"visibilityTimeout" : "00:00:30",
"batchSize": 16,
"maxDequeueCount": 1,
"newBatchThreshold": 8
}
}
Note: maxDequeueCount default is 5. The number of times to try processing a message before moving it to the poison queue. For your need, you could set the "maxDequeueCount":1.
Also these settings are host wide and apply to all functions. You can't control these per function currently.

Increase deployVerticle Timeout

Using Vert.x I have a verticle with a very slow startup because it depends on several slow http requests.
It is completely async, but I still receive the following error because the Timeout of deployVerticle.
(TIMEOUT,-1) Timed out after waiting 30000(ms) for a reply. address: d5c134e0-53dc-4d4f-b854-1c40a7905914, repliedAddress: my.dummy.project
I am deploying the verticle as
def name = "groovy:my.dummy.verticle"
def opts = new DeploymentOptions().setConfig(config());
vertx.deployVerticle(name, opts, { res ->
if(res.failed()){
log.error("Failed to deploy verticle " + name)
}
else {
log.info("Deployed verticle " + name)
}
})
How can I increase those 30000ms to something more suitable for me? I know that the requests will take more than a minute.
The message you're seeing is not directly related to the deployment. The message is coming from the event bus that did not receive a response to the sent message within 30 seconds.
You can increase that timeout using the DeliveryOptions http://vertx.io/docs/apidocs/io/vertx/core/eventbus/DeliveryOptions.html

Resources