AlertManager queue is dropping messages - prometheus-alertmanager

I have been using alertmanager to send alerts to pagerduty, and after a while I started getting the following error:
level=warn ts=2021-04-02T10:43:01.239Z caller=delegate.go:272 component=cluster msg="dropping messages because too many are queued" current=4102 limit=4096
This happens every 15 minutes.
Is there a way to:
increase the queue size?
connect to the queue and check what makes it stuck?
see some kind of monitoring for the queue?
I tried to search for anything that will allow me to connect to it, but have not found anything.

Related

Messages going to dead letter rather than active queue

I have configured service bus and I am sending messages to a topic in it.
I am observing a strange behavior that my messages are going to dead letter queue and not the active queue.
I have checked the properties for my topic like the auto delete on idle, default time to live but not able to figure out the reason.
I tried turning off my listener on this topic hoping some code failure causing the messages to go to dead letter. But still not able to figure out the reason.
Inspect queue's MaxDeliverCount. If dead-lettered messages exceed that value, it's an indication your code was failing to process the messages and they were dead-lettered for that reason. The reason is stated in the DeadLetterReason header. If that's the case, as suggested in the comments, log in your code the reason of failure to understand what's happening.
Additional angle to check if your message is getting aborted. This could happen when you use some library or abstraction on top of Azure Service Bus client. If it is, it will eventually get dead-lettered as well. Just like in the first scenario, you'll need some logs to understand why this is happening.

Identify if a message is already read in Azure Service Bus Topic

I have 65k records in Azure Service Bus Topic, while testing, whenever my test application is started, it reads all the 65k records. Can you please help me how can we avoid reading messages that have already read or How can we read only the messages that are send after executing test application?
From the question, it's unclear what exactly you're after. Here are a few things for consideration.
Queues/subscriptions are intended to be read by the consumers, not to store messages and access conditionally. To avoid consuming messages, you should consume those either by using ReceiveAndDelete receiving more, or PeekLock and completing the received messages.
If these messages are test messages and are not intended for the production, do not mix the environments and use different namespaces.
Alternatively, set a short TimeToLive on your test messages to get rid of those. You could also drop the entity and recreate it, but I try to avoid this if your performing testing quite often.

Azure Service Bus: Duplicate messages are processing in message queue

I am working on Azure Service Bus. My service bus queue is processing one message 3 times. My lock time of message is 5 minutes. Every message is processing max of 2 mins but I don't know why the queue is picking same message and sending to processing and the duplicate messages are picking after 5 mins only.
How can I resolve this?
With Azure Service Bus messages will be re-processed when a message is not actioned by the receiving party. An action would be completing, deferring, dead-lettering. If you don't have any of those, once LockDuration on the broker side expires, the message will be re-delivered. Additional situation when a message would be re-delivered without waiting for LockDuration to time out would be to abandon a message. Then a message is picked up right away by the next request for new messages.
You should share your code to provide enough context. Messages can be received manually using MessageReceiver.ReceiveAsync() or using user-callback API. For the first option you have to action messages (complete for example). For the other option, there's a configuration API where you could opt-out of auto-completion and would be required manually complete message passed into user-callback.

Azure Request-Response Session Timeout handling

We are using the azure service bus to facilitate the parallel processing of messages through workers listening to a queue.
First an aggregated message is received and then this message is split in thousands of individual messages which are posted through a request-response pattern since we need to know when all messages have been completed to run a separate process.
Our issue is that the request-response method has a timeout which is causing the following issue:
Lets say we post 1000 messages to be processed and there is only one worker listening. Messages left in the queue after the timeout expiration are discarded which is something that we do not want. If we set the expiry time to a large value that will guarantee that all messages will be processed then we run the risk of a message failing and having to wait the timeout to understand that something has gone wrong.
Is there a way to dynamically change the expiration of a single message in a request-response scenario or any other pattern that we should consider?
Thanks!
You got the things wrong, The Time to live of an azure service bus message https://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.brokeredmessage.timetolive.aspx It is the time on which the message will be on the queue if it is consumed or not.
This it is not the timeout, if you post a message with this larger time to live the message will stay on the queue for a long time but if you fail to consume you should warn the other end that you failed to consume this message.
You can do this using another queue and putting another message on this other queue with the message id that failed and the error.
This is an asynchronous process so you should not be holding requests based on that but work with the asynchronous nature of the problem.

NServiceBus MessageForwardingInCaseOfFaultConfig not working as expected

I've setup NServiceBus to forward failed messages to an error queue which is monitored by ServiceControl.
Here's my config:
<section name="MessageForwardingInCaseOfFaultConfig" type="NServiceBus.Config.MessageForwardingInCaseOfFaultConfig, NServiceBus.Core" />
<MessageForwardingInCaseOfFaultConfig ErrorQueue="error" />
When I send a message that fails to be processed, it's sent to the DLQ. However, I can't find a copy of this message in the error or error.log queue. When I look at the message details in AMS, the Delivery Counter is set to 7, but when I check the NSB logs, I can only find the exception once. Also, I'm a bit confused as to why this exceptions is logged as "INFO". It makes it a lot harder to detect that way, but that's a seperate concern.
Note: I'm running on Azure Service Bus Transport.
Anyone an idea of what I'm missing here?
Thanks in advance!
When a handler is trying to process a message and is failing, message will become visible and will be retried again. If delivery count set on the queue is low, the message will fail processing and ASB will natively DLQ it. That's why the message ends up in the ASB DLQ and not in the NSB`s configured error queue.
The information you see on your DLQ-ed message is confirming that. The default MaxDeliveryCount in NSB.ASB v5 is set to 6, so ASB will DLQ your message the moment message is attempted to be processed more than that.
This is due to NSB having it's own (per-instance) retry counter and not using the native DeliveryCount provided by ASB. If you have your endpoint scaled out, you'll need to adjust the MaxDeliveryCount since each instance of role can grab a message and attempt to process it. Each instance will have it's retry counter. As a result of that, instance counter could be below 6, but message DeliveryCount will exceed that.

Resources