NServiceBus - messages going to error queue but no exception logged? - log4net

I've got some messages that are intermittently failing in production and ending up in the error queue. Each time I run ReturnToSourceQueue, a proportion succeed, and a proportion fail again (so if I keep running ReturnToSourceQueue eventually all of them empty from the error queue).
What I can't figure out is why I'm not seeing any exceptions in the logs? For the messages that succeed, I see a "Received message..." entry in the log4net logs, and it appears in the audit queue. However, for the messages that fail, they go to the error queue, but there's no log entry.
The process that's receiving these messages is an ASP.NET application if that makes any difference.
Does anyone have any pointers?
Thanks

Do you have a try-catch inside your message handler? This may be masking the error.

Related

Python ServiceBus Message Lock Expiry Issue

Im running into an issue with the message lock expiry in SerivceBus using Python SDK.
I've created a simple tool that will clear dead-lettered messages in queues and subscriptions. We have various amount of queues and subscriptions and the tool acts as a clean-up job which I'm just running locally.
The tool works fine however because its a "clean-up" tool we might have messages banked in DLQ which have been there for months.
Im running into a issue where im trying to complete these messages however its throwing azure.servicebus.exceptions.ServiceBusError: The lock on the message lock has expired - exception.
I thought I was able to resolve this issue but using AutoLockRenewer and actually renewing the lock on the message before completing it, however, the exception seems to still be getting thrown.
It's strange because say the exception gets thrown the tool will stop running, once I re-run the tool it's able to complete messages where it previously couldn't, but it will eventually find a message in another queue/subscription where it will break due to the lock. So after each re-run it's able to clear DLQ in more and more queues/subscriptions, it doesn't break at the same point as the previous run.
This is a snippet of my code:
renewer = AutoLockRenewer()
with ServiceBusClient.from_connection_string(shared_access_key["primaryConnectionString"]) as client:
for queue in queues:
if queue["countDetails"]["deadLetterMessageCount"] > 0:
with client.get_queue_receiver(queue_name=queue["name"], sub_queue="deadletter") as receiver:
while len(receiver.receive_messages(max_wait_time=60)) > 0:
messages = receiver.receive_messages(max_message_count=50, max_wait_time=60)
for message in messages:
renewer.register(receiver, message, max_lock_renewal_duration=300)
receiver.complete_message(message)

Messages going to dead letter rather than active queue

I have configured service bus and I am sending messages to a topic in it.
I am observing a strange behavior that my messages are going to dead letter queue and not the active queue.
I have checked the properties for my topic like the auto delete on idle, default time to live but not able to figure out the reason.
I tried turning off my listener on this topic hoping some code failure causing the messages to go to dead letter. But still not able to figure out the reason.
Inspect queue's MaxDeliverCount. If dead-lettered messages exceed that value, it's an indication your code was failing to process the messages and they were dead-lettered for that reason. The reason is stated in the DeadLetterReason header. If that's the case, as suggested in the comments, log in your code the reason of failure to understand what's happening.
Additional angle to check if your message is getting aborted. This could happen when you use some library or abstraction on top of Azure Service Bus client. If it is, it will eventually get dead-lettered as well. Just like in the first scenario, you'll need some logs to understand why this is happening.

Azure service bus dead letter

I am trying to resubmit all the deadletter messages back to its original queue. When I resubmit the message, it's again moving to deadletter.
This time I thought there might be some problem with the message. When I debugged it, it was having no problem.
Can anyone help me out?
Possible scenarios your messages end up in the DLQ are:
Too slow processing, message LockDuration expires and message is retried again until all of the DeliveryCounts are exhausted and message is DLQ-ed.
You have an aggressive PrefetchCount. Messages that are prefetched and not processed within LockDuration time are subject to DeliveryCount increase (see #1)
Too short of LockDuration causing messages to being processed while the re-appear on the queue and picked up by other processing instances (or if you use OnMessage API with concurrency> 1).
Processing constantly failing, causing message eventually to end up in a DLQ.
I suspect you have #4. Not sure how you re-submit, but you have to clone the message and send it back. There was a similar question here, have a look.

NServiceBus MessageForwardingInCaseOfFaultConfig not working as expected

I've setup NServiceBus to forward failed messages to an error queue which is monitored by ServiceControl.
Here's my config:
<section name="MessageForwardingInCaseOfFaultConfig" type="NServiceBus.Config.MessageForwardingInCaseOfFaultConfig, NServiceBus.Core" />
<MessageForwardingInCaseOfFaultConfig ErrorQueue="error" />
When I send a message that fails to be processed, it's sent to the DLQ. However, I can't find a copy of this message in the error or error.log queue. When I look at the message details in AMS, the Delivery Counter is set to 7, but when I check the NSB logs, I can only find the exception once. Also, I'm a bit confused as to why this exceptions is logged as "INFO". It makes it a lot harder to detect that way, but that's a seperate concern.
Note: I'm running on Azure Service Bus Transport.
Anyone an idea of what I'm missing here?
Thanks in advance!
When a handler is trying to process a message and is failing, message will become visible and will be retried again. If delivery count set on the queue is low, the message will fail processing and ASB will natively DLQ it. That's why the message ends up in the ASB DLQ and not in the NSB`s configured error queue.
The information you see on your DLQ-ed message is confirming that. The default MaxDeliveryCount in NSB.ASB v5 is set to 6, so ASB will DLQ your message the moment message is attempted to be processed more than that.
This is due to NSB having it's own (per-instance) retry counter and not using the native DeliveryCount provided by ASB. If you have your endpoint scaled out, you'll need to adjust the MaxDeliveryCount since each instance of role can grab a message and attempt to process it. Each instance will have it's retry counter. As a result of that, instance counter could be below 6, but message DeliveryCount will exceed that.

Automatic reboot whenever there's an uncaught exception in a continous WebJob

I'm currently creating a continous webjob that will do polling to an API, and then forward messages to an Azure Service Bus. I've managed to get this to work just fine, but I have one problem; what if my app crashes for whatever reason? What if there's an uncaught exception, or something goes wrong, and the app stops running. How do i get it to run again?
I created a test app, which will send a message every to the Service Bus, then on the 11th message it will crash due to an intentionally placed NullReferenceException. I did this in order to investigate behaviour whenever/if the app crashes.
What happens is that the app runs just fine for the first 10 seconds (as expected). Messages are being sent, and everything looks good. Then after the 10th second, when the exception occurs, nothing happens. No log in Azure saying there was an exception, no reboot - nothing. It just stands there as "running", but messages are no longer being sent.
How do I deal with this? It's essential that the application is able to reboot if it fails. Are there any standard ways to do this? Best practices?
Any help would be appreciated :)
It is always good to handle most of the failure scenarios in the system by ourselves rather than to let the hosting environment to react for the failures.
My suggestion would be to have a check in the code for exceptions like any try catch block in your executable script to catch different kind of failure scenarios and instead of throwing the exceptions, log it your self or take any retry operation if required.
Example, when you got a junk data to process and it failed. Then you can try to do the operation again for eg. 3 times and then finally push a log to deadletter account to manually take care of such junk inputs. And don't let the flow be stopped by throwing the exception but instead handle it your self by logging a message which needs manual intervention.
In any GUI or Web applications, if there is an exception then the flow is re initiated by user click and system will respond. But here as it a background processor, it is ideal to avoid all such control flow blockers.
Hope this would help.

Resources