Azure Service Bus queues close after 24 hours automatically - azure

Problem
We are developing a Azure Service Bus based Cloud Service, but after 24 hours the queue clients seem to get closed automatically.
Can someone confirm this behavior or give advise how to fix it?
At the moment we close the clients after 24 hours manually and recreate them to avoid this effect, but this can't be the only solution.

Sessions dropping intermittently is a normal occurrence. The AMQP protocol and stack in the client is newer and generally more resilient against this. The only reason not to use AMQP is if you are using transactions. Also, unless you have a good reason to run your own receive loop, use OnMessage.
You are getting ‘OperationCanceledException’ when the link fails for any reason and any in-flight requests will fail with this exception. However, this is transient, so you should be able to reuse the same QueueClient to issue receives and those should (eventually) work as the client recovers. OnMessage will hide all of that from you.

Related

Azure Service Bus ReceiveMessages with Sub processes

I thought my question was related to post "Azure Service Bus: How to Renew Lock?" but I have tried the RenewLockAsync.
Here is the concern, I am receiving messages from the ServBus with Sessions enabled so I get the session then receive messages. All good, here's the Rub.
There are TWO ADDITIONAL processes to complete per message. A manual transform / harvest of the message into some other object which is then sent out to a Kafka topic (stream). Note its all Async on top of this craziness. My team lead is insistent that the two sub processes can just be added INTO the receive process (ReceiveAsync) and finally call session.CompleteAsync() AFTER the OTHER two processes complete.
Well needles to say I'm consistently erroring with "The session lock has expired on the MessageSession. Accept a new MessageSession." with that architecture. I haven't even fleshed out the send to Kafka part its just mocked so its going to take longer once fleshed out.
Is it even remotely plausible to session.CompleteAsync() AFTER the sub processes or shouldn't that be done when the message is successfully received, then move on to other processing? I thought separate tasks would be more appropriate but again he didn't dig that idea..
I appreciate all insight and opinions thank you !
"The session lock has expired on the MessageSession. Accept a new MessageSession." indicates one of 2 things:
The lock has been open for too long, in which case calling "RenewLockAsync" before it expires would help.
The message lock has been explicitly released, through a call to CompleteAsync, AbandonAsync, DeadLetterAsync, etc. That would indicate a bug, since the lock can not be used after it has been released

Azure Function - Event Hub Trigger stopped

I've got an Azure Function app in production on an event hub trigger, it's low throughput with the function typically only being triggered once daily. It's running on an S1 plan at the moment and has a few other functions such as timer triggered and HTTP triggered.
It's been running fine but today it stopped being triggered by new messages until I restarted the app. All other functions were working just fine and responding to their associated triggers.
I've look through App Insights and there are no reported errors or issues, it's just not doing anything.
Has anyone else had this issue or know of what may be causing it?
First of all - is your App Service has Always On enabled?
Second thing - have you tried to test your trigger locally, so you can be sure, that there are no issues with your Event Hub?
Personally, I faced such issues when Event Host Processor implemented in EventHubTrigger was losing a lease because of additional processor introduced. It is also possible, that since it faces a low throughput, it lost a lease and for some reason was not able to renew it:
As an instance of EventProcessorHost starts it will acquire as many
leases as possible and begin reading events. As the leases draw near
expiration EventProcessorHost will attempt to renew them by placing a
reservation. If the lease is available for renewal the processor
continues reading, but if it is not the reader is closed and
CloseAsync is called - this is a good time to perform any final
cleanup for that partition.
https://blogs.msdn.microsoft.com/servicebus/2015/01/21/event-processor-host-best-practices-part-2/
Nonetheless, it is worth to contact the support to make sure there were no other issues.

Using MSMQ Across a Network with Multiple Users vs One User Locally

I recently created an error manager to take logged errors from clients on our network and put them into an MSMQ for processing. I have a separate Windows Service running on the server to pick items off the queue and push them into a database.
When I wrote it and tested it everything worked great; however I neglected to consider that at deploy-time, having 100 clients all sending to a public queue might not be performant, best-case, and worst-case there could be all kinds of collisions, it seems to me.
My thought right now is to front the MSMQ with a WCF service and make everyone go through that. The logic being that at that point I could employ some locking, etc. If I went with a service I think I could employ a private queue instead of a public one, which would be tons faster, as well.
What I'm not sure is, am I overthinking it? MSMQ is pretty robust and the methods I think are thread-safe. Should I just leave it alone and see what happens? If I do put in the service, how much management would I need to have in place?
I recently created an error manager to take logged errors from clients
on our network and put them into an MSMQ for processing
I assume you're using System.Messaging for this? If so there is nothing at all wrong with your approach.
having 100 clients all sending to a public queue might not be
performant
MSMQ was designed from the bottom up to handle high load. Depending on the size of the individual messages and the storage threshold of the machine, a queue can hold 10's of thousand of messages without any noticeable performance impact.
Because a "send" in MSMQ involves the queue manager on each machine writing messages locally before transmission (in a store and forward messaging pattern), there is almost no chance of "collisions" or any other forms of contention happening; if the sender is unable to transmit the message it simply "sends" it to a temporary local queue and then the actual transmission happens in the background and is mediated by the fault tolerant and very reliable msmq protocol.
My thought right now is to front the MSMQ with a WCF service and make
everyone go through that
This would be a valid choice if you were starting from nothing. As another poster has stated, WCF does hide you from some of the msmq-voodoo by removing the necessity to use System.Messaging. However, you've already written the code so I see little benefit exposing a netMsmqBinding endpoint.
If I went with a service I think I could employ a private queue
instead of a public one
As far as I understand it from your description, there's nothing to stop you using a private queue in your current scenario. In fact I'd recommend always using private queues as they're much simpler.
If I do put in the service, how much management would I need to have
in place?
You will have more management overhead with a wcf service. Because you're wrapping each end of a send-receive with the WCF stack, there is more code to spin up and therefore potentially fail. WCF stack exceptions are famously difficult to troubleshoot without full service logging enabled.
EDIT - in response to comments
I think for a private queue you have to actually be writing FROM the
machine the queue sits on, which would not work in a networked
environment
Untrue. MSMQ supports transactional reads to and writes from any private queue, regardless of whether the queue is local or remote.
This is because any time a message is sent from one machine to another in msmq, regardless of the queue address, the following happens:
Queue manager on sending machine writes the message to a temporary local "outbound" queue.
Queue manager on sending machine contacts queue manager on receiving machine and transmits the message.
Queue manager on receiving machine places the message into the destination queue.
If you are using transactions, the above steps will comprise 3 distinct transactions.
Something to remember: the safest paradigm in exchanging messages between queues on different machines is send remote, read local.
So this means when you send a message, you're instructing msmq to send to a remote queue address. However, when someone sends something to you, they must do the same. So you end up reading only from local queues, and sending only to remote queues.
This way you get the most reliable messaging setup, because when reading, a local queue will always be available.
Try it! I've been using msmq for cross machine communication for nearly 10 years and I've never used a public queue. I don't even know what they're for!
I would expose an WCF "IsOneWay" method.
And then host your WCF in IIS.
The IsOneWay will wire up to MSMQ.
This way...you have the robustness of IIS hosting. You can expose any endpoint you want.
But eventually the request makes it to MSMQ.
One of hte reasons is the ease of using msmq with wcf. Having written and used msmq "pre-wcf" I found the code (pulling messages off the queue and error handling) to be difficult and problematic. That alone would push me to WCF hosting.
And as you mention, the security around a local-queue is much easier to deal with.
Bottom line, let WCF handle the msmq-voodoo for you.
Simple example below.
[ServiceContract]
public interface IMyControllerController
{
[OperationContract(IsOneWay = true)]
void SubmitRequest( MyObject obj );
}
http://msdn.microsoft.com/en-us/library/ms733035%28v=vs.110%29.aspx
http://msdn.microsoft.com/en-us/library/system.servicemodel.operationcontractattribute.isoneway%28v=vs.110%29.aspx
What happens in WCF to methods with IsOneWay=true at application termination
http://blogs.msdn.com/b/tomholl/archive/2008/07/12/msmq-wcf-and-iis-getting-them-to-play-nice-part-1.aspx

Properly handle Azure MessagingCommunicationException?

I've got several long-running processes that listen on the same azure servicebus topic. After an extended time of running (usually a few days), I get one of these exceptions in one of the processes (and they all seem to stop working). The message itself and the documentation suggest that the answer is to re-try the connection. At first I was just trying to create a new TopicClient, but then found out the actual connection was held by the MessagingFactory. I have now tried creating a whole new MessagingFactory as well, but that doesn't seem to be working either.
What is the proper way to handle this exception? An example (even pseudocode) would be great.

Windows Azure Service Bus Queues: Throttling and TOPAZ

Today at a customer we analysed the logs of the previous weeks and we found the following issue regarding Windows Azure Service Bus Queues:
The request was terminated because the entity is being throttled.
Please wait 10 seconds and try again.
After verifying the code I told them to use the Transient Fault Handing Application Block (TOPAZ) to implement a retry policy like this one:
var retryStrategy = new Incremental(5, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(2));
var retryPolicy = new RetryPolicy<ServiceBusTransientErrorDetectionStrategy>(retryStrategy);
The customer answered:
"Ah that's great, so it will also handle the fact that it should wait
for 10 seconds when throttled."
Come to think about it, I never verified if this was the case or not. I always assumed this was the case. In the Microsoft.Practices.EnterpriseLibrary.WindowsAzure.TransientFaultHandling assembly I looked for code that would wait for 10 seconds in case of throttling but didn't find anything.
Does this mean that TOPAZ isn't sufficient to create resilient applications? Should this be combined with some custom code to handle throttling (ie: wait 10 seconds in case of a specific exception)?
As far as throttling concerned, Topaz provides a set of built-in retry strategies, including:
- Fixed interval
- Incremental intervals
- Random exponential back-off intervals
You can also write your custom retry stragey and plug-it into Topaz.
Also, as Brent indicated, 10 sec wait is not mandatory. In many cases, retrying immediately may succeed without the need to wait. By default, Topaz performs the first retry immediately before using the retry intervals defined by the strategy.
For more info, see Ch.6 of the "Building Elastic and Resilient Cloud Apps" Developer's Guide, also available as epub/mobi/pdf from here.
If you have suggestions/feature requests for Topaz, please submit them via the uservoice.
As I recall, the "10 second" wait isn't a requirement. Additionally, TOPAZ I believe also has backoff capabilities which would help you over come thing.
On a personal note, I'd argue that simply utilzing something like TOPAZ is not sufficient to creating a truely resilient solution. Resiliency goes beyond just throttling on a single connection point, you'll also need to be able to handle failover to a redundant endpoint which TOPAZ won't do.

Resources