I'm new to Azure Service Bus and I'm trying to establish a transactional strategy for queuing messages. Since SQL Azure doesn't support MSDTC and, therefore, with a distributed TransactionScope, I can't make use of it. So, I have a Unit of Work that can handle my database transactions manually.
The problem is that I can only find people using TransactionScope to handle both database and Azure Service Bus operations. Is there any other magical way to achieve transactions on Service Bus without using TransactionScope?
Thanks.
If you will be using cloud hosting or service like azure service bus, you should start considering to give up on two phase commits (2PC) or distributed transactions (DTC).
Instead, use a per-resource transactions (i.e. a transaction for a SQL command or a transaction for a Service Bus operation) carefully. And avoid transactions that cross that resource boundary.
You can then knit those resources/components operations together using reliable messaging and patterns like sagas, etc. for workflow management and error compensation. And scale out from there.
2PC in the cloud is hard for all sorts of reasons (but not impossible, you still can use IaaS).
2PC, as implemented by DTC, effectively depends on the coordinator and its log and connectivity to the coordinator to be very highly available. It also depends on all parties cooperating on a positive outcome in an expedient fashion. To that end, you need to run DTC in a failover cluster, because it’s the Achilles heel of the whole system and any transaction depends on DTC clearing it.
I'll quote a great example here of how to think of a 2PC transaction as a series of messages/actions and compensations
The grand canonical example for 2PC transactions is a bank account transfer. You debit one account and credit another.
These two operations need to succeed or fail together because otherwise you are either creating or destroying money (which is illegal, by the way). So that’s the example that’s very commonly used to illustrate 2PC transactions.
The catch is – that’s not how it really works, at all. Getting money from one bank account to another bank account is a fairly complicated affair that touches a ton of other accounts. More importantly, it’s not a synchronous fail-together/success-together scenario.
Instead, principles of accounting apply (surprise!). When a transfer is initiated, let’s say in online banking, the transfer is recorded in form of a message for submission into the accounting system and the debit is recorded in the account as a ‘pending’ transaction that affects the displayed balance.
From the user’s perspective, the transaction is ’done’, but factually nothing has happened, yet. Eventually, the accounting system will get the message and start performing the transfer, which often causes a cascade of operations, many of them yielding further messages, including booking into clearing accounts and notifying the other bank of the transfer.
The principle here is that all progress is forward. If an operation doesn’t work for some technical reason it can be retried once the technical reason is resolved.
If operation fails for a business reason, the operation can be aborted – but not by annihilating previous work, but by doing the inverse of previous work. If an account was credited, that credit is annulled with a debit of the same amount.
For some types of failed transactions, the ‘inverse’ operation may not be fully symmetric but may result in extra actions like imposing penalty fees.
In fact, in accounting, annihilating any work is illegal – ‘delete’ and ‘update’ are a great way to end up in prison.
You can use TransactionScope. It works pretty well. Even though you have sent a message, but something other (database update) fails, then the message will be not published to the queue.
using var scope = new TransactionScope(TransactionScopeOption.RequiresNew, TransactionScopeAsyncFlowOption.Enabled);
try
{
var messageBody = "This is my message";
var message = new Message(Encoding.UTF8.GetBytes(messageBody));
await _queueClient.SendAsync(message);
await _myOtherService.FailPotentially(); // if this method fails then message will ne rolled back
scope.Complete();
}
catch (Exception exception)
{
scope.Dispose();
_logger.LogError(" ... ", exception);
throw;
}
Based on ServiceBus documentation you must use standard pricing instad of basic
Related
I have a serverless function that receives orders, about ~30 per day. This function is depending on a third-party API to perform some additional lookups and checks. However, this external endpoint isn't 100% reliable and I need to be able to store order requests if the other API isn't available for a couple of hours (or more..).
My initial thought was to split the function into two, the first part would receive orders, do some initial checks such as validating the order, then post the request into a message queue or pub/sub system. On the other side, there's a consumer that reads orders and tries to perform the API requests, if the API isn't available the orders get posted back into the queue.
However, someone suggested to me to simply use an Azure Durable Function for the requests, and store the current backlog in the function state, using the Aggregator Pattern (especially since the API will be working find 99.99..% of the time). This would make the architecture a lot simpler.
What are the advantages/disadvantages of using one over the other, am I missing any important considerations?
I would appreciate any insight or other suggestions you have. Let me know if additional information is needed.
You could solve this problem with Durable Task Framework or Azure Storage or Service Bus Queues, but at your transaction volume, I think that's overcomplicating the solution.
If you're dealing with ~30 orders per day, consider one of the simpler solutions:
Use Polly, a well-supported resilience and fault-tolerance framework.
Write request information to your database. Have an Azure Function Timer Trigger read occasionally and finish processing orders that aren't marked as complete.
Durable Task Framework is great when you get into serious volume. But there's a non-trivial learning curve for the framework.
I'm thinking how can I handle sending events when suddenly message broker go down. Please take a look at this code
using (var uow = uowProvider.Create())
{
...
...
var policy = offer.Buy(customer);
uow.Policies.Add(policy);
// DB changes are saved here! but what would happen if...
await uow.CommitChanges();
// ...eventPublisher throw an exception?
await eventPublisher.PublishMessage(PolicyCreated(policy));
return true;
}
IMHO if eventPublisher throw exception the event PolicyCreated won't be published. I don't know how to deal with this situation. The event must be published in system. I suppose that only good solution will be creating some kind of retry mechanism but I'm not sure...
I would like to elaborate a bit on the answers provided by both #Imran Arshad and #VoiceOfUnreason which are, of course, correct.
There are basically 3 patterns when it comes to publishing messages:
exactly once delivery (requires distributed transactions)
at most once delivery (no distributed transaction but may miss messages - like the actor model)
at least once delivery (no distributed transaction but may have duplicate messages)
The following is all in terms of your example.
For exactly once delivery both the database and the queue would need to provide the ability to enlist in distributed transactions. Some queues do not proivde this functionality out-of-the-box (like RabbitMQ) and even though it may be possible to roll your own it may not be the best option. Distributed transactions are typically quite slow.
For at most once delivery we have to accept that we may miss messages and I'm guessing that in most use-cases this is quite troublesome. You would get around this by tracking the progress and picking up the missed messages and resending them if required.
For at least once delivery we would need to ensure that the messages are idempotent. When we get a duplicate messages (usually quite an edge case) they should be ignored or their outcome should be the same as the initial message processed.
Now, there are a couple of ways around your issue. You could start a database transaction and make your database changes. Before you comit you perform the message sending. Should that fail then your transaction would be rolled back. That works fine for sending a single message but in your case some subscribers may have received a message. This complicates matters as all your subscribers need to receive the message or none of them get to receive it.
You could have your subscriber check whether the state is indeed true and whether it should continue processing. This places a burden on the subscriber and introduces some coupling. It could either postpone the action should the state not allow processing, or ignore it.
Another option is that instead of publishing the event you send yourself a command that indicates completion of the step. The command handler would perform the publishing and retry until all subscriber queues receive the message. This would require the relevant subscribers to ignore those messages that they had already processed (idempotence).
The outbox is a store-and-forward approach and will eventually send the message to all subscribers. You could have your outbox perhaps be included in the database transaction. In my Shuttle.Esb service bus one of the folks that used it came across a weird side-effect that I had not planned. He used a sql-based queue as an outbox and the queue connection was to the same database. It was therefore included in the database transasction and would roll back with all the other changes if not committed. Apologies for promoting my own product but I'm sure other service bus offerings may have the same functionality.
There are therefore quite a few things to consider and various techniques to mitigate the risk of a queue outage. I would, however, move the queue interaction to before the database commit.
For reliable system you need to save events locally. If your broker is down you have to retry and publish event.
There are many ways to achieve this but most common is outbox pattern. Just like your mail box your event/message stays locally and you keep retrying until it's sent and you mark the message published in your local DB.
you can read more about here Publish Events
You'll want to review Udi Dahan's discussion of Reliable Messaging without Distributed Transactions.
But very roughly, the PolicyCreated event becomes part of the unit of work; either because it is saved in the Policy representation itself, or because it is saved in an EventRepository that participates in the same transaction as the Policies repository.
Once you've captured the information in your database, retry the publish is relatively straight forward - read the events from the database, publish, optionally mark the events in the database as successfully published so that they can be cleaned up.
The Service Bus documentation states that "the At-Most-Once semantic can be supported by using session state to store the application state and by using transactions to atomically receive messages and update the session state." "Session" here appears to refer to Service Bus' messaging sessions, which include the ability to store arbitrary state. This mechanism lets you enroll state updates in transactions along with operations on messages.
I see how this can be used to reliably maintain the state of an application that is using message sessions. If you can update application state and complete a message in the same transaction, a properly-implemented app could potentially die anywhere in execution, and on resume would be guaranteed to inherit a state that results in successful, in-order continued session processing (sample code is here, though strangely it doesn't actually use transactions, although I see how it could and what that would accomplish).
What I don't see is how any of this translates to "at-most-once" delivery. Nothing about Service Bus, including updates to session state, can be enrolled in a distributed transaction. So what exactly does "at-most-once" mean, and what does it accomplish? And what distinguishing feature of Service Bus allows it to support "at-most-once" delivery when Azure Storage queues do not?
After looking at your post and reading through the doc, I realized it wasn't really explaining at-most-once.
So I reached out to the concerned team and confirmed that it is indeed incorrect. A PR has been raised to fix the doc accordingly.
Instead, sessions and transactions together provide a higher level of consistency which is commonly referred to as exactly-once processing (which can't really be achieved just by the message broker itself but along with a receiver capable of deduplication).
PS: at-most-once is indeed possible by simply using the ReceiveAndDelete mode
Having just read Vaughn Vernon's effective aggregate design, I'm wondering about failures related to event publishing.
In the given example at page 9 (page 3 of the PDF), we call DomainEventPublisher.publish(). The event being published allows other aggregates to execute their behaviours.
What I'm wondering is: What happens if DomainEventPublisher.publish() fails ? What happens if DomainEventPublisher.publish() succeeds, but the transaction fails ?
How implementations handle these two cases ?
DomainEventPublisher.publish() is synchronous. You'd setup a generic handler (handles all events) which stores the events in the same database transaction as the business process, which means your event storage must have the ability to be transactionnal with whatever other storage mechanism you rely on to store the state of your aggregates.
Once events have been written on disk transactionnaly, you can then put them on a message queue for asynchronous delivery.
Are there other known ways to do it?
Well, rather than using a static DomainEventPublisher you could record events in a collection on the AR, just like in event sourcing and then implement a centralised mechanism to store them (e.g. transaction hooks, using aspects, etc.).
What happens if DomainEventPublisher.publish() succeeds, but the
transaction fails?
In this case I am against Vernon approach. I prefer to return the events to the application service. This way I can persist the changes performed by the aggregate using a transaction (if needed) and, if everything is Ok, I will publish the event. This also helps to keep the business layer entirely clean and pure.
In a few words; if the transaction fails then no event is raised.
What happens if DomainEventPublisher.publish() fails?
A domain event never fails, by business rules, because it's a notification of things that happened. If an aggregate said Yes to the operation and return a event expressing the business changes; then nothing in the world should say that this operation can not be done or has to be undone.
If the event fails by infrastructure then you need to have the tools to re-raise it (automatically or manually) when the outage is fixed and eventually archive the consistency in your system. Take a look at NServiceBus. It provides retries, error queues, logs and so on to never loose the events.
If the message system is down you have at least event logs that you can use to re-rise them into the message system.
I want to use event sourcing and CQRS, and so I need projections (I hope I use the proper term) to update my query databases. How can I handle database errors?
For example one of my query cache databases is not available, but I already updated the others. So the not-available database won't be in snyc with the others when it comes back to business. How will it know that it have to run for instance the last 10 domain events from the event storage? I guess I have to store information about the current state of the databases, but what if that database state storage fails? Any ideas, best practices how to solve this kind of problems?
In either case, you must tell your messaging bus that the processing failed and it should redeliver the event later, in the hope that the database will be back online then. This is essentially why we are using message bus systems with an "at least once"-delivery guarantee.
For transactional query databases, you should also rollback the transaction, of course. If your query database(s) do not support transactions, you must make sure on the application side that updates are idempotent - i.e., if your event arrives on the next delivery attempt, your projection code and/or database must be designed such that the repeated processing of the event does not harm the state of the database. This is sometimes trivial to achieve (e.g., when the event leads to a changed person's name in the projection), but often not-so-trivial (e.g., when the projection simply increments view counts). But this is what you pay for when you are using non-transactional databases.