Transaction handling Rabbit MQ and Spring AMQP - spring-transactions

I am trying to understand few things here. My requirement was I want to store record in db and want to send message to queue and then lets say in same method if it throws some exception I don't want to send message and don't want to commit db transaction.
Now I thought of using spring transaction but since two different resources, thought of using JTA using some atomikos to synchronize resources - but again I read RMQ do not support 2PC or XA etc.
Anyway I went ahead and tried first without adding atomikos, all I did is made sure that my channel transacted and #Transaction annotation took care, see below sample code - I didn't added anything special in pom.
Now my question is how is this working, how is this different from 2PC - and what can go wrong with approach and what situations can mess up eventual consistency using this method. And surprisingly why I didn't have to use third party jta. If all is good with this - this seems to me eventual consistency guarantee when we use rmq and db using spring goodies! for microservices :)
If this is not good solution what are alternatives - I would like to avoid worker process etc if possible for eventual consistency.
#Bean
public RabbitTemplate rabbitTemplate(ConnectionFactory connectionFactory) {
RabbitTemplate rabbitTemplate = new RabbitTemplate(connectionFactory);
rabbitTemplate.setChannelTransacted(true);
return rabbitTemplate;
}
#GetMapping
#Transactional
public void sampleEndpoint(#RequestParam boolean throwException){
Customer a=new Customer();
a.setCustomerName("XYZ");
customerRepository.save(a);
rabbitTemplate.convertAndSend("txtest","Test");
if(throwException)
throw new RuntimeException();
}
I used postgres dependency for above example using spring boot 1.5.7

I suggest you read Dave Syer's article: Distributed transactions in Spring, with and without XA.
You need to start the Rabbit transaction before the database transaction so the rabbit transaction is synchronized with the DB transaction and commits very soon after the DB tx and rolls back if the DB tx rolls back.
There is a small possibility that the DB tx commits successfully but the Rabbit tx rolls back. This is called "Best Effort 1PC" in the article. You need to deal with the small possibility of duplicate messages.
You don't show all your configuration but it appears your Rabbit tx will commit before the DB, which is probably not what you want.

Regarding the "How is it working" question, this quote from spring-amqp documentation clarifies:
If there is already a transaction in progress when the framework is sending or receiving a message, and the channelTransacted flag is true, then the commit or rollback of the messaging transaction will be deferred until the end of the current transaction. If the channelTransacted flag is false, then no transaction semantics apply to the messaging operation (it is auto-acked).
My understanding is that, for your use case, you do not even need to configure a ChainedTransactionManager in order to implement Best Effort 1PC. #Transactional will be enough and the rabbit tx will commit right after the DB tx.

Related

How to handle publishing event when message broker is out?

I'm thinking how can I handle sending events when suddenly message broker go down. Please take a look at this code
using (var uow = uowProvider.Create())
{
...
...
var policy = offer.Buy(customer);
uow.Policies.Add(policy);
// DB changes are saved here! but what would happen if...
await uow.CommitChanges();
// ...eventPublisher throw an exception?
await eventPublisher.PublishMessage(PolicyCreated(policy));
return true;
}
IMHO if eventPublisher throw exception the event PolicyCreated won't be published. I don't know how to deal with this situation. The event must be published in system. I suppose that only good solution will be creating some kind of retry mechanism but I'm not sure...
I would like to elaborate a bit on the answers provided by both #Imran Arshad and #VoiceOfUnreason which are, of course, correct.
There are basically 3 patterns when it comes to publishing messages:
exactly once delivery (requires distributed transactions)
at most once delivery (no distributed transaction but may miss messages - like the actor model)
at least once delivery (no distributed transaction but may have duplicate messages)
The following is all in terms of your example.
For exactly once delivery both the database and the queue would need to provide the ability to enlist in distributed transactions. Some queues do not proivde this functionality out-of-the-box (like RabbitMQ) and even though it may be possible to roll your own it may not be the best option. Distributed transactions are typically quite slow.
For at most once delivery we have to accept that we may miss messages and I'm guessing that in most use-cases this is quite troublesome. You would get around this by tracking the progress and picking up the missed messages and resending them if required.
For at least once delivery we would need to ensure that the messages are idempotent. When we get a duplicate messages (usually quite an edge case) they should be ignored or their outcome should be the same as the initial message processed.
Now, there are a couple of ways around your issue. You could start a database transaction and make your database changes. Before you comit you perform the message sending. Should that fail then your transaction would be rolled back. That works fine for sending a single message but in your case some subscribers may have received a message. This complicates matters as all your subscribers need to receive the message or none of them get to receive it.
You could have your subscriber check whether the state is indeed true and whether it should continue processing. This places a burden on the subscriber and introduces some coupling. It could either postpone the action should the state not allow processing, or ignore it.
Another option is that instead of publishing the event you send yourself a command that indicates completion of the step. The command handler would perform the publishing and retry until all subscriber queues receive the message. This would require the relevant subscribers to ignore those messages that they had already processed (idempotence).
The outbox is a store-and-forward approach and will eventually send the message to all subscribers. You could have your outbox perhaps be included in the database transaction. In my Shuttle.Esb service bus one of the folks that used it came across a weird side-effect that I had not planned. He used a sql-based queue as an outbox and the queue connection was to the same database. It was therefore included in the database transasction and would roll back with all the other changes if not committed. Apologies for promoting my own product but I'm sure other service bus offerings may have the same functionality.
There are therefore quite a few things to consider and various techniques to mitigate the risk of a queue outage. I would, however, move the queue interaction to before the database commit.
For reliable system you need to save events locally. If your broker is down you have to retry and publish event.
There are many ways to achieve this but most common is outbox pattern. Just like your mail box your event/message stays locally and you keep retrying until it's sent and you mark the message published in your local DB.
you can read more about here Publish Events
You'll want to review Udi Dahan's discussion of Reliable Messaging without Distributed Transactions.
But very roughly, the PolicyCreated event becomes part of the unit of work; either because it is saved in the Policy representation itself, or because it is saved in an EventRepository that participates in the same transaction as the Policies repository.
Once you've captured the information in your database, retry the publish is relatively straight forward - read the events from the database, publish, optionally mark the events in the database as successfully published so that they can be cleaned up.

How to control idempotency of messages in an event-driven architecture?

I'm working on a project where DynamoDB is being used as database and every use case of the application is triggered by a message published after an item has been created/updated in DB. Currently the code follows this approach:
repository.save(entity);
messagePublisher.publish(event);
Udi Dahan has a video called Reliable Messaging Without Distributed Transactions where he talks about a solution to situations where a system can fail right after saving to DB but before publishing the message as messages are not part of a transaction. But in his solution I think he assumes using a SQL database as the process involves saving, as part of the transaction, the correlationId of the message being processed, the entity modification and the messages that are to be published. Using a NoSQL DB I cannot think of a clean way to store the information about the messages.
A solution would be using DynamoDB streams and subscribe to the events published either using a Lambda or another service to transformed them into domain-specific events. My problem with this is that I wouldn't be able to send the messages from the domain logic, the logic would be spread across the service processing the message and the Lambda/service reacting over changes and the solution would be platform-specific.
Is there any other way to handle this?
I can't say a specific solution based on DynamoDB since I've not used this engine ever. But I've built an event driven system on top of MongoDB so I can share my learnings you might find useful for your case.
You can have different approaches:
1) Based on an event sourcing approach you can just save the events/messages your use case produce within a transaction. In Mongo when you are just inserting/appending new items to the same collection you can ensure atomicity. Anyway, if the engine does not provide that capability the query operation is so centralized that you are reducing the possibility of an error at minimum.
Once all the events are stored, you can then consume them and project them to a given state and then persist the updated state in another transaction.
Here you have to deal with eventual consistency as data will be stale in your read model until you have projected the events.
2) Another approach is applying the UnitOfWork pattern where you cache all the query operations (insert/update/delete) to save both events and the state. Once your use case finishes, you execute all the cached queries against the database (flush). This way although the operations are not atomic you are again centralizing them quite enough to minimize errors.
Of course the best is to use an ACID database if you require that capability and any other approach will be a workaround to get close to it.
About publishing the events I don't know if you mean they are published to a messaging transportation mechanism such as rabbitmq, Kafka, etc. But that must be a background process where you fetch the events from the DB and publishes them in order to break the 2 phase commit within the same transaction.

Transaction Synchronization in Spring Kafka

I want to synchronize a kafka transaction with a repository transaction:
#Transactional
public void syncTransaction(){
myRepository.save(someObject)
kafkaTemplate.send(someEvent)
}
Since the merge (https://github.com/spring-projects/spring-kafka/issues/373) and according to the doc this is possible. Nevertheless i have problems to understand and implement that feature.
Looking at the example in https://docs.spring.io/spring-kafka/reference/html/#transaction-synchronization I have to create a MessageListenerContainer to listen to my own events.
Do I still have to send my events using the KafkaTemplate?
Does the MessageListenerContainer prohibit the sending to the broker?
And if i understand correctly the kafkaTemplate und the kafkaTransactionManager have to use the same producerFactory in which i have to enable Transaction setting a transactionIdPrefix. And in my example i have to set the TransactionManager of the messageListenerContainer to the DataSourceTransactionManager. Is that correct?
From my perspective it looks weird that I send an event via kafkaTemplate, listen to my own event and forward the event using the kafkaTemplate again.
I would really help me if i can get an example for a simple synchronization of a kafka transaction with a repository transaction and an explanation.
If the listener container is provisioned with a KafkaTransactionManager, the container will create a producer which will be used by any downstream kafka template and the container will send the offsets to the transaction for you.
If the container has some other transaction manager, the container can't send the offsets since it doesn't have access to the producer (or template).
Another solution is to annotate your method with #Transactional (with the datasource TM) and configure the container with a kafka TM.
That way, your DB tx will commit just before the thread returns to the container which will then send the offsets to the kafka transaction and commit it.
See the framework test cases for examples.
#Eike Behrends to have a db + kafka transaction, you can use ChainedTransactionManager and define it this way :
#Bean
public KafkaTransactionManager kafkaTransactionManager() {
KafkaTransactionManager ktm = new KafkaTransactionManager(producerFactory());;
ktm.setTransactionSynchronization(AbstractPlatformTransactionManager.SYNCHRONIZATION_ON_ACTUAL_TRANSACTION);
return ktm;
}
#Bean
#Primary
public JpaTransactionManager transactionManager(EntityManagerFactory em) {
return new JpaTransactionManager(em);
}
#Bean(name = "chainedTransactionManager")
public ChainedTransactionManager chainedTransactionManager(JpaTransactionManager jpaTransactionManager,
KafkaTransactionManager kafkaTransactionManager) {
return new ChainedTransactionManager(kafkaTransactionManager, jpaTransactionManager);
}
You need to annotate your transactional db+kafka methods #Transactional("chainedTransactionManager")
(you can see the issue on spring-kafka project : https://github.com/spring-projects/spring-kafka/issues/433 )
You say :
From my perspective it looks weird that I send an event via
kafkaTemplate, listen to my own event and forward the event using the
kafkaTemplate again.
Have you tried this ? If so can you provide an example please ?
For achieving your target you should use a different "eventually consistent" approach like CDC (Change Data Capture). There are no atomic transactions between Kafka writes and any other system (e.g. a database) - aka XA transactions. It is a complete paradigm swift when you have distributed services (some call them microservices) that in your case probably communicate by producing/ consuming to/ from Kafka topics.
TL;DR: just use upsert / merge.
Accidentally seen this old topic and after so many years people still struggle.
Just want to share simplest and most native approach to deal with such systems as kafka.
The real issue why people come here for an answer is old approach of distributed transactions. And most ones want to synchronize non-transactional (kafka named it's functionality as transactions but they are "special" actually) kafka with some ACID database.
If your service is working within idempotent environment - everything downstream should be idempotent too.
Just make sure your operations to underlying storage are idempontent, the simplest approach are upsert / merge (depends on the storage).
P.s. CDC is a thing, but it requires much more labor cost and is unnecessary in most typical cases.
MORE :
If you want to dig about why kafka "transactions" are special, here are good starting points (explained within eos):
for newer versions: https://www.youtube.com/watch?v=j0l_zUhQaTc
for older: https://www.youtube.com/watch?v=zm5A7z95pdE
EDIT
Very interesting why this answer got downvotes... Just check this issue/comments/related issues https://github.com/spring-projects/spring-data-commons/issues/2232 - thats why one would not want to use ChainedTransactionManager for business-critical Transactions (it can't act as a real 2PC by design).

C# Masstransit how to handle exception when the queue is not available or down

I am using Masstransit with RabbitMQ to consume message from queue. Can anyone tell me how to handle exception when the queue is down or not available to get the message? following is my setup:
var busControl = Bus.Factory.CreateUsingRabbitMq(cfg =>
{
var host = cfg.Host(new Uri(configManager.RabbitMqUrl), h =>
{
h.Username(configManager.RabbitMqUserName);
h.Password(configManager.RabbitMqPassword);
});
cfg.ReceiveEndpoint(host, RabbitMqConstants.Change, e =>
{
e.UseRetry(Retry.Immediate(configManager.ProcessorRetryNumber));
e.Handler<ChangeDetected>(context =>
{
var task = Task.Run(() => consumer.Consume(context));
return task;
});
});
});
Thanks
In our in-house RabbitMQ messaging implementation, we have approached/solved the publishing side of this (broker not available when want to publish) in two ways:
[1] We use Polly to asynchronously orchestrate a limited number of publishing retries (with delay between tries). This overcomes situations where loss of connectivity to the broker is a minor network blip.
[2] If all publish retries fail, we use a 'message hospital' concept: we store enough detail about the failed-to-publish message to an alternative source (database; with additional failover to local file store), such that we can republish the failed messages later, if desired. A variant on 'store and forward' (we can republish in bulk, but we also allow manual intervention to choose whether to republish).
All depends how important it is to you 'never to lose a message'. Some redundancy of RabbitMQ brokers (clustering as Chris Patterson suggests or federation) is also an obvious step. Clustering/federation gives you protection if you lose/want to do maintenance on one/some of your brokers. The resilience strategies [1], [2] above give you protection if for some reason the message publisher can't see any RabbitMQ broker (for example network fault nearer the publisher).
For receiving messages, MassTransit will automatically reconnect to the broker (RabbitMQ) when it comes back online. For sending messages, if your application is unable to connect to the broker to send, that's another problem entirely.
When using messaging in applications, it often becomes the single most important aspect of your infrastructure. So if you need high availability, then a cluster setup may be in your future (there are articles on clustering RabbitMQ out there).
MassTransit does not have any store-and-forward concepts in it, the broker needs to be available. While a few options have been discussed, nothing is concrete at this point nor generally available.
After reading its documents, I realized that MassTransit just does not handle situations that the producer failed to send/publish to MQ or the consumer failed to send back the ACK.
So I have to go with another tool CAP, which implemented a local transaction table. You can put the send message action within the same local DB transaction of your business code. But the cons is that the CAP does not have saga implemented yet.
Otherwise, you have to implement a durable outbox pattern with the local transaction table by yourself.

Service Fabric actors that receive events from other actors

I'm trying to model a news post that contains information about the user that posted it. I believe the best way is to send user summary information along with the message to create a news post, but I'm a little confused how to update that summary information if the underlying user information changes. Right now I have the following NewsPostActor and UserActor
public interface INewsPostActor : IActor
{
Task SetInfoAndCommitAsync(NewsPostSummary summary, UserSummary postedBy);
Task AddCommentAsync(string content, UserSummary, postedBy);
}
public interface IUserActor : IActor, IActorEventPublisher<IUserActorEvents>
{
Task UpdateAsync(UserSummary summary);
}
public interface IUserActorEvents : IActorEvents
{
void UserInfoChanged();
}
Where I'm getting stuck is how to have the INewsPostActor implementation subscribe to events published by IUserActor. I've seen the SubscribeAsync method in the sample code at https://github.com/Azure/servicefabric-samples/blob/master/samples/Actors/VS2015/VoiceMailBoxAdvanced/VoicemailBoxAdvanced.Client/Program.cs#L45 but is it appropriate to use this inside the NewsPostActor implementation? Will that keep an actor alive for any reason?
Additionally, I have the ability to add comments to news posts, so should the NewsPostActor also keep a subscription to each IUserActor for each unique user who comments?
Events may not be what you want to be using for this. From the documentation on events (https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-reliable-actors-events/)
Actor events provide a way to send best effort notifications from the
Actor to the clients. Actor events are designed for Actor-Client
communication and should NOT be used for Actor-to-Actor communication.
Worth considering notifying the relevant actors directly or have an actor/service that will manage this communication.
Service Fabric Actors do not yet support a Publish/Subscribe architecture. (see Azure Feedback topic for current status.)
As already answered by charisk, Actor-Events are also not the way to go because they do not have any delivery guarantees.
This means, the UserActor has to initiate a request when a name changes. I can think of multiple options:
From within IUserAccount.ChangeNameAsync() you can send requests directly to all NewsPostActors (assuming the UserAccount holds a list of his posts). However, this would introduce additional latency since the client has to wait until all posts have been updated.
You can send the requests asynchronously. An easy way to do this would be to set a "NameChanged"-property on your Actor state to true within ChangeNameAsync() and have a Timer that regularly checks this property. If it is true, it sends requests to all NewsPostActors and sets the property to false afterwards. This would be an improvement to the previous version, however it still implies a very strong connection between UserAccounts and NewsPosts.
A more scalable solution would be to introduce the "Message Router"-pattern. You can read more about this pattern in Vaughn Vernon's excellent book "Reactive Messaging Patterns with the Actor Model". This way you can basically setup your own Pub/Sub model by sending a "NameChanged"-Message to your Router. NewsPostActors can - depending on your scalability needs - subscribe to that message either directly or through some indirection (maybe a NewsPostCoordinator). And also depending on your scalability needs, the router can forward the messages either directly or asynchronously (by storing it in a queue first).

Resources