Transaction Synchronization in Spring Kafka - spring-transactions

I want to synchronize a kafka transaction with a repository transaction:
#Transactional
public void syncTransaction(){
myRepository.save(someObject)
kafkaTemplate.send(someEvent)
}
Since the merge (https://github.com/spring-projects/spring-kafka/issues/373) and according to the doc this is possible. Nevertheless i have problems to understand and implement that feature.
Looking at the example in https://docs.spring.io/spring-kafka/reference/html/#transaction-synchronization I have to create a MessageListenerContainer to listen to my own events.
Do I still have to send my events using the KafkaTemplate?
Does the MessageListenerContainer prohibit the sending to the broker?
And if i understand correctly the kafkaTemplate und the kafkaTransactionManager have to use the same producerFactory in which i have to enable Transaction setting a transactionIdPrefix. And in my example i have to set the TransactionManager of the messageListenerContainer to the DataSourceTransactionManager. Is that correct?
From my perspective it looks weird that I send an event via kafkaTemplate, listen to my own event and forward the event using the kafkaTemplate again.
I would really help me if i can get an example for a simple synchronization of a kafka transaction with a repository transaction and an explanation.

If the listener container is provisioned with a KafkaTransactionManager, the container will create a producer which will be used by any downstream kafka template and the container will send the offsets to the transaction for you.
If the container has some other transaction manager, the container can't send the offsets since it doesn't have access to the producer (or template).
Another solution is to annotate your method with #Transactional (with the datasource TM) and configure the container with a kafka TM.
That way, your DB tx will commit just before the thread returns to the container which will then send the offsets to the kafka transaction and commit it.
See the framework test cases for examples.

#Eike Behrends to have a db + kafka transaction, you can use ChainedTransactionManager and define it this way :
#Bean
public KafkaTransactionManager kafkaTransactionManager() {
KafkaTransactionManager ktm = new KafkaTransactionManager(producerFactory());;
ktm.setTransactionSynchronization(AbstractPlatformTransactionManager.SYNCHRONIZATION_ON_ACTUAL_TRANSACTION);
return ktm;
}
#Bean
#Primary
public JpaTransactionManager transactionManager(EntityManagerFactory em) {
return new JpaTransactionManager(em);
}
#Bean(name = "chainedTransactionManager")
public ChainedTransactionManager chainedTransactionManager(JpaTransactionManager jpaTransactionManager,
KafkaTransactionManager kafkaTransactionManager) {
return new ChainedTransactionManager(kafkaTransactionManager, jpaTransactionManager);
}
You need to annotate your transactional db+kafka methods #Transactional("chainedTransactionManager")
(you can see the issue on spring-kafka project : https://github.com/spring-projects/spring-kafka/issues/433 )
You say :
From my perspective it looks weird that I send an event via
kafkaTemplate, listen to my own event and forward the event using the
kafkaTemplate again.
Have you tried this ? If so can you provide an example please ?

For achieving your target you should use a different "eventually consistent" approach like CDC (Change Data Capture). There are no atomic transactions between Kafka writes and any other system (e.g. a database) - aka XA transactions. It is a complete paradigm swift when you have distributed services (some call them microservices) that in your case probably communicate by producing/ consuming to/ from Kafka topics.

TL;DR: just use upsert / merge.
Accidentally seen this old topic and after so many years people still struggle.
Just want to share simplest and most native approach to deal with such systems as kafka.
The real issue why people come here for an answer is old approach of distributed transactions. And most ones want to synchronize non-transactional (kafka named it's functionality as transactions but they are "special" actually) kafka with some ACID database.
If your service is working within idempotent environment - everything downstream should be idempotent too.
Just make sure your operations to underlying storage are idempontent, the simplest approach are upsert / merge (depends on the storage).
P.s. CDC is a thing, but it requires much more labor cost and is unnecessary in most typical cases.
MORE :
If you want to dig about why kafka "transactions" are special, here are good starting points (explained within eos):
for newer versions: https://www.youtube.com/watch?v=j0l_zUhQaTc
for older: https://www.youtube.com/watch?v=zm5A7z95pdE
EDIT
Very interesting why this answer got downvotes... Just check this issue/comments/related issues https://github.com/spring-projects/spring-data-commons/issues/2232 - thats why one would not want to use ChainedTransactionManager for business-critical Transactions (it can't act as a real 2PC by design).

Related

Selecting one producer for multiple consumers

In a Producer-Consumer case with multiple app instances, I know I am supposed to have some type of queue for the distribution of events to the consumers. But how do I deal with the producer?
I must query a database for objects with an expired deadline every minute. That will push work to a message queue, so distribution is not a problem. My concern is that if I have multiple instances of the app, I have to make sure that only one is producing work.
Am I supposed to solve this electing a cluster leader? Is there a common algorithm or library in NodeJS for this? My guess is that I will have to reach for some magic Redis command and make my instances aware of each other.
There are always many different ways to achieve things, but my suggestion is to create an idempotent outbox table in your database, where multiple producers throw the records to be published to the message queue.
Then, you can deploy a tool like Debezium that does transaction log tailing (reads the database transaction log) and pushes the message to whatever message queue technology you're using.
Please note that it's also a good practice to implement the idempotency check on your consumers to make sure they don't process the same message twice.
Wix - How We Implemented Idempotency in a Billing System at Scale

How to control idempotency of messages in an event-driven architecture?

I'm working on a project where DynamoDB is being used as database and every use case of the application is triggered by a message published after an item has been created/updated in DB. Currently the code follows this approach:
repository.save(entity);
messagePublisher.publish(event);
Udi Dahan has a video called Reliable Messaging Without Distributed Transactions where he talks about a solution to situations where a system can fail right after saving to DB but before publishing the message as messages are not part of a transaction. But in his solution I think he assumes using a SQL database as the process involves saving, as part of the transaction, the correlationId of the message being processed, the entity modification and the messages that are to be published. Using a NoSQL DB I cannot think of a clean way to store the information about the messages.
A solution would be using DynamoDB streams and subscribe to the events published either using a Lambda or another service to transformed them into domain-specific events. My problem with this is that I wouldn't be able to send the messages from the domain logic, the logic would be spread across the service processing the message and the Lambda/service reacting over changes and the solution would be platform-specific.
Is there any other way to handle this?
I can't say a specific solution based on DynamoDB since I've not used this engine ever. But I've built an event driven system on top of MongoDB so I can share my learnings you might find useful for your case.
You can have different approaches:
1) Based on an event sourcing approach you can just save the events/messages your use case produce within a transaction. In Mongo when you are just inserting/appending new items to the same collection you can ensure atomicity. Anyway, if the engine does not provide that capability the query operation is so centralized that you are reducing the possibility of an error at minimum.
Once all the events are stored, you can then consume them and project them to a given state and then persist the updated state in another transaction.
Here you have to deal with eventual consistency as data will be stale in your read model until you have projected the events.
2) Another approach is applying the UnitOfWork pattern where you cache all the query operations (insert/update/delete) to save both events and the state. Once your use case finishes, you execute all the cached queries against the database (flush). This way although the operations are not atomic you are again centralizing them quite enough to minimize errors.
Of course the best is to use an ACID database if you require that capability and any other approach will be a workaround to get close to it.
About publishing the events I don't know if you mean they are published to a messaging transportation mechanism such as rabbitmq, Kafka, etc. But that must be a background process where you fetch the events from the DB and publishes them in order to break the 2 phase commit within the same transaction.

Transaction handling Rabbit MQ and Spring AMQP

I am trying to understand few things here. My requirement was I want to store record in db and want to send message to queue and then lets say in same method if it throws some exception I don't want to send message and don't want to commit db transaction.
Now I thought of using spring transaction but since two different resources, thought of using JTA using some atomikos to synchronize resources - but again I read RMQ do not support 2PC or XA etc.
Anyway I went ahead and tried first without adding atomikos, all I did is made sure that my channel transacted and #Transaction annotation took care, see below sample code - I didn't added anything special in pom.
Now my question is how is this working, how is this different from 2PC - and what can go wrong with approach and what situations can mess up eventual consistency using this method. And surprisingly why I didn't have to use third party jta. If all is good with this - this seems to me eventual consistency guarantee when we use rmq and db using spring goodies! for microservices :)
If this is not good solution what are alternatives - I would like to avoid worker process etc if possible for eventual consistency.
#Bean
public RabbitTemplate rabbitTemplate(ConnectionFactory connectionFactory) {
RabbitTemplate rabbitTemplate = new RabbitTemplate(connectionFactory);
rabbitTemplate.setChannelTransacted(true);
return rabbitTemplate;
}
#GetMapping
#Transactional
public void sampleEndpoint(#RequestParam boolean throwException){
Customer a=new Customer();
a.setCustomerName("XYZ");
customerRepository.save(a);
rabbitTemplate.convertAndSend("txtest","Test");
if(throwException)
throw new RuntimeException();
}
I used postgres dependency for above example using spring boot 1.5.7
I suggest you read Dave Syer's article: Distributed transactions in Spring, with and without XA.
You need to start the Rabbit transaction before the database transaction so the rabbit transaction is synchronized with the DB transaction and commits very soon after the DB tx and rolls back if the DB tx rolls back.
There is a small possibility that the DB tx commits successfully but the Rabbit tx rolls back. This is called "Best Effort 1PC" in the article. You need to deal with the small possibility of duplicate messages.
You don't show all your configuration but it appears your Rabbit tx will commit before the DB, which is probably not what you want.
Regarding the "How is it working" question, this quote from spring-amqp documentation clarifies:
If there is already a transaction in progress when the framework is sending or receiving a message, and the channelTransacted flag is true, then the commit or rollback of the messaging transaction will be deferred until the end of the current transaction. If the channelTransacted flag is false, then no transaction semantics apply to the messaging operation (it is auto-acked).
My understanding is that, for your use case, you do not even need to configure a ChainedTransactionManager in order to implement Best Effort 1PC. #Transactional will be enough and the rabbit tx will commit right after the DB tx.

Detect a new record was added to cassandra table

I have a requirement: when a new comment is posted, i want to get all previous comment's owner id and send a notification.
Problem here is how will i know that a new comment was added to cassandra table. What will the solution for this kind of requirement ?
If you want to use only cassandra, without changes, it's impossible.
With changes, you have three options:
You can use cassandra as embedded service in java. Here is a simple and short how to: http://prettyprint.me/prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/index.html
You can create a wrapper for your cassandra connection. An Application which handles the Cassandra Connection and is available via API for your other application.
Cassandra has a trigger functionality. (Never used it and never heard that someone is using it)
I prefer the second solution. Here are the reasons why:
It's simpler to create.
You can handler all your views in this application.
You can validate the input, resolve relations, logging data etc.
You can simply push the new added comment to kafka or another message queue.
This could be a setup:
Create a new comment -> call a backend api -> call the cassandra database interface -> push a new message to kafka -> send the data to all kafka consumer

Service Fabric actors that receive events from other actors

I'm trying to model a news post that contains information about the user that posted it. I believe the best way is to send user summary information along with the message to create a news post, but I'm a little confused how to update that summary information if the underlying user information changes. Right now I have the following NewsPostActor and UserActor
public interface INewsPostActor : IActor
{
Task SetInfoAndCommitAsync(NewsPostSummary summary, UserSummary postedBy);
Task AddCommentAsync(string content, UserSummary, postedBy);
}
public interface IUserActor : IActor, IActorEventPublisher<IUserActorEvents>
{
Task UpdateAsync(UserSummary summary);
}
public interface IUserActorEvents : IActorEvents
{
void UserInfoChanged();
}
Where I'm getting stuck is how to have the INewsPostActor implementation subscribe to events published by IUserActor. I've seen the SubscribeAsync method in the sample code at https://github.com/Azure/servicefabric-samples/blob/master/samples/Actors/VS2015/VoiceMailBoxAdvanced/VoicemailBoxAdvanced.Client/Program.cs#L45 but is it appropriate to use this inside the NewsPostActor implementation? Will that keep an actor alive for any reason?
Additionally, I have the ability to add comments to news posts, so should the NewsPostActor also keep a subscription to each IUserActor for each unique user who comments?
Events may not be what you want to be using for this. From the documentation on events (https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-reliable-actors-events/)
Actor events provide a way to send best effort notifications from the
Actor to the clients. Actor events are designed for Actor-Client
communication and should NOT be used for Actor-to-Actor communication.
Worth considering notifying the relevant actors directly or have an actor/service that will manage this communication.
Service Fabric Actors do not yet support a Publish/Subscribe architecture. (see Azure Feedback topic for current status.)
As already answered by charisk, Actor-Events are also not the way to go because they do not have any delivery guarantees.
This means, the UserActor has to initiate a request when a name changes. I can think of multiple options:
From within IUserAccount.ChangeNameAsync() you can send requests directly to all NewsPostActors (assuming the UserAccount holds a list of his posts). However, this would introduce additional latency since the client has to wait until all posts have been updated.
You can send the requests asynchronously. An easy way to do this would be to set a "NameChanged"-property on your Actor state to true within ChangeNameAsync() and have a Timer that regularly checks this property. If it is true, it sends requests to all NewsPostActors and sets the property to false afterwards. This would be an improvement to the previous version, however it still implies a very strong connection between UserAccounts and NewsPosts.
A more scalable solution would be to introduce the "Message Router"-pattern. You can read more about this pattern in Vaughn Vernon's excellent book "Reactive Messaging Patterns with the Actor Model". This way you can basically setup your own Pub/Sub model by sending a "NameChanged"-Message to your Router. NewsPostActors can - depending on your scalability needs - subscribe to that message either directly or through some indirection (maybe a NewsPostCoordinator). And also depending on your scalability needs, the router can forward the messages either directly or asynchronously (by storing it in a queue first).

Resources