Detect a new record was added to cassandra table - cassandra

I have a requirement: when a new comment is posted, i want to get all previous comment's owner id and send a notification.
Problem here is how will i know that a new comment was added to cassandra table. What will the solution for this kind of requirement ?

If you want to use only cassandra, without changes, it's impossible.
With changes, you have three options:
You can use cassandra as embedded service in java. Here is a simple and short how to: http://prettyprint.me/prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/index.html
You can create a wrapper for your cassandra connection. An Application which handles the Cassandra Connection and is available via API for your other application.
Cassandra has a trigger functionality. (Never used it and never heard that someone is using it)
I prefer the second solution. Here are the reasons why:
It's simpler to create.
You can handler all your views in this application.
You can validate the input, resolve relations, logging data etc.
You can simply push the new added comment to kafka or another message queue.
This could be a setup:
Create a new comment -> call a backend api -> call the cassandra database interface -> push a new message to kafka -> send the data to all kafka consumer

Related

mongo DB change strem with load balancing in production env

mongo change stream with load balancing
Can any one help as how we can achieve mango change stream with load balanceing server..?
we are working on micro service architecture facing issue on production when i go with load balanceing the same code is deployed over the 4 server when i perform any operation on single server the change stream trigger is fired form all the 4 server.
So what i want it should be trigger from same server where the operation are performed
Thanks in advance
Change stream is a database-level concept. Data is inserted/updated/deleted from the database, this produces change events. Any number of subscribers can subscribe to the change events and do whatever they want to do with the changes. Each subscriber is notified of every change event.
A change stream is not meant to inform the application that originated the change of the said change. This is redundant - the application already knows what it did.
Consider rephrasing your question to explain what you are trying to accomplish better.

How to control idempotency of messages in an event-driven architecture?

I'm working on a project where DynamoDB is being used as database and every use case of the application is triggered by a message published after an item has been created/updated in DB. Currently the code follows this approach:
repository.save(entity);
messagePublisher.publish(event);
Udi Dahan has a video called Reliable Messaging Without Distributed Transactions where he talks about a solution to situations where a system can fail right after saving to DB but before publishing the message as messages are not part of a transaction. But in his solution I think he assumes using a SQL database as the process involves saving, as part of the transaction, the correlationId of the message being processed, the entity modification and the messages that are to be published. Using a NoSQL DB I cannot think of a clean way to store the information about the messages.
A solution would be using DynamoDB streams and subscribe to the events published either using a Lambda or another service to transformed them into domain-specific events. My problem with this is that I wouldn't be able to send the messages from the domain logic, the logic would be spread across the service processing the message and the Lambda/service reacting over changes and the solution would be platform-specific.
Is there any other way to handle this?
I can't say a specific solution based on DynamoDB since I've not used this engine ever. But I've built an event driven system on top of MongoDB so I can share my learnings you might find useful for your case.
You can have different approaches:
1) Based on an event sourcing approach you can just save the events/messages your use case produce within a transaction. In Mongo when you are just inserting/appending new items to the same collection you can ensure atomicity. Anyway, if the engine does not provide that capability the query operation is so centralized that you are reducing the possibility of an error at minimum.
Once all the events are stored, you can then consume them and project them to a given state and then persist the updated state in another transaction.
Here you have to deal with eventual consistency as data will be stale in your read model until you have projected the events.
2) Another approach is applying the UnitOfWork pattern where you cache all the query operations (insert/update/delete) to save both events and the state. Once your use case finishes, you execute all the cached queries against the database (flush). This way although the operations are not atomic you are again centralizing them quite enough to minimize errors.
Of course the best is to use an ACID database if you require that capability and any other approach will be a workaround to get close to it.
About publishing the events I don't know if you mean they are published to a messaging transportation mechanism such as rabbitmq, Kafka, etc. But that must be a background process where you fetch the events from the DB and publishes them in order to break the 2 phase commit within the same transaction.

Transaction Synchronization in Spring Kafka

I want to synchronize a kafka transaction with a repository transaction:
#Transactional
public void syncTransaction(){
myRepository.save(someObject)
kafkaTemplate.send(someEvent)
}
Since the merge (https://github.com/spring-projects/spring-kafka/issues/373) and according to the doc this is possible. Nevertheless i have problems to understand and implement that feature.
Looking at the example in https://docs.spring.io/spring-kafka/reference/html/#transaction-synchronization I have to create a MessageListenerContainer to listen to my own events.
Do I still have to send my events using the KafkaTemplate?
Does the MessageListenerContainer prohibit the sending to the broker?
And if i understand correctly the kafkaTemplate und the kafkaTransactionManager have to use the same producerFactory in which i have to enable Transaction setting a transactionIdPrefix. And in my example i have to set the TransactionManager of the messageListenerContainer to the DataSourceTransactionManager. Is that correct?
From my perspective it looks weird that I send an event via kafkaTemplate, listen to my own event and forward the event using the kafkaTemplate again.
I would really help me if i can get an example for a simple synchronization of a kafka transaction with a repository transaction and an explanation.
If the listener container is provisioned with a KafkaTransactionManager, the container will create a producer which will be used by any downstream kafka template and the container will send the offsets to the transaction for you.
If the container has some other transaction manager, the container can't send the offsets since it doesn't have access to the producer (or template).
Another solution is to annotate your method with #Transactional (with the datasource TM) and configure the container with a kafka TM.
That way, your DB tx will commit just before the thread returns to the container which will then send the offsets to the kafka transaction and commit it.
See the framework test cases for examples.
#Eike Behrends to have a db + kafka transaction, you can use ChainedTransactionManager and define it this way :
#Bean
public KafkaTransactionManager kafkaTransactionManager() {
KafkaTransactionManager ktm = new KafkaTransactionManager(producerFactory());;
ktm.setTransactionSynchronization(AbstractPlatformTransactionManager.SYNCHRONIZATION_ON_ACTUAL_TRANSACTION);
return ktm;
}
#Bean
#Primary
public JpaTransactionManager transactionManager(EntityManagerFactory em) {
return new JpaTransactionManager(em);
}
#Bean(name = "chainedTransactionManager")
public ChainedTransactionManager chainedTransactionManager(JpaTransactionManager jpaTransactionManager,
KafkaTransactionManager kafkaTransactionManager) {
return new ChainedTransactionManager(kafkaTransactionManager, jpaTransactionManager);
}
You need to annotate your transactional db+kafka methods #Transactional("chainedTransactionManager")
(you can see the issue on spring-kafka project : https://github.com/spring-projects/spring-kafka/issues/433 )
You say :
From my perspective it looks weird that I send an event via
kafkaTemplate, listen to my own event and forward the event using the
kafkaTemplate again.
Have you tried this ? If so can you provide an example please ?
For achieving your target you should use a different "eventually consistent" approach like CDC (Change Data Capture). There are no atomic transactions between Kafka writes and any other system (e.g. a database) - aka XA transactions. It is a complete paradigm swift when you have distributed services (some call them microservices) that in your case probably communicate by producing/ consuming to/ from Kafka topics.
TL;DR: just use upsert / merge.
Accidentally seen this old topic and after so many years people still struggle.
Just want to share simplest and most native approach to deal with such systems as kafka.
The real issue why people come here for an answer is old approach of distributed transactions. And most ones want to synchronize non-transactional (kafka named it's functionality as transactions but they are "special" actually) kafka with some ACID database.
If your service is working within idempotent environment - everything downstream should be idempotent too.
Just make sure your operations to underlying storage are idempontent, the simplest approach are upsert / merge (depends on the storage).
P.s. CDC is a thing, but it requires much more labor cost and is unnecessary in most typical cases.
MORE :
If you want to dig about why kafka "transactions" are special, here are good starting points (explained within eos):
for newer versions: https://www.youtube.com/watch?v=j0l_zUhQaTc
for older: https://www.youtube.com/watch?v=zm5A7z95pdE
EDIT
Very interesting why this answer got downvotes... Just check this issue/comments/related issues https://github.com/spring-projects/spring-data-commons/issues/2232 - thats why one would not want to use ChainedTransactionManager for business-critical Transactions (it can't act as a real 2PC by design).

Spring Integration Feed Inbound Channel Adapter duplicate entries

I am using Spring Integration to consume RSS feeds using its inbound channel adapter and writing the feeds to a database table.
To prevent duplicate entries when the process is stopped/started, I have enabled the PropertiesPersistingMetadataStore. As a secondary measure, on the database table, I also have a unique constraint across the feed id/feed entry link columns.
This seems to be working fine but I have noticed on some restarts (not all the time) that I am getting some DB exception errors where it is trying to insert the same RSS feed item again.
Under what conditions would I being getting these duplicate errors and is there anyway I can get round them?
The PropertiesPersistingMetadataStore only persists its state on a normal application shutdown (when the bean is destroy()ed by the application context).
However, it implements Flushable so you can either call flush() on it in your flow after persisting.
You could use transaction synchronization to flush the store after the db transaction commits with the after commit expression #metadataStore.flush().
Or, you could use a more robust persistent store, such as Redis, which persists on each update.

Custom Logging mechanism: Master Operation with n-Operation Details or Child operations

I'm trying to implement logging mechanism in a Service-Workflow-hybrid application. The requirements for logging is that instead for independent log action, each log must be considered as a detail operation and placed against a parent/master operation. So, it's a parent-child and goes to database table(s). This is the primary reason, NLog failed.
To help understand better, I'm diving in a generic detail. This is how the application flow goes:
Now, the Main entry point of the application (normally called Program.cs) is Platform. It initializes an engine that is capable of listening incoming calls from ISDN lines, VoIP, or web services. The interface is generic, so any call that reaches the Platform triggers OnConnecting(). OnConnecting() is a thread-safe event and can be triggered as many times as system requires.
Within OnConnecting(), a new instance of our custom Workflow manager is launched and the context is a custom object called ProcessingInfo:
new WorkflowManager<ZeProcessingInfo>();
Where, ZeProcessingInfo:
var ZeProcessingInfo = new ProcessingInfo(this, new LogMaster());
As you can see, the ProcessingInfo is composed of Platform itself and a new instance of LogMaster. LogMaster is defined in an independent assembly.
Now this LogMaster is available throughout the WorkflowManager, all the Workflows it launches, all the activities within any running Workflow, and passed on to external code called from within any Activity. Now, when a new LogMaster is initialized, a Master Operation entry is created in the database and this LogMaster object now lives until this call is ended after a series of very serious roller coaster rides through different workflows. Upon every call of OnConnecting(), a new Master Operation is created and maintained.
The LogMaster allows for calling a AddDetail() method that adds new child detail under the internally stored Master Operation (distinguished through a Guid Primary Key). The LogMaster is built upon Entity Framework.
And, I'm able to log under the same Master Operation as many times as I require. But the application requirements are changing and there is a need to log from other assemblies now. There is a Platform Server assembly witch is a Windows Service that acts as a server listening to web service based calls and once a client calls a method, OnConnecting in Platform is triggered.
I need a mechanism to somehow retrieve the related LogMaster object so that I can add detail to the same Master Operation. But Platform Server is the once triggering the OnConnecting() on the Platform and thus, instantiating LogMaster. This creates a redundancy loop.
Also, failure scenarios are being considered as well. If LogMaster fails, need to revert to Event Logging from Database Logging. If Event Logging is failed (or not allowed through unified configuration), need to revert to file-based (XML) logging.
I hope I have given a rough idea. I don't expect code but I need some strategy for a very seamless plug-able configurable logging mechanism that supports Master-Child operations.
Thanks for reading. Any help would be much appreciated.
I've read this question a number of times and it was pretty hard to figure out what was going on. I don't think your diagram helps at all. If your question is about trying to retrieve the master log record when writing child log records then I would forget about trying to create normalised data in the log tables. You will just slow down the transactional system in trying to do so. You want the log/audit records to write as fast as possible and you can later aggregate them when you want to read them.
Create a de-normalised table for the logs entries and use a single Guid in that table to track the session/parent log master. Yes this will be a big table but it will write fast.
As for guaranteed delivery of log messages to a destination, I would try not to create multiple destinations as combining them later will be a nightmare but rather use something like MSMQ to emit the audit logs as fast as possible and have another service pick them up and process them in a guaranteed delivery manner. ETW (Event Logging) is not guaranteed under load and you will not know that it has failed.

Resources