I am just looking for a way to track changes in a table of Cassandra. I don't want to use a trigger. If any changes made I will immediately update my data source.
Do you have any idea how to implement this feature using Java?
Also is it possible to create a plugin for Cassandra? I did not find any good resource to create a plugin for Cassandra.
Thanks.
I believe that what you are looking for is Change Data Capture (CDC)
You can read more on CDC in Apache Cassandra
Related
Let me expose the architecture of my system before diving into the heart of the problem.
I Have stream of data that comes from Kafka and my company uses a distributed cache (hazelcast precisely) that make data ready to be requested through web services that we expose. We also want to persist the data in the cache to cassandra so it would be durable. I have two solutions on how to put the data to hazelcast and I would like to have your suggestions (maybe another way of doing) and tell me in your view what's the best solution and why?
1/ use a kafka-hazelcast connector to send data directly from kafka to hazelcast and then persist the data to cassadandra using write-behind and mapstores ==> there two main drawbacks with this solution, first we to serialize/deserialize each time we store data to cassandra (important usage of CPU) and second we put all the data to the cache even not needed by users (we have lots of evictions hapenning)
2/ Use a kafka-cassandra connector and write data directly to cassandra and then find a means (how complex you think this part could be ?) to notify hazelcast to update/evict the data if it's already in the cache ==> the pros of this solution is that we get rid of the serilizatino/deserialization needed by the mapstores and we load only the data that was queried before and the key is already in the cache
Which one of the two solutions do you prefer and why ?
what's the best means to notify hazelcast in the second solution in you point of view ?
Thank you in advance for your suggestions/answers
I hope i was concise and clear !
as the title suggests I need to insert 50,000+ records into Azure CosmosDb running SQL mode and read them later on again programatically with Node. C# has the BulkExecutor, but what's the best and fastest way in Node?
In the short term, you could implement some of the same logic in the Bulk Executor library in Node.js. Biggest things are having to understand our underlying physical partitioning layer, which isn't easy.
In the mid-to-long term, we'll be adding bulk operation support to Node.js/Python. For now, you might save yourself some time making due with Java/.NET library until then.
I wonder if it is possible to add a listener to Cassandra getting the table and the primary key for changed entries? It would be great to have such a mechanism.
Checking Cassandra documentation I only find adding StateListener(s) to the Cluster instance.
Does anyone know how to do this without hacking Cassandras data store or encapsulate the driver and do something on my own?
Check out this future jira --
https://issues.apache.org/jira/browse/CASSANDRA-8844
If you like it vote for it : )
CDC
"In databases, change data capture (CDC) is a set of software design
patterns used to determine (and track) the data that has changed so
that action can be taken using the changed data. Also, Change data
capture (CDC) is an approach to data integration that is based on the
identification, capture and delivery of the changes made to enterprise
data sources."
-Wikipedia
As Cassandra is increasingly being used as the Source of Record (SoR)
for mission critical data in large enterprises, it is increasingly
being called upon to act as the central hub of traffic and data flow
to other systems. In order to try to address the general need, we,
propose implementing a simple data logging mechanism to enable
per-table CDC patterns.
If clients need to know about changes, the world has mostly gone to the message broker model-- a middleman which connects producers and consumers of arbitrary data. You can read about Kafka, RabbitMQ, and NATS here. There is an older DZone article here. In your case, the client writing to the database would also send out a change message. What's nice about this model is you can then pull whatever you need from the database.
Kafka is interesting because it can also store data. In some cases, you might be able to dispose of the database altogether.
Are you looking for something like triggers?
https://github.com/apache/cassandra/tree/trunk/examples/triggers
A database trigger is procedural code that is automatically executed
in response to certain events on a particular table or view in a
database. The trigger is mostly used for maintaining the integrity of
the information on the database. For example, when a new record
(representing a new worker) is added to the employees table, new
records should also be created in the tables of the taxes, vacations
and salaries.
From the beginning of an application, you plan ahead and denormalize data at write-time for faster queries at read-time. Using Cassandra "BATCH" commands, you can ensure atomic updates across multiple tables.
But, what about when you add a new feature, and need a new denormalized table? Do you need to run a temporary script to populate this new table with data? Is this how people normally do it? Is there a feature in Cassandra that will do this for me?
I can't comment yet hence the new answer. The answer is yes, you'd have to write a migration script and run that when you deploy your software upgrade with the new feature. That's fairly a typical devops release process from my experience.
I've not seen anything like Code First Migrations (for MS SQL Server & Entity Framework) for Cassandra, which does the migration script automatically for you.
I would greatly appreciate if someone could share if it is possible to do a near real time oracle database sync application using spring integration. Its a lightweight requirement where only certain data fields across couple of tables to be copied over as soon as they change in source database. Any thoughts around what architecture can be used would greatly help. Also if any Oracle utility that can be leveraged along with SI?
I'd say that the Oracle Trigger is for you. When the main data is changed you should use a trigger to move those changes to another table at the same DB.
From SI you should use <int-jdbc:inbound-channel-adapter> to read and remove data from that sync table. Within the same transaction you have to use <int-jdbc:outboud-channel-adapter> to move the data to another DB.
The main feature here should be a XA transaction, because you use two DBs and what is good they both are Oracle.
Of course you can try to use the 1PC effort, but there will be need more work to do.