I have 2 repositories and I want a piece of information to be added in both of them. If the 2nd one fails then I want to roll back the operation in the first repository. Is there a way to do operations atomically in Cassandra in code, not cql?
I suppose I could try to delete the entry in the first repository if the operation in 2nd repository fails but I'll prefer a way in which Cassandra does it so that I don't have to maintain the logic in my application
Related
I'm supplying client drivers to a database I am maintaining. The DB has lots of tables with well defined schemas. (Cassandra in this case)
From time to time there will be some breaking changes (stemming from product and system requirements) and the clients will "break" in the sense that the queries they were performing until now will not be correct in regards to the newer schemas.
I'm curious to know if there is a good clean way to "version" the clients to work with the corresponding tables?
For instance a naive implementation could add the version number to the table name, i.e. for every table in the db , append a version number to the table name.
The clients would always query tables that match this naming convention. Newer breaking versions would change the table name to match the newer version and clients would be upgraded accordingly.
Is there a better way to handle this?
It's also possible to add 1 version for you DB and 1 version that is stored on your client, when a breaking change is made you update the database version.
When the client starts a version check is performed and if the version missmatches an auto upgrade can be done.
I came across the same problem few months ago. We have to load the schema according to the Version in which our client should support. The solution we found is as follows:
Along with the schema, one more table will be created which contains the following fields ---> version_no, ks_name, table_name, column_name, add/drop, is_loaded, primary key(version_no,(ks_name, table_name, column_name)). Note:if you have single keyspace, you can remove that column or table name can be itself written as ks_name.table_name.
Then, whenever we want to load a new version, we will log the changes in that table and when we load the previous schema again, the script will make sure that the old alterations are effected such that it will roll back to the same previous version of schema. Make sure that you update the is_loaded field as it is the only way to differentiate if a schema is half loaded or script failed such that it will not rise further more errors. Hope it helps!!
I need to continuosly monitor the cassandra database table for 2 column values and if column1==true and column2== true, I need to perform some operation.
Will Cassandra-driver library has the ability to do the above? or any other library to perform the above?
As I know, the cassandra driver doesn't have such ability.
But Cassandra has triggers feature. Triggers code should be placed to Cassandra nodes and should be one of JVM languages:
the trigger can be written in any Java (JVM) language and exists outside the database. You place the trigger code in a lib/triggers subdirectory of the Cassandra installation directory, it loads during cluster startup, and exists on every node that participates in a cluster. The trigger defined on a table fires before a requested DML statement occurs, which ensures the atomicity of the transaction.
I am interested in how the Cassandra production DBA's processes change when using Cassandra and performing many releases over a year. During the releases, columns in tables would change frequently and so would the number of Cassandra tables, as new features and queries are supported.
In the relational DB, in production, you create the 'view' and BOOM you get the data already there - loaded from the view's query.
With Cassandra, does the DBA have to create a new Cassandra table AND have to write/run a script to copy all the required data into that table? Can a production level Cassandra DBA provide some pointers on their processes?
We run a small shop, so I can tell you how I manage table/keyspace changes, and that may differ from how others get it done. First, I keep a text .cql file in our (private) Git repository that has all of our tables and keyspaces in their current formats. When changes are made, I update that file. This lets other developers know what the current tables look like, without having to use SSH or DevCenter. This also has the added advantage of giving us a file that allows us to restore our schema with a single command.
If it's a small change (like adding a new column) I'll try to get that out there just prior to deploying our application code. If it's a new table, I may create that earlier, as a new table without code to use it really doesn't hurt anything.
However, if it is a significant change...such as updating/removing an existing column or changing a key...I will create it as a new table. That way, we can deploy our code to use the new table(s), and nobody ever knows that we switched something behind the scenes. Obviously, if the table needs to have data in it, I'll have export/import scripts ready ahead of time and run those right after we deploy.
Larger corporations with enterprise deployments use tools like Chef to manage their schema deployments. When you have a large number of nodes or clusters, an automated deployment tool is really the best way to go.
I am using hazelcast 3.2.2 community edition.
I am performing various tests with hazelcast. I have two separate VMs which are running two hazelcast instances as linux service forming a single cluster. I will refer them as HAZ-A and HAZ-B in this context.
Here is the test flow (link means Physical link in this context):
1) HAZ-A is up, HAZ-B is up.
2) Link down of HAZ-A, HAZ-B link is up.
Perform some operations say change password of a user, so HAZ-B will have two versions of user object (One will be backup of HAZ-A say version 1, another will be updated copy say version 2).
3) Link down of HAZ-B, HAZ-A link is already down. Hence links of both HAZ-A and HAZ-B are down.
4) Restore link of HAZ-A. Link is already down of HAZ-B.
Perform some operations say change password of a user, at this time I am getting stale data, since HAZ-A did not get a single chance to sync with HAZ-B.
So the point here is:
Can we implement/inject any kind of listener which will detect
interface up/down or link up/down and upon detection we can simply
re-sync data from db ?
From the documentation it seems both HAZ-A and HAZ-B will load the values from the DB and when they eventually see each other, they will merge
From Chapter 18
If a MapStore was in use, those lost partitions would be reloaded from some database, making each mini-cluster complete. Each mini-cluster will then recreate the missing primary partitions and continue to store data in them, including backups on the other nodes.
The attributes for the <jdbc:inbound-channel-adapter> component in Spring Integration include data-source, sql and update. These allow for separate SELECT and UPDATE statements to be run against tables in the specified database. Both sql statements will be part of the same transaction.
The limitation here is that both the SELECT and UPDATE will be performed against the same data source. Is there a workaround for the case when the the UPDATE will be on a table in a different data source (not just separate databases on the same server)?
Our specific requirement is to select rows in a table which have a timestamp prior to a specific time. That time is stored in a table in a separate data source. (It could also be stored in a file). If both sql statements used the same database, the <jdbc:inbound-channel-adapter> would work well for us out of the box. In that case, the SELECT could use the time stored, say, in table A as part of the WHERE clause in the query run against table B. The time in table A would then be updated to the current time, and all this would be part of one transaction.
One idea I had was, within the sql and update attributes of the adapter, to use SpEL to call methods in a bean. The method defined for sql would look up a time stored in a file, and then return the full SELECT statement. The method defined for update would update the time in the same file and return an empty string. However, I don't think such an approach is failsafe, because the reading and writing of the file would not be part of the same transaction that the data source is using.
If, however, the update was guaranteed to only fire upon commit of the data source transaction, that would work for us. If the event of a failure, the database transaction would commit, but the file would not be updated. We would then get duplicate rows, but should be able to handle that. The issue would be if the file was updated and the database transaction failed. That would mean lost messages, which we could not handle.
If anyone has any insights as to how to approach this scenario it is greatly appreciated.
Use two different channel adapters with a pub-sub channel, or an outbound gateway followed by an outbound channel adapter.
If necessary, start the transaction(s) upstream of both; if you want true atomicity you would need to use an XA transaction manager and XA datasources. Or, you can get close by synchronizing the two transactions so they get committed very close together.
See Dave Syer's article "Distributed transactions in Spring, with and without XA" and specifically the section on Best Efforts 1PC.