What is the solution of multi table ACID transaction in cassandra - cassandra

I was following this link to use a batch transaction without using BATCH keyword.
Cluster cluster = Cluster.builder()
.addContactPoint(“127.0.0.1")
.build();
Session session = cluster.newSession();
//Save off the prepared statement you're going to use
PreparedStatement statement = session.prepare(“INSERT INTO tester.users (userID, firstName, lastName) VALUES (?,?,?)”);
//
List<ResultSetFuture> futures = new ArrayList<ResultSetFuture>();
for (int i = 0; i < 1000; i++) {
//please bind with whatever actually useful data you're importing
BoundStatement bind = statement.bind(i, “John”, “Tester”);
ResultSetFuture resultSetFuture = session.executeAsync(bind);
futures.add(resultSetFuture);
}
//not returning anything useful but makes sure everything has completed before you exit the thread.
for(ResultSetFuture future: futures){
future.getUninterruptibly();
}
cluster.close();
My question is with the given approach is it possible to INSERT, UPDATE or DELETE data from different table and if any of those fail all should be failed by maintaining the same performance (as described in the link).
With this approach what i tried, i was trying to insert, delete data from different table and one query got failed so all previous query was executed and updated the db.
With BATCH I can see that if any statement get failed all statement will be failed. But using BATCH on different table is anti-pattern so what is the solution ?

With BATCH I can see that if any statement get failed all statement will be failed.
Wrong, the guarantee of LOGGED BATCH is: if some statements in the batch fail, they will be retried until the succeed.
But using BATCH on different table is anti-pattern so what is the solution ?
ACID transaction is not possible with Cassandra, it would require some sort of global lock or global coordination and be prohibitive performance-wise.
However, if you don't care about the performance cost, you can implement your self a global lock/lease system using Light Weight Transaction primitives as described here
But be ready to face poor performance

Related

will cassandra fail on two parallel create keyspace commands executed simultanously

We have experienced, that if we rollout DDL cql scripts, that will alter an existing table in parallel, that there is a substantial chance to corrupt the keyspace to the point that we needed to recreate it.
We have now serialized this process, including the creation of that keyspace. Now there is a flaming discussion, if cassandra explicitely supports the creation of different keyspaces in parallel.
I suppose, that this is ok, but since the cluster is large, we would like to have a second opinion, so I am asking here:
Can we safely assume, that parallel creation of different keyspaces is safe in cassandra?
In current versions of the Cassandra it's not possible - you need to wait for schema agreement after each DDL statement, including creation of other keyspaces. Usually drivers are waiting for some time (default 10 seconds) to get confirmation that all nodes in cluster have the same schema version. Depending on the driver, you can explicitly check for schema agreement - either in the result set returned after execution of statement, or via cluster metadata. For example, in Java it could look as following:
Metadata metadata = cluster.getMetadata();
for (int i = 0; i < commands.length; i++) {
System.out.println("Executing '" + commands[i] + "'");
ResultSet rs = session.execute(commands[i]);
if (!rs.getExecutionInfo().isSchemaInAgreement()) {
while (!metadata.checkSchemaAgreement()) {
System.out.println("Schema isn't in agreement, sleep 1 second...");
Thread.sleep(1000);
}
}
}
New versions of Cassandra will have improvements in this area, for example, via CASSANDRA-13426 (committed into 4.0), and CASSANDRA-10699 (not yet done)

Cassandra. Not enough replica available - Java driver behaviour different from CQL console

I have a very simple cluster with 2 nodes.
I have created a keyspace with SimpleStrategy replication and a replication factor of 2.
For reads and writes I always use the default data consistency level of ONE.
If I take down one of the two nodes, by using the datastax java driver, I can still read data but when I try to write I get "Not enough replica available for query at consistency ONE (1 required but only 0 alive)".
Strangely if I execute the exactly same insert statement by using the CQL console it works without any problem. Even when using the CQL console the data consistency level was 1.
Am I missing something?
TIA
Update
I have done some more tests and the problem appears only when I use the BatchStatement. If I execute the prepared statement directly it works. Any idea ?
Here the code
Cluster cluster = Cluster.builder()
.addContactPoint("192.168.1.10")
.addContactPoint("192.168.1.12")
.build();
Session session = cluster.connect();
session.execute("use giotest");
BatchStatement batch = new BatchStatement();
PreparedStatement statement = session.prepare("INSERT INTO hourly(series_id, timestamp, value) VALUES (?, ?, ?)");
for (int i = 0; i < 50; i++) {
batch.add(statement.bind(new Long(i), new Date(), 2345.5));
}
session.execute(batch);
batch.clear();
session.close();
cluster.close();
Batches are atomic by default: if the coordinator fails mid-batch, Cassandra will make sure other nodes replay the remaining requests. It uses a distributed batch log for that (see this post for more details).
This batch log must be replicated to at least one replica other than the coordinator, otherwise that would defeat the above mechanism.
In your case, there is no other replica, only the coordinator. So Cassandra is telling you that it cannot provide the guarantees of an atomic batch. See also the discussion on CASSANDRA-7870.
If you haven't already, make sure you have specified both hosts at the driver level.

Azure Table Storage batch insert with potentially pre-existing rowkeys

I'm trying to send a simple batch of Insert operations to Azure Table Storage but it seems that the whole batch transaction is invalidated and, using the managed azure storage client, the ExecuteBatch method itself throws an Exception if there is a single Insert in the batch to a pre-existing record. (using 2.0 client):
public class SampleEntity : TableEntity
{
public SampleEntity(string partKey, string rowKey)
{
this.PartitionKey = partKey;
this.RowKey = rowKey;
}
}
var acct = CloudStorageAccount.DevelopmentStorageAccount;
var client = acct.CreateCloudTableClient();
var table = client.GetTableReference("SampleEntities");
var foo = new SampleEntity("partition1", "preexistingKey");
var bar = new SampleEntity("partition1", "newKey");
var batchOp = new TableBatchOperation();
batchOp.Add(TableOperation.Insert(foo));
batchOp.Add(TableOperation.Insert(bar));
var result = table.ExecuteBatch(batchOp); // throws exception: "0:The specified entity already exists."
The batch-level exception is avoided by using InsertOrMerge but then every individual operation response returns a 204, whether or not that particular operation inserted or merged it. So it seems its impossible for the client application to retain knowledge of whether it, or another node in the cluster, inserted the record. Unforunately, in my current case, this knowledge is necessary for some downstream synchronization.
Is there some configuration or technique to allow the batch of inserts to proceed and return the particular response code per-item without throwing a blanket exception?
As you already know, since batch is a transaction operation you get an all-or-none kind of a deal. One thing interesting with batch transactions is that you get an index of first failed entity in the batch. So assuming you're trying to insert 100 entities in a batch and 50th entity is already present in the table, the batch operation will give you the index of failed entity (49 in this case).
Is there some configuration or technique to allow the batch of inserts
to proceed and return the particular response code per-item without
throwing a blanket exception?
I don't think so. The transaction would fail as soon as the first entity fails. It will not even attempt to process other entities.
Possible Solutions (Just thinking out loud :))
If I understand correctly, your key requirement is to identify if an entity was inserted or merged (or replaced). For this the approach would be to separate out failed entities from a batch and process them separately. Based on this, I can think of two approaches:
What you could possibly do in this case is split that batch into 3
batches: 1st batch will contain 49 entities, 2nd batch will contain
just 1 entity (which failed) and the 3rd batch will contain 50
entities. You could now insert all entities in the 1st batch, decide
what you want to do with that failed entity and try to insert the
3rd batch. You would need to repeat the process over and over again
till the time this operation is complete.
Another idea would be to remove the failed entity from the batch and
retry that batch. So in the example above, in your 1st attempt
you'll try with 100 entities, in your 2nd attempt you'll try with 99
entities and so on and so forth keeping track of failed entities all
the while (with the reason as to why they failed). Once the batch
operation is successfully completed, you can work with all the
failed entities.

JDBC programms running long time performance issue

My program has an issue with Oracle query performance, I believe the SQL have good performance, because it returns quickly in SQLPlus.
But when my program has been running for a long time, like 1 week, the SQL query (using JDBC) becomes slower (In my logs, the query time is much longer than when I originally started the program). When I restart my program, the query performance comes back to normal.
I think it is could be something wrong with the way I use the preparedStatement, because the SQL I'm using does not use placeholders "?" at all. Just a complex select query.
The query process is done by a util class. Here is the pertinent code building the query:
public List<String[]> query(String sql, String[] args) {
Connection conn = null;
conn = openConnection();
conn.setAutocommit(true);
....
PreparedStatement preStatm = null;
ResultSet rs = null;
....//set preparedstatment arg code
rs = preStatm.executeQuery();
....
finally{
//close rs
//close prestatm
//close connection
}
}
In my case, the args is always null, so it just passes a query sql to this query method. Is that possible this way could slow down the DB query after program long time running? Or I should use statement instead, or just pass args with "?" in the SQL? How can I find out the root cause for my issue? Thanks.
Maybe problem in jdbc cache... oracle spec
Try to turn it off.
or try to reinit the driver some times (one time per day)
You first need to look into data that will help you see where you are spending most your time, guessing is not an option when performance tunning.
So I would recommend get solid data that pin points the layer presenting the issue (JAVA or DB).
For this I would suggest to look at AWR and ASH reports when the problem is most noticeable. Also collect data on the JVM (you can use JConsole and/or JVisualVM).
When first diagnosing bad performance I always do the "USE" method, Utilization, Saturation and Error.
So first, look for Errors in logs.
Then look for any resource becoming Saturated (CPUs, Memory etc...)
Finally Look at the Utilization of each resource, having a client server layout will make this easier, if this is not the case you will need to drill down to process level to know whether its Java or the DB.
Once you have collected this data you can direct your tunning efforts accordingly. Going this approach will only make you waste time and sometimes even mask problems or induce new ones.
You can come back later with this data and we can take a look!

Dealing with deadlocks in long-running Hibernate transactions

I have a Hibernate application that may produce concurrent inserts and updates (via Session.saveOrUpdate) to records with the same primary key, which is assigned. These transactions are somewhat long-running, perhaps 15 seconds on average (since data is collected from remote sources and persisted as it comes in). My DB isolation level is set to Read Committed, and I'm using MySQL and InnoDB.
The problem is this scenario creates excessive lock waits which timeout, either as a result of a deadlock or the long transactions. This leads me to a few questions:
Does the database engine only release its locks when the transaction is committed?
If this is the case, should I seek to shorten my transactions?
If so, would it be a good practice to use separate read and write transactions, where the write transaction could be made short and only take place after all of my data is gathered (the bulk of my transaction length involves collecting remote data).
Edit:
Here's a simple test that approximates what I believe is happening. Since I'm dealing with long running transactions, commit takes place long after the first flush. So just to illustrate my situation I left commit out of the test:
#Entity
static class Person {
#Id
Long id = Long.valueOf(1);
#Version
private int version;
}
#Test
public void updateTest() {
for (int i = 0; i < 5; i++) {
new Thread() {
public void run() {
Session s = sf.openSession();
Transaction t = s.beginTransaction();
Person p = new Person();
s.saveOrUpdate(p);
s.flush(); // Waits...
}
}.run();
}
}
And the queries that this expectantly produces, waiting on the second insert:
select id, version from person where id=?
insert into person (version, id) values (?, ?)
select id, version from person where id=?
insert into person (version, id) values (?, ?)
That's correct, the database releases locks only when the transaction is committed. Since you're using hibernate, you can use optimistic locking, which does locks the database for long periods of time. Essentially, hibernate does what you suggest, separating the reading and writing portions into separate transactions. On write, it checks that the data in memory has not been changed concurrently in the database.
Hibernate Reference - Optimistic Transactions
Opportunistic locking:
Base assumption: update conflicts do occur seldom.
Mechanic:
Read dataset with version field
Change dataset
Update dataset
3.1.Read Dataset with current Version field and key
If you get it, nobody has changed the record.
Apply the next version field value.
update the record.
If you do not get it, the record has been changed, return en aproriate message to the caller and you are done
Inserts are not affected, you either
have a separate primary key anyway
or you accept multiple record with identical values.
Therefore the example given above is not a case for optimistic locking.

Resources