Cassandra Datastax BatchStatement with Lightweight Transactions - cassandra

I'm having difficulty combining BatchStatement and Lightweight Transactions using the Datastax java driver.
Consider the following:
String batch =
"BEGIN BATCH "
+ "Update mykeyspace.mytable set record_version = 102 where id = '" + id + "' if record_version = 101;
" <additional batched statements>
+ "APPLY BATCH";
Row row = session.execute(batch).one();
if (! row.getBool("[applied]")) {
throw new RuntimeException("Optimistic Lock Failure!");
}
This functions as expected and indicates whether my lightweight transaction succeeded and my batch was applied. All is good. However if I try the same thing using a BatchStatement, I run into a couple of problems:
-- My lightweight transaction "if" clause is ignored and the update is always applied
-- The "Row" result is null making it impossible to execute the final row.getBool("[applied]") check.
String update = "Update mykeyspace.mytable set record_version = ? where id = ? if record_version = ?";
PreparedStatement pStmt = getSession().prepare(update);
BatchStatement batch = new BatchStatement();
batch.add(new BoundStatement(pStmt).bind(newVersion, id, oldVersion));
Row row = session.execute(batch).one(); <------ Row is null!
if (! row.getBool("[applied]")) {
throw new RuntimeException("Optimistic Lock Failure!");
}
Am I doing this wrong? Or is this a limitation with the datastax BatchStatement?

I am encountering this same issue. I opened a ticket with DataStax support yesterday and received the following answer:
Currently Lightweight Transactions as PreparedStatements within a BATCH are not supported. This is why you are encountering this issue.
There is nothing on the immediate roadmap to include this feature in Cassandra.
That suggests eliminating the PreparedStatement will workaround the issue. I'm going to try that myself, but haven't yet.
[Update]
I've been trying to work around this issue. Based on the earlier feedback, I assumed the restriction was on using a PreparedStatement for the conditional update.
I tried changing my code to not use a PreparedStatement, but that still didn't work when using a BatchStatement that contained a RegularStatement instead of a PreparedStatement.
BatchStatement batchStatement = new BatchStatement();
batchStatement.add(conditionalRegularStatement);
session.executeQuery(batchStatement);
They only thing that seems to work is to do an executeQuery with a raw query string that includes the batch.
session.executeQuery("BEGIN BATCH " + conditionalRegularStatement.getQueryString() + " APPLY BATCH");

Update:
This issue is resolved in CASSANDRA-7337 which was fixed in C* 2.0.9.
The latest DataStax Enterprise is now on 2.0.11. If you are seeing this issue ==> Upgrade!

The 2nd code snippet doesn't look right to me (there's no constructor taking a string or you've missed the prepared statement).
Can you try instead the following:
String update = "Update mykeyspace.mytable set record_version = ? where id = ? if record_version = ?";
PreparedStatement pStmt = session.prepare(update);
BatchStatement batch = new BatchStatement();
batch.add(pStmt.bind(newVersion, id, recordVersion));
Row row = session.execute(batch).one();
if (! row.getBool("[applied]")) {
throw new RuntimeException("Optimistic Lock Failure!");
}

Related

Cassandra update with Lightweight Transactions

I want to implement the lightweight transaction for update statement with SpringBoot Cassandra.
Ex:
UPDATE cyclist_id SET id = 15a116fc-b833-4da6-ab9a-4a3775750239 where lastname = 'WELTEN' and firstname = 'Bram' IF age = 18;
Is there any API available with CassandraTemplate to achieve the above?
I tried using
UpdateOptions updateOptions = UpdateOptions.builder().consistencyLevel(ConsistencyLevel.QUORUM).build(); But not sure how to set the If condition with value (like age = 18)
I figured out the solution to implement lightweight transaction through SpringBoot Caasandra.
Create CqlSession using com.datastax.oss.driver.api.core.CqlSession builder.
Use PreparedStatement - for query. In query use IF condition. and set the parameter to the query using BoundStatement.

NoNodeAvailableException after some insert request to cassandra

I am trying to insert data into Cassandra local cluster using async execution and version 4 of the driver (as same as my Cassandra instance)
I have instantiated the cql session in this way:
CqlSession cqlSession = CqlSession.builder()
.addContactEndPoint(new DefaultEndPoint(
InetSocketAddress.createUnresolved("localhost",9042))).build();
Then I create a statement in an async way:
return session.prepareAsync(
"insert into table (p1,p2,p3, p4) values (?, ?,?, ?)")
.thenComposeAsync(
(ps) -> {
CompletableFuture<AsyncResultSet>[] result = data.stream().map(
(d) -> session.executeAsync(
ps.bind(d.p1,d.p2,d.p3,d.p4)
)
).toCompletableFuture()
).toArray(CompletableFuture[]::new);
return CompletableFuture.allOf(result);
}
);
data is a dynamic list filled with user data.
When I exec the code I get the following exception:
Caused by: com.datastax.oss.driver.api.core.NoNodeAvailableException: No node was available to execute the query
at com.datastax.oss.driver.api.core.AllNodesFailedException.fromErrors(AllNodesFailedException.java:53)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.sendRequest(CqlPrepareHandler.java:210)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.onThrottleReady(CqlPrepareHandler.java:167)
at com.datastax.oss.driver.internal.core.session.throttling.PassThroughRequestThrottler.register(PassThroughRequestThrottler.java:52)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.<init>(CqlPrepareHandler.java:153)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareAsyncProcessor.process(CqlPrepareAsyncProcessor.java:66)
at com.datastax.oss.driver.internal.core.cql.CqlPrepareAsyncProcessor.process(CqlPrepareAsyncProcessor.java:33)
at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:210)
at com.datastax.oss.driver.api.core.cql.AsyncCqlSession.prepareAsync(AsyncCqlSession.java:90)
The node is active and some data are inserted before the exception rise. I have also tried to set up a data center name on the session builder without any result.
Why this exception rise if the node is up and running? Actually I have only one local node and that could be a problem?
The biggest thing that I don't see, is a way to limit the current number of active async threads.
Basically, if that (mapped) data stream gets hit hard enough, it'll basically create all of these new threads that it's awaiting. If the number of writes coming in from those threads creates enough back-pressure that node can't keep up or catch up to, the node will become overwhelmed and not accept requests.
Take a look at this post by Ryan Svihla of DataStax:
Cassandra: Batch Loading Without the Batch — The Nuanced Edition
Its code is from the 3.x version of the driver, but the concepts are the same. Basically, provide some way to throttle-down the writes, or limit the number of "in flight threads" running at any given time, and that should help greatly.
Finally, I have found a solution using BatchStatement and a little custom code to create a chucked list.
int chunks = 0;
if (data.size() % 100 == 0) {
chunks = data.size() / 100;
} else {
chunks = (data.size() / 100) + 1;
}
final int finalChunks = chunks;
return session.prepareAsync(
"insert into table (p1,p2,p3, p4) values (?, ?,?, ?)")
.thenComposeAsync(
(ps) -> {
AtomicInteger counter = new AtomicInteger();
final List<CompletionStage<AsyncResultSet>> batchInsert = data.stream()
.map(
(d) -> ps.bind(d.p1,d.p2,d.p3,d.p4)
)
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / finalChunks))
.values().stream().map(
boundedStatements -> BatchStatement.newInstance(BatchType.LOGGED, boundedStatements.toArray(new BatchableStatement[0]))
).map(
session::executeAsync
).collect(Collectors.toList());
return CompletableFutures.allSuccessful(batchInsert);
}
);

How do I retrieve table names in Cassandra using Java?

If is use this code in a CQL shell , I get all the names of table(s) in that keyspace.
DESCRIBE TABLES;
I want to retrieve the same data using ResulSet . Below is my code in Java.
String query = "DESCRIBE TABLES;";
ResultSet rs = session.execute(query);
for(Row row : rs) {
System.out.println(row);
}
While session and cluster has been initialized earlier as:
Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
Session session = cluster.connect("keyspace_name");
Or I like to know Java code to retrieve table names in a keyspace.
The schema for the system tables change between versions quite a bit. It is best to rely on drivers Metadata that will have version specific parsing built in. From the Java Driver use
Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
Collection<TableMetadata> tables = cluster.getMetadata()
.getKeyspace("keyspace_name")
.getTables(); // TableMetadata has name in getName(), along with lots of other info
// to convert to list of the names
List<String> tableNames = tables.stream()
.map(tm -> tm.getName())
.collect(Collectors.toList());
Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
Metadata metadata = cluster.getMetadata();
Iterator<TableMetadata> tm = metadata.getKeyspace("Your Keyspace").getTables().iterator();
while(tm.hasNext()){
TableMetadata t = tm.next();
System.out.println(t.getName());
}
The above code will give you table names in the passed keyspace irrespective of the cassandra version used.

Insert is 10 times faster than Update in Cassandra. Is it normal?

In my Java application accessing Cassandra, it can insert 500 rows per second, but only update 50 rows per second(actually the updated rows didn't exist).
Updating one hundred fields is as fast as updating one field.
I just use CQL statements in the Java application.
Is this situation normal? How can I improve my application?
public void InsertSome(List<Data> data) {
String insertQuery = "INSERT INTO Data (E,D,A,S,C,......) values(?,?,?,?,?,.............); ";
if (prepared == null)
prepared = getSession().prepare(insertQuery);
count += data.size();
for (int i = 0; i < data.size(); i++) {
List<Object> objs = getFiledValues(data.get(i));
BoundStatement bs = prepared.bind(objs.toArray());
getSession().execute(bs);
}
}
public void UpdateOneField(Data data) {
String updateQuery = "UPDATE Data set C=? where E=? and D=? and A=? and S=?; ";
if (prepared == null)
prepared = getSession().prepare(updateQuery);
BoundStatement bs = prepared.bind(data.getC(), data.getE(),
data.getD(), data.getA(), data.getS());
getSession().execute(bs);
}
public void UpdateOne(Data data) {
String updateQuery = "UPDATE Data set C=?,U=?,F........where E=? and D=? and A=? and S=? and D=?; ";
if (prepared == null)
prepared = getSession().prepare(updateQuery);
......
BoundStatement bs = prepared.bind(objs2.toArray());
getSession().execute(bs);
}
Schema:
Create Table Data (
E,
D,
A,
S,
D,
C,
U,
S,
...
PRIMARY KEY ((E
D),
A,
S)
) WITH compression = { 'sstable_compression' : 'DeflateCompressor', 'chunk_length_kb' : 64 }
AND compaction = { 'class' : 'LeveledCompactionStrategy' };
Another scenario:
I used the same application to access another cassandra cluster. The result was different. UPDATE was as fast as INSERT. But it only INSERT/UPDATE 5 rows per second. This cassandra cluster is the DataStax Enterprise running on GCE(I used the default DataStax Enterprise on Google Cloud Launcher)
So I think it's probably that some configurations are the reasons. But I don't know what they are.
Conceptually UPDATE and INSERT are the same so I would expect similar performance. UPDATE doesn't check to see if the data already exists (unless you are doing a lightweight transaction with IF EXISTS).
I noticed that each of your methods prepare a statement if it is not null. Is it possible the statement is being reprepared each time? That would add for a roundtrip for every method invocation. I also noticed that InsertSome does multiple inserts per invocation, where UpdateOne / UpdateOneField execute one statement. So if the statement were prepared every time, thats an invocation per update, where it's only done once per insert for a list.
Cassandra uses log-structured merge trees for an on-disk format, meaning all writes are done sequentially (the database is the append-only log). That implies a lower write latency.
At the cluster level, Cassandra is also able to achieve greater write scalability by partitioning the key space such that each machine is only responsible for a portion of the keys. That implies a higher write throughput, as more writes can be done in parallel.

GridGain join query with both objects returned

Is it possible to return both objects of a join as the result of a GridGain cache query?
We can get either one of the sides of the join or fields from both (and then use these to retrieve each object separately), but looking at the examples and documentation there seems to be no way to get both objects.
Thanks!
#dejan- In GridGain and Apache Ignite, you can use _key and _val functionality with SqlFieldsQuery in order to return an object. For example -
SqlFieldsQuery sql = new SqlFieldsQuery(
"select a._key, a._val, b._val from SomeTypeA a, SomeTypeB b " +
"where a.id = b.otherId");
try (QueryCursor<List<?>> cursor = cache.query(sql) {
for (List<?> row : cursor)
System.out.println("Row: " + row);
}
Note that in this case the object will be returned in a Serialized form.
First of all, GridGain Open Source edition is now Apache Ignite, so I would recommend switching.
In Ignite, you can return exactly the fields you need from a query using SqlFieldsQuery, like so:
SqlFieldsQuery sql = new SqlFieldsQuery(
"select fieldA1, fieldA2, fieldB3 from SomeTypeA a, SomeTypeB b " +
"where a.id = b.otherId");
try (QueryCursor<List<?>> cursor = cache.query(sql) {
for (List<?> row : cursor)
System.out.println("Row: " + row);
}
In GridGain open source edtion, you can use GridGain fields query APIs as well.

Resources