Cassandra Statement set KeySpace - cassandra

Using Cassandra 2.2.8 with 3.0 Connector.
I am trying to create a Statement with QueryBuilder. When I execute Statement it complains no keyspace defined. The only way I know to set keyspace is as below (There is no setKeyspace method in Statement). When I do a getKeySpace - I actually get null
Statement s = QueryBuilder.select().all()
.from("test.tests")
System.out.println("getKeyspace:"+ s.getKeyspace()); >> null
Am I doing something wrong, Is there any other (more reliable) way to setKeyspace?
Thanks

from(String) expects a table name. While what you are doing is technically valid and cassandra will interpret it correctly, the driver is not able to derive the keyspace name in this way.
Instead you could use from(String, String) which takes the first parameter as the keyspace.
Statement s = QueryBuilder.select().all()
.from("test", "tests");
System.out.println("getKeyspace:" + s.getKeyspace()); // >> test

Related

Getting SyntaxException programmatically creating a table with the Cassandra Python driver

Error:
cassandra.protocol.SyntaxException: \
<Error from server: code=2000 [Syntax error in CQL query] \
message="line 1:36 no viable alternative at input '(' \
(CREATE TABLE master_table(dict_keys[(]...)">
Code:
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
session=cluster.connect('firstkey')
ColName={"qty_dot_url": "int",
"qty_hyphen_url": "int",
"qty_underline_url": "int",
"qty_slash_url": "int"}
columns = ColName.keys()
values = ColName.values()
session.execute('CREATE TABLE master_table({ColName} {dataType}),PRIMARY KEY(qty_dot_url)'.format(ColName=columns, dataType=values))
How to resolve above mentioned error?
So I replaced the session.execute with a print, and it produced this:
CREATE TABLE master_table(dict_keys(['qty_dot_url', 'qty_hyphen_url', 'qty_underline_url', 'qty_slash_url']) dict_values(['int', 'int', 'int', 'int'])),PRIMARY KEY(qty_dot_url)
That is not valid CQL. It needs to look like this:
CREATE TABLE master_table(qty_dot_url int, qty_hyphen_url int,
qty_underline_url int, qty_slash_url int, PRIMARY KEY(qty_dot_url))
I was able to create that by making these adjustments to your code:
createTableCQL = "CREATE TABLE master_table("
for key, value in ColName.items():
createTableCQL += key + " " + value + ", "
createTableCQL += "PRIMARY KEY(qty_dot_url))"
You could then follow that with a session.execute(createTableCQL).
Notes:
The PRIMARY KEY definition must be inside the paren list.
Creating schema from inside application code is often problematic, and can create a schema disagreement in the cluster. It's almost always better to create tables outside of code.
The syntax exception is a result of your Python code generating an invalid CQL which Aaron pointed out in his response.
To add to his answer, you need to add additional steps whenever you are programatically making schema changes. In particular, you need to make sure that you check for schema agreement (i.e. the schema change has been propagated to all nodes) before moving on to the next bit in your code.
You will need to modify your code to save the result from the schema change, for example:
resultset = session.execute(SimpleStatement("CREATE TABLE ..."))
then call this in your code:
resultset.response_future.is_schema_agreed
You'll need to loop through this check until True is returned. Depending on how long you want to wait (default max_schema_agreement_wait is 10 seconds), you'll need to implement some logic to do [something] when schema agreement is not achieved (because a node is down for example) -- this requires manual intervention from an operator to investigate the cluster.
As Aaron already said, performing schema changes programatically is very problematic and we discourage doing this unless you fully understand the pitfalls and know how to handle failures. Cheers!

Is there any way to find out which node has been used by SELECT statement in Cassandra?

I have written a custom LoadBalancerPolicy for spark-cassandra-connector and now I want to ensure that it really works!
I have a Cassandra cluster with 3 nodes and a keyspace with a replication factor of 2, so when we want to retrieve a record, there will be only two nodes on cassandra which hold the data.
The thing is that I want to ensure the spark-cassandra-connector (with my load-balancer-policy) is still token-aware and will choose the right node as coordinator for each "SELECT" statement.
Now, I'm thinking if we can write a trigger on the SELECT statement for each node, in case of the node does not hold the data, the trigger will create a log and I realize the load-balancer-policy does not work properly. How can we write a trigger On SELECT in Cassandra? Is there any better way to accomplish that?
I already checked the documentation for creating the triggers and those are too limited:
Official documentation
Documentation at DataStax
Example implementation in official repo
You can do it from the program side, if you get routing key for your bound statement (you must use prepared statements), find the replicas for it via Metadata class, and then compare if this host is in the ExecutionInfo that you can get from ResultSet.
According to what Alex said, we can do it as below:
After creating SparkSession, we should make a connector:
import com.datastax.spark.connector.cql.CassandraConnector
val connector = CassandraConnector.apply(sparkSession.sparkContext.getConf)
Now we can define a preparedStatement and do the rest:
connector.withSessionDo(session => {
val selectQuery = "select * from test where id=?"
val prepareStatement = session.prepare(selectQuery)
val protocolVersion = session.getCluster.getConfiguration.getProtocolOptions.getProtocolVersion
// We have to explicitly bind the all of parameters that partition key is based on them, otherwise the routingKey will be null.
val boundStatement = prepareStatement.bind(s"$id")
val routingKey = boundStatement.getRoutingKey(protocolVersion, null)
// We can get tha all of nodes that contains the row
val replicas = session.getCluster.getMetadata.getReplicas("test", routingKey)
val resultSet = session.execute(boundStatement)
// We can get the node which gave us the row
val host = resultSet.getExecutionInfo.getQueriedHost
// Final step is to check whether the replicas contains the host or not!!!
if (replicas.contains(host)) println("It works!")
})
The important thing is that we have to explicitly bind the all of parameters that partition key is based on them (i.e. we cannot set them har-codded in the SELECT statement), otherwise the routingKey will be null.

Cassandra Prepared Statement - Binding Parameters Twice

I have a cql query I want to preform. The cql string looks like this:
SELECT * FROM :columnFamilyName WHERE <some_column_name> = :name AND <some_id> = :id;
My application has two layers of abstraction above the datastax driver. In one layer I want to bind the first two parameters and in another layer I'd like to bind the last parameter.
The problem is, if I bind the first two parameters, I get a BoundStatement to which I cannot bind another parameter. Am I missing something? Can it be done?
We're using datastax driver version 2.0.3.
Thanks,
Anatoly.
You should be able to bind any number of parameters to your BoundStatement using boundStatement.setXXXX(index,value) as follows :
BoundStatement statement = new BoundStatement(query);
statement.setString(0, "value");
statement.setInt(1, 1);
statement.setDate(2, new Date());
ResultSet results = session.execute(statement);
The problem though is that you're trying to use a dynamic column family whose value changes with the value you want to bind.
As far as I know, this is not allowed so you should instead prepare one statement per table and then use the right bound statement.

Cassandra 2.0, CQL 3.x Update ...USING TIMESTAMP

I am planning to use the Update...USING TIMESTAMP... statement to make sure that I do not overwrite fresh data with stale data while having to avoid doing at least LOCAL_QUORUM writes.
Here is my table structure.
Table=DocumentStore
DocumentID (primaryKey, bigint)
Document(text)
Version(int)
If the service receives 2 write requests with Version=1 and Version=2, regardless of the order of arrival, the business requirement is that we end up with Version=2 in the database.
Can I use the following CQL Statement?
Update DocumentStore using <versionValue>
SET Document=<documentValue>,
Version=<versionValue>
where DocumentID=<documentIDValue>;
Has anybody used something like this? If so was the behavior as expected?
Yes, this is a known technique. Although it should be
UPDATE "DocumentStore" USING TIMESTAMP <versionValue>
SET "Document" = <documentValue>,
"Version" = <versionValue>
WHERE "DocumentID" = <documentIDValue>;
You missed a TIMESTAMP keyword, and also since you are using case sensitive names, you should enclose them in quotes.

How to check if a Cassandra table exists

Is there an easy way to check if table (column family) is defined in Cassandra using CQL (or API perhaps, using com.datastax.driver)?
Right now I am leaning towards executing SELECT 1 FROM table and checking for exception but maybe there is a better way?
As of 1.1 you should be able to query the system keyspace, schema_columnfamilies column family. If you know which keyspace you want to check, this CQL should list all column families in a keyspace:
SELECT columnfamily_name
FROM schema_columnfamilies WHERE keyspace_name='myKeyspaceName';
The report describing this functionality is here: https://issues.apache.org/jira/browse/CASSANDRA-2477
Although, they do note that some of the system column names have changed between 1.1 and 1.2. So you might have to mess around with it a little to get your desired results.
Edit 20160523 - Cassandra 3.x Update:
Note that for Cassandra 3.0 and up, you'll need to make a few adjustments to the above query:
SELECT table_name
FROM system_schema.tables WHERE keyspace_name='myKeyspaceName';
The Java driver (since you mentioned it in your question) also maintains a local representation of the schema.
Driver 3.x and below:
KeyspaceMetadata ks = cluster.getMetadata().getKeyspace("myKeyspace");
TableMetadata table = ks.getTable("myTable");
boolean tableExists = (table != null);
Driver 4.x and above:
Metadata metadata = session.getMetadata();
boolean tableExists =
metadata.getKeyspace("myKeyspace")
.flatMap(ks -> ks.getTable("myTable"))
.isPresent();
I just needed to manually check for the existence of a table using cqlsh.
Possibly useful general info.
describe keyspace_name.table_name
If it doesn't exist you'll get 'table_name' not found in keyspace 'keyspace'
If it does exist you'll get a description of the table.
For the .NET driver CassandraCSharpDriver version 3.17.1 the following code creates a table if it doesn't exist yet:
var ks = _cassandraSession.Cluster.Metadata.GetKeyspace(keyspaceName);
var tableNames = ks.GetTablesNames();
if(!tableNames.Contains(tableName.ToLowerInvariant()))
{
var stmt = new SimpleStatement($"CREATE TABLE {tableName} (id text PRIMARY KEY, name text, price decimal, volume int, time timestamp)");
_cassandraSession.Execute(stmt);
}
You will need to adapt the list of table columns to your needs. This can also be awaited by using await _cassandraSession.ExecuteAsync(stmt).ConfigureAwait(false) in an async method.
Also, I want to mention that I'm using Cassandra version 4.0.1.

Resources