How to fetch Primary Key/Clustering column names for a particular table using CQL statements? - cassandra

I am trying to fetch the Primary Key/Clustering Key names for a particular table/entity and implement the same query in my JPA interface (which extends CassandraRepository).
I am not sure whether something like:
#Query("DESCRIBE TABLE <table_name>)
public Object describeTbl();
would work here as describe isn't a valid CQL statement and in case it would, what would be the type of the Object?
Suggestions?

One thing you could try, would be to query the system_schema.columns table. It is keyed by keyspace_name and table_name, and might be what you're looking for here:
> SELECT column_name,kind FROM system_schema.columns
WHERE keyspace_name='spaceflight_data'
AND table_name='astronauts_by_group';
column_name | kind
-------------------+---------------
flights | regular
group | partition_key
name | clustering
spaceflight_hours | clustering
(4 rows)

DESCRIBE TABLE is supported only in Cassandra 4 that includes fix for CASSANDRA-14825. But it may not help you much because it just returns the text string representing the CREATE TABLE statement, and you'll need to parse text to extract primary key definition - it's doable but could be tricky, depending on the structure of the primary key.
Or you can obtain underlying Session object and via getMetadata function get access to actual metadata object that allows to obtain information about keyspaces & tables, including the information about schema.

Related

Howto expose a native SQL function as a predicate

I have a table in my database which stores a list of string values as a jsonb field.
create table rabbits_json (
rabbit_id bigserial primary key,
name text,
info jsonb not null
);
insert into rabbits_json (name, info) values
('Henry','["lettuce","carrots"]'),
('Herald','["carrots","zucchini"]'),
('Helen','["lettuce","cheese"]');
I want to filter my rows checking if info contains a given value.
In SQL, I would use ? operator:
select * from rabbits_json where info ? 'carrots';
If my googling skills are fine today, I believe that this is not implemented yet in JOOQ:
https://github.com/jOOQ/jOOQ/issues/9997
How can I use a native predicate in my query to write an equivalent query in JOOQ?
For anything that's not supported natively in jOOQ, you should use plain SQL templating, e.g.
Condition condition = DSL.condition("{0} ? {1}", RABBITS_JSON.INFO, DSL.val("carrots"));
Unfortunately, in this specific case, you will run into this issue here. With JDBC PreparedStatement, you still cannot use ? for other usages than bind variables. As a workaround, you can:
Use Settings.statementType == STATIC_STATEMENT to prevent using a PreparedStatement in this case
Use the jsonb_exists_any function (not indexable) instead of ?, see https://stackoverflow.com/a/38370973/521799

Update a collection type of a custom type in cassandra

How can I append a new element to a set which is in a custom type in Cassandra.
custom_type is :
CREATE TYPE custom_type (
normal_type TEXT,
set_type Set<TEXT>
);
and the table to be updated is :
CREATE TABLE test_table (
id TEXT,
my_type FROZEN<custom_type>,
clustering_key TEXT,
PRIMARY KEY ((id),clustering_key)
);
Tried below query but did not work.
#Query("update test_table set my_type.set_type = my_type.set_type + {'newelement'} where id=?1 and clustering_key=?2")
Any Idea on how to do that?
Using [cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4
When you say frozen, then the whole value is treated as one piece (blob), so you can't update parts of this field. Official documentation states:
When using the frozen keyword, you cannot update parts of a user-defined type value. The entire value must be overwritten. Cassandra treats the value of a frozen, user-defined type like a blob.

Composite key in Cassandra with Pig and where_clause for part of the key in the where clause

I basically have the same problem as the following Composite key in Cassandra with Pig. The only difference is I try to query for a part of the composite key within the where_clause of pig.
The data structure is similar to the earlier mentioned issue, I'll copy some code/context to minimize the reading of that issue.
We have a CQL table that looks something like this:
CREATE table data (
occurday text,
seqnumber int,
occurtimems bigint,
unique bigint,
fields map<text, text>,
primary key ((occurday, seqnumber), occurtimems, unique)
)
Instead of querying for both the seqnumber and the occurday (as was the issue in previously mentioned issue) I try to query one of the keys.
If I execute this query as part of a LOAD from within Pig, however, things don't work.
-- Need to URL encode the query
data = LOAD 'cql://ks/data?where_clause=occurday%3D%272013-10-01%27' USING CqlStorage();
gives
java.lang.RuntimeException
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:301)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:181)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: InvalidRequestException(why:occurday cannot be restricted by more than one relation if it includes an Equal)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:51017)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:50994)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:50933)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1756)
at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1742)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:605)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635)
... 7 more
Basically my question is, what am I doing wrong or what don't I understand?
As I understand from CqlPagingRecorderReader Used when Partition Key Is Explicitly Stated
I should be able to query with just part of the partition key?
Also while reading
Add CqlRecordReader to take advantage of native CQL pagination
I get the impression this should be possible, but I am swimming around with (in my opinion) no clear direction on how to accomplish this.
Any help is very very welcome at this point.
Regards,
Lennart Weijl
PS.
I am running on Cassandra 2.0.9 with Pig 0.13.0
According to CASSANDRA-6311, I believe you need to apply the 6331-v2-2.0-branch.txt patch, recompile pig, and then update your LOAD statement to:
data = LOAD 'cql://ks/data?where_clause=occurday%3D%272013-10-01%27' USING CqlInputFormat();
The key change being USING CqlInputFormat() which triggers the use of the new CqlRecordReader that was released in Cassandra 2.0.7.
Edit: Note that the exception is thrown from CqlPagingRecordReader which means you're still using the old record reader.

Astyanax Composite Keys in Cassandra

Im trying to create a schema that will enable me access rows with only part of the row_key.
For example the key is of the form user_id:machine_os:machine_arch
An example of a row key: 12242:"windows2000":"x86"
From the documentation I could not understand whether this will enable me to query all rows that have userid=12242 or query all rows that have "windows2000"
Is there any feasible way to achieve this ?
Thanks,
Yadid
Alright, here is what is happening: based on your schema, you are effectively creating a column family with a composite primary key or a composite rowkey. What this means is, you will need to restrict each component of the composite key except the last one with a strict equality relation. The last component of the composite key can use inequality and the IN relation, but not the 1st and 2nd components.
Additionally, you must specify all three parts if you want to utilize any kind of filtering. This is necessary because without all parts of the partition key, the coordinator node will have no idea on which node in the cluster the data exists (remember, Cassandra uses the partition key to determine replicas and data placement).
Effectively, this means you can't do any of these:
select * from datacf where user_id = 100012; # missing 2nd and 3rd key components
select * from datacf where user_id = 100012; and machine_arch = 'x86'; # missing 3rd key component
select * from datacf where machine_arch = 'x86'; # you have to specify the 1st
select * from datacf where user_id = 100012 and machine_arch in ('x86', 'x64'); # nope, still want 3rd
However, you will be able to run queries like this:
select * from datacf where user_id = 100012 and machine_arch = 'x86'
and machine_os = "windows2000"; # yes! all 3 parts are there
select * from datacf where user_id = 100012 and machine_os = "windows2000"
and machine_arch in ('x86', 'x64'); # the last part of the key can use the 'IN' or other equality relations
To answer your initial question, with you existing data model, you will neither be able to query data with userid = 12242 or query all rows that have "windows2000" as the machine_os.
If you can tell me exactly what kind of query you will be running, I can probably help in trying to design the table accordingly. Cassandra data models usually work better when looked at from the data retrieval perspective. Long story short- use only user_id as your primary key and use secondary indexes on other columns you want to query on.

Composite key in Cassandra with Pig

We have a CQL table that looks something like this:
CREATE table data (
occurday text,
seqnumber int,
occurtimems bigint,
unique bigint,
fields map<text, text>,
primary key ((occurday, seqnumber), occurtimems, unique)
)
I can query this table from cqlsh like this:
select * from data where seqnumber = 10 AND occurday = '2013-10-01';
This query works and returns the expected data.
If I execute this query as part of a LOAD from within Pig, however, things don't work.
-- Need to URL encode the query
data = LOAD 'cql://ks/data?where_clause=seqnumber%3D10%20AND%20occurday%3D%272013-10-01%27' USING CqlStorage();
gives
InvalidRequestException(why:seqnumber cannot be restricted by more than one relation if it includes an Equal)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:39567)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1625)
at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1611)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:591)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:621)
Shouldn't these behave the same? Why is the version through Pig failing where the straight cqlsh command works?
Hadoop is using CqlPagingRecordReader to try to load your data. This is leading to queries that are not identical to what you have entered. The paging record reader is trying to obtain small slices of Cassandra data at a time to avoid timeouts.
This means that your query is executed as
SELECT * FROM "data" WHERE token("occurday","seqnumber") > ? AND
token("occurday","seqnumber") <= ? AND occurday='A Great Day'
AND seqnumber=1 LIMIT 1000 ALLOW FILTERING
And this is why you are seeing your repeated key error. I'll submit a bug to the Cassandra Project.
Jira:
https://issues.apache.org/jira/browse/CASSANDRA-6151

Resources