According to this documentation, I was trying a select query with token() function in it, but it gives wrong results.
I am using below cassandra version
[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]
I was trying token query for below table -
CREATE TABLE price_key_test (
objectid int,
createdOn bigint,
price int,
foo text,
PRIMARY KEY ((objectid, createdOn), price));
Inserted data --
insert into nasa.price_key_test (objectid,createdOn,price,foo) values (1,1000,100,'x');
insert into nasa.price_key_test (objectid,createdOn,price,foo) values (1,2000,200,'x');
insert into nasa.price_key_test (objectid,createdOn,price,foo) values (1,3000,300,'x');
Data in table --
objectid | createdon | price | foo
----------+-----------+-------+-----
1 | 3000 | 300 | x
1 | 2000 | 200 | x
1 | 1000 | 100 | x
Select query is --
select * from nasa.price_key_test where token(objectid,createdOn) > token(1,1000) and token(objectid,createdOn) < token(1,3000)
This query suppose to return row with createdOn 2000, but it returns zero rows.
objectid | createdon | price | foo
----------+-----------+-------+-----
(0 rows)
According to my understanding, token(objectid,createdOn) > token(1,1000) and token(objectid,createdOn) < token(1,3000) should select row with partition key with value 1 and 2000.
Is my understanding correct?
Try flipping your greater/less-than signs around:
aploetz#cqlsh:stackoverflow> SELECT * FROM price_key_test
WHERE token(objectid,createdOn) < token(1,1000)
AND token(objectid,createdOn) > token(1,3000) ;
objectid | createdon | price | foo
----------+-----------+-------+-----
1 | 2000 | 200 | x
(1 rows)
Adding the token() function to your SELECT should help you to understand why:
aploetz#cqlsh:stackoverflow> SELECT objectid, createdon, token(objectid,createdon),
price, foo FROM price_key_test ;
objectid | createdon | system.token(objectid, createdon) | price | foo
----------+-----------+-----------------------------------+-------+-----
1 | 3000 | -8449493444802114536 | 300 | x
1 | 2000 | -2885017981309686341 | 200 | x
1 | 1000 | -1219246892563628877 | 100 | x
(3 rows)
The hashed token values generated are not necessarily proportional to their original numeric values. In your case, token(1,3000) generated a hash that was the smallest of the three, and not the largest.
Related
I need to make a selection by the value of the remainder of the division:
cqlsh> SELECT * FROM table WHERE key%10=1;
Invalid syntax at line 1, char 39
SELECT * FROM table WHERE key%10=1;
^
Does CQL allow such queries?
CQL does not support modulo operations on the partition key.
You can only use the absolute value of the partition to filter in CQL queries. Cheers!
So I went to try this out with a simple table:
CREATE TABLE stackoverflow.keys (
month int,
id uuid,
key int,
PRIMARY KEY (month, id));
I was able to get this to work:
> SELECT month,month%10,id,key,key%10 AS "key mod 10"
FROM keys2 WHERE month=202208;
month | month % 10 | id | key | key mod 10
--------+------------+--------------------------------------+------+------------
202208 | 8 | 2fe7e98f-d1e2-45df-91f6-fa1430995fdc | 12 | 2
202208 | 8 | 59d04401-d11f-472d-a606-a33d380dc017 | 800 | 0
202208 | 8 | 92d3fa01-3b1e-4649-9280-786d75e2b9dc | 1157 | 7
202208 | 8 | 02612042-a7de-49ce-b958-ee60853ba51c | 2660 | 0
However, I was not able to get the modulus operator to work in the WHERE clause.
[Question posted by a user on YugabyteDB Community Slack]
I'm trying to upsert, delete and upsert the same record using using timestamp syntax. The first upsert and delete are successful. After I delete the record, if I upsert the same record again, the update status is true, but the select statement is not showing the row.
ycqlsh:test> CREATE TABLE todo ( id int, seq int, task text, status boolean, primary key (id, seq) );
ycqlsh:test> insert into todo(id, seq, task, status) values(1, 1, 'sample', false);
ycqlsh:test> insert into todo(id, seq, task, status) values(1, 2, 'sample2', false);
ycqlsh:test> select * from todo;
id | seq | task | status
----+-----+---------+--------
1 | 1 | sample | False
1 | 2 | sample2 | False
(2 rows)
ycqlsh:test> UPDATE todo using timestamp 1000 SET status = false, task='sample3' WHERE id=1 and seq=3 returns status as row;
[applied] | [message] | id | seq | task | status
-----------+-----------+------+------+------+--------
True | null | null | null | null | null
ycqlsh:test> select * from todo;
id | seq | task | status
----+-----+---------+--------
1 | 1 | sample | False
1 | 2 | sample2 | False
1 | 3 | sample3 | False
(3 rows)
ycqlsh:test> delete from todo WHERE id=1 and seq=3;
ycqlsh:test> select * from todo;
id | seq | task | status
----+-----+---------+--------
1 | 1 | sample | False
1 | 2 | sample2 | False
(2 rows)
ycqlsh:test> UPDATE todo using timestamp 2000 SET status = false, task='sample3' WHERE id=1 and seq=3 returns status as row;
[applied] | [message] | id | seq | task | status
-----------+-----------+------+------+------+--------
True | null | null | null | null | null
ycqlsh:test> select * from todo;
id | seq | task | status
----+-----+---------+--------
1 | 1 | sample | False
1 | 2 | sample2 | False
This is because you use DELETE without USING. To reintroduce the row back into the table, you have to use UPDATE without USING TIMESTAMP.
The correct usage should have been:
delete from todo USING TIMESTAMP 1000 id=1 and seq=3;
If you want DELETE without the use of USING timestamp... then you should use timestamp values that are actually close to physical time in microseconds since epoch rather than numbers like 1000 or 2000.
I'm selecting data from a Cassandra database using a query. It is working fine but how to get the data in same order as I have given IN query?
I have created table like this:
id | n | p | q
----+---+---+------
5 | 1 | 2 | 4
10 | 2 | 4 | 3
11 | 1 | 2 | null
I am trying to select data using
SELECT *
FROM malleshdmy
WHERE id IN ( 11,10,5)
But, It producing same data as like stored.
id | n | p | q
----+---+---+------
5 | 1 | 2 | 4
10 | 2 | 4 | 3
11 | 1 | 2 | null
Please help me in this issue.
I want data as 11,10 and 5
If the id is partition key, then it's impossible - data are sorted only inside the clustering columns, and data for different partition keys could be returned in arbitrary order (but sorted inside that partition).
You need to sort data yourself.
Since id is your partition key, your data is actually being sorted by the token of id, not the values themselves:
cqlsh:testid> SELECT id,n,p,q,token(id) FROM table;
id | n | p | q | system.token(id)
----+---+---+------+----------------------
5 | 1 | 2 | 4 | -7509452495886106294
10 | 2 | 4 | 3 | -6715243485458697746
11 | 1 | 2 | null | -4156302194539278891
Because of this, you don't have any control over how the partition key is sorted.
In order to sort your data by id, you need to make id a clustering column rather than a partition key. Your data will still need a partition key, however, and this will always be sorted by token.
If you decide to make id a clustering column, you will need to specify that you want a descending order in your order by statement
CREATE TABLE clusterTable (
... partition type, //partition key with a type to be specified
... id INT,
... n INT,
... p INT,
... q INT,
... PRIMARY KEY((partition),id))
... WITH CLUSTERING ORDER BY (id DESC);
This link is very helpful in discussing how ordering works in Cassandra: https://www.datastax.com/dev/blog/we-shall-have-order
I learn Cassandra through its documentation. Now I'm learning about batch and static fields.
In their example at the end of the page, they somehow managed to make balance have two different values (-200, -208) even though it's a static field.
Could someone explain to me how this is possible? I've read the whole page but I did not catch on.
In Cassandra static field is static under a partition key.
Example : Let's define a table
CREATE TABLE static_test (
pk int,
ck int,
d int,
s int static,
PRIMARY KEY (pk, ck)
);
Here pk is the partition key and ck is the clustering key.
Let's insert some data :
INSERT INTO static_test (pk , ck , d , s ) VALUES ( 1, 10, 100, 1000);
INSERT INTO static_test (pk , ck , d , s ) VALUES ( 2, 20, 200, 2000);
If we select the data
pk | ck | s | d
----+----+------+-----
1 | 10 | 1000 | 100
2 | 20 | 2000 | 200
here for partition key pk = 1 static field s value is 1000 and for partition key pk = 2 static field s value is 2000
If we insert/update static field s value of partition key pk = 1
INSERT INTO static_test (pk , ck , d , s ) VALUES ( 1, 11, 101, 1001);
Then static field s value will change for all the rows of the partition key pk = 1
pk | ck | s | d
----+----+------+-----
1 | 10 | 1001 | 100
1 | 11 | 1001 | 101
2 | 20 | 2000 | 200
In a table that uses clustering columns, non-clustering columns can be declared static in the table definition. Static columns are only static within a given partition.
Example:
CREATE TABLE test (
partition_column text,
static_column text STATIC,
clustering_column int,
PRIMARY KEY (partition_column , clustering_column)
);
INSERT INTO test (partition_column, static_column, clustering_column) VALUES ('key1', 'A', 0);
INSERT INTO test (partition_column, clustering_column) VALUES ('key1', 1);
SELECT * FROM test;
Results:
primary_column | clustering_column | static_column
----------------+-------------------+--------------
key1 | 0 | A
key1 | 1 | A
Observation:
Once declared static, the column inherits the value from given partition key
Now, lets insert another record
INSERT INTO test (partition_column, static_column, clustering_column) VALUES ('key1', 'C', 2);
SELECT * FROM test;
Results:
primary_column | clustering_column | static_column
----------------+-------------------+--------------
key1 | 0 | C
key1 | 1 | C
key1 | 2 | C
Observation:
If you update the static key, or insert another record with updated static column value, the value is reflected across all the columns ==> static column values are static (constant) across given partition column
Restriction (from the DataStax reference documentation below):
A table that does not define any clustering columns cannot have a static column. The table having no clustering columns has a one-row partition in which every column is inherently static.
A table defined with the COMPACT STORAGE directive cannot have a static column.
A column designated to be the partition key cannot be static.
Reference : DataStax Reference
In the example on the page you've linked they don't have different values at the same point in time.
They first have the static balance field set to -208 for the whole user1 partition:
user | expense_id | balance | amount | description | paid
-------+------------+---------+--------+-------------+-------
user1 | 1 | -208 | 8 | burrito | False
user1 | 2 | -208 | 200 | hotel room | False
Then they apply a batch update statement that sets the balance value to -200:
BEGIN BATCH
UPDATE purchases SET balance=-200 WHERE user='user1' IF balance=-208;
UPDATE purchases SET paid=true WHERE user='user1' AND expense_id=1 IF paid=false;
APPLY BATCH;
This updates the balance field for the whole user1 partition to -200:
user | expense_id | balance | amount | description | paid
-------+------------+---------+--------+-------------+-------
user1 | 1 | -200 | 8 | burrito | True
user1 | 2 | -200 | 200 | hotel room | False
The point of a static fields is that you can update/change its value for the whole partition at once. So if I would execute the following statement:
UPDATE purchases SET balance=42 WHERE user='user1'
I would get the following result:
user | expense_id | balance | amount | description | paid
-------+------------+---------+--------+-------------+-------
user1 | 1 | 42 | 8 | burrito | True
user1 | 2 | 42 | 200 | hotel room | False
I'm working on smart parking data stored in Cassandra database and i'm trying to get the last status of each device.
I'm working on self-made dataset.
here's the description of the table.
table description
select * from parking.meters
need help please !
trying to get the last status of each device
In Cassandra, you need to design your tables according to your query patterns. Building a table, filling it with data, and then trying to fulfill a query requirement is a very backward approach. The point, is that if you really need to satisfy that query, then your table should have been designed to serve that query from the beginning.
That being said, there may still be a way to make this work. You haven't mentioned which version of Cassandra you are using, but if you are on 3.6+, you can use the PER PARTITION LIMIT clause on your SELECT.
If I build your table structure and INSERT some of your rows:
aploetz#cqlsh:stackoverflow> SELECT * FROM meters ;
parking_id | device_id | date | status
------------+-----------+----------------------+--------
1 | 20 | 2017-01-12T12:14:58Z | False
1 | 20 | 2017-01-10T09:11:51Z | True
1 | 20 | 2017-01-01T13:51:50Z | False
1 | 7 | 2017-01-13T01:20:02Z | False
1 | 7 | 2016-12-02T16:50:04Z | True
1 | 7 | 2016-11-24T23:38:31Z | False
1 | 19 | 2016-12-14T11:36:26Z | True
1 | 19 | 2016-11-22T15:15:23Z | False
(8 rows)
And I consider your PRIMARY KEY and CLUSTERING ORDER definitions:
PRIMARY KEY ((parking_id, device_id), date, status)
) WITH CLUSTERING ORDER BY (date DESC, status ASC);
You are at least clustering by date (which should be an actual date type, not a text), so that will order your rows in a way that helps you here:
aploetz#cqlsh:stackoverflow> SELECT * FROM meters PER PARTITION LIMIT 1;
parking_id | device_id | date | status
------------+-----------+----------------------+--------
1 | 20 | 2017-01-12T12:14:58Z | False
1 | 7 | 2017-01-13T01:20:02Z | False
1 | 19 | 2016-12-14T11:36:26Z | True
(3 rows)