I learn Cassandra through its documentation. Now I'm learning about batch and static fields.
In their example at the end of the page, they somehow managed to make balance have two different values (-200, -208) even though it's a static field.
Could someone explain to me how this is possible? I've read the whole page but I did not catch on.
In Cassandra static field is static under a partition key.
Example : Let's define a table
CREATE TABLE static_test (
pk int,
ck int,
d int,
s int static,
PRIMARY KEY (pk, ck)
);
Here pk is the partition key and ck is the clustering key.
Let's insert some data :
INSERT INTO static_test (pk , ck , d , s ) VALUES ( 1, 10, 100, 1000);
INSERT INTO static_test (pk , ck , d , s ) VALUES ( 2, 20, 200, 2000);
If we select the data
pk | ck | s | d
----+----+------+-----
1 | 10 | 1000 | 100
2 | 20 | 2000 | 200
here for partition key pk = 1 static field s value is 1000 and for partition key pk = 2 static field s value is 2000
If we insert/update static field s value of partition key pk = 1
INSERT INTO static_test (pk , ck , d , s ) VALUES ( 1, 11, 101, 1001);
Then static field s value will change for all the rows of the partition key pk = 1
pk | ck | s | d
----+----+------+-----
1 | 10 | 1001 | 100
1 | 11 | 1001 | 101
2 | 20 | 2000 | 200
In a table that uses clustering columns, non-clustering columns can be declared static in the table definition. Static columns are only static within a given partition.
Example:
CREATE TABLE test (
partition_column text,
static_column text STATIC,
clustering_column int,
PRIMARY KEY (partition_column , clustering_column)
);
INSERT INTO test (partition_column, static_column, clustering_column) VALUES ('key1', 'A', 0);
INSERT INTO test (partition_column, clustering_column) VALUES ('key1', 1);
SELECT * FROM test;
Results:
primary_column | clustering_column | static_column
----------------+-------------------+--------------
key1 | 0 | A
key1 | 1 | A
Observation:
Once declared static, the column inherits the value from given partition key
Now, lets insert another record
INSERT INTO test (partition_column, static_column, clustering_column) VALUES ('key1', 'C', 2);
SELECT * FROM test;
Results:
primary_column | clustering_column | static_column
----------------+-------------------+--------------
key1 | 0 | C
key1 | 1 | C
key1 | 2 | C
Observation:
If you update the static key, or insert another record with updated static column value, the value is reflected across all the columns ==> static column values are static (constant) across given partition column
Restriction (from the DataStax reference documentation below):
A table that does not define any clustering columns cannot have a static column. The table having no clustering columns has a one-row partition in which every column is inherently static.
A table defined with the COMPACT STORAGE directive cannot have a static column.
A column designated to be the partition key cannot be static.
Reference : DataStax Reference
In the example on the page you've linked they don't have different values at the same point in time.
They first have the static balance field set to -208 for the whole user1 partition:
user | expense_id | balance | amount | description | paid
-------+------------+---------+--------+-------------+-------
user1 | 1 | -208 | 8 | burrito | False
user1 | 2 | -208 | 200 | hotel room | False
Then they apply a batch update statement that sets the balance value to -200:
BEGIN BATCH
UPDATE purchases SET balance=-200 WHERE user='user1' IF balance=-208;
UPDATE purchases SET paid=true WHERE user='user1' AND expense_id=1 IF paid=false;
APPLY BATCH;
This updates the balance field for the whole user1 partition to -200:
user | expense_id | balance | amount | description | paid
-------+------------+---------+--------+-------------+-------
user1 | 1 | -200 | 8 | burrito | True
user1 | 2 | -200 | 200 | hotel room | False
The point of a static fields is that you can update/change its value for the whole partition at once. So if I would execute the following statement:
UPDATE purchases SET balance=42 WHERE user='user1'
I would get the following result:
user | expense_id | balance | amount | description | paid
-------+------------+---------+--------+-------------+-------
user1 | 1 | 42 | 8 | burrito | True
user1 | 2 | 42 | 200 | hotel room | False
Related
Hi all I have a cassandra Table containing Hash as Primary key and another column containing List. I want to add another column named Zipcode such that I can query cassandra based on either zipcode or zipcode and hash
Hash | List | zipcode
select * from table where zip_code = '12345';
select * from table where zip_code = '12345' && hash='abcd';
Is there any way that I could do this?
Recommendation in Cassandra is that you design your data tables based on your access patterns. For example in your case you would like to get results by zipcode and by zipcode and hash, so ideally you can have two tables like this
CREATE TABLE keyspace.table1 (
zipcode text,
field1 text,
field2 text,
PRIMARY KEY (zipcode));
and
CREATE TABLE keyspace.table2 (
hashcode text
zipcode text,
field1 text,
field2 text,
PRIMARY KEY ((hashcode,zipcode)));
Then you may be required to redesign your tables based on your data. I recommend you understand data model design in cassandra before proceeding further.
ALLOW FILTERING construct can be used but its usage depends on how big/small is your data. If you have a very large data then avoid using this construct as it will require complete scan of the database which is quite expensive in terms of resources and time.
It is possible to design a single table that will satisfy both app queries.
In this example schema, the table is partitioned by zip code with hash as the clustering key:
CREATE TABLE table_by_zipcode (
zipcode int,
hash text,
...
PRIMARY KEY(zipcode, hash)
)
With this design, each zip code can have one or more rows of hash. Here's the table with some test data in it:
zipcode | hash | intcol | textcol
---------+------+--------+---------
123 | abc | 1 | alice
123 | def | 2 | bob
123 | ghi | 3 | charli
456 | tuv | 5 | banana
456 | xyz | 4 | apple
The table contains two partitions zipcode = 123 and zipcode = 456. The first zip code has three rows (abc, def, ghi) and the second has two rows (tuv, xyz).
You can query the table using just the partition key (zipcode), for example:
cqlsh> SELECT * FROM table_by_zipcode WHERE zipcode = 123;
zipcode | hash | intcol | textcol
---------+------+--------+---------
123 | abc | 1 | alice
123 | def | 2 | bob
123 | ghi | 3 | charli
It is also possible to query the table with the partition key zipcode and clustering key hash, for example:
cqlsh> SELECT * FROM table_by_zipcode WHERE zipcode = 123 AND hash = 'abc';
zipcode | hash | intcol | textcol
---------+------+--------+---------
123 | abc | 1 | alice
Cheers!
I want to be able to combine two columns from a table into one column then to to be able to get the actual value of the foreign keys. I can do these things individually but not together.
Following the answer below I was able to combine the two columns into one using the first sql statement below.
How to combine 2 columns into a new one in sqlite
The combining process is shown below:
+---+---+
|HT | AT|
+---+---+
|1 | 2 |
|5 | 7 |
|9 | 5 |
+---+---+
into one column as shown:
+---+
|HT |
+---+
| 1 |
| 5 |
| 9 |
| 2 |
| 7 |
| 5 |
+---+
The second SQL statement show's the actual value of each foreign key corresponding to each foreign key id. The Foreign Key Table.
+-----+------------------------+
|T_id | TN |
+-----+------------------------+
| 1 | 'Dallas Cowboys |
| 2 | 'Chicago Bears' |
| 5 | 'New England Patriots' |
| 7 | 'New York Giants' |
| 9 | 'New York Jets' |
+-----+------------------------+
sql = "SELECT * FROM (SELECT M.HT FROM M UNION SELECT M.AT FROM Match)t"
The second sql statement lets me get the foreign key values for each value in M.HT.
sql = "SELECT M.HT, T.TN FROM M INNER JOIN T ON M.HT = T.Tid WHERE strftime('%Y-%m-%d', M.ST) BETWEEN \'2015-08-01\' AND \'2016-06-30\' AND M.Comp = 6 ORDER BY M.ST"
Result of second SQL statement:
+-----+------------------------+
| HT | TN |
+-----+------------------------+
| 1 | 'Dallas Cowboys |
| 5 | 'New England Patriots' |
| 9 | 'New York Jets' |
+-----+------------------------+
But try as I might I have not been able to combine these queries!
I believe the following will work (assuming that the tables are Match and T and baring the WHERE and ORDER BY clauses for brevity/ease) :-
SELECT DISTINCT(m.ht), t.tn
FROM
(SELECT Match.HT FROM Match UNION SELECT Match.AT FROM Match) AS m
JOIN T ON t.tid = m.ht
JOIN Match ON (m.ht = Match.ht OR m.ht = Match.at)
/* WHERE and ORDER BY clauses using Match as m only has columns ht and at */
WHERE strftime('%Y-%m-%d', Match.ST)
BETWEEN \'2015-08-01\' AND \'2016-06-30\' AND Match.Comp = 6
ORDER BY Match.ST
;
Note only tested without the WHERE and ORDER BY clause.
That is using :-
DROP TABLE IF EXISTS Match;
DROP TABLE IF EXISTS T;
CREATE TABLE IF NOT EXISTS Match (ht INTEGER, at INTEGER, st TEXT DEFAULT (datetime('now')));
CREATE TABLE IF NOT EXISTS t (tid INTEGER PRIMARY KEY, tn TEXT);
INSERT INTO T (tn) VALUES('Cows'),('Bears'),('a'),('b'),('Pats'),('c'),('Giants'),('d'),('Jets');
INSERT INTO Match (ht,at) VALUES (1,2),(5,7),(9,5);
/* Directly without the Common Table Expression */
SELECT
DISTINCT(m.ht), t.tn,
Match.st /*<<<<< Added to show results of obtaining other values from Matches >>>>> */
FROM
(SELECT Match.HT FROM Match UNION SELECT Match.AT FROM Match) AS m
JOIN T ON t.tid = m.ht
JOIN Match ON (m.ht = Match.ht OR m.ht = Match.at)
/* WHERE and ORDER BY clauses here using Match */
;
Noting that limited data (just the one extra column) was used for brevity
Results in :-
I'm selecting data from a Cassandra database using a query. It is working fine but how to get the data in same order as I have given IN query?
I have created table like this:
id | n | p | q
----+---+---+------
5 | 1 | 2 | 4
10 | 2 | 4 | 3
11 | 1 | 2 | null
I am trying to select data using
SELECT *
FROM malleshdmy
WHERE id IN ( 11,10,5)
But, It producing same data as like stored.
id | n | p | q
----+---+---+------
5 | 1 | 2 | 4
10 | 2 | 4 | 3
11 | 1 | 2 | null
Please help me in this issue.
I want data as 11,10 and 5
If the id is partition key, then it's impossible - data are sorted only inside the clustering columns, and data for different partition keys could be returned in arbitrary order (but sorted inside that partition).
You need to sort data yourself.
Since id is your partition key, your data is actually being sorted by the token of id, not the values themselves:
cqlsh:testid> SELECT id,n,p,q,token(id) FROM table;
id | n | p | q | system.token(id)
----+---+---+------+----------------------
5 | 1 | 2 | 4 | -7509452495886106294
10 | 2 | 4 | 3 | -6715243485458697746
11 | 1 | 2 | null | -4156302194539278891
Because of this, you don't have any control over how the partition key is sorted.
In order to sort your data by id, you need to make id a clustering column rather than a partition key. Your data will still need a partition key, however, and this will always be sorted by token.
If you decide to make id a clustering column, you will need to specify that you want a descending order in your order by statement
CREATE TABLE clusterTable (
... partition type, //partition key with a type to be specified
... id INT,
... n INT,
... p INT,
... q INT,
... PRIMARY KEY((partition),id))
... WITH CLUSTERING ORDER BY (id DESC);
This link is very helpful in discussing how ordering works in Cassandra: https://www.datastax.com/dev/blog/we-shall-have-order
According to this documentation, I was trying a select query with token() function in it, but it gives wrong results.
I am using below cassandra version
[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]
I was trying token query for below table -
CREATE TABLE price_key_test (
objectid int,
createdOn bigint,
price int,
foo text,
PRIMARY KEY ((objectid, createdOn), price));
Inserted data --
insert into nasa.price_key_test (objectid,createdOn,price,foo) values (1,1000,100,'x');
insert into nasa.price_key_test (objectid,createdOn,price,foo) values (1,2000,200,'x');
insert into nasa.price_key_test (objectid,createdOn,price,foo) values (1,3000,300,'x');
Data in table --
objectid | createdon | price | foo
----------+-----------+-------+-----
1 | 3000 | 300 | x
1 | 2000 | 200 | x
1 | 1000 | 100 | x
Select query is --
select * from nasa.price_key_test where token(objectid,createdOn) > token(1,1000) and token(objectid,createdOn) < token(1,3000)
This query suppose to return row with createdOn 2000, but it returns zero rows.
objectid | createdon | price | foo
----------+-----------+-------+-----
(0 rows)
According to my understanding, token(objectid,createdOn) > token(1,1000) and token(objectid,createdOn) < token(1,3000) should select row with partition key with value 1 and 2000.
Is my understanding correct?
Try flipping your greater/less-than signs around:
aploetz#cqlsh:stackoverflow> SELECT * FROM price_key_test
WHERE token(objectid,createdOn) < token(1,1000)
AND token(objectid,createdOn) > token(1,3000) ;
objectid | createdon | price | foo
----------+-----------+-------+-----
1 | 2000 | 200 | x
(1 rows)
Adding the token() function to your SELECT should help you to understand why:
aploetz#cqlsh:stackoverflow> SELECT objectid, createdon, token(objectid,createdon),
price, foo FROM price_key_test ;
objectid | createdon | system.token(objectid, createdon) | price | foo
----------+-----------+-----------------------------------+-------+-----
1 | 3000 | -8449493444802114536 | 300 | x
1 | 2000 | -2885017981309686341 | 200 | x
1 | 1000 | -1219246892563628877 | 100 | x
(3 rows)
The hashed token values generated are not necessarily proportional to their original numeric values. In your case, token(1,3000) generated a hash that was the smallest of the three, and not the largest.
I'm working on smart parking data stored in Cassandra database and i'm trying to get the last status of each device.
I'm working on self-made dataset.
here's the description of the table.
table description
select * from parking.meters
need help please !
trying to get the last status of each device
In Cassandra, you need to design your tables according to your query patterns. Building a table, filling it with data, and then trying to fulfill a query requirement is a very backward approach. The point, is that if you really need to satisfy that query, then your table should have been designed to serve that query from the beginning.
That being said, there may still be a way to make this work. You haven't mentioned which version of Cassandra you are using, but if you are on 3.6+, you can use the PER PARTITION LIMIT clause on your SELECT.
If I build your table structure and INSERT some of your rows:
aploetz#cqlsh:stackoverflow> SELECT * FROM meters ;
parking_id | device_id | date | status
------------+-----------+----------------------+--------
1 | 20 | 2017-01-12T12:14:58Z | False
1 | 20 | 2017-01-10T09:11:51Z | True
1 | 20 | 2017-01-01T13:51:50Z | False
1 | 7 | 2017-01-13T01:20:02Z | False
1 | 7 | 2016-12-02T16:50:04Z | True
1 | 7 | 2016-11-24T23:38:31Z | False
1 | 19 | 2016-12-14T11:36:26Z | True
1 | 19 | 2016-11-22T15:15:23Z | False
(8 rows)
And I consider your PRIMARY KEY and CLUSTERING ORDER definitions:
PRIMARY KEY ((parking_id, device_id), date, status)
) WITH CLUSTERING ORDER BY (date DESC, status ASC);
You are at least clustering by date (which should be an actual date type, not a text), so that will order your rows in a way that helps you here:
aploetz#cqlsh:stackoverflow> SELECT * FROM meters PER PARTITION LIMIT 1;
parking_id | device_id | date | status
------------+-----------+----------------------+--------
1 | 20 | 2017-01-12T12:14:58Z | False
1 | 7 | 2017-01-13T01:20:02Z | False
1 | 19 | 2016-12-14T11:36:26Z | True
(3 rows)