I'm trying to model some time series data in Cassandra which I had been able to do with the older thrift client but CQL seems to be throwing me off.
I want to add a NEW column to my row IF a specific column value matches.
My table definition is:
CREATE TABLE TestTable (
key int,
base uuid,
ts int, // Timestamp (column name)
val text, // Timestamp value (column value)
PRIMARY KEY (key, ts)
) WITH CLUSTERING ORDER BY (ts DESC);
What I'm guessing it'd look like is:
Row | UUID | TS | TS | TS
--- | ---- | --- | ---| ---
1 | id1 | 1 | 2 | 3
--- | --- | --- | ---| ---
2 | id2 | 1 | 5 | 6
So essentially, I can have a bunch of Timestamps for a given row and a SINGLE UUID for a row.
The UUID needs to be updated for each new insert of a TS column.
So inserts in a row work just fine:
insert into TestTable(key, base, ts, val) values (1, dfb63886-91a4-11e6-ae22-56b6b6499611, 50, 'one')
But I'm failing to figure out a way, using CQL, to INSERT a new column in a row using Cassandra transactions (CAS).
This one fails:
insert into TestTable(key, base, ts, val) values (1, dfb63886-91a4-11e6-ae22-56b6b6499611, 70, 'four') if base = dfb63886-91a4-11e6-ae22-56b6b6499611;
with the error:
SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:106 mismatched input 'base' expecting K_NOT (..., 70, 'four') if [base] =...)">
And the query:
update TestTable set val = 'four', ts=70 where key = 1 if base = dfb63886-91a4-11e6-ae22-56b6b6499611;
fails with the error:
InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY part ts found in SET part"
I'm trying to figure out how to model the data properly so that I only have one UUID per row and can have multiple columns without having to explicitly define them during table creation, since it can vary quite a bit.
IIRC, it was easy doing this with the thrift client but using that isn't an option =/
There is a nice tutorial regarding data series here
In a nutshell, your composite key will be your unique identifier (like the UUID that you were proposing) and a timestamp, so you will be able to add as many events/values associated to a UUID
CREATE TABLE IF NOT EXISTS TestTable (
base uuid,
ts timestamp, // Timestamp (column name)
value text, // Timestamp value (column value)
PRIMARY KEY (base, ts)
) WITH CLUSTERING ORDER BY (ts DESC);
Adding values will have the same UUID with different times:
INSERT INTO TestTable (base, ts, value)
VALUES (467286c5-7d13-40c2-92d0-73434ee8970c, dateof(now()), 'abc');
INSERT INTO TestTable (base, ts, value)
VALUES (467286c5-7d13-40c2-92d0-73434ee8970c, dateof(now()), 'def');
cqlsh:test> SELECT * FROM TestTable WHERE base = 467286c5-7d13-40c2-92d0-73434ee8970c;
base | ts | value
--------------------------------------+---------------------------------+-------
467286c5-7d13-40c2-92d0-73434ee8970c | 2016-10-14 04:13:42.779000+0000 | def
467286c5-7d13-40c2-92d0-73434ee8970c | 2016-10-14 04:12:50.551000+0000 | abc
(2 rows)
Updating can be done in any of the columns, except the ones used as keys, the errors displayed in the update statement was caused by the "IF" statement and because it was tried to update ts which is part of the composite key.
INSERT INTO TestTable (base, ts, value)
VALUES (ffb0bb8e-3d67-4203-8c53-046a21992e52, dateof(now()), 'bananas');
SELECT * FROM TestTable WHERE base = ffb0bb8e-3d67-4203-8c53-046a21992e52 AND ts < dateof(now());
base | ts | value
--------------------------------------+---------------------------------+---------
ffb0bb8e-3d67-4203-8c53-046a21992e52 | 2016-10-14 04:17:26.421000+0000 | apples
(1 rows)
UPDATE TestTable SET value = 'apples' WHERE base = ffb0bb8e-3d67-4203-8c53-046a21992e52;
SELECT * FROM TestTable WHERE base = ffb0bb8e-3d67-4203-8c53-046a21992e52 AND ts < dateof(now());
base | ts | value
--------------------------------------+---------------------------------+---------
ffb0bb8e-3d67-4203-8c53-046a21992e52 | 2016-10-14 04:17:26.421000+0000 | bananas
(1 rows)
Related
I have a table that looks like this:
(id, a, b, mapValue)
I want to update if exist or insert if not VALUES(id,a,b,mapValue). Where mapValue is the combination of the old mapValue with the new one, replacing the values of each key that was already there.
For example if the old mapValue was {1:c, 2:d} and the new one is {2:e, 3:f} the result would be {1:c, 2:e, 3:f}.
I want to do this in a query that also updates/inserts id,a,b in VALUES(id,a,b,mapValue).
How can I achieve this?
I've found this guide about updating maps but it doesn't say anything about updating them while dealing with other values in the table. I need to do this at the same time.
In Cassandra, there is no difference between INSERT & UPDATE - everything is UPSERT, so when you do UPDATE and data doesn't exist, it's inserted. Or if you do INSERT and data already exist, it will be updated.
Regarding map update, you can use + and - operations on the corresponding column when doing UPDATE. For example, I have a table:
CREATE TABLE test.m1 (
id int PRIMARY KEY,
i int,
m map<int, text>
);
and I can do following to update existing row:
cqlsh:test> insert into test.m1 (id, i, m) values (1, 1, {1:'t1'});
cqlsh:test> select * from test.m1;
id | i | m
----+---+-----------
1 | 1 | {1: 't1'}
(1 rows)
cqlsh:test> update test.m1 set m = m + {2:'t2'}, i = 4 where id = 1;
cqlsh:test> select * from test.m1;
id | i | m
----+---+--------------------
1 | 4 | {1: 't1', 2: 't2'}
(1 rows)
and I can use the similar UPDATE command to insert completely new data:
cqlsh:test> update test.m1 set m = m + {6:'t6'}, i = 6 where id = 6;
cqlsh:test> select * from test.m1;
id | i | m
----+---+--------------------
1 | 4 | {1: 't1', 2: 't2'}
6 | 6 | {6: 't6'}
(2 rows)
Usually, if you know that no data existed before for given primary key, then UPDATE with + is better way to insert data into set or map because it doesn't generate a tombstone that is generated when you do INSERT or UPDATE without + on the collection column.
P.S. You can find more information on using collections in the following document.
I have a cassandra table with data in it.
I add three new columns country as text, lat and long as double.
When these columns are added null values are inserted against the already present rows in the table. However, null is inserted as text in country column and null as value in lat and long columns.
Is this something that is the default behavior and can I add null as value under the newly created text columns?
Cassandra uses null to show that value is missing, not that this is explicitly inserted. In your case, when you add new columns - they are just added to table's specification stored in Cassandra itself - existing data (stored in SSTables) is not modified, so when Cassandra reads old data it doesn't find values for that columns in SSTable, and output null instead.
But you can have the same behavior without adding new columns - just don't insert value for specific regular column (you must have non-null values for columns of primary key!). For example:
cqlsh> create table test.abc (id int primary key, t1 text, t2 text);
cqlsh> insert into test.abc (id, t1, t2) values (1, 't1-1', 't2-1');
cqlsh> insert into test.abc (id, t1) values (2, 't1-2');
cqlsh> insert into test.abc (id, t2) values (3, 't3-3');
cqlsh> SELECT * from test.abc;
id | t1 | t2
----+------+------
1 | t1-1 | t2-1
2 | t1-2 | null
3 | null | t3-3
(3 rows)
Hi I have created a table for storing data of like this
CREATE TABLE keyspace.test (
name text,
date text,
time double,
entry text,
details text,
PRIMARY KEY ((name, date), time)
) WITH CLUSTERING ORDER BY (time DESC);
And inserted data into the table.But a query like this gives an unordered result.
SELECT * FROM keyspace.test where device_id name ='anand' and date in ('2017-04-01','2017-04-02','2017-04-03','2017-04-05') ;
Is there any problem with my table design.
I think you are misunderstanding cassandra clustering key order. Cassandra Sort data with cluster key within a single partition.
That is for your case cassandra sort data with clustering key time within a single name and date.
Example : Let's insert some data
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 1, 'a');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 2, 'b');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 3, 'c');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-02', 0, 'nil');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-02', 4, 'd');
If we select data with your query :
SELECT * FROM test where name ='anand' and date in ('2017-04-01','2017-04-02','2017-04-03','2017-04-05') ;
Output :
name | date | time | details | entry
-------+------------+------+---------+-------
anand | 2017-04-01 | 3 | null | c
anand | 2017-04-01 | 2 | null | b
anand | 2017-04-01 | 1 | null | a
anand | 2017-04-02 | 4 | null | d
anand | 2017-04-02 | 0 | null | nil
You can see that time 3,2,1 are within a single partition anand:2017-04-01 are sorted in desc And time 4,0 are within single partition anand:2017-04-02 are sorted in desc. Cassandra will not take care of sorting between different partition.
Here is the doc :
In the table definition, a clustering column is a column that is part of the compound primary key definition, but not the first column, which is the position reserved for the partition key. Columns are clustered in multiple rows within a single partition. The clustering order is determined by the position of columns in the compound primary key definition.
Source : http://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html
By the way why is your data field is text type and time field is double type ?
You can use date field as date type and time as timestamp type.
The query that you are using is o.k. but it probably doesn't behave as you are expecting it to because coordinator will not sort the results based on partitions. I also run into this problem couple of times.
The solution to it is very simple, basically It's far better to execute the 4 separate queries that you need on the client and then merge the results there. In short IN operator puts a lot of pressure to the coordinator node in the cluster, there's a nice read on this subject:
https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/
Want to write System.currentMiliseconds in the cassandta table for each column by cassandra. For example
writeToCassandra(name, email)
in cassandra table:
--------------------------------
name | email| currentMiliseconds
Can cassandra prepare currentMiliseconds column automatically like auto increment ?
BR!
Cassandra has some sort of columnar database taste inside. So if you read docs how the columns are stored inside SSTable, you'll notice that each column has a personal write timestamp appended (used for conflict resolution, like last-write-wins strategy). You can query for that timestamp using writetime() function:
cqlsh:so> create table ticks ( id text primary key, value int);
cqlsh:so> insert into ticks (id, value) values ('foo', 1);
cqlsh:so> insert into ticks (id, value) values ('bar', 2);
cqlsh:so> insert into ticks (id, value) values ('baz', 3);
cqlsh:so> select id, value from ticks;
id | value
-----+-------
bar | 2
foo | 1
baz | 3
(3 rows)
cqlsh:so> select id, writetime(value) from ticks;
id | writetime(value)
-----+------------------
bar | 1448282940862913
foo | 1448282937031542
baz | 1448282945591607
(3 rows)
As you requested, I've not explicitly inserted write timestamp to DB, but able to query it. Note you cannot use writetime() function for PK.
You can try with: dateof(now())
e.g.
INSERT INTO YOUR_TABLE (NAME, EMAIL, DATE)
VALUES ('NAME', 'EMAIL', dateof(now()));
I have created a KEYSPACE and a TABLE with a uuid column as primary key and a timestamp column using an index. All this succeeded like the following picture showed:
cassandra#cqlsh:my_keyspace> insert into my_test ( id, insert_time, value ) values ( uuid(), '2015-03-12 09:10:30', '111' );
cassandra#cqlsh:my_keyspace> insert into my_test ( id, insert_time, value ) values ( uuid(), '2015-03-12 09:20:30', '222' );
cassandra#cqlsh:my_keyspace> select * from my_test;
id | insert_time | value
--------------------------------------+--------------------------+-------
9d7f88bc-5cb9-463f-b679-fd66e6469eb5 | 2015-03-12 09:20:30+0000 | 222
69579f6f-bf88-493b-a1d6-2f89fac25650 | 2015-03-12 09:10:30+0000 | 111
(2 rows)
and now query
cassandra#cqlsh:my_keyspace> select * from my_test where insert_time = '2015-03-12 09:20:30';
id | insert_time | value
--------------------------------------+--------------------------+-------
9d7f88bc-5cb9-463f-b679-fd66e6469eb5 | 2015-03-12 09:20:30+0000 | 222
(1 rows)
and now query with less than:
cassandra#cqlsh:my_keyspace> select * from my_test where insert_time < '2015-03-12 09:20:30';
InvalidRequest: code=2200 [Invalid query] message="No secondary indexes on the restricted columns support the provided operators: 'insert_time < <value>'"
while the first query is successful, why this happened? How should I make the second query successful since that's just what I want?
You can test all this on your own machine. Thanks
CREATE TABLE my_test (
id uuid PRIMARY KEY,
insert_time timestamp,
value text
) ;
CREATE INDEX my_test_insert_time_idx ON my_keyspace.my_test (insert_time);
Cassandra range queries are quite limited. It goes down to performance, and data storage mechanics. A range query must have the following:
Hit a (or few with IN) partition key, and include exact matches on all consecutive clustering keys except the last one in the query, which you can do a range query on.
Say your PK is (a, b, c, d), then the following are allowed:
where a=a1 and b < b1
where a=a1 and b=b1 and c < c1
The following is not:
where a=a1 and c < 1
[I won't go into Allow Filtering here...avoid it.]
Secondary indexes must be exact matches. You can't have range queries on them.