Timestamp with auto increment in Cassandra - cassandra

Want to write System.currentMiliseconds in the cassandta table for each column by cassandra. For example
writeToCassandra(name, email)
in cassandra table:
--------------------------------
name | email| currentMiliseconds
Can cassandra prepare currentMiliseconds column automatically like auto increment ?
BR!

Cassandra has some sort of columnar database taste inside. So if you read docs how the columns are stored inside SSTable, you'll notice that each column has a personal write timestamp appended (used for conflict resolution, like last-write-wins strategy). You can query for that timestamp using writetime() function:
cqlsh:so> create table ticks ( id text primary key, value int);
cqlsh:so> insert into ticks (id, value) values ('foo', 1);
cqlsh:so> insert into ticks (id, value) values ('bar', 2);
cqlsh:so> insert into ticks (id, value) values ('baz', 3);
cqlsh:so> select id, value from ticks;
id | value
-----+-------
bar | 2
foo | 1
baz | 3
(3 rows)
cqlsh:so> select id, writetime(value) from ticks;
id | writetime(value)
-----+------------------
bar | 1448282940862913
foo | 1448282937031542
baz | 1448282945591607
(3 rows)
As you requested, I've not explicitly inserted write timestamp to DB, but able to query it. Note you cannot use writetime() function for PK.

You can try with: dateof(now())
e.g.
INSERT INTO YOUR_TABLE (NAME, EMAIL, DATE)
VALUES ('NAME', 'EMAIL', dateof(now()));

Related

How to update and replace in Cassandra a UDT field value?

Does Cassandra support update of a UDT field value? something like replacing it with a new value?
I have user_fav_payment_method UDT and I need to replace cash with debit card:
update user_ratings set
user_fav_payment_method{'cash'} = {'debit cards'}
where rating_id = 66;
This code is wrong but I need to do something similar to this, how can i do it?
Per documentation:
In Cassandra 3.6 and later, user-defined types that include only non-collection fields can update individual field values. Update an individual field in user-defined type data using the UPDATE command. The desired key-value pair are defined in the command. In order to update, the UDT must be defined in the CREATE TABLE command as an unfrozen data type.
You can use . notation to update only individual fields of the non-frozen UDT, like this:
cqlsh> use test;
cqlsh:test> create type payment_method ( method text, data text);
cqlsh:test> create table users (id int primary key, pay_method payment_method);
cqlsh:test> insert into users (id, pay_method) values (1, {method: 'cash', data: 'usd'});
cqlsh:test> select * from users;
id | pay_method
----+-------------------------------
1 | {method: 'cash', data: 'usd'}
(1 rows)
cqlsh:test> update users set pay_method.method = 'card' where id = 1;
cqlsh:test> select * from users;
id | pay_method
----+-------------------------------
1 | {method: 'card', data: 'usd'}
(1 rows)

Cassandra altering table to add new columns adds null as text

I have a cassandra table with data in it.
I add three new columns country as text, lat and long as double.
When these columns are added null values are inserted against the already present rows in the table. However, null is inserted as text in country column and null as value in lat and long columns.
Is this something that is the default behavior and can I add null as value under the newly created text columns?
Cassandra uses null to show that value is missing, not that this is explicitly inserted. In your case, when you add new columns - they are just added to table's specification stored in Cassandra itself - existing data (stored in SSTables) is not modified, so when Cassandra reads old data it doesn't find values for that columns in SSTable, and output null instead.
But you can have the same behavior without adding new columns - just don't insert value for specific regular column (you must have non-null values for columns of primary key!). For example:
cqlsh> create table test.abc (id int primary key, t1 text, t2 text);
cqlsh> insert into test.abc (id, t1, t2) values (1, 't1-1', 't2-1');
cqlsh> insert into test.abc (id, t1) values (2, 't1-2');
cqlsh> insert into test.abc (id, t2) values (3, 't3-3');
cqlsh> SELECT * from test.abc;
id | t1 | t2
----+------+------
1 | t1-1 | t2-1
2 | t1-2 | null
3 | null | t3-3
(3 rows)

Cassandra migrate int to bigint

What would be the easiest way to migrate an int to a bigint in Cassandra? I thought of creating a new column of type bigint and then running a script to basically set the value of that column = the value of the int column for all rows, and then dropping the original column and renaming the new column. However, I'd like to know if someone has a better alternative, because this approach just doesn't sit quite right with me.
You could ALTER your table and change your int column to a varint type. Check the documentation about ALTER TABLE, and the data types compatibility matrix.
The only other alternative is what you said: add a new column and populate it row by row. Dropping the first column can be entirely optional: if you don't assign values when performing insert everything will stay as it is, and new records won't consume space.
You can ALTER your table to store bigint in cassandra with varint. See the example-
cassandra#cqlsh:demo> CREATE TABLE int_test (id int, name text, primary key(id));
cassandra#cqlsh:demo> SELECT * FROM int_test;
id | name
----+------
(0 rows)
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 215478936541111, 'abc');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
---------------------+---------
215478936541111 | abc
(1 rows)
cassandra#cqlsh:demo> ALTER TABLE demo.int_test ALTER id TYPE varint;
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999, 'abcd');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
------------------------------------------------------------------------------------------------------------------------------+---------
215478936541111 | abc
9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 | abcd
(2 rows)
cassandra#cqlsh:demo>

How to delete a record in Cassandra?

I have a table like this:
CREATE TABLE mytable (
user_id int,
device_id ascii,
record_time timestamp,
timestamp timeuuid,
info_1 text,
info_2 int,
PRIMARY KEY (user_id, device_id, record_time, timestamp)
);
When I ask Cassandra to delete a record (an entry in the columnfamily) like this:
DELETE from my_table where user_id = X and device_id = Y and record_time = Z and timestamp = XX;
it returns without an error, but when I query again the record is still there. Now if I try to delete a whole row like this:
DELETE from my_table where user_id = X
It works and removes the whole row, and querying again immediately doesn't return any more data from that row.
What I am doing wrong? How you can remove a record in Cassandra?
Thanks
Ok, here is my theory as to what is going on. You have to be careful with timestamps, because they will store data down to the millisecond. But, they will only display data to the second. Take this sample table for example:
aploetz#cqlsh:stackoverflow> SELECT id, datetime FROM data;
id | datetime
--------+--------------------------
B25881 | 2015-02-16 12:00:03-0600
B26354 | 2015-02-16 12:00:03-0600
(2 rows)
The datetimes (of type timestamp) are equal, right? Nope:
aploetz#cqlsh:stackoverflow> SELECT id, blobAsBigint(timestampAsBlob(datetime)),
datetime FROM data;
id | blobAsBigint(timestampAsBlob(datetime)) | datetime
--------+-----------------------------------------+--------------------------
B25881 | 1424109603000 | 2015-02-16 12:00:03-0600
B26354 | 1424109603234 | 2015-02-16 12:00:03-0600
(2 rows)
As you are finding out, this becomes problematic when you use timestamps as part of your PRIMARY KEY. It is possible that your timestamp is storing more precision than it is showing you. And thus, you will need to provide that hidden precision if you will be successful in deleting that single row.
Anyway, you have a couple of options here. One, find a way to ensure that you are not entering more precision than necessary into your record_time. Or, you could define record_time as a timeuuid.
Again, it's a theory. I could be totally wrong, but I have seen people do this a few times. Usually it happens when they insert timestamp data using dateof(now()) like this:
INSERT INTO table (key, time, data) VALUES (1,dateof(now()),'blah blah');
CREATE TABLE worker_login_table (
worker_id text,
logged_in_time timestamp,
PRIMARY KEY (worker_id, logged_in_time)
);
INSERT INTO worker_login_table (worker_id, logged_in_time)
VALUES ("worker_1",toTimestamp(now()));
after 1 hour executed the above insert statement once again
select * from worker_login_table;
worker_id| logged_in_time
----------+--------------------------
worker_1 | 2019-10-23 12:00:03+0000
worker_1 | 2015-10-23 13:00:03+0000
(2 rows)
Query the table to get absolute timestamp
select worker_id, blobAsBigint(timestampAsBlob(logged_in_time )), logged_in_time from worker_login_table;
worker_id | blobAsBigint(timestampAsBlob(logged_in_time)) | logged_in_time
--------+-----------------------------------------+--------------------------
worker_1 | 1524109603000 | 2019-10-23 12:00:03+0000
worker_1 | 1524209403234 | 2019-10-23 13:00:03+0000
(2 rows)
The below command will not delete the entry from Cassandra as the precise value of timestamp is required to delete the entry
DELETE from worker_login_table where worker_id='worker_1' and logged_in_time ='2019-10-23 12:00:03+0000';
By using the timestamp from blob we can delete the entry from Cassandra
DELETE from worker_login_table where worker_id='worker_1' and logged_in_time ='1524209403234';

Are there any performance penalties when using a TEXT as a Primary Key?

If yes, what would the data model look like if I want to have a unique TEXT field?
No. Regardless of data type used, Cassandra stores all data on disk (including primary key values) as hex byte arrays. In terms of performance, the datatype of the primary key really doesn't matter.
The only case where it would matter, is in token/node distribution. This is because the generated token for "12345" as text will be different from the token generated for 12345 as a bigint:
aploetz#cqlsh:stackoverflow> CREATE TABLE textaskey (key text PRIMARY KEY, value text);
aploetz#cqlsh:stackoverflow> CREATE TABLE longaskey (key bigint PRIMARY KEY, value text);
aploetz#cqlsh:stackoverflow> INSERT INTO textaskey (key, value) VALUES ('12345','12345');
aploetz#cqlsh:stackoverflow> INSERT INTO longaskey (key, value) VALUES (12345,'12345');
aploetz#cqlsh:stackoverflow> SELECT token(key),value FROM textaskey ;
token(key) | value
---------------------+-------
2375712675693977547 | 12345
(1 rows)
aploetz#cqlsh:stackoverflow> SELECT token(key),value FROM longaskey;
token(key) | value
---------------------+-------
3741197147323682197 | 12345
(1 rows)
But even in this example, one shouldn't perform faster/different than the other.

Resources