Delete Rows using timestamp column cassandra - cassandra

I want to delete data between timestamp from my table.
CREATE TABLE propatterns_test.test (
clientId text,
meterId text,
meterreading text,
date timestamp,
PRIMARY KEY (meterId, date) );
My delete query is:
DELETE FROM test WHERE meterid = 'M5' AND date > '2016-12-27 10:00:00+0000';
Which returned this error :
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Invalid operator < for PRIMARY KEY part date"
After that I tried to delete a specific row :
DELETE FROM test WHERE meterid = 'M5' AND date = '2016-12-27 09:42:30+0000';
Actually the table contains the same record, but it was not deleted.
This is what my data looks like:
meterid | date | clientid | meterreading
---------+--------------------------+----------+--------------
M5 | 2016-12-27 09:42:30+0000 | RDS | 35417.8
M5 | 2016-12-27 09:42:44+0000 | RDS | 35417.8
M5 | 2016-12-27 09:47:20+0000 | RDS | 35417.8
M5 | 2016-12-27 09:47:33+0000 | RDS | 35417.8
Nothing is deleting from table. So how can I delete data between timestamp dates which is part of the primary key?

I see a couple of things happening here. First of all, like iconnj mentioned, range deletes are not possible in versions prior to Cassandra 3.0.
Secondly, your single-row delete attempt is failing (I believe) due to the fact that you are not accounting for the milliseconds present on the timestamp. You can see this if you nest your date column inside the timestsampasblob and blobasbigint functions:
aploetz#cqlsh:stackoverflow> SELECT meterid,date,blobAsBigint(timestampAsBlob(date))
FROM propatterns WHERE meterid='M5';
meterid | date | system.blobasbigint(system.timestampasblob(date))
---------+--------------------------+---------------------------------------------------
M5 | 2016-12-27 09:42:30+0000 | 1482831750000
M5 | 2016-12-30 17:31:53+0000 | 1483119113231
M5 | 2016-12-30 17:32:08+0000 | 1483119128812
(3 rows)
Note the zeros on the end of the 2016-12-27 09:42:30+0000 row, that I explicitly INSERTed from your example. Note that the two rows I INSERTed using the dateof(now()) nested functions actually has the milliseconds as the last three digits on the timestamps.
Watch what happens when I take those three digits and add them as milliseconds when I delete one of the rows:
aploetz#cqlsh:stackoverflow> DELETE FROM propatterns WHERE meterid='M5'
AND date='2016-12-30 17:32:08.812+0000';
aploetz#cqlsh:stackoverflow> SELECT meterid,date,blobAsBigint(timestampAsBlob(date))
FROM propatterns WHERE meterid='M5';
meterid | date | system.blobasbigint(system.timestampasblob(date))
---------+--------------------------+---------------------------------------------------
M5 | 2016-12-27 09:42:30+0000 | 1482831750000
M5 | 2016-12-30 17:31:53+0000 | 1483119113231
(2 rows)
In summary:
You cannot perform range deletes prior to Cassandra 3.0.
You cannot delete individual rows keyed by timestamps without specifying milliseconds, if milliseconds are indeed present.

Delete with range clause is possible in C* 3.0 onwards. Looking at the error you got I think you are on a pre 3.0 version in which case you won't be able to do this via CQL

In Cassandra 3 you can use the "...from Y using timestamp XXX where ..." command:
create table mytime (
location_id text,
tour_id text,
mytime timestamp,
PRIMARY KEY (location_id, tour_id));
INSERT INTO mytime (location_id, tour_id, mytime) values ('location1', '1', toTimeStamp(now()));
INSERT INTO mytime (location_id, tour_id, mytime) values ('location1', '2', toTimeStamp(now()));
Be aware: the value you need to use for the timestamp is nanoseconds not miliseconds:
select location_id, mytime, blobAsBigint(mytime), WRITETIME(mytime) from mytime;
location_id |mytime |system.blobasbigint(mytime) |writetime(mytime) |
------------|------------------------|----------------------------|------------------|
location1 |2018-11-28-09.53.52.110 |1543395232110 |1543395232109517 |
location1 |2018-11-28-09.53.52.742 |1543395232742 |1543395232740055 |
So now you can do
delete from mytime using timestamp 1543395232109517 where location_id = 'location1';
Which correctly deletes the entry <= 1543395232109517:
select location_id, mytime, blobAsBigint(mytime), WRITETIME(mytime) from mytime;
location_id |mytime |system.blobasbigint(mytime) |writetime(mytime) |
------------|------------------------|----------------------------|------------------|
location1 |2018-11-28-09.53.52.742 |1543395232742 |1543395232740055 |

Related

Cassandra: get last with a non-null value in a column

I have a Cassandra table where each column can contain a value or a NULL. But if it contains a NULL, I know that all the next values in that column are also NULL.
Something like this:
+------------+---------+---------+---------+
| date | column1 | column2 | column3 |
+------------+---------+---------+---------+
| 2017-01-01 | 1 | 'a' | NULL |
| 2017-01-02 | 2 | 'b' | NULL |
| 2017-01-03 | 3 | NULL | NULL |
| 2017-01-04 | 4 | NULL | NULL |
| 2017-01-05 | NULL | NULL | NULL |
+------------+---------+---------+---------+
I need a query that, for a given column, returns the date of the last column with a non-null value. In this case:
For column1, '2017-01-04'
For column2, '2017-01-02'
For column3, no result returned.
In SQL it would be something like this:
SELECT date
FROM my_table
WHERE column1 IS NOT NULL
ORDER BY date DESC LIMIT 1
Is it possible in any way, or should I break the table into one table for each column to avoid the NULL situation at all?
tldr; Create a new table that tracks this separately.
This would only be possible if 'column 1' was part of the primary key, with secondary indexes or with a materialized view.
You don't want your primary key to have nulls. As an aside make sure you're writing 'UNSET' inplace of null to the rest of your table. This should be handled by the driver but some drivers are not terribly mature. Writing nulls is effectively a delete operation and will cause tombstones.
Secondary indexes come with performance problems as potentially they hit the entire cluster and don't scale very well beyond a certain point.
Materialized views are being deprecated, so probably avoid those.
You are likely better served by creating a separate table that tracks this exact functionality. This would mean multiple writes and multiple reads but would avoid large table scans and secondary indexes.
I'm going to assume your partition isn't by date and that you've got wide rows because it makes this simpler but this is what that would look like.
CREATE TABLE my_table (
partition bigint,
date text,
column1 bigint,
column2 text,
column3 text,
PRIMARY KEY(partition, date);
CREATE TABLE offset_tracker(
partition bigint,
date text,
PRIMARY KEY(partition);
Here you would do a select date FROM offset_tracker WHERE partition=x to get your 'largest date with values'.

Cassandra CLUSTERING ORDER BY is not working and showing in correct results

Hi I have created a table for storing data of like this
CREATE TABLE keyspace.test (
name text,
date text,
time double,
entry text,
details text,
PRIMARY KEY ((name, date), time)
) WITH CLUSTERING ORDER BY (time DESC);
And inserted data into the table.But a query like this gives an unordered result.
SELECT * FROM keyspace.test where device_id name ='anand' and date in ('2017-04-01','2017-04-02','2017-04-03','2017-04-05') ;
Is there any problem with my table design.
I think you are misunderstanding cassandra clustering key order. Cassandra Sort data with cluster key within a single partition.
That is for your case cassandra sort data with clustering key time within a single name and date.
Example : Let's insert some data
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 1, 'a');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 2, 'b');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 3, 'c');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-02', 0, 'nil');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-02', 4, 'd');
If we select data with your query :
SELECT * FROM test where name ='anand' and date in ('2017-04-01','2017-04-02','2017-04-03','2017-04-05') ;
Output :
name | date | time | details | entry
-------+------------+------+---------+-------
anand | 2017-04-01 | 3 | null | c
anand | 2017-04-01 | 2 | null | b
anand | 2017-04-01 | 1 | null | a
anand | 2017-04-02 | 4 | null | d
anand | 2017-04-02 | 0 | null | nil
You can see that time 3,2,1 are within a single partition anand:2017-04-01 are sorted in desc And time 4,0 are within single partition anand:2017-04-02 are sorted in desc. Cassandra will not take care of sorting between different partition.
Here is the doc :
In the table definition, a clustering column is a column that is part of the compound primary key definition, but not the first column, which is the position reserved for the partition key. Columns are clustered in multiple rows within a single partition. The clustering order is determined by the position of columns in the compound primary key definition.
Source : http://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html
By the way why is your data field is text type and time field is double type ?
You can use date field as date type and time as timestamp type.
The query that you are using is o.k. but it probably doesn't behave as you are expecting it to because coordinator will not sort the results based on partitions. I also run into this problem couple of times.
The solution to it is very simple, basically It's far better to execute the 4 separate queries that you need on the client and then merge the results there. In short IN operator puts a lot of pressure to the coordinator node in the cluster, there's a nice read on this subject:
https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

How to delete a record in Cassandra?

I have a table like this:
CREATE TABLE mytable (
user_id int,
device_id ascii,
record_time timestamp,
timestamp timeuuid,
info_1 text,
info_2 int,
PRIMARY KEY (user_id, device_id, record_time, timestamp)
);
When I ask Cassandra to delete a record (an entry in the columnfamily) like this:
DELETE from my_table where user_id = X and device_id = Y and record_time = Z and timestamp = XX;
it returns without an error, but when I query again the record is still there. Now if I try to delete a whole row like this:
DELETE from my_table where user_id = X
It works and removes the whole row, and querying again immediately doesn't return any more data from that row.
What I am doing wrong? How you can remove a record in Cassandra?
Thanks
Ok, here is my theory as to what is going on. You have to be careful with timestamps, because they will store data down to the millisecond. But, they will only display data to the second. Take this sample table for example:
aploetz#cqlsh:stackoverflow> SELECT id, datetime FROM data;
id | datetime
--------+--------------------------
B25881 | 2015-02-16 12:00:03-0600
B26354 | 2015-02-16 12:00:03-0600
(2 rows)
The datetimes (of type timestamp) are equal, right? Nope:
aploetz#cqlsh:stackoverflow> SELECT id, blobAsBigint(timestampAsBlob(datetime)),
datetime FROM data;
id | blobAsBigint(timestampAsBlob(datetime)) | datetime
--------+-----------------------------------------+--------------------------
B25881 | 1424109603000 | 2015-02-16 12:00:03-0600
B26354 | 1424109603234 | 2015-02-16 12:00:03-0600
(2 rows)
As you are finding out, this becomes problematic when you use timestamps as part of your PRIMARY KEY. It is possible that your timestamp is storing more precision than it is showing you. And thus, you will need to provide that hidden precision if you will be successful in deleting that single row.
Anyway, you have a couple of options here. One, find a way to ensure that you are not entering more precision than necessary into your record_time. Or, you could define record_time as a timeuuid.
Again, it's a theory. I could be totally wrong, but I have seen people do this a few times. Usually it happens when they insert timestamp data using dateof(now()) like this:
INSERT INTO table (key, time, data) VALUES (1,dateof(now()),'blah blah');
CREATE TABLE worker_login_table (
worker_id text,
logged_in_time timestamp,
PRIMARY KEY (worker_id, logged_in_time)
);
INSERT INTO worker_login_table (worker_id, logged_in_time)
VALUES ("worker_1",toTimestamp(now()));
after 1 hour executed the above insert statement once again
select * from worker_login_table;
worker_id| logged_in_time
----------+--------------------------
worker_1 | 2019-10-23 12:00:03+0000
worker_1 | 2015-10-23 13:00:03+0000
(2 rows)
Query the table to get absolute timestamp
select worker_id, blobAsBigint(timestampAsBlob(logged_in_time )), logged_in_time from worker_login_table;
worker_id | blobAsBigint(timestampAsBlob(logged_in_time)) | logged_in_time
--------+-----------------------------------------+--------------------------
worker_1 | 1524109603000 | 2019-10-23 12:00:03+0000
worker_1 | 1524209403234 | 2019-10-23 13:00:03+0000
(2 rows)
The below command will not delete the entry from Cassandra as the precise value of timestamp is required to delete the entry
DELETE from worker_login_table where worker_id='worker_1' and logged_in_time ='2019-10-23 12:00:03+0000';
By using the timestamp from blob we can delete the entry from Cassandra
DELETE from worker_login_table where worker_id='worker_1' and logged_in_time ='1524209403234';

Cassandra - Delete Time Series Rows in cqlsh

Running Cassandra 2.0.11 and I'm having difficulty deleting a row of time series data in CQLSH. Since I'm unable to use > or < in the WHERE clause of a DELETE statement, I'm assuming I need the exact time
Schema:
CREATE TABLE account_data_by_user (
user_id int,
time timestamp,
account_id int,
account_desc text,
...
PRIMARY KEY ((user_id), time, account_id)
Row in question:
user_id | time | account_id | account_desc |
--------+--------------------------+-----------------+------------------+-
1 | 2015-02-20 08:51:55-0600 | 1 | null |
Attempting:
DELETE
FROM account_data_by_user
WHERE user_id = 1 and time = '2015-02-20 08:51:55-0600' and account_id = 1
The above executes successfully, but the row is still there. I'm assuming the cqlsh output [time] is the problem.
I should note that I can delete a row like this through cqlengine.Model.delete, but I'm not sure what it's executing to accomplish the delete.
So after much google, I've discovered the blob conversion functions from this JIRA issue: https://issues.apache.org/jira/browse/CASSANDRA-5870
Query:
SELECT user_id, host_account_id, blobasbigint(timestampasblob(time))
FROM account_data_by_user where user_id
Returns:
user_id | account_id | blobasbigint(timestampasblob(time))
---------+-----------------+-------------------------------------
1 | 1 | 1424458973126
1 | 184531 | 1423738054142
DELETE
FROM account_data_by_user
WHERE user_id = 1 and time = 1424458973126 and host_account_id = 1;
This successfully removed the desired row.

Cassandra: can you add dynamic columns within existing column clustering?

I'm using Cassandra 1.2.12 with CQL 3, and am having trouble modeling my column family.
I currently store snapshots of customer data at particular times. Works great:
CREATE TABLE data (
cust_id varchar,
time timeuuid,
data_text text,
PRIMARY KEY (cust_id, time)
);
The cust_id is the partition key and time is the clustering id, so, as I understand it, I can think of each row in the table like:
| cust_id | timeuuid1 : data_text | timeuuid2 : data_text |
| CUST1 | data at this time | data at this time |
Now I'd like to store another group of metrics for each snapshot - but the name of each of these columns isn't fixed. So something like:
| cust_id | timeuuid1 : data_text | timeuuid1 : dynamicCol1 | timeuuid1 : dynamicCol2 | timeuuid1 : dynamicColN |
| CUST1 | data |{some value} |{some value} |{some value} |
I've achieved dynamic columns for timestamp by using a composite primary key, but I can't see how to achieve this within each cluster of columns, if you see what I mean.
If I add, say, "dynamicColumnName" to the existing composite key, I'll end up with customer data stored for each dynamic column, which is not what I want.
Is this possible, without using a Map column? Hope you can help, thanks!
I am not a CQL user... With the thrift API you dynamically add a column to a column family by inserting/updating a record with a value for a column with name X. The column X will start to exist right there and then for that record.
Have you tried an INSERT statement specifying a column that you have not explicitly defined? I would expect that to have the same effect (column is created).

Resources