Remove trailing zeroes of time datatype in CQL - cassandra

Trying to remove trailing zero of arrival_time, the column data type is Time
SELECT * FROM TABLE
And I got this:
station_name | arrival_time
--------------+--------------------
Wellington | 06:05:00.000000000
and I need the result to look like this:
station_name | arrival_time
--------------+--------------------
Wellington | 06:05:00
I'm new to CQL, Thanks in advance.

So you can't actually do that in Cassandra with the time type. You can however, do it with a timestamp.
cassdba#cqlsh:stackoverflow> CREATE TABLE arrival_time2 (station_name TEXT PRIMARY KEY,
arrival_time time, arrival_timestamp timestamp);
cassdba#cqlsh:stackoverflow> INSERT INTO arrival_time2 (station_name , arrival_time , arrival_timestamp)
VALUES ('Wellington','06:05:00','2018-03-22 06:05:00');
cassdba#cqlsh:stackoverflow> SELECT * FROM arrival_time2;
station_name | arrival_time | arrival_timestamp
--------------+--------------------+---------------------------------
Wellington | 06:05:00.000000000 | 2018-03-22 11:05:00.000000+0000
(1 rows)
Of course, this isn't what you want either, really. So next you need to set a time_format in the [ui] section of your ~/.cassandra/cqlshrc.
[ui]
time_format = %Y-%m-%d %H:%M:%S
Restart cqlsh, and this should work:
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cassdba#cqlsh> SELECT station_name,arrival_timestamp
FROm stackoverflow.arrival_time2 ;
station_name | arrival_timestamp
--------------+---------------------
Wellington | 2018-03-22 11:05:00
(1 rows)

select station_name, SUBSTRING( Convert(Varchar(20),arrival_time), 0, 9) As arrival_time
from [Table]
Used following table and data format
CREATE TABLE [dbo].[ArrivalStation](
[station_name] [varchar](500) NULL,
[arrival_time] [Time](7) NULL
) ON [PRIMARY]
INSERT [dbo].[ArrivalStation] ([station_name], [arrival_time]) VALUES (N'Wellington ', N'06:05:00.0000000')
INSERT [dbo].[ArrivalStation] ([station_name], [arrival_time]) VALUES (N'Singapore', N'12:35:29.1234567')

Related

How can I filter for a specific date on a CQL timestamp column?

I have a table defined as:
CREATE TABLE downtime(
asset_code text,
down_start timestamp,
down_end timestamp,
down_duration duration,
down_type text,
down_reason text,
PRIMARY KEY ((asset_code, down_start), down_end)
);
I'd like to get downtime on a particular day, such as:
SELECT * FROM downtime \
WHERE asset_code = 'CA-PU-03-LB' \
AND todate(down_start) = '2022-12-11';
I got a syntax error:
SyntaxException: line 1:66 no viable alternative at input '(' (...where asset_code = 'CA-PU-03-LB' and [todate](...)
If function is not allowed on a partition key in where clause, how can I get data with "down_start" of a particular day?
You don't need to use the TODATE() function to filter for a specific date. You can simply specify the date as '2022-12-11' when applying a filter on a CQL timestamp column.
But the difference is that you cannot use the equality operator (=) because the CQL timestamp data type is encoded as the number of milliseconds since Unix epoch (Jan 1, 1970 00:00 GMT) so you need to be precise when you're working with timestamps.
Let me illustrate using this example table:
CREATE TABLE tstamps (
id int,
tstamp timestamp,
colour text,
PRIMARY KEY (id, tstamp)
)
My table contains the following sample data:
cqlsh> SELECT * FROM tstamps ;
id | tstamp | colour
----+---------------------------------+--------
1 | 2022-12-05 11:25:01.000000+0000 | red
1 | 2022-12-06 02:45:04.564000+0000 | yellow
1 | 2022-12-06 11:06:48.119000+0000 | orange
1 | 2022-12-06 19:02:52.192000+0000 | green
1 | 2022-12-07 01:48:07.870000+0000 | blue
1 | 2022-12-07 03:13:27.313000+0000 | indigo
The cqlshi client formats the tstamp column into a human-readable date in UTC. But really, the tstamp values are stored as integers:
cqlsh> SELECT tstamp, TOUNIXTIMESTAMP(tstamp) FROM tstamps ;
tstamp | system.tounixtimestamp(tstamp)
---------------------------------+--------------------------------
2022-12-05 11:25:01.000000+0000 | 1670239501000
2022-12-06 02:45:04.564000+0000 | 1670294704564
2022-12-06 11:06:48.119000+0000 | 1670324808119
2022-12-06 19:02:52.192000+0000 | 1670353372192
2022-12-07 01:48:07.870000+0000 | 1670377687870
2022-12-07 03:13:27.313000+0000 | 1670382807313
To retrieve the rows for a specific date, you need to specify the range of timestamps which fall on a specific date. For example, the timestamps for 6 Dec 2022 UTC ranges from 1670284800000 (2022-12-06 00:00:00.000 UTC) to 1670371199999 (2022-12-06 23:59:59.999 UTC).
This means if we want to query for December 6, we need to filter using a range query:
SELECT * FROM tstamps \
WHERE id = 1 \
AND tstamp >= '2022-12-06' \
AND tstamp < '2022-12-07';
and we get:
id | tstamp | colour
----+---------------------------------+--------
1 | 2022-12-06 02:45:04.564000+0000 | yellow
1 | 2022-12-06 11:06:48.119000+0000 | orange
1 | 2022-12-06 19:02:52.192000+0000 | green
WARNING - In your case where the timestamp column is part of the partition key, performing a range query is dangerous because it results in a multi-partition query -- there are 86M possible values between 1670284800000 and 1670371199999. For this reason, timestamps are not a good choice for partition keys. Cheers!
👉 Please support the Apache Cassandra community by hovering over the cassandra tag above and click on Watch tag. 🙏 Thanks!

Cassandra: cannot restrict 2 columns using clustering key(version 2.1.9)

I have a schema pretty similar to this:-
create table x(id int, start_date timestamp, end_date timestamp,
primary key((id), start_date, end_date))
with clustering order by (start_date desc, end_date desc);
Now I am stuck with a problem where I have to query between start date and end date. something like this : -
select count(*) from x where id=2 and start_date > 'date' and end_date < 'date' ;
But it gives me an error similar to the following: -
InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column "end_date"
cannot be restricted (preceding column "start_date" is restricted
by a non-EQ relation)"
I am new to cassandra, any and all suggestions are welcomed even if it requires us to do a schema change. :)
You don't say which version of Cassandra you are running, but in 2.2 and later you can do multi-column slice restrictions on clustering columns. This can get close to what you want. The syntax in CQL is a little ugly, but basically you have to specify the starting range with all the clustering columns specified, like a compound key. It's important to think about the rows being sorted first by the first column, then within that sorted by the second column.
So assume we have this data:
SELECT * from x;
id | start_date | end_date
----+--------------------------+--------------------------
2 | 2015-09-01 09:16:47+0000 | 2015-11-01 09:16:47+0000
2 | 2015-08-01 09:16:47+0000 | 2015-10-01 09:16:47+0000
2 | 2015-07-01 09:16:47+0000 | 2015-09-01 09:16:47+0000
2 | 2015-06-01 09:16:47+0000 | 2015-10-01 09:16:47+0000
Now let's select based on both dates:
SELECT * from x where id=2
and (start_date,end_date) >= ('2015-07-01 09:16:47+0000','2015-07-01 09:16:47+0000')
and (start_date,end_date) <= ('2015-09-01 09:16:47+0000','2015-09-01 09:16:47+0000');
id | start_date | end_date
----+--------------------------+--------------------------
2 | 2015-08-01 09:16:47+0000 | 2015-10-01 09:16:47+0000
2 | 2015-07-01 09:16:47+0000 | 2015-09-01 09:16:47+0000
Now you'll notice that one of those end dates appears to be later than our restriction, but it isn't. Since things are sorted by start_date first, you'll get all the end dates with a matching start_date since they are in the range of the compound range. To get rid of rows like that you'll probably need to do a little filtering on the client side.
See more information here, under "Multi-column slice restrictions".

Paging Resultsets in Cassandra with compound primary keys - Missing out on rows

So, my original problem was using the token() function to page through a large data set in Cassandra 1.2.9, as explained and answered here: Paging large resultsets in Cassandra with CQL3 with varchar keys
The accepted answer got the select working with tokens and chunk size, but another problem manifested itself.
My table looks like this in cqlsh:
key | column1 | value
---------------+-----------------------+-------
85.166.4.140 | county_finnmark | 4
85.166.4.140 | county_id_20020 | 4
85.166.4.140 | municipality_alta | 2
85.166.4.140 | municipality_id_20441 | 2
93.89.124.241 | county_hedmark | 24
93.89.124.241 | county_id_20005 | 24
The primary key is a composite of key and column1. In CLI, the same data looks like this:
get ip['85.166.4.140'];
=> (counter=county_finnmark, value=4)
=> (counter=county_id_20020, value=4)
=> (counter=municipality_alta, value=2)
=> (counter=municipality_id_20441, value=2)
Returned 4 results.
The problem
When using cql with a limit of i.e. 100, the returned results may stop in the middle of a record, like this:
key | column1 | value
---------------+-----------------------+-------
85.166.4.140 | county_finnmark | 4
85.166.4.140 | county_id_20020 | 4
leaving these to "rows" (columns) out:
85.166.4.140 | municipality_alta | 2
85.166.4.140 | municipality_id_20441 | 2
Now, when I use the token() function for the next page like, these two rows are skipped:
select * from ip where token(key) > token('85.166.4.140') limit 10;
Result:
key | column1 | value
---------------+------------------------+-------
93.89.124.241 | county_hedmark | 24
93.89.124.241 | county_id_20005 | 24
95.169.53.204 | county_id_20006 | 2
95.169.53.204 | county_oppland | 2
So, no trace of the last two results from the previous IP address.
Question
How can I use token() for paging without skipping over cql rows? Something like:
select * from ip where token(key) > token(key:column1) limit 10;
Ok, so I used the info in this post to work out a solution:
http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive
(section "CQL3 pagination").
First, I execute this cql:
select * from ip limit 5000;
From the last row in the resultset, I get the key (i.e. '85.166.4.140') and the value from column1 (i.e. 'county_id_20020').
Then I create a prepared statement evaluating to
select * from ip where token(key) = token('85.166.4.140') and column1 > 'county_id_20020' ALLOW FILTERING;
(I'm guessing it would work also without using the token() function, as the check is now for equal:)
select * from ip where key = '85.166.4.140' and column1 > 'county_id_20020' ALLOW FILTERING;
The resultset now contains the remaining X rows (columns) for this IP. The method then returns all the rows, and the next call to the method includes the last used key ('85.166.4.140'). With this key, I can execute the following select:
select * from ip where token(key) > token('85.166.4.140') limit 5000;
which gives me the next 5000 rows from (and including) the first IP after '85.166.4.140'.
Now, no columns are lost in the paging.
UPDATE
Cassandra 2.0 introduced automatic paging, handled by the client.
More info here: http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
(note that setFetchSize is optional and not necessary for paging to work)

Cassandra Delete by Secondary Index or By Allowing Filtering

I’m trying to delete by a secondary index or column key in a table. I'm not concerned with performance as this will be an unusual query. Not sure if it’s possible? E.g.:
CREATE TABLE user_range (
id int,
name text,
end int,
start int,
PRIMARY KEY (id, name)
)
cqlsh> select * from dat.user_range where id=774516966;
id | name | end | start
-----------+-----------+-----+-------
774516966 | 0 - 499 | 499 | 0
774516966 | 500 - 999 | 999 | 500
I can:
cqlsh> select * from dat.user_range where name='1000 - 1999' allow filtering;
id | name | end | start
-------------+-------------+------+-------
-285617516 | 1000 - 1999 | 1999 | 1000
-175835205 | 1000 - 1999 | 1999 | 1000
-1314399347 | 1000 - 1999 | 1999 | 1000
-1618174196 | 1000 - 1999 | 1999 | 1000
Blah blah…
But I can’t delete:
cqlsh> delete from dat.user_range where name='1000 - 1999' allow filtering;
Bad Request: line 1:52 missing EOF at 'allow'
cqlsh> delete from dat.user_range where name='1000 - 1999';
Bad Request: Missing mandatory PRIMARY KEY part id
Even if I create an index:
cqlsh> create index on dat.user_range (start);
cqlsh> delete from dat.user_range where start=1000;
Bad Request: Non PRIMARY KEY start found in where clause
Is it possible to delete without first knowing the primary key?
No, deleting by using a secondary index is not supported: CASSANDRA-5527
When you have your secondary index you can select all rows from that index. When you have your rows you know the primary key and can then delete the rows.
I came here looking for a solution to delete rows from cassandra column family.
I ended up doing an INSERT and set a TTL (time to live) so that I don't have to worry about deleting it.
Putting it out there, might help someone.

Is there a way to make clustering order by data type and not string in Cassandra?

I created a table in CQL3 in the cqlsh using the following CQL:
CREATE TABLE test (
locationid int,
pulseid int,
name text, PRIMARY KEY(locationid, pulseid)
) WITH CLUSTERING ORDER BY (locationid ASC, pulseid DESC);
Note that locationid is an integer.
However, after I inserted data, and ran a select, I noticed that locationid's ascending sort seems to be based upon string, and not integer.
cqlsh:citypulse> select * from test;
locationid | pulseid | name
------------+---------+------
0 | 3 | test
0 | 2 | test
0 | 1 | test
0 | 0 | test
10 | 3 | test
5 | 3 | test
Note the 0 10 5. Is there a way to make it sort via its actual data type?
Thanks,
Allison
In Cassandra, the first part of the primary key is the 'partition key'. That key is used to distribute data around the cluster. It does this in a random fashion to achieve an even distribution. This means that you can not order by the first part of your primary key.
What version of Cassandra are you on? In the most recent version of 1.2 (1.2.2), the create statement you have used an example is invalid.

Resources