Is there a way to do flexiblesearch query for fetching one-day old records?
Something like:
select * from {table} where {conditions}
where {conditions} are old-day old records?
Every itemtype has an attribute called creationtime which stores the timestamp when a record of this itemtype is created in the database. Similarly, there is another attribute called modifiedtime which stores the timestamp when the record is modified in the database. You can use one of these attributes as per your requirement e.g.
The query to find the products which are older than 1 day:
SELECT {pk} FROM {Product} WHERE {creationtime} < DATE_SUB(NOW(), INTERVAL 1 DAY)
The query to find the products which are exactly 1 day old:
SELECT {pk} FROM {Product} WHERE {creationtime} = DATE_SUB(NOW(), INTERVAL 1 DAY)
Check hybris ▸ bin ▸ platform ▸ ext ▸ core ▸ resources ▸ core-items.xml to know more about all the attributes of the itemtype code="Item" which is the supertype of all the itemtypes and hence every itemtype inherits all of its attributes by default.
Something like this should work:
SELECT * FROM {Product} WHERE {creationTime} < NOW() - INTERVAL 1 DAY
Instead of creationTime, you could also use modifiedTime, depending on what you want.
Instead of relying on the DB function(DATE_SUB, INTERVAL ), I would rather calculate the date in java and pass the Date object to Flexible query like this
final FlexibleSearchQuery flexibleSearchQuery = new FlexibleSearchQuery("
SELECT * FROM TABLE AS t WHERE t.creationtime < ?inputDate")
flexibleSearchQuery.addQueryParameter("inputDate",
ZonedDateTime.now(ZoneId.systemDefault()).toInstant().minus(1, ChronoUnit.DAYS));
There condition could be on 2 columns:
{modifiedtime}
{creationtime}
FlexQueries:-
SELECT * FROM {product} WHERE {creationtime} < current_date - interval 1 day
SELECT * FROM {product} WHERE {modifiedtime} < current_date - interval 1 day
for more than 1 day:
SELECT * FROM {product} WHERE {modifiedtime} < current_date - interval 10 day
Related
I have a cassandra column which is of type date and has values in timestamp format like below. How can we filter rows based on this column which have date greater than today's date?
Example:
Type: date
Timestamp: 2021-06-29 11:53:52 +00:00
TTL: null
Value: 2021-03-16T00:00:00.000+0000
I was able to filter rows using columname <= '2021-09-25' which gives ten rows some of them having dates on sep 23 and 24. When i filter using columname < '2021-09-24', i get an error like below
An error occurred on line 1 (use Ctrl-L to toggle line numbers):
Cassandra failure during read query at consistency ONE (1 responses were required but only 0 replica responded, 1 failed)
The CQL timestamp data type is encoded as the number of milliseconds since Unix epoch (Jan 1, 1970 00:00 GMT) so you need to be precise when you're working with timestamps.
Depending on where you're running the query, the filter could be translated in the local timezone. Let me illustrate with this example table:
CREATE TABLE community.tstamptbl (
id int,
tstamp timestamp,
PRIMARY KEY (id, tstamp)
)
These 2 statements may appear similar but translate to 2 different entries:
INSERT INTO tstamptbl (id, tstamp) VALUES (5, '2021-08-09');
INSERT INTO tstamptbl (id, tstamp) VALUES (5, '2021-08-09 +0000');
The first statement creates an entry with a timestamp in my local timezone (Melbourne, Australia) while the second statement creates an entry with a timestamp in UTC (+0000):
cqlsh:community> SELECT * FROM tstamptbl WHERE id = 5;
id | tstamp
----+---------------------------------
5 | 2021-08-08 14:00:00.000000+0000
5 | 2021-08-09 00:00:00.000000+0000
Similarly, you need to be precise when reading the data. You need to specify the timezone to remove ambiguity. Here are some examples:
SELECT * FROM tstamptbl WHERE id = 5 AND tstamp < '2021-08-09 +0000';
SELECT * FROM tstamptbl WHERE id = 1 AND tstamp < '2021-08-10 12:00+0000';
SELECT * FROM tstamptbl WHERE id = 1 AND tstamp < '2021-08-10 12:34:56+0000';
In the second part of your question, the error isn't directly related to your filter. The problem is that the replica(s) failed to respond for whatever reason (e.g. unresponsive/overloaded, down, etc). You need to investigate that issue separately. Cheers!
I have a table with the following (with other fields removed)
CREATE TABLE if NOT EXISTS request_audit (
user_id text,
request_body text,
lookup_timestamp TIMESTAMP
PRIMARY KEY ((user_id), lookup_timestamp)
) WITH CLUSTERING ORDER BY ( lookup_timestamp DESC);
I create a record with the following
INSERT INTO request_audit (user_id, lookup_timestamp, request_body) VALUES (?, ?, toTimestamp(now()))
I am trying to retrieve all rows within the last 24 hours, but I am having trouble with the timestamp,
I have tried
SELECT * from request_audit WHERE user_id = '1234' AND lookup_timestamp > toTimestamp(now() - "1 day" )
and various other ways of trying to take a day away from the query.
Cassandra has a very limited date operation support. What you need is a custom function to do date math calculation.
Inspired from here.
How to get Last 6 Month data comparing with timestamp column using cassandra query?
you can write a UDF (user defined function) to date operation.
CREATE FUNCTION dateAdd(date timestamp, day int)
CALLED ON NULL INPUT
RETURNS timestamp
LANGUAGE java
AS
$$java.util.Calendar c = java.util.Calendar.getInstance();
c.setTime(date);
c.add(java.util.Calendar.DAY_OF_MONTH, day);
return c.getTime();$$ ;
remember that you would have to enable UDF in config. Cassandra.yml. Hope that is possible.
enable_user_defined_functions: true
once done this query works perfectly.
SELECT * from request_audit WHERE user_id = '1234' AND lookup_timestamp > dateAdd(dateof(now()), -1)
You couldn't do it directly from CQL, as it doesn't support this kind of expressions. If you're running this query from cqlsh, then you can try to substitute the desired date with something like this:
date --date='-1 day' '+%F %T%z'
and execute this query.
If you're invoking this from your program, just use corresponding date/time library to get date corresponding -1 day, but this depends on the language that you're using.
I have the requirement to forward data at certain intervals from my system to an external system. To do this, I already stored all rows in a table. Already forwarded data should not be exported again.
The idea is to memorize the last export time on client side and export the following records the next time. Old rows are deleted after a successful export.
CREATE TABLE export(
id int,
import_date_time timestamp,
data text,
PRIMARY KEY (id, import_date_time)
) WITH CLUSTERING ORDER BY (import_date_time DESC)
insert into export(id, import_date_time, data) values (1, toUnixTimestamp(now()), 'content')
select * from export where id = 1 and import_date_time > '2017-03-30 16:22:37'
delete from export where id = 1 and import_date_time <= '2017-03-30 16:22:37'
Has anyone already implemented similar or do you have a different
solution?
If possible, I do not need an id for the request because I want to
export all data
If you used fixed partition key value (id = 1), then all the insert, select and delete will happen on a same node (If RF=1) over and over. And also for every delete cassandra create a tombstone entry, when you execute select query cassandra needs to merge each entry. So your select query performance will degrade.
So instead of having fixed value, use dynamic value like the below one :
CREATE TABLE export(
hour int,
day int,
month int,
year int,
import_date_time timestamp,
data text,
PRIMARY KEY ((hour, day, month, year), import_date_time)
) WITH CLUSTERING ORDER BY (import_date_time DESC);
Here you can insert the value of hour, day, month, year extracted from import_date_time
You need to take care of two case When selecting data :
Previous export time and current export time both at same hour.
Both time are not inside same hour.
For case one you need only one query and for case two you have to execute two query.
Example Query :
SELECT * FROM export WHERE hour = 16 AND day = 30 AND month = 3 AND year = 2017 AND import_date_time > '2017-03-30 16:22:37';
I have a table called Price in MYSQL which looks like this :
+---------+-------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+-------------------+-----------------------------+
| Current | float(20,3) | YES | | NULL | |
| Time | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+---------+-------------+------+-----+-------------------+-----------------------------+
My application requires me to sum and retrieve results from the last 1 hour, 2 hours up to the last week from now. I am trying to move to Cassandra and wanted to make a suitable model for my data. Currently i have built a table in Cassandra which looks something like this :
CREATE TABLE IF NOT EXISTS HAS.Price (
ID INT,
Current float,
Time timestamp,
Time_uuid timeuuid,
PRIMARY KEY (ID, Time_uuid)
);
This is not logical as it just creates one big table and i dont think this will distribute data to other nodes. I am using a fixed id of 1 here. I believe in my case the logical partition key to choose would be "hour" so for example i can sum all the current values from last hour, last 2 hours and so on. In this case i am referring to this post . If i create hour as a partition key for example all the data for lets say the 15th hour of the day will go in this row
2015-08-06 15:00:00
and the data for the next hour will go to 2015-08-06 16:00:00. However lets say the current time is 2015-08-06 16:12:43 and i want to select records from last hour how will my query look like because part of the data is in 2015-08-06 15:00:00 which will have a different primary key
Try the following option. ( I have correct the answer)
Design for your queries. Here, possible queries I could see other than upto minute
Get sum for day
Get sum for hour
Get sum for last hour (any time on the hour)
CREATE TABLE mykeyspace.price (
day text,
hour text,
inserttime timeuuid,
current float,
PRIMARY KEY ((day, hour), inserttime)
) WITH CLUSTERING ORDER BY (inserttime DESC)
Make 2 insert for every transaction like below
insert into price (day, hour , inserttime , current ) VALUES ('20150813','',now(),2.00)
insert into price (day, hour , inserttime , current ) VALUES ('',’ 2015081317',now(),2.00)
Where
day is YYYYMMDD
hour YYYYMMDDhhmmss (2015081317)
Select Query to get last hour at any minute: Use minTimeuuid and maxTimeuuid
select day,hour,dateOf(inserttime) from price where day = 0 and hour IN ( 2015081317, 2015081316) and inserttime > maxTimeuuid('2015-08-13 16:20:00-0500') and inserttime < minTimeuuid('2015-08-13 17:20:00-0500');
Note: Range query is not allowed on a partition key, although documentation says you could use token function but the results are not predictable.
This is not logical as it just creates one big table and i dont think this will distribute data to other nodes.
Yes, this won't distribute data across you nodes.
Here what I think solution should be
CREATE TABLE IF NOT EXISTS HAS.Price (
Time_uuid timeuuid,
Current float,
PRIMARY KEY (Time_uuid)
);
Then simply find start hour time_uuid and end hour time_uuid and write query like
`SELECT * FROM HAS.Price WHERE time_uuid>=cdb36860-4444-11e5-8080-808080808080 AND time_uuid<=f784b8ef-450d-11e5-7f7f-7f7f7f7f7f7f`
CQL Execution [returns instantly, assuming uses clustering key index]:
cqlsh:stats> select count(*) from events where month='2015-04' and day = '2015-04-02';
count
-------
5447
Presto Execution [takes around 8secs]:
presto:default> select count(*) as c from cassandra.stats.events where month = '2015-04' and day = timestamp '2015-04-02';
c
------
5447
(1 row)
Query 20150228_171912_00102_cxzfb, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:08 [147K rows, 144KB] [17.6K rows/s, 17.2KB/s]
Why should presto get to process 147K rows when cassandra itself responds with just 5447 rows for the same query [I tried select * too]?
Why presto is not able to use the clustering key optimization?
I tried all possible values like timestamp, date, different formats of dates. Not able to see any effect on number of rows being fetched.
CF Reference:
CREATE TABLE events (
month text,
day timestamp,
test_data text,
some_random_column text,
event_time timestamp,
PRIMARY KEY (month, day, event_time)
) WITH comment='Test Data'
AND read_repair_chance = 1.0;
Added event_timestamp too as a constraint in response to Dain's answer
presto:default> select count(*) from cassandra.stats.events where month = '2015-04' and day = timestamp '2015-04-02 00:00:00+0000' and event_time = timestamp '2015-04-02 00:00:34+0000';
_col0
-------
1
(1 row)
Query 20150301_071417_00009_cxzfb, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:07 [147K rows, 144KB] [21.3K rows/s, 20.8KB/s]
The Presto engine will pushdown simple WHERE clauses like this to a connector (you can see this in the Hive connector), so the question is, why does the Cassandra connector not take advantage of this. To see why, we'll have to look at the code.
The pushdown system first interacts with connectors in the ConnectorSplitManager.getPartitions(ConnectorTableHandle, TupleDomain) method, so looking at the CassandraSplitManager, I see it is delegating the logic to getPartitionKeysSet. This method looks for a range constraint (e.g., x=33 or x BETWEEN 1 AND 10) for every column in the primary key, so in your case, you would need to add a constraint on event_time.
I don't know why the code insists on having a constraint on every column in the primary key, but I'd guess that it is a bug. It should be easy to tweak this code to remove that constraint.