Cassandra - How to get the Day of week from the timestamp column?

Cassandra - How to get the Day of week from the timestamp column? - cassandra

I have a timestamp column in a Cassandra table, how do i get the day of week from the timestamp column using cql?

There isn't support out of the box for this but
If using the CQL is a must you can have a look at the User Defined Functions:
http://cassandra.apache.org/doc/latest/cql/functions.html
http://www.datastax.com/dev/blog/user-defined-functions-in-cassandra-3-0
http://docs.datastax.com/en//cql/latest/cql/cql_using/useCreateUDF.html
Then you could use something as simple as:
How to determine day of week by passing specific date?
or even something like
Aggregation with Group By date in Spark SQL
And then you have a UDF that gives you day of the week when you are working with a dates.

Maybe this answer will be helpful for some one still looking for an answer in 2022.
You can create an user defined function:
CREATE OR REPLACE FUNCTION DOW(
input_date_string varchar,
date_pattern varchar
)
CALLED ON NULL INPUT
RETURNS int
LANGUAGE java AS
'
int ret = -1;
try {
ret = java.time.LocalDate.parse(input_date_string, java.time.format.DateTimeFormatter.ofPattern(date_pattern))
.getDayOfWeek()
.getValue();
} catch (java.lang.Exception ex) {
// error, do nothing here and -1 will be returned
}
return ret;
';
Test
cqlsh:store> create table testdate(key int PRIMARY KEY , date_string varchar );
... insert some date_strings ...
INSERT INTO testdate (key , date_string ) VALUES ( 9, '2022-11-22');
...
cqlsh:store> select date_string, dow(date_string, 'yyyy-MM-dd') from testdate;
date_string | store.dow(date_string, 'yyyy-MM-dd')
-------------+--------------------------------------
50/11/2022 | -1
2022-11-23 | 3
19/11/2024 | -1
2022-11-21 | 1
19/11/2023 | -1
19/11/20249 | -1
2022-11-20 | 7
50/aa/2022 | -1
2022-11-22 | 2
19/11/2024 | -1
Similar function with timestamp argument
CREATE OR REPLACE FUNCTION DOW_TS(
input_date_time timestamp,
zone_id varchar
)
CALLED ON NULL INPUT
RETURNS int
LANGUAGE java AS
'
int ret = -1;
try {
ret = input_date_time.toInstant().atZone(java.time.ZoneId.of(zone_id)).toOffsetDateTime()
.getDayOfWeek()
.getValue();
} catch (java.lang.Exception ex) {
// error, do nothing here and -1 will be returned
}
return ret;
';
Test
cqlsh:store> select id, dt, dow_ts(dt, 'UTC'), dow_ts(dt,'WHAT') from testdt;
id | dt | store.dow_ts(dt, 'UTC') | store.dow_ts(dt, 'WHAT')
----+---------------------------------+-------------------------+--------------------------
1 | 2022-11-19 14:30:47.420000+0000 | 6 | -1
Above functions had been tested with below cassandra's setup:
INFO [main] 2022-11-19 12:25:47,004 CassandraDaemon.java:632 - JVM vendor/version: OpenJDK 64-Bit Server VM/11.0.17
INFO [main] 2022-11-19 12:25:50,737 StorageService.java:736 - Cassandra version: 4.0.7
INFO [main] 2022-11-19 12:25:50,738 StorageService.java:737 - CQL version: 3.4.5
References:
https://docs.datastax.com/en/dse/6.8/cql/cql/cql_using/useCreateUDF.html
https://cassandra.apache.org/_/quickstart.html
Hint: you should ensure to set "enable_user_defined_functions: true" in /etc/cassandra/cassandra.yaml.
With docker option above (https://cassandra.apache.org/_/quickstart.html), you do a quick hack as below
$ docker run --rm -d --name cassandra --hostname cassandra --network cassandra cassandra
$ docker cp cassandra:/etc/cassandra/cassandra.yaml .
Use your favorite editor to change "enable_user_defined_functions: false" to "enable_user_defined_functions: true" in "$(pwd)"/cassandra.yaml
$ docker run --rm -d --name cassandra --hostname cassandra --network cassandra --mount type=bind,source="$(pwd)"/cassandra.yaml,target=/etc/cassandra/cassandra.yaml cassandra
If you have very old cassandra version, which does not support java8 then maybe below altenative would work (see https://en.wikipedia.org/wiki/Determination_of_the_day_of_the_week)
CREATE OR REPLACE FUNCTION DOW_Tomohiko_Sakamoto(
input_date_time timestamp
)
CALLED ON NULL INPUT
RETURNS int
LANGUAGE java AS
'
int y = input_date_time.getYear() + 1900;
int m = input_date_time.getMonth() + 1;
int d = input_date_time.getDate();
int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4};
if (m < 3) {
y -= 1;
}
int ret = (y + y / 4 - y / 100 + y / 400 + t[m - 1] + d) % 7;
if (ret == 0) {
ret = 7;
}
return ret;
';
TEST
cqlsh:store> insert into data(id, dt ) VALUES (2, '2022-11-19 00:00:00+0000');
cqlsh:store> insert into data(id, dt ) VALUES (3, '2022-11-21 00:00:00+0000');
cqlsh:store> insert into data(id, dt ) VALUES (4, '2022-11-23 00:00:00+0000');
cqlsh:store> insert into data(id, dt ) VALUES (5, '2022-11-24 00:00:00+0000');
cqlsh:store> insert into data(id, dt ) VALUES (7, '2022-11-25 00:00:00+0000');
cqlsh:store> insert into data(id, dt ) VALUES (8, '2022-11-26 00:00:00+0000');
cqlsh:store> insert into data(id, dt ) VALUES (9, '2022-11-27 00:00:00+0000');
cqlsh:store> insert into data(id, dt ) VALUES (10, '2022-11-28 00:00:00+0000');
cqlsh:store> insert into data(id, dt ) VALUES (11, '2020-02-29 00:00:00+0000');
cqlsh:store> insert into data(id, dt ) VALUES (12, '2020-02-30 00:00:00+0000');
cqlsh:store> insert into data(id, dt ) VALUES (13, '2020-02-31 00:00:00+0000');
cqlsh:store> select id, dt, dow_ts(dt,'UTC'), DOW_Tomohiko_Sakamoto(dt) from data;
id | dt | store.dow_ts(dt, 'UTC') | store.dow_tomohiko_sakamoto(dt)
----+---------------------------------+-------------------------+---------------------------------
5 | 2022-11-24 00:00:00.000000+0000 | 4 | 4
10 | 2022-11-28 00:00:00.000000+0000 | 1 | 1
13 | 2020-02-29 00:00:00.000000+0000 | 6 | 6
11 | 2020-02-29 00:00:00.000000+0000 | 6 | 6
1 | 2022-11-20 17:43:28.568000+0000 | 7 | 7
8 | 2022-11-26 00:00:00.000000+0000 | 6 | 6
2 | 2022-11-19 00:00:00.000000+0000 | 6 | 6
4 | 2022-11-23 00:00:00.000000+0000 | 3 | 3
7 | 2022-11-25 00:00:00.000000+0000 | 5 | 5
9 | 2022-11-27 00:00:00.000000+0000 | 7 | 7
12 | 2020-02-29 00:00:00.000000+0000 | 6 | 6
3 | 2022-11-21 00:00:00.000000+0000 | 1 | 1

Related

How can I filter for a specific date on a CQL timestamp column?

I have a table defined as:
CREATE TABLE downtime(
asset_code text,
down_start timestamp,
down_end timestamp,
down_duration duration,
down_type text,
down_reason text,
PRIMARY KEY ((asset_code, down_start), down_end)
);
I'd like to get downtime on a particular day, such as:
SELECT * FROM downtime \
WHERE asset_code = 'CA-PU-03-LB' \
AND todate(down_start) = '2022-12-11';
I got a syntax error:
SyntaxException: line 1:66 no viable alternative at input '(' (...where asset_code = 'CA-PU-03-LB' and [todate](...)
If function is not allowed on a partition key in where clause, how can I get data with "down_start" of a particular day?

You don't need to use the TODATE() function to filter for a specific date. You can simply specify the date as '2022-12-11' when applying a filter on a CQL timestamp column.
But the difference is that you cannot use the equality operator (=) because the CQL timestamp data type is encoded as the number of milliseconds since Unix epoch (Jan 1, 1970 00:00 GMT) so you need to be precise when you're working with timestamps.
Let me illustrate using this example table:
CREATE TABLE tstamps (
id int,
tstamp timestamp,
colour text,
PRIMARY KEY (id, tstamp)
)
My table contains the following sample data:
cqlsh> SELECT * FROM tstamps ;
id | tstamp | colour
----+---------------------------------+--------
1 | 2022-12-05 11:25:01.000000+0000 | red
1 | 2022-12-06 02:45:04.564000+0000 | yellow
1 | 2022-12-06 11:06:48.119000+0000 | orange
1 | 2022-12-06 19:02:52.192000+0000 | green
1 | 2022-12-07 01:48:07.870000+0000 | blue
1 | 2022-12-07 03:13:27.313000+0000 | indigo
The cqlshi client formats the tstamp column into a human-readable date in UTC. But really, the tstamp values are stored as integers:
cqlsh> SELECT tstamp, TOUNIXTIMESTAMP(tstamp) FROM tstamps ;
tstamp | system.tounixtimestamp(tstamp)
---------------------------------+--------------------------------
2022-12-05 11:25:01.000000+0000 | 1670239501000
2022-12-06 02:45:04.564000+0000 | 1670294704564
2022-12-06 11:06:48.119000+0000 | 1670324808119
2022-12-06 19:02:52.192000+0000 | 1670353372192
2022-12-07 01:48:07.870000+0000 | 1670377687870
2022-12-07 03:13:27.313000+0000 | 1670382807313
To retrieve the rows for a specific date, you need to specify the range of timestamps which fall on a specific date. For example, the timestamps for 6 Dec 2022 UTC ranges from 1670284800000 (2022-12-06 00:00:00.000 UTC) to 1670371199999 (2022-12-06 23:59:59.999 UTC).
This means if we want to query for December 6, we need to filter using a range query:
SELECT * FROM tstamps \
WHERE id = 1 \
AND tstamp >= '2022-12-06' \
AND tstamp < '2022-12-07';
and we get:
id | tstamp | colour
----+---------------------------------+--------
1 | 2022-12-06 02:45:04.564000+0000 | yellow
1 | 2022-12-06 11:06:48.119000+0000 | orange
1 | 2022-12-06 19:02:52.192000+0000 | green
WARNING - In your case where the timestamp column is part of the partition key, performing a range query is dangerous because it results in a multi-partition query -- there are 86M possible values between 1670284800000 and 1670371199999. For this reason, timestamps are not a good choice for partition keys. Cheers!
👉 Please support the Apache Cassandra community by hovering over the cassandra tag above and click on Watch tag. 🙏 Thanks!

Difference between 2 consecutive values in Kusto

I have the following script:
let StartTime = datetime(2022-02-18 10:10:00 AM);
let EndTime = datetime(2022-02-18 10:15:00 AM);
MachineEvents
| where Timestamp between (StartTime .. EndTime)
| where Id == "00112233" and Name == "Higher"
| top 2 by Timestamp
| project Timestamp, Value
I got the following result:
What I am trying to achieve after that is to check if the last Value received (in this case for example it is 15451.433) is less than 30,000. If the condition is true, then I should check again the difference between the last two consecutive values (in this case : 15451.433 - 15457.083). If the difference is < 0 then I should return the Value as true, else it should return as false (by other words the Value should give a boolean value instead of double as shown in the figure)

datatable(Timestamp:datetime, Value:double)
[
datetime(2022-02-18 10:15:00 AM), 15457.083,
datetime(2022-02-18 10:14:00 AM), 15451.433,
datetime(2022-02-18 10:13:00 AM), 15433.333,
datetime(2022-02-18 10:12:00 AM), 15411.111
]
| top 2 by Timestamp
| project Timestamp, Value
| extend nextValue=next(Value)
| extend finalResult = iff(Value < 30000, nextValue - Value < 0, false)
| top 1 by Timestamp
| project finalResult
Output:
finalResult
1

You can use the prev() function (or next()) to process the values in the other rows.
...
| extend previous = prev(value)
| extend diff = value - previous
| extend isPositive = diff > 0
You might need to use serialize if you don't have something like top that already does that for you.

ROUNDUP Function giving error as Invalid Identifier in oracle

I want to use ROUNDUP formula of excel in my oracle procedure. But while using I am getting error as
ROUNDUP is Invalid Identifier.
Below is my code
SELECT ROUNDUP(15/30) FROM DUAL;
Please suggest how can I use this

You cannot, ROUNDUP is not an Oracle function (which is why you get the invalid identifier error).
You could instead use CEIL.
SELECT CEIL(15/30) FROM DUAL;
| CEIL(15/30) |
| ----------: |
| 1 |
If you want to round up to a given precision then you could create a user-defined function:
CREATE FUNCTION roundup(
value IN NUMBER,
precision IN PLS_INTEGER DEFAULT 0
) RETURN NUMBER DETERMINISTIC
IS
BEGIN
IF precision = 0 THEN
RETURN CEIL( value );
ELSE
RETURN CEIL( value * POWER( 10, precision ) ) / POWER( 10, precision );
END IF;
END;
/
Then:
SELECT ROUNDUP(0.56789),
ROUNDUP(0.56789, 1),
ROUNDUP(0.56789, 2),
ROUNDUP(0.56789, -1)
FROM DUAL;
Outputs:
ROUNDUP(0.56789) | ROUNDUP(0.56789,1) | ROUNDUP(0.56789,2) | ROUNDUP(0.56789,-1)
---------------: | -----------------: | -----------------: | ------------------:
1 | .6 | .57 | 10
db<>fiddle here

How to count the duplicate records in a table in cassandra

Let say my data is like below:
Acct_id | amount
--------|-------
10001 |6.00
20000 |5.00
32356 |1.00
10001 |2.00
45000 |1.50
45000 |10.00
My expected result should be like this:
acct_id| count
-------|-----
10001 | 2
45000 | 2
How do i get it in cassandra?

How do i get it in cassandra?
If you're using Cassandra 2.2.x or 3.x you can create an user defined aggregate
CREATE FUNCTION counByAccId(state map<int, int>, acctid int)
RETURNS NULL ON NULL INPUT
RETURNS map<int, int>
LANGUAGE java
AS '
if(state.containsKey(acctid)) {
Integer currentCount = (Integer)state.get(acctid);
state.put(acctid, currentCount + 1);
} else {
state.put(acctid, 1);
}
return state;
';
CREATE AGGREGATE groupByAcctIdAndCount(int)
SFUNC counByAccId
STYPE map<int, int>
INITCOND {};
SELECT groupByAcctIdAndCount(acct_id) FROM myTable WHERE partition_key = xxx;
Example data set:
select * from agg;
partition_key | acct_id | val
---------------+---------+-----
5 | 45000 | 1.5
1 | 10001 | 6
2 | 20000 | 5
4 | 10001 | 2
6 | 45000 | 10
3 | 32356 | 1
select groupByAcctIdAndCount(acctid) FROM agg;
music.groupbyacctidandcount(acct_id)
------------------------------------------
{10001: 2, 20000: 1, 32356: 1, 45000: 2}
WARNING: be sure to read my blog about UDA and the implication in term of performance when scanning a full table: http://www.doanduyhai.com/blog/?p=2015

How to find a specific mask within a string - Oracle?

I have a field in a table that can be informed with differente values.
Examples:
Row 1 - (2012,2013)
Row 2 - 8871
Row 3 - 01/04/2012
Row 4 - 'NULL'
I have to identify the rows that have a string with a date mask 'dd/mm/yyyy' informed. Like Row 3, so I may add a TO_DATE function to it.
Any idea on how can I search a mask within the field?
Thanks a lot

Sounds like a data model problem (storing a date in a string).
But, since it happens and we sometimes can't control or change things, I usually keep a function around like this one:
CREATE OR REPLACE FUNCTION safe_to_date (p_string IN VARCHAR2,
p_format_mask IN VARCHAR2,
p_error_date IN DATE DEFAULT NULL)
RETURN DATE
DETERMINISTIC IS
x_date DATE;
BEGIN
BEGIN
x_date := TO_DATE (p_string, p_format_mask);
RETURN x_date; -- Only gets here if conversion was successful
EXCEPTION
WHEN OTHERS THEN
RETURN p_error_date;
END;
END safe_to_date;
Then use it like this:
WITH d AS
(SELECT 'X' string_field FROM DUAL
UNION ALL
SELECT '11/15/2012' FROM DUAL
UNION ALL
SELECT '155' FROM DUAL)
SELECT safe_to_date (d.string_field, 'MM/DD/YYYY')
FROM d;

SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE Test ( id, VALUE ) AS
SELECT 'Row 1', '(2012,2013)' FROM DUAL
UNION ALL SELECT 'Row 2', '8871' FROM DUAL
UNION ALL SELECT 'Row 3', '01/04/2012' FROM DUAL
UNION ALL SELECT 'Row 4', NULL FROM DUAL
UNION ALL SELECT 'Row 5', '99,99,2015' FROM DUAL
UNION ALL SELECT 'Row 6', '32/12/2015' FROM DUAL
UNION ALL SELECT 'Row 7', '29/02/2015' FROM DUAL
UNION ALL SELECT 'Row 8', '29/02/2016' FROM DUAL
/
Query 1 - You can check with a regular expression:
SELECT *
FROM TEST
WHERE REGEXP_LIKE( VALUE, '^\d{2}/\d{2}/\d{4}$' )
Results:
| ID | VALUE |
|-------|------------|
| Row 3 | 01/04/2012 |
| Row 6 | 32/12/2015 |
| Row 7 | 29/02/2015 |
| Row 8 | 29/02/2016 |
Query 2 - You can make the regular expression more complicated to catch more invalid dates:
SELECT *
FROM TEST
WHERE REGEXP_LIKE( VALUE, '^(0[1-9]|[12]\d|3[01])/(0[1-9]|1[0-2])/\d{4}$' )
Results:
| ID | VALUE |
|-------|------------|
| Row 3 | 01/04/2012 |
| Row 7 | 29/02/2015 |
| Row 8 | 29/02/2016 |
Query 3 - But the best way is to try and convert the value to a date and see if there is an exception:
CREATE OR REPLACE FUNCTION is_Valid_Date(
datestr VARCHAR2,
format VARCHAR2 DEFAULT 'DD/MM/YYYY'
) RETURN NUMBER DETERMINISTIC
AS
x DATE;
BEGIN
IF datestr IS NULL THEN
RETURN 0;
END IF;
x := TO_DATE( datestr, format );
RETURN 1;
EXCEPTION
WHEN OTHERS THEN
RETURN 0;
END;
/
SELECT *
FROM TEST
WHERE is_Valid_Date( VALUE ) = 1
Results:
| ID | VALUE |
|-------|------------|
| Row 3 | 01/04/2012 |
| Row 8 | 29/02/2016 |

You can use the like operator to match the pattern.
where possible_date_field like '__/__/____';

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra - How to get the Day of week from the timestamp column? - cassandra

I have a timestamp column in a Cassandra table, how do i get the day of week from the timestamp column using cql?

Related

How can I filter for a specific date on a CQL timestamp column?

Difference between 2 consecutive values in Kusto

ROUNDUP Function giving error as Invalid Identifier in oracle

How to count the duplicate records in a table in cassandra

How to find a specific mask within a string - Oracle?

Categories

Resources