I am using the following code in azure sql datawarehouse
SELECT cast(DATEDIFF(ms,cast(Start as datetime2),cast(EndTime as datetime2)
) as float) AS [total]--difference to be calculated in millisecond
FROM systable
but coming across an error as
"The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.
"
My requirement is to have the difference in milliseconds and if thats changed then it will affect other results.
request you to please provide some help
This happens because the DATEDIFF() function returns an integer. An integer only allows values up to 2,147,483,647. In this case, you have more than ~2B values causing the data type overflow. You would ideally use the DATEDIFF_BIG() function which returns a bigint that allows for values up to 9,223,372,036,854,775,807 or ~9 Septillion. DATEDIFF_BIG() isn't supported in SQL Data Warehouse / Azure Synapse Analytics (as of Jan 2020).
You can vote for the feature here: (https://feedback.azure.com/forums/307516/suggestions/14781627)
Testing DATEDIFF(), you can see that you can get ~25 days and 20 hours of difference between dates before you run out of integers. Some sample code is below.
DECLARE #startdate DATETIME2 = '01/01/2020 00:00:00.0000';
DECLARE #enddate DATETIME2 = '01/01/2020 00:00:02.0000';
-- Support:
-- MILLISECOND: ~25 days 20 Hours
-- MICROSECOND: ~35 minutes
-- NANOSECOND: ~ 2 seconds
SELECT
DATEDIFF(DAY, #startdate, #enddate) [day]
, DATEDIFF(HOUR, #startdate, #enddate) [hour]
, DATEDIFF(MINUTE, #startdate, #enddate) [minute]
, DATEDIFF(SECOND, #startdate, #enddate) [second]
, DATEDIFF(MILLISECOND, #startdate, #enddate) [millisecond]
, DATEDIFF(MICROSECOND, #startdate, #enddate) [microsecond]
, DATEDIFF(NANOSECOND, #startdate, #enddate) [nanosecond]
In the interim, you could calculate the ticks since the beginning of the time for each value and then subtract the difference. For a DATETIME2, you can calculate ticks like this:
CREATE FUNCTION dbo.DATEDIFF_TICKS(#date DATETIME2)
RETURNS BIGINT
AS
BEGIN
RETURN
(DATEDIFF(DAY, '01/01/0001', CAST(#date AS DATE)) * 864000000000.0)
+ (DATEDIFF(SECOND, '00:00', CAST(#date AS TIME(7))) * 10000000.0)
+ (DATEPART(NANOSECOND, #date) / 100.0);
END
GO
You can then just run the function and determine the ticks and the difference between ticks.
DECLARE #startdate DATETIME2 = '01/01/2020 00:00:00.0000';
DECLARE #enddate DATETIME2 = '01/30/2020 00:00:00.0000';
SELECT
dbo.DATEDIFF_TICKS(#startdate) [start_ticks],
dbo.DATEDIFF_TICKS(#startdate) [end_ticks],
dbo.DATEDIFF_TICKS(#enddate) - dbo.DATEDIFF_TICKS(#startdate) [diff];
Here is a sample running 500 years of differences:
DECLARE #startdate DATETIME2 = '01/01/2000 00:00:00.0000';
DECLARE #enddate DATETIME2 = '01/01/2500 00:00:00.0000';
SELECT
dbo.DATEDIFF_TICKS(#startdate) [start_ticks],
dbo.DATEDIFF_TICKS(#startdate) [end_ticks],
dbo.DATEDIFF_TICKS(#enddate) - dbo.DATEDIFF_TICKS(#startdate) [diff];
The results:
start_ticks end_ticks diff
-------------------- -------------------- --------------------
630822816000000000 630822816000000000 157785408000000000
Related
I have a table containing the market data of 5,000 unique stocks. Each stock has 24 records a day and each record has 1,000 fields (factors). I want to pivot the table for cross-sectional analysis. You can find my script below.
I have two questions: (1) The current script is a bit complex. Is there a simpler implementation? (2) The execution takes 521 seconds. Any way to make it faster?
1.Create table
CREATE TABLE tb
(
tradeTime DateTime,
symbol String,
factor String,
value Float64
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(tradeTime)
ORDER BY (symbol, tradeTime)
SETTINGS index_granularity = 8192
2.Insert test data
INSERT INTO tb SELECT
tradetime,
symbol,
untuple(factor)
FROM
(
SELECT
tradetime,
symbol
FROM
(
WITH toDateTime('2022-01-01 00:00:00') AS start
SELECT arrayJoin(timeSlots(start, toUInt32((22 * 23) * 3600), 3600)) AS tradetime
)
ARRAY JOIN arrayMap(x -> concat('symbol', toString(x)), range(0, 5000)) AS symbol
)
ARRAY JOIN arrayMap(x -> (concat('f', toString(x)), toFloat64(x) + toFloat64(0.1)), range(0, 1000)) AS factor
3.Finally, send the query
SELECT
tradeTime,
sumIf(value, factor = 'factor1') AS factor1,
sumIf(value, factor = 'factor2') AS factor2,
sumIf(value, factor = 'factor3') AS factor3,
sumIf(value, factor = 'factor4') AS factor4,
...// so many factors to list out
sumIf(value, factor = 'factor1000') AS factor1000
FROM tb
GROUP BY tradeTime,symbol
ORDER BY tradeTime,symbol ASC
Have you considered building a materialized view to solve this with the inserts into a SummingMergeTree ?
I am using ScyllaDB open-source version 4.4.
I am trying to figure out how to write a query which I would have done with a window function or a UNION set operator if this was a traditional relational database with a full SQL.
A simplified table schema:
CREATE TABLE mykeyspace.mytable (
name text ,
timestamp_utc_nanoseconds bigint ,
value bigint ,
PRIMARY KEY( (name),timestamp_utc_nanoseconds )
) WITH CLUSTERING ORDER BY (timestamp_utc_nanoseconds DESC);
My query needs to calculate and return 6 values, each of them is an average of "value" over one of the previous minutes.
In pseudo-code:
SELECT AVG(value) AS one_min_avg_6_min_ago FROM mykeyspace.mytable WHERE name = 'some_name' AND timestamp_utc_nanoseconds >= [*6 minutes ago*] AND timestamp_utc_nanoseconds < [*5 minutes ago*];
SELECT AVG(value) AS one_min_avg_5_min_ago FROM mykeyspace.mytable WHERE name = 'some_name' AND timestamp_utc_nanoseconds >= [*5 minutes ago*] AND timestamp_utc_nanoseconds < [*4 minutes ago*];
SELECT AVG(value) AS one_min_avg_4_min_ago FROM mykeyspace.mytable WHERE name = 'some_name' AND timestamp_utc_nanoseconds >= [*4 minutes ago*] AND timestamp_utc_nanoseconds < [*3 minutes ago*];
SELECT AVG(value) AS one_min_avg_3_min_ago FROM mykeyspace.mytable WHERE name = 'some_name' AND timestamp_utc_nanoseconds >= [*3 minutes ago*] AND timestamp_utc_nanoseconds < [*2 minutes ago*];
SELECT AVG(value) AS one_min_avg_2_min_ago FROM mykeyspace.mytable WHERE name = 'some_name' AND timestamp_utc_nanoseconds >= [*2 minutes ago*] AND timestamp_utc_nanoseconds < [*1 minutes ago*];
SELECT AVG(value) AS one_min_avg_1_min_ago FROM mykeyspace.mytable WHERE name = 'some_name' AND timestamp_utc_nanoseconds >= [*1 minute ago*];
My client-side is C# .NET 5. I can easily do pretty much anything on the client side. But the latency in this case is going to be a big problem.
My question is:
How can I combine these 6 queries into one result set on the server side (not on the client app side)?
Ideally you would use a UDA - User Defined Aggregate function, however support for these is not yet complete. In the meantime, you can execute these 6 queries in parallel which might even be preferable.
I am new to cognos 10.2 report studio at the moment.
I need to declare the prefix date time in my sql in order to make my union queries works.
I've tested a few datetime declarations but it seems not working and i keep getting the server returned an unrecognizable query framework response.
I've tried some of the codes which i found in some cognos forum as per shown in below.
Codes that i've tried
1. '1970-01-01T00:00:00.000000000' as invdate
2. todate(null) as invdate
/********** This below is my code ***********/
select
'fstgld' as wso,
0 as pono,
'nosh' as shpm,
'gld' as DocType,
0 as DocNo,
'gl' as item,
trim(tffst305.dim2) as ItemGroup,
tffst305.year as fy,
tffst305.perd as period,
'fst' as slsordtype,
'finbg' as finbg,
0 as Qty,
tffst305.leac as leac,
0 as Sales,
tffst305.fdah-tffst305.fcah as Cost
current_date as invdate <------this is the part where i keep getting error as i need to declare a prefix datetime
From tffst305
WHERE
tffst305.ptyp = 1 and
tffst305.budg ='ACT' and
tffst305.company_nr = 810
union all
select
cisli310.orno as wso,
cisli310.pono as pono,
cisli310.shpm as shpm,
cisli310.tran as DocType,
cisli310.idoc as DocNo,
cisli310.item as item,
tdsls411.citg as ItemGroup,
tfgld018.year as fy,
tfgld018.vprd as period,
cisli310.sotp as slsordtype,
tccom112.cfcg as finbg,
cisli310.dqua as Qty,
'inv' as leac,
cisli310.amth(1) as Sales,
0 as Cost,
cisli305.idat as invdate <--- extracted from the table field
From cisli310
RIGHT OUTER JOIN cisli305 ON cisli310.tran = cisli305.tran and
cisli310.idoc = cisli305.idoc
LEFT OUTER JOIN tdsls411 ON cisli310.orno=tdsls411.orno and
cisli310.pono=tdsls411.pono
LEFT OUTER JOIN tccom112 ON cisli305.ofbp = tccom112.itbp
inner join tfgld018 on cisli310.tran = tfgld018.ttyp and cisli310.idoc =
tfgld018.docn
WHERE
cisli310.sotp in ('SSP', 'SPL', 'SWK') and cisli310.amth(1) <>0 and
cisli305.company_nr=810 and
cisli310.company_nr=810 and
tdsls411.company_nr=810 and
tfgld018.company_nr=810 and
tccom112.company_nr=810
The field of the record is a datetime datatype such as 2009-07-03 03:08:03pm
Try replacing current_date with # timestampMask ( $current_timestamp , 'yyyy-dd-mm' ) # You can add other date or time portions as needed
timestampMask ( string_expression1 , string_expression2 )
Returns "string_expression1", representing a timestamp with time zone, trimmed to the format specified in "string_expression2".
The format in "string_expression2" must be one of the following: 'yyyy', 'mm', 'dd', 'yyyy-mm', 'yyyymm', 'yyyy-mm-dd', 'yyyymmdd', 'yyyy-mm-dd hh:mm:ss', 'yyyy-mm-dd hh:mm:ss+hh:mm', 'yyyy-mm-dd hh:mm:ss.ff3', 'yyyy-mm-dd hh:mm:ss.ff3+hh:mm', 'yyyy-mm-ddThh:mm:ss', 'yyyy-mm-ddThh:mm:ss+hh:mm', 'yyyy-mm-ddThh:mm:ss.ff3+hh:mm', or 'yyyy-mm-ddThh:mm:ss.ff3+hh:mm'.The macro functions that return a string representation of a timestamp with time zone show a precision of 9 digits for the fractional part of the seconds by default. The format options allow this to be trimmed down to a precision of 3 or 0.
I need to replicate the linux command "date +%s%3N" in SQL Developer. I have tried the below code sample but it returns with a different value. I have also done extensive searching Google with no luck.
select to_char((extract(day from (systimestamp - timestamp '1970-01-01 00:00:00')) * 86400000
+ extract(hour from (systimestamp - timestamp '1970-01-01 00:00:00')) * 3600000
+ extract(minute from (systimestamp - timestamp '1970-01-01 00:00:00')) * 60000
+ extract(second from (systimestamp - timestamp '1970-01-01 00:00:00')) * 1000) * 1000) unix_time
from dual;
The date +%s%3N command returns something like:
1475615656692870653
Whereas the above code sample returns something like:
1475594089419116
The date command returns a longer and larger number than the code sample even though it was run before the code sample. The ultimate solution would be a direct utility in Oracle if possible. If not, possibly invoking the date command within Oracle would work.
Try this one:
CREATE OR REPLACE FUNCTION GetUnixTime RETURN INTEGER IS
dsInt INTERVAL DAY(9) TO SECOND;
res NUMBER;
BEGIN
dsInt := CURRENT_TIMESTAMP - TIMESTAMP '1970-01-01 00:00:00 UTC';
res:= EXTRACT(DAY FROM dsInt)*24*60*60
+ EXTRACT(HOUR FROM dsInt)*60*60
+ EXTRACT(MINUTE FROM dsInt)*60
+ EXTRACT(SECOND FROM dsInt);
RETURN ROUND(1000*res);
END GetUnixTime;
ROUND(1000*res) will return Unix time in Milliseconds, according to your question it is not clear whether you like to get Milliseconds, Microseconds or even Nanoseconds. But it is quite obvious how to change the result to desired value.
This function considers your local time zone and time zone of Unix epoch (which is always UTC)
If you don't like a function, you can write it at a query of course:
SELECT
ROUND(EXTRACT(DAY FROM CURRENT_TIMESTAMP - TIMESTAMP '1970-01-01 00:00:00 UTC')*24*60*60
+ EXTRACT(HOUR FROM CURRENT_TIMESTAMP - TIMESTAMP '1970-01-01 00:00:00 UTC')*60*60
+ EXTRACT(MINUTE FROM CURRENT_TIMESTAMP - TIMESTAMP '1970-01-01 00:00:00 UTC')*60
+ EXTRACT(SECOND FROM CURRENT_TIMESTAMP - TIMESTAMP '1970-01-01 00:00:00 UTC')
* 1000) AS unix_time
FROM dual;
I ended up going with using oscommands through the method described in this link here http://www.orafaq.com/scripts/plsql/oscmd.txt. The solutions below were a step in the right direction, however, other parts of the script we were making were running into issues. Using oscommands solved all of our issues. With the method mentioned in the link, I was simply able to type
l_time := oscomm('/bin/date +%s%3N');
to get the correct number.
We have a postgreql connection pool used by multithreaded application, that permanently inserts some records into big table. So, lets say we have 10 database connections, executing the same function, whcih inserts the record.
The trouble is, we have 10 records inserted as a result meanwhile it should be only 2-3 records inserted, if only transactions could see the records of each other (our function takes decision to do not insert the record according to the date of the last record found).
We can not afford table locking for func execution period.
We tried different tecniques to make the database apply our rules to new records immediately despite the fact they are created in parallel transactions, but havent succeeded yet.
So, I would be very grateful for any help or idea!
To be more specific, here is the code:
schm.events ( evtime TIMESTAMP, ref_id INTEGER, param INTEGER, type INTEGER);
record filter rule:
BEGIN
select count(*) into nCnt
from events e
where e.ref_id = ref_id and e.param = param and e.type = type
and e.evtime between (evtime - interval '10 seconds') and (evtime + interval '10 seconds')
if nCnt = 0 then
insert into schm.events values (evtime, ref_id, param, type);
end if;
END;
UPDATE (comment length is not enough unfortunately)
I've applied to production the unique index solution. The results are pretty acceptable, but the initial target has not been achieved.
The issue is, with the unique hash I can not control the interval between 2 records with sequential hash_codes.
Here is the code:
CREATE TABLE schm.events_hash (
hash_code bigint NOT NULL
);
CREATE UNIQUE INDEX ui_events_hash_hash_code ON its.events_hash
USING btree (hash_code);
--generate the hash codes data by partioning(splitting) evtime in 10 sec intervals:
INSERT into schm.events_hash
select distinct ( cast( trunc( extract(epoch from evtime) / 10 ) || cast( ref_id as TEXT) || cast( type as TEXT ) || cast( param as TEXT ) as bigint) )
from schm.events;
--and then in a concurrently executed function I insert sequentially:
begin
INSERT into schm.events_hash values ( cast( trunc( extract(epoch from evtime) / 10 ) || cast( ref_id as TEXT) || cast( type as TEXT ) || cast( param as TEXT ) as bigint) );
insert into schm.events values (evtime, ref_id, param, type);
end;
In that case, if evtime lies within hash-determined interval, only one record is being inserted.
The case is, we can skip records that refer to different determined intervals, but are close to each other (less than 60 sec interval).
insert into schm.events values ( '2013-07-22 19:32:37', '123', '10', '20' ); --inserted, test ok, (trunc( extract(epoch from cast('2013-07-22 19:32:37' as timestamp)) / 10 ) = 137450715 )
insert into schm.events values ( '2013-07-22 19:32:39', '123', '10', '20' ); --filtered out, test ok, (trunc( extract(epoch from cast('2013-07-22 19:32:39' as timestamp)) / 10 ) = 137450715 )
insert into schm.events values ( '2013-07-22 19:32:41', '123', '10', '20' ); --inserted, test fail, (trunc( extract(epoch from cast('2013-07-22 19:32:41' as timestamp)) / 10 ) = 137450716 )
I think there must be a way to modify the hash function to achieve the initial target, but havent found it yet. Maybe, there are some table constraint expressions, that are executed by the postgresql itself, out of the transaction?
About your only options are:
Using a unique index with a hack to collapse 20-second ranges to a single value;
Using advisory locking to control communication; or
SERIALIZABLE isolation and intentionally creating a mutual dependency between sessions. Not 100% sure this will be practical in your case.
What you really want is a dirty read, but PostgreSQL does not support dirty reads, so you're kind of stuck there.
You might land up needing a co-ordinator outside the database to manage your requirements.
Unique index
You can truncate your timestamps for the purpose of uniquenes checking, rounding them to regular boundaries so they jump in 20 second chunks. Then add them to a unique index on (chunk_time_seconds(evtime, 20), ref_id, param, type) .
Only one insert will succeed and the rest will fail with an error. You can trap the error in a BEGIN ... EXCEPTION block in PL/PgSQL, or preferably just handle it in the application.
I think a reasonable definition of chunk_time_seconds might be:
CREATE OR REPLACE FUNCTION chunk_time_seconds(t timestamptz, round_seconds integer)
RETURNS bigint
AS $$
SELECT floor(extract(epoch from t) / 20) * 20;
$$ LANGUAGE sql IMMUTABLE;
A starting point for advisory locking:
Advisory locks can be taken on a single bigint or a pair of 32-bit integers. Your key is bigger than that, it's three integers, so you can't directly use the simplest approach of:
IF pg_try_advisory_lock(ref_id, param) THEN
... do insert ...
END IF;
then after 10 seconds, on the same connection (but not necessarily in the same transaction) issue pg_advisory_unlock(ref_id_param).
It won't work because you must also filter on type and there's no three-integer-argument form of pg_advisory_lock. If you can turn param and type into smallints you could:
IF pg_try_advisory_lock(ref_id, param << 16 + type) THEN
but otherwise you're in a bit of a pickle. You could hash the values, of course, but then you run the (small) risk of incorrectly skipping an insert that should not be skipped in the case of a hash collision. There's no way to trigger a recheck because the conflicting rows aren't visible, so you can't use the usual solution of just comparing rows.
So ... if you can fit the key into 64 bits and your application can deal with the need to hold the lock for 10-20s before releasing it in the same connection, advisory locks will work for you and will be very low overhead.