JPQL Query average per X minutes - jpql

I'm trying to build a jpql query that gives me the average value in an interval of X minutes when given a start date and a end date.
Is there a way of doing this in the jpql or do I have to calculate this in Java?
I don't want to write a native sql query.
This is the seudo-query to understand what I'm trying to accomplish.
'timestmamp_x', avg(d.value) FROM Data d WHERE d.startDate > :startDate AND d.endDate < :startDate 'SOMETHING to group per x minutes';

No, there is no way to have such a grouping JPQL, because it does not have supporting construct. There is not any way to extract minutes from date. Additionally you cannot do any kind of arithmetic calculations with dates and date functions are limited to CURRENT_DATE, CURRENT_TIMESTAMP and CURRENT_TIME.

Related

HIVE where date filter by x days back? string format

so our DBA's setup our hive table with the date column as the partition column, but as a "string" YYYYMMDD format.
How can I WHERE filter this "date" column for something like last 30 days?
Please use date_format to format systemdate - 30 days into YYYYMMDD and then compare with your partition column. Please note to use partition column as is so hive can choose correct partitions.
When you want to pick previous 30th days data -
select *
from mytable
where partition_col = date_format( current_date() - interval '30' days, 'yyyyMMdd')
If you want all data since last 30 days -
select *
from mytable
wherecast(partition_col as INT) >= cast(date_format( current_date() - interval '30' days, 'yyyyMMdd') as INT)
casting shouldnt impact partition benefits but you need to check the performance before using it. Please get back in such scenario.

How to determine time stamps for Cassandra queries

One of The values inserted into the table is current time. I compute the current time using toTimestamp(now()). Now, I want to compute current time minus 90 days , current time minus 15 days.
My question is how do I compute current time - nth day ?
Query for current timestamp :
INSERT INTO TABLE_NAME (col_1, col_2, col_3) VALUES ('val_1', toTimestamp(now()), val_3);
In the above query, val_2 is current timestamp. Current time stamp is determined by
toTimestamp(now())
How do I compute current time - 90 days , current time - 2weeks
This functionality is not built into CQL.
If you are able to use UDFs, you can (building on the example given here:
How to get Last 6 Month data comparing with timestamp column using cassandra query?) do the following:
Enable UDFs as needed by adding or changing this line to true in cassandra.yaml:
enable_user_defined_functions: true
Then add two user defined functions like this:
CREATE FUNCTION dateadd(date timestamp, daydiff int)
CALLED ON NULL INPUT
RETURNS timestamp
LANGUAGE java
AS $$java.util.Calendar c = java.util.Calendar.getInstance();c.setTime(date);c.add(java.util.Calendar.DATE, daydiff);return c.getTime();$$
CREATE FUNCTION weekadd(date timestamp, weekdiff int)
CALLED ON NULL INPUT
RETURNS timestamp
LANGUAGE java
AS $$java.util.Calendar c = java.util.Calendar.getInstance();c.setTime(date);c.add(java.util.Calendar.DATE, weekdiff*7);return c.getTime();$$
Select the data from your table like this:
select dateadd(col_2,-90) from TABLE_NAME;
select weekadd(col_2,-2) from TABLE_NAME;

Whats the alternative for redshift Extract datatype

What is the alternative for Extract in Azure Datawarehouse, we are using datepart right now but it does not work with from, so what can be a straight forward alternative for extract??
Yes, the equivalent of Redshift's EXTRACT is DATEPART, as listed in the supported T-SQL functions of Azure DWH
DATEPART ( datepart , date )
e.g. the RedShift query
select salesid, extract(week from saletime) as weeknum
from sales
where pricepaid > 9999
order by 2;
Has the equivalent in T-SQL as
select salesid, DATEPART(ww, saletime) as weeknum
from sales
where pricepaid > 9999
order by 2;
i.e. DATEPART does not use FROM but instead is a function with 2 parameters - the first being the PART and the second the Date/Time to which to apply the function.

How to refer to a single value in a Calculated Column in a measure (DAX)

I have a table(Data_all) that calculates daycount_ytd in one table.
[Date] is in Date Format.
[Fiscal Year] is just year. eg: 2016
Calculated Column
daycount_ytd=DATEDIFF("01/01/"&[Fiscal Year],Data_all[Date],day)+1
Im trying to create a measure that refers to this Calculated Column
Measure:
Amt_X Yield %:=[Amt X]/([Amt Y]/365* (Data_all[DayCount_YTD]))
I get the error that Data_all[DayCount_YTD] refers to a list of values.
How do i filter the expression to get a single value without using a aggregation function eg:(sum, median)?
Or perhaps, is there another way to achieve the same calculation?
You've arrived an a fundamental concept in DAX and once you've worked out how to deal with it then the solution generalises to loads of scenarios.
Basically you can't just pass columns into a DAX measure without wrapping them in something else - generally some kind of mathematical operation or you can use VALUES() depending on exactly what you are trying to do.
This measure will work OK if you use it in a PIVOT with the date as a row label:
=
SUM ( data_all[Amt X] )
/ (
SUM ( data_all[Amt Y] ) / 365
* MAX ( data_all[daycount_ytd] )
)
However you will see it gives you an incorrect total as it is in the latest for the entire thing. What you need is a version that iterates over the rows and then performs a calculation to SUM or AVERAGE each item. There is a whole class of DAX functions dedicated to this such as SUMX, AVERAGEX etc. You can read more about them here
It's not totally clear to me what the maths behind your 'total' should be but the following measure calculates the value for each day and sums them together:
=
SUMX(
VALUES(data_all[date]),
SUM(data_all[Amt X]) /
(SUM(data_all[Amt Y]) / 365 * MAX(data_all[daycount_ytd]))
)

fetching timeseries/range data in cassandra

I am new to Cassandra and trying to see if it fits my data query needs. I am populating test data in a table and fetching them using cql client in Golang.
I am storing time series data in Cassandra, sorted by timestamp. I store data on a per-minute basis.
Schema is like this:
parent: string
child: string
bytes: int
val2: int
timestamp: date/time
I need to answer queries where a timestamp range is provided and a childname is given. The result needs to be the bytes value in that time range(Single value, not series) I made a primary key(child, timestamp). I followed this approach rather than the column-family, comparator-type with timeuuid since that was not supported in cql.
Since the data stored in every timestamp(every minute) is the accumulated value, when I get a range query for time t1 to t2, I need to find the bytes value at t2, bytes value at t1 and subtract the 2 values before returning. This works fine if t1 and t2 actually had entries in the table. If they do not, I need to find those times between (t1, t2) that have data and return the difference.
One approach I can think of is to "select * from tablename WHERE timestamp <= t2 AND timestamp >= t1;" and then find the difference between the first and last entry in this array of rows returned. Is this the best way to do it? Since MIN and MAX queries are not supported, is there is a way to find the maximum timestamp in the table less than a given value? Thanks for your time.
Are you storing each entry as a new row with a different partition key(first column in the Primary key)? If so, select * from x where f < a and f > b is a cluster wide query, which will cause you problems. Consider adding a "fake" partition key, or use a partition key per date / week / month etc. so that your queries hit a single partition.
Also, your queries in cassandra are >= and <= even if you specify > and <. If you need strictly greater than or less than, you'll need to filter client side.

Resources