azure datafactory dataflow variable alternativ - azure

I need your help to solve a technical problem in ADF Dataflow.
data
my_dataflow
I need to merge rows where the dates cross or touch each other. The problem is that I can't manipulate a variable to increment in dataflow to get the following result (in yellow).
This result would allow me to group the rows together.
I don't know how implement it into dataflow
The logic will be :
CASE(nrow == 1, my_variable = 1,
endDate > startDate-1 || endDate == startDate-1,my_variable,
my_variable + 1
)
Do you have a solution to work around this problem?

Related

Timeseries differencing - ArangoDB (AQL or Python)

I have a collection which holds documents, with each document having a data observation and the time that the data was captured.
e.g.
{
_key:....,
"data":26,
"timecaptured":1643488638.946702
}
where timecaptured for now is a utc timestamp.
What I want to do is get the duration between consecutive observations, with SQL I could do this with LAG for example, but with ArangoDB and AQL I am struggling to see how to do this at the database. So effectively the difference in timestamps between two documents in time order. I have a lot of data and I don't really want to pull it all into pandas.
Any help really appreciated.
Although the solution provided by CodeManX works, I prefer a different one:
FOR d IN docs
SORT d.timecaptured
WINDOW { preceding: 1 } AGGREGATE s = SUM(d.timecaptured), cnt = COUNT(1)
LET timediff = cnt == 1 ? null : d.timecaptured - (s - d.timecaptured)
RETURN timediff
We simply calculate the sum of the previous and the current document, and by subtracting the current document's timecaptured we can therefore calculate the timecaptured of the previous document. So now we can easily calculate the requested difference.
I only use the COUNT to return null for the first document (which has no predecessor). If you are fine with having a difference of zero for the first document, you can simply remove it.
However, neither approach is very straight forward or obvious. I put on my TODO list to add an APPEND aggregate function that could be used in WINDOW and COLLECT operations.
The WINDOW function doesn't give you direct access to the data in the sliding window but here is a rather clever workaround:
FOR doc IN collection
SORT doc.timecaptured
WINDOW { preceding: 1 }
AGGREGATE d = UNIQUE(KEEP(doc, "_key", "timecaptured"))
LET timediff = doc.timecaptured - d[0].timecaptured
RETURN MERGE(doc, {timediff})
The UNIQUE() function is available for window aggregations and can be used to get at the desired data (previous document). Aggregating full documents might be inefficient, so a projection should do, but remember that UNIQUE() will remove duplicate values. A document _key is unique within a collection, so we can add it to the projection to make sure that UNIQUE() doesn't remove anything.
The time difference is calculated by subtracting the previous' documents timecaptured value from the current document's one. In the case of the first record, d[0] is actually equal to the current document and the difference ends up being 0, which I think is sensible. You could also write d[-1].timecaptured - d[0].timecaptured to achieve the same. d[1].timecaptured - d[0].timecaptured on the other hand will give you the inverted timestamp for the first record because d[1] is null (no previous document) and evaluates to 0.
There is one risk: UNIQUE() may alter the order of the documents. You could use a subquery to sort by timecaptured again:
LET timediff = doc.timecaptured - (
FOR dd IN d SORT dd.timecaptured LIMIT 1 RETURN dd.timecaptured
)[0]
But it's not great for performance to use a subquery. Instead, you can use the aggregation variable d to access both documents and calculate the absolute value of the subtraction so that the order doesn't matter:
LET timediff = ABS(d[-1].timecaptured - d[0].timecaptured)

How to Calculate date difference in hours in logic app

I have azure table with following properties
event_name (string)
event_date_time (dateTime)
status (string)
From logic app I need to fetch all records which satisfies following condition
current time <= event_date_time <= current_date_time + 24 hours
I am using logic Get Entities connector . I am not sure how to implement above logic .
Is there any date difference function which will return difference in hours ?
Similar solution with your another post, just use "Filter Query" like below:

How truncate time while querying documents for date comparison in Cosmos Db

I have document contains properties like this
{
"id":"1bd13f8f-b56a-48cb-9b49-7fc4d88beeac",
"name":"Sam",
"createdOnDateTime": "2018-07-23T12:47:42.6407069Z"
}
I want to query a document on basis of createdOnDateTime which is stored as string.
query e.g. -
SELECT * FROM c where c.createdOnDateTime>='2018-07-23' AND c.createdOnDateTime<='2018-07-23'
This will return all documents which are created on that day.
I am providing date value from date selector which gives only date without time so, it gives me problem while comparing date.
Is there any way to remove time from createdOnDateTime property or is there any other way to achieve this?
CosmosDB clients are storing timestamps in ISO8601 format and one of the good reasons to do so is that its lexicographical order matches the flow of time. Meaning - you can sort and compare those strings and get them ordered by time they represent.
So in this case you don't need to remove time components just modify the passed in parameters to get the result you need. If you want all entries from entire date of 2018-07-23 then you can use query:
SELECT * FROM c
WHERE c.createdOnDateTime >= '2018-07-23'
AND c.createdOnDateTime < '2018-07-24'
Please note that this query can use a RANGE index on createdOnDateTime.
Please use User Defined Function to implement your requirement, no need to update createdOnDateTime property.
UDF:
function con(date){
var myDate = new Date(date);
var month = myDate.getMonth()+1;
if(month<10){
month = "0"+month;
}
return myDate.getFullYear()+"-"+month+"-"+myDate.getDate();
}
SQL:
SELECT c.id,c.createdOnDateTime FROM c where udf.con(c.createdOnDateTime)>='2018-07-23' AND udf.con(c.createdOnDateTime)<='2018-07-23'
Output :
Hope it helps you.

SSAS How to scope a calculated member measure to a number of specific members?

I'm trying to create a calculated member measure for a subset of a group of locations. All other members should be null. I can limit the scope but then in the client (excel in this case) the measure does not present the grand total ([Group].[Group].[All]).
CREATE MEMBER CURRENTCUBE.[Measures].[Calculated Measure]
AS (
Null
),
FORMAT_STRING = "$#,##0.00;-$#,##0.00",
NON_EMPTY_BEHAVIOR = { [Measures].[Places] }
,VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Locations';
-----------------------------------------------------------------------------------------
SCOPE ({([Group].[Group].&[location 1]),
([Group].[Group].&[location 2]),
([Group].[Group].&[location 3]),
([Group].[Group].&[location 4]),
([Group].[Group].&[location 5])
}, [Measures].[Calculated Measure]);
// Location Calculations
THIS = (
[Measures].[Adjusted Dollars] - [Measures].[Adjusted Dollars by Component] + [Measures].[Adjusted OS Dollars]
);
END SCOPE;
It's as though the [Group].[Group].[All] member is outside of the scope so it won't aggregate the rest of the members. Any help would be appreciated.
Your calculation is applied after all calculations already happened. You can get around this by adding Root([Time]) to the scope, assuming your time dimension is named [Time]. And if you want to aggregate across more dimensions, you would have to add them all to the SCOPE.
In most cases when you have a calculation that you want too do before aggregation it is more easy to define the calculation e. g. in the DSV, e. g. with an expression like
CASE WHEN group_location in(1, 2, 3, 4) THEN
Adjusted_dollars - adjusted_dollars_by_comp + adjusted_os_dollars
ELSE NULL
END
and just make a standard aggregatable measure from it.
I've searched high and low for this answer. After reading the above suggestion, I came up with this:
Calculate the measure in a column in the source view table
(isnull(a,0) - isnull(b,0)) + isnull(c,0) = x
Add the unfiltered calculated column (x) to the dsv
Create a named calculation in the dsv that uses a case statement to filter the original calc measure CASE WHEN location IN ( 1,2,3)THEN xELSE NULLEND
Add the named calculation as measure
I choose to do it this way to capture the measure unfiltered first then, if another filter needs to be added or one needs to be taken off, I can do so without messing with the views again. I just add the new member to filter by to my named calculation case statement. I tried to insert the calculation directly into a named calculation in the dsv that filtered it as well but the calculation produced the incorrect results.

Query WadPerformanceCountersTable in Increments?

I am trying to query the WadPerformanceCountersTable generated by Azure Diagnostics which has a PartitionKey based on tick marks accurate up to the minute. This PartitionKey is stored as a string (which I do not have any control over).
I want to be able to query against this table to get data points for every minute, every hour, every day, etc. so I don't have to pull all of the data (I just want a sampling to approximate it). I was hoping to using the modulus operator to do this, but since the PartitionKey is stored as a string and this is an Azure Table, I am having issues.
Is there any way to do this?
Non-working example:
var query =
(from entity in ServiceContext.CreateQuery<PerformanceCountersEntity>("WADPerformanceCountersTable")
where
long.Parse(entity.PartitionKey) % interval == 0 && //bad for a variety of reasons
String.Compare(entity.PartitionKey, partitionKeyEnd, StringComparison.Ordinal) < 0 &&
String.Compare(entity.PartitionKey, partitionKeyStart, StringComparison.Ordinal) > 0
select entity)
.AsTableServiceQuery();
If you just want to get a single row based on two different time interval (now and N time back) you can use the following query which returns the single row as described here:
// 10 minutes span Partition Key
DateTime now = DateTime.UtcNow;
// Current Partition Key
string partitionKeyNow = string.Format("0{0}", now.Ticks.ToString());
DateTime tenMinutesSpan = now.AddMinutes(-10);
string partitionKeyTenMinutesBack = string.Format("0{0}", tenMinutesSpan.Ticks.ToString());
//Get single row sample created last 10 mminutes
CloudTableQuery<WadPerformanceCountersTable> cloudTableQuery =
(
from entity in ServiceContext.CreateQuery<PerformanceCountersEntity>("WADPerformanceCountersTable")
where
entity.PartitionKey.CompareTo(partitionKeyNow) < 0 &&
entity.PartitionKey.CompareTo(partitionKeyTenMinutesBack) > 0
select entity
).Take(1).AsTableServiceQuery();
The only way I can see to do this would be to create a process to keep the Azure table in sync with another version of itself. In this table, I would store the PartitionKey as a number instead of a string. Once done, I could use a method similar to what I wrote in my question to query the data.
However, this is a waste of resources, so I don't recommend it. (I'm not implementing it myself, either.)

Resources