azure data factory syntax for scheduled slice activity - azure

I'm using Azure Data Factory(V2) to schedule a copy pipeline activity - the requirement is that every day the job should run and select everything from a table, from the last 5 days. I have scheduled the copy and tried the following syntax in the source dataset:
select * from [dbo].[aTable] where [aDate] >= '#{formatDateTime(adddays(pipeline().parameters.windowStart, 'yyyy-MM-dd HH:mm' ),-5)}'
But this doesn't work, I'm getting an error stating that adddays is expecting an int for it's second parameter but is receiving a string.
Can anybody advise on the proper way to nest this??
Thanks

I cant test this right now, so I'll risk a possible answer just by looking at your query. I think it should be like this:
select * from [dbo].[aTable] where [aDate] >= '#{formatDateTime(adddays(pipeline().parameters.windowStart, -5), 'yyyy-MM-dd HH:mm')}'
Hope this helps!

Related

Cassandra - SyntaxException: line ... no viable alternative at input in select statement

I am trying to select data from cassandra db using below query but it is failing-
SELECT id from keyspace.table where code=123 and toTimestamp(now()) >= some_date;
Error- SyntaxException: line 1:103 no viable alternative at input '(' (...table where code=123 and [toTimestamp](...)
Looks like toTimestamp(now()) is causing the issue.
Can someone plz suggest what is the issue and solution to this?
Thanks.
You can't use functions in the WHERE statement. So the only workaround is to get current time inside your application, and pass it to the query. This request is tracked as CASSANDRA-8488.
But in reality, your query should have condition on some column, not on the calculated value.

Stream Analytics UDF works in Test but not in Job

I need to parse a JSON data in Stream Analytics,
Below is the sample is am using,
SELECT
UDF.parseData(GetRecordPropertyValue(GetArrayElement(A.message,0), 'raw')).intent as 'rawData'
FROM
AppInsightMessages A
I can able to parse the intent from the field. This is a custom logging required.
However it is not working in Stream analytics job.
I am getting error like
Stream Analytics job has validation errors: Query compilation error: Expression is not supported: 'udf . parseData
Tried with CAST ing to string to record also. no luck.
What I am doing wrong ?
thanks in advance ..
Usually, this is due to trying to merge multiple stages into a single expression.
Please try splitting the processing to several steps:
With UDFStep AS (
SELECT
UDF.parseData(GetRecordPropertyValue(GetArrayElement(A.message,0), 'raw'))
FROM
AppInsightMessages A
)
SELECT intent as rawData
FROM UDFStep
BTW, you don't need to quote the 'rawData'.

Data factory SAP BW

I'm trying to use the Data Factory in Azure to export data from a SAP BW. The connection is working, and I'm able to get data. The problem is how I'm getting the data. The picture describes the issue pretty well.
Has anyone encountered something similar? Any tips on how to approach this issue? Any help is greatly appreciated!
Query like:
SELECT
[Measures].<<Measure>> ON COLUMNS,
NON EMPTY
{<<Dimension>>.MEMBERS,
<<Dimension>>.MEMBERS} ON ROWS
FROM <<Cube>>
Picture:
https://i.stack.imgur.com/9Gxfh.png
Best regards,
This is how your query should look like.
select Measures.Value on columns,
nonempty
(
DimPlan.Plan.Plan,
DimCategory.Category.Category,
DimProduct.Product.Product
)
on rows
from YourCube
Looks like you are getting the ALL members of each hierarchy coming through into the results.
Very similar to MoazRubs answer but avoiding needing to use the NonEmpty function - you can simply cross-join the hierarchies via the * operator:
SELECT
Measures.Value ON 0,
DimPlan.Plan.Plan.MEMBERS *
DimCategory.Category.Category.MEMBERS *
DimProduct.Product.Product.MEMBERS
ON 1
FROM YourCube;

USQL nested query performance

I have a USQL query that runs fine on it's own against 400M records in a managed table.
But during development, I don't want to run it against all records all the time, so I pop a where clause in, run it for a tiny subsection of data, and it completes in around 2 minutes (#5 AUs), writing out results to a tsv in my data lake.
Happy with that.
However, I now want to use it as the source for a second query and further processing.
So I create a view with the original USQL (minus the where clause).
Then to test, a new script :
'Select * from MyView WHERE <my original test filter>'.
Now I was expecting that to execute in around the same time as the original raw query. But instead I got to 4 minutes, only 10% through the plan, and cancelled - something is not right.
No expert at reading Job Graphs, but ...
The original script kicks off with 2* 'Extract Combine partition' both reading a couple of hundered MBs, my select on the saved View is reading over 100GB !!
So it is not taking the where clause into account at all at this stage.
Obviously this shows how little I yet understand about how DLA works behind the scenes !
Would someone please help me understand (a) what is going on and (b) a path forward to get the behavior I need ?
Currently having a play with stored procedures to store the 1st result in a table and then call the second query against that - but just seems overkill compared with 'traditional' SQL Server ?!?
All pointers & hints appreciated !
Many Thanks
Original Base Query:
CREATE VIEW IF NOT EXISTS Play.[M3_CycleStartPoints]
AS
//#BASE =
SELECT ROW_NUMBER() OVER (PARTITION BY A.[CTNNumber] ORDER BY A.[SeqNo]) AS [CTNCycleNo], A.[CTNNumber], A.[SeqNo], A.[BizstepDescription], A.[ContainerStatus], A.[FillStatus]
FROM
[Play].[RawData] AS A
LEFT OUTER JOIN
(
SELECT [CTNNumber],[SeqNo]+1 AS [SeqNo],[FillStatus],[ContainerStatus],[BizstepDescription]
FROM [Play].[RawData]
WHERE [FillStatus] == "EMPTY" AND [AssetUsage] == "CYLINDER"
) AS B
ON A.[CTNNumber] == B.[CTNNumber] AND A.[SeqNo] == B.[SeqNo]
WHERE (
(A.[FillStatus] == "FULL" AND
A.[AssetUsage] == "CYLINDER" AND
B.[CTNNumber] == A.[CTNNumber]
) OR (
A.[SeqNo] == 1
)
);
//AND A.[CTNNumber] == "BE52XH7";
//Only used to test when running script as stand-alone & output to tsv
Second Query
SELECT *
FROM [Play].[M3_CycleStartPoints]
WHERE [CTNNumber] == "BE52XH7";
Ok, I think I've got this, or at least in part.
Table valued Functions
http://www.sqlservercentral.com/articles/U-SQL/146839/
to allow the passing of an argument to a view and return the result.
Would be interested in finding some reading material around this subject still though.
Coming from a T-SQL world, seems that there are some fundamental differences I'm still tripping over.

Alternative to over(partition...) function, because it is not supported

The solution to this question might be simple, but I can't translate other posts about this topic into my own script.
I'm looking for a query to select the highest delivery time for each consignment number, since a consignment can have more than one delivery time's, because it can have more than one parcels.
I came up with this query, and it works fine when I'm using SQL server.
select
DELIVERYTIME
from (
select
h_parcel.CONSIGNMENT, S_PARCEL.DELIVERYTIME,
(row_number() over(partition by h_parcel.CONSIGNMENT order by S_PARCEL.DELIVERYTIME desc)) as rn
from
S_PARCEL
inner join
h_parcel on h_parcel.h_parcel = s_parcel.h_parcel) as t
where
t.rn = 1
This code is used to fill a column in an ETL process, which is done in Visual Studio. Visual Studio does not support the function over(partition by....), so this code has to be translated into a code without the partition function. Can someone please help me :)?
Thanks.

Resources