Azure Data Factory Lookup activity fail to execute CosmosDb query 'Cross partition query only supports 'VALUE ' for aggregates.' - azure

I am trying to configure Azure Data Factory Lookup activity to get MAX datetime field value from CosmosDb container. But unfortunately simplest query just totally doesn't work, the query is
SELECT max(members.lastModifiedOn) as dt FROM members
In CosmosDb control panel we see results
[
{
"dt": "2020-09-01T07:32:03.6733333"
}
]
But in Azure Data Factory preview we see nothing but error
One or more errors occurred.
Message: {"Errors":["Cross partition query only supports 'VALUE ' for aggregates."]}

The only one trick which I've found is to execute query for Lookup activity in such weird format
SELECT VALUE r FROM (SELECT MAX(m.lastModifiedOn) as lastModifiedOn FROM m) as r
Seems Azure Data Factory Lookup activity is expecting to have array of object as the result of query execution

Related

In Azure Data Factory, how do I pass the Index of a ForEach as a parameter properly

Sorry if this is a bit vague or rambly, I'm still getting to grips with Data Factory and a lot of it seems a bit obtuse...
What I want to do is query my Cosmos Database for a list of Ids of records that need to be updated. For each of these records, I want to call a REST API using the Id (i.e. /Record/{Id}/Details)
I've created a Data Flow that took a string as a parameter and then called the REST API fine.
I then made a pipeline using a Lookup with a query (select c.RecordId from c where...) and pass that into a ForEach with items set to #activity('Lookup1').output.value
I then setup the Activity of the ForEach to my Data flow. From research, I think I'm supposed to set the Parameter value to "#item().RecordId", but that gives an error "parameter [name] does not match parameter type 'string'".
I can change the type of the parameter to any (and use toString([parameter]) to cast it ) and then when I try and debug it passes the parameter in, but it gives an error of "Job failed due to reason: at (Line 2/Col 14): Datatype any not found".
I'm not sure what the solution is. Is there a way to cast the result of the lookup to an integer or string? Is there a way to narrow an any down? Is there a better way than toString() that would work? Is there a better way than ForEach?
I tried to reproduce similar scenario what you are trying.
My sample data in cosmos
To query Cosmos Database for a list of Ids and call a REST API using the Id For each of these records.
First, I took Lookup activity in data factory and selected the id's where the last_name is Bluth
Its output and settings are as below:
Then I passed the output of lookup activity to For-each activity.
Then inside for each activity I created Dataflow activity and for that DataSource I gave the source as Rest API. My Rest API to call specific user is https://reqres.in/api/users/2 I gave base URL as https://reqres.in/api/users.
Then I created parameter called demoId as datatype string and in relative URL I gave that dynamic value as #dataset().demoId
After this I gave value source parameter as #item().id as after https://reqres.in/api/users there is only id should be provided to get data in you case you can try Record/#{item().id}/Details.
For each id it is successfully passing id to rest API and fetching data:

Azure Data Factory Pipeline - Store single-value source query output as a variable to then use in Copy Data activity

I am looking to implement an incremental table loading pipeline in ADF. I want to execute a query to get the latest timestamp from the table in an Azure SQL database. Then, store this value as a variable in ADF so I can then reference it in the "Source" query of a Copy Data activity.
The goal is to only request data from an API with a timestamp greater than the latest timestamp in the SQL table.
Is this functionality possible within ADF pipelines? or do I need to look to Azure functions or Data Flows?
This is definitely possible with Data Factory. You could use the Lookup Activity or a Stored Procedure, but the team just released the new Script Activity:
This will return results like so:
{
"resultSetCount": 1,
"recordsAffected": 0,
"resultSets": [
{
"rowCount": 1,
"rows": [
{
"MaxDate": "2018-03-20"
}
]
...
}
Here is the expression to read this into a variable:
#activity('Script1').output.resultSets[0].rows[0].MaxDate

Azure Data Factory Error: "incorrect syntax near"

I'm trying to do a simple incremental update from an on-prem database as source to Azure SQL database based on a varchar column called "RP" in On-Prem database that contains "date+staticdescription" for example: "20210314MetroFactory"
1- I've created a Lookup activity called Lookup1 using a table created in Azure SQL Database and uses this Query
"Select RP from SubsetwatermarkTable"
2- I've created a Copy data activity where the source settings have this Query
"Select * from SourceDevSubsetTable WHERE RP NOT IN '#{activity('Lookup1').output.value}'"
When debugging -- I'm getting the error:
Failure type: User configuration issue
Details: Failure happened on 'Source' side.
'Type=System.Data.SqlClient.SqlException,Message=Incorrect syntax near
'[{"RP":"20210307_1Plant
1KAO"},{"RP":"20210314MetroFactory"},{"RP":"20210312MetroFactory"},{"RP":"20210312MetroFactory"},{"RP":"2'.,Source=.Net
SqlClient Data
Provider,SqlErrorNumber=102,Class=15,ErrorCode=-2146232060,State=1,Errors=[{Class=15,Number=102,State=1,Message=Incorrect
syntax near
'[{"RP":"20210311MetroFactory"},{"RP":"20210311MetroFactory"},{"RP":"202103140MetroFactory"},{"RP":"20210308MetroFactory"},{"RP":"2'.,},],'
Can anyone tell me what I am doing wrong and how to fix it even if it requires creating more activities.
Note: There is no LastModifiedDate column in the table. Also I haven't yet created the StoredProcedure that will update the Lookup table when it is done with the incremental copy.
Steve is right as to why it is failling and the query you need in the Copy Data.
As he says, you want a comma-separated list of quoted values to use in your IN clause.
You can get this more easily though - from your Lookup directly using this query:-
select stuff(
(
select ','''+rp+''''
from subsetwatermarktable
for xml path('')
)
, 1, 1, ''
) as in_clause
The sub-query gets the comma separated list with quotes around each rp-value, but has a spurious comma at the start - the outer query with stuff removes this.
Now tick the First Row Only box on the Lookup and change your Copy Data source query to:
select *
from SourceDevSubsetTable
where rp not in (#{activity('lookup').output.firstRow.in_clause})
The result of #activity('Lookup1').output.value is an array like your error shows
[{"RP":"20210307_1Plant
1KAO"},{"RP":"20210314MetroFactory"},{"RP":"20210312MetroFactory"},{"RP":"20210312MetroFactory"},{"RP":"2'.,Source=.Net
SqlClient Data
Provider,SqlErrorNumber=102,Class=15,ErrorCode=-2146232060,State=1,Errors=[{Class=15,Number=102,State=1,Message=Incorrect
syntax near
'[{"RP":"20210311MetroFactory"},{"RP":"20210311MetroFactory"},{"RP":"202103140MetroFactory"},{"RP":"20210308MetroFactory"},{"RP":"2'.,},]
However, your SQL should be like this:Select * from SourceDevSubsetTable WHERE RP NOT IN ('20210307_1Plant 1KAO','20210314MetroFactory',...).
To achieve this in ADF, you need to do something like this:
create three variables like the following screenshot:
loop your result of #activity('Lookup1').output.value and append 'item().RP' to arrayvalues:
expression:#activity('Lookup1').output.value
expression:#concat(variables('apostrophe'),item().RP,variables('apostrophe'))
3.cast arrayvalues to string and add parentheses by Set variable activity
expression:#concat('(',join(variables('arrayvalues'),','),')')
4.copy to your Azure SQL database
expression:Select * from SourceDevSubsetTable WHERE RP NOT IN #{variables('stringvalues')}

Unable to get scalar value of a query on cosmos db in azure data factory

I am trying to get the count of all records present in cosmos db in a lookup activity of azure data factory. I need this value to do a comparison with other value activity outputs.
The query I used is SELECT VALUE count(1) from c
When I try to preview the data after inserting this query I get an error saying
One or more errors occurred. Unable to cast object of type
'Newtonsoft.Json.Linq.JValue' to type 'Newtonsoft.Json.Linq.JObject'
as shown in the below image:
snapshot of my azure lookup activity settings
Could someone help me in resolving this error and if this is the limitation of azure data factory how can I get the count of all the rows of the cosmos db document using some other ways inside azure data factory?
I reproduce your issue on my side exactly.
I think the count result can't be mapped as normal JsonObject. As workaround,i think you could just use Azure Function Activity(Inside Azure Function method ,you could use SDK to execute any sql as you want) to output your desired result: {"number":10}.Then bind the Azure Function Activity with other activities in ADF.
Here is contradiction right now:
The query sql outputs a scalar array,not other things like jsonObject,or even jsonstring.
However, ADF Look Up Activity only accepts JObject,not JValue. I can't use any convert built-in function here because the query sql need to be produced with correct syntax anyway. I already submitted a ticket to MS support team,but get no luck with this limitation.
I also tried select count(1) as num from c which works in the cosmos db portal. But it still has limitation because the sql crosses partitions.
So,all i can do here is trying to explain the root cause of issue,but can't change the product behaviours.
2 rough ideas:
1.Try no-partitioned collection to execute above sql to produce json output.
2.If the count is not large,try to query columns from db and loop the result with ForEach Activity.
You can use:
select top 1 column from c order by column desc

How to get the Source Data When Ingestion Failure in KUSTO ADX

I have a base table in ADX Kusto DB.
.create table base (info:dynamic)
I have written a function which parses(dynamic column) the base table and greps a few columns and stores it in another table whenever the base table gets data(from EventHub). Below function and its update policy
.create function extractBase()
{
base
| evaluate bag_unpack(info)
| project tostring(column1), toreal(column2), toint(column3), todynamic(column4)
}
.alter table target_table policy update
#'[{"IsEnabled": true, "Source": "base", "Query": "extractBase()", "IsTransactional": false, "PropagateIngestionProperties": true}]'
suppose if the base table does not contain the expected column, ingestion error happens. how do I get the source(row) for the failure?
When using .show ingestion failures, it displays the failure message. there is a column called IngestionSourcePath. when I browse the URL, getting an exception as Resource Not Found.
If ingestion failure happens, I need to store the particular row of base table into IngestionFailure Table. for further investigation
In this case, your source data cannot "not have" a column defined by its schema.
If no value was ingested for some column in some row, a null value will be present there and the update policy will not fail.
Here the update policy will break if the original table row does not contain enough columns. Currently the source data for such errors is not emitted as part of the failure message.
In general, the source URI is only useful when you are ingesting data from blobs. In other cases the URI shown in the failed ingestion info is a URI on an internal blob that was created on the fly and no one has access to.
However, there is a command that is missing from documentation (we will make sure to update it) that allows you to duplicate (dump to storage container you provide) the source data for the next failed ingestion into a specific table.
The syntax is:
.dup-next-failed-ingest into TableName to h#'Path to Azure blob container'
Here the path to Azure Blob container must include a writeable SAS.
The required permission to run this command is DB admin.

Resources