Get Nested Output Dynamically in Azure Data Factory

Get Nested Output Dynamically in Azure Data Factory - azure

I want to build an expression in ADF via concatenation, then evaluate the nested expression.
Basically, I have a Web Activity which is returning json output. I need to access an element of the output that has multiple possible keys, and can be nested at multiple levels. I want to use pipeline parameters to access my desired element regardless of the key or level it resides at.
Here is a sample input:
{
"status": "OK",
"code": 200,
"timestamp": "2020-11-02T15:22:59Z",
"messages": [],
"result": {}
"paging" : {"total_count" : 1000}
}
I can grab the desired output statically like this:
#{activity('callAPI').output['paging']['total_count']}
I can also generate the above expression dynamically like this:
#{concat('activity(''callAPI'').output', pipeline().parameters.myPipelineParam)}
However, once I create the expression via concatenation, I can't figure out how to also evaluate it in the same expression.
Any ideas on how to do this, or perhaps a better method I'm not seeing?

Related

Azure Data Factory REST API return invalid JSON file with pagination

I'm building a pipeline, which copy a response from a API into a file in my storage account. There is also an element of pagination. However, that works like a charm and i get all my data from all the pages.
My result is something like this:
{"data": {
"id": "Something",
"value": "Some other thing"
}}
The problem, is that the copy function just appends the response to the file and thereby making it invalid JSON, which is a big problem further down the line. The final output would look like:
{"data": {
"id": "22222",
"value": "Some other thing"
}}
{"data": {
"id": "33333",
"value": "Some other thing"
}}
I have tried everything I could think of and google my way to, but nothing changes how the data is appended to the file and i'm stuck with an invalid JSON file :(
As a backup plan, i'll just make a loop and create a JSON file for each PAGE. But that seems a bit janky and really slow
Anyone got an idea or have a solution for my problem?

When you copy data from Rest API to blob storage it will copy data in the form of set of objects by default.
Example:
sample data
{ "time": "2015-04-29T07:12:20.9100000Z", "callingimsi": "466920403025604"}
sink data
{"time":"2015-04-29T07:12:20.9100000Z","callingimsi":"466920403025604"}
{"time":"2015-04-29T07:13:21.0220000Z","callingimsi":"466922202613463"}
{"time":"2015-04-29T07:13:21.4370000Z","callingimsi":"466923101048691"}
This is the invalid format of Json.
To work around this, select file pattern in sink activity setting as Array of objects this will return array of all objects.
Output:

Dynamic REST calls in Azure Synapse Pipeline

I am making a call to a REST API with Azure Synapse and the return dataset looks something like this:
{
"links": [
{
"rel": "next",
"href": "[myRESTendpoint]?limit=1000&offset=1000"
},
{
"rel": "last",
"href": "[myRESTendpoint]?limit=1000&offset=60000"
},
{
"rel": "self",
"href": "[myRESTendpoint]"
}
],
"count": 1000,
"hasMore": true,
"items": [
{
"links": [],
"closedate": "6/16/2014",
"id": "16917",
"number": "62000",
"status": "H",
"tranid": "0062000"
},...
],
"offset": 0,
"totalResults": 60316
}
I am familiar with making a REST call to a single endpoint that can return all the the data with a single call using a Synapse pipeline, but this particular REST endpoint has a hard limit on only returning 1000 records, but it does give a property named "hasMore".
Is there a way to recursively make rest calls in a Synapse pipeline until the "hasMore" property equals false?
The end goal of this is to sink data to either a dedicated SQL pool or into ADLS2 and transform from there.

I have tried to achieve the same scenario using Azure Data Factory which seems to be more appropriate and easy to achieve the goal "The end goal of this is to sink data to either a dedicated SQL pool or into ADLS2 and transform from there".
As you have to hit the page recursively to fetch 1000 records , you might set it in the following fashion if the response header/response body contain the URL for the next page.
You're less likely to be able to use the functionality if the next page link or query parameter isn't included in the response headers/body.
Alternatively, you may utilise loop logic and do the Copy Activity.
Create two parameters in the Rest Connector:
Fill in the parameters for the RestConnector's relative URL.
Using the Set Variable action, the value of this variable would be increased in a loop. For each cycle, the URL for the Copy Activity is dynamically set.If you want to loop or iterate, you may use the Until activity.
Alternative:
In my experience, the REST connection pagination is quite rigid. Usually put the action within a loop. As a result, to have more control.
FOREACH Loop, here

For those following the thread, I used IpsitaDash-MT's suggestion using the ForEach loop. In the case of this API, when a call is made I get a property returned at the end of the call named "totalResults". Here are the steps I used to achieve what I was looking to do:
Make a dummy call to the API to get the "totalResults" parameter. This is just a call to return the number of results I am looking to get. In the case of this API, the body of the request is a SQL statement, so when the dummy request is made I am only asking for the ID's of the results I am looking to get.
SQL statement example
I then take the property "totalResults" from that request set a dynamic value in the "Items" of the ForEach loop like this:
#range(0,add(div(sub(int(activity('Get Pages Customers').output.totalResults),mod(int(activity('Get Pages Customers').output.totalResults),1000)),1000),1))
NOTE: The API only allows pages of 1000 results, I do some math to get a range of page numbers. I also have to add 1 to the final result to include the last page.
ForEach Loop Settings
In the API I have two parameters that can be passed "limit" and "offset". Since I want all of the data there is no reason to have limit set to anything other than 1000 (the max allowable number). The offset parameter can be set to any number less than or equal to "totalResults" - "limit" and greater than or equal to 0. So I use the range established in step 2 and multiply it out by 1000 to set the offset parameter in the URL.
Setting the offset parameter in the copy data activity
Dynamic value of the Relative URL in the REST connector
NOTE: I found it better to sink the data as JSON into ADLS2 first rather than into a dedicated SQL pool due to the Lookup feature.
Since synapse does not allow nested ForEach loops, I run the data through a data flow to format the data and check for duplicates and updates.
When the data flow is completed it kicks off a lookup activity to get the data that was just processed and pass it into a new pipeline to use another ForEach loop to get the child data for each ID of parent data.
Data Flow and Lookup for child data pipeline

How to use values passed in HTTP Request in Logic Apps / Assign Values to Logic App Parameters Dynamically

I am trying to make a generic Logic App(LA) to do some processing on some files. Calling the Logic App from ADF and able to pass the correct File Names. However I am not able to use/assign values passed to the Logic App to the parameters defined in the LA. What am I Missing ? Please see the screenshot.
-Thanks
Sample Execution to show the names are passed properly.

As far as I know, we can't assign PRM_FileName from the body of the request to one parameter. But we can use expression to get the value of PRM_FileName.
The expression should be triggerBody()?['PRM_FileName']. You can also assign PRM_FileName to a variable (for example named var1) and you can use the var1 in your next actions but not use the expression(shown as below screenshot).
============================Update===========================
Below is my logic app:
I did everything what you mentioned in your 3 steps except I put the PRM_FileName in the body of the request but not appending it at the end of url.
============================Update 2===========================
Please use same schema with mine:
{
"type": "object",
"properties": {
"PRM_FileName": {
"type": "string"
}
}
}
And then select the PRM_FileName into the variable directly(shown as below screenshot).
The expression should be triggerBody()?['PRM_FileName'], but in your screenshot the expression is triggerOutputs()['queries']['PRM_FileName'].

Azure Resource Manager template chained functions

I am trying to remove / from the URL using azure function before assigning to output variable value
"webappStorageUri":{
"type": "string",
"value": "[take(reference(resourceId('Microsoft.Storage/storageAccounts', variables('webappStorageName'))).primaryEndpoints.web, length(reference(resourceId('Microsoft.Storage/storageAccounts', variables('webappStorageName'))).primaryEndpoints.web)-1]"
}
Returned value from length function should be the value for take function. This is not working. I get following error on deployment. I don't get anything out of this error message. Does Azure support chained function execution? Is this is right approach to remove / from the URL?
Error message
[error]Deployment template language expression evaluation failed: 'Unable to parse language expression 'take(reference(resourceId('Microsoft.Storage/storageAccounts', variables('webappStorageName'))).primaryEndpoints.web, length(reference(resourceId('Microsoft.Storage/storageAccounts', variables('webappStorageName'))).primaryEndpoints.web)-1': expected token 'RightParenthesis' and actual 'Integer'.'. Please see https://aka.ms/arm-template-expressions for usage details.

I'm not sure what you are trying to achieve, but your function has issues with brackets, and you cannot really substract by appending -1 in a random place.
"[take(reference(variables('webappStorageName')).primaryEndpoints.web,
sub(length(reference(variables('webappStorageName')).primaryEndpoints.web), 1))]"
line breaks for readability only

Kudu nested field

I have questions about Kudu with nested fields.
I have JSON from Kafka like this:
{
"ts": 32,
"status": "success",
"uid": "3232",
"url": "http://some_url",
"syncpixel": "http://some_url",
"dfp": {
"DFP_UABrowser": "Chrome 61",
"DFP_UAOperatingSystem": "Windows 7 ver.7.0",
"JavascriptDisplayData_Screen_W_x_H": "1440 x 900",
"Native_client": true
}
}
dfp field has a nested object, I want to insert this object to kudu through Flume
I know that kudu does not support nested field, and supported binary column.
What do I need to do?
Convert field dfp to binary format and read for example scala spark?
Turn JSON in flatten format (but in many cases is not best issue, something like streaming product purchase with product id, name and other or products view in page).

If you use spark/scala streaming will not be and issue when you have proper setup cluster.
Read the entire json through spark and use "explode" function to flatten the json.
This will make life easier.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string