I'm using 'GetMetadata' activity in my pipelines to get all the folders and child items and item types. But this activity is giving output in the JSON format which i'm unable to store the values to a variable so that i can iterate thru them. I need to store the Folders metadata in a sql table
Get Metadata activity sample output is like below.
{
"itemName": "ParentFolder",
"itemType": "Folder",
"childItems": [
{
"name": "ChildFolder1",
"type": "Folder"
},
{
"name": "ChildFolder2",
"type": "Folder"
},
{
"name": "ChildFolder3",
"type": "Folder"
}
],
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (North Europe)",
"executionDuration": 187
}
Can some help me to store the above json output of 'Get MetaData' Activity into a sql table like below.
The easiest way to do this is pass the Get MetaData output as a string to a stored proc and parse it in your sql db using OPENJSON.
This is how to convert the output to a string.
#string(activity('Get Metadata').output)
Now you just pass that to a stored proc and then use OPENJSON to parse it.
I have seen many others doing this using ADF foreach, however if you have 1000s of files/folders you will end up paying a lot for this method overtime. (each loop counts as an activity)
Related
I wanted to filter the entry in my Azure Storage Table and the structure looks like the following. I wanted to filter the entry's based on the given Id for example JD.98755. How can we achieve this?
{
"items": [
{
"selectionId": {
"Id": "JD.98755",
"status": 0
},
"Consortium": "xxxxxx",
"CreatedTime": "2019-09-06T09:34:07.551260+00:00",
"RowKey": "yyyyyy",
"PartitionKey": "zzzzzz-zzzzz-zz-zzzzz-zz",
"Timestamp": "2019-09-06T09:41:34.660306+00:00",
"etag": "W/\"datetime'2019-09-06T09%3A41%3A34.6603060Z'\""
}
],
"nextMarker": {}
}
I can filter other elements like the Consortium using the below query but not the Id
az storage entity query -t test --account-name zuhdefault --filter "Consortium eq 'test'"
I tried something like the following to filter based on the given ID but it has not returned any results.
az storage entity query -t test --account-name zuhdefault --filter "Id eq 'JD.98755'"
{
"items": [],
"nextMarker": {}
}
I do agree with #Gaurav Mantri and I guess one of other approach you can use is:
I have reproduced in my environment and got expected results as below:
Firstly, you need to store the output of the command into a variable like below:
I have stored output in $x variable:
$x
Then you can change the output from Json:
$r= $x | ConvertFrom-Json
Then you can store items.id value in a variable like below:
Now you can use below command to get the items with Id JD.98755:
If you have more data, then store the first output into variable then divide them into objects using ConvertFrom-json and then you use the above steps from first.
The reason you are not getting any data back is because Azure Table Storage is a simple key/value pair store and you are storing a JSON there (in all likelihood, the SDK serialized JSON data and stored it as string in Table Storage).
Considering there is no key named Id, you will not be able to search for that.
If you need to store JSON document, one option is to make use of Cosmos DB (with SQL API) instead of Table Storage. Other option would be to flatten your JSON so that you store them as key/value pair. In this scenario, your data would look something like:
{
"selectionId_Id": "JD.98755",
"selectionId_status": 0,
"Consortium": "xxxxxx",
"CreatedTime": "2019-09-06T09:34:07.551260+00:00",
"RowKey": "yyyyyy",
"PartitionKey": "zzzzzz-zzzzz-zz-zzzzz-zz",
"Timestamp": "2019-09-06T09:41:34.660306+00:00",
"etag": "W/\"datetime'2019-09-06T09%3A41%3A34.6603060Z'\""
}
then you should be able to filter by selectionId_Id.
I am pulling a list of pipeline using REST API in ADF. The API gives a nested JSON and I am using a for each iterator to be pulled via a stored procedure to be pulled into a SQL table.
The sample JSON looks like this:
{
"value": [
{
"id": "ADF/pipelines/v_my_data_ww",
"name": "v_my_data_ww",
"type": "MS.DF/factories/pipelines",
"properties": {
"activities": [
{
"name": "Loaddata_v_my_data_ww",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
}
]
},
"folder": {
"name": "myData"
},
"annotations": [],
"lastPublishTime": "2021-04-01T22:09:20Z"
,
"etag": "iaui7881919"
}
}
]
}
So using the For Each Iterator #activity('Web1').output.value, I was able to pull in the main keys, like id, name , type. However I also need to pull in the name/type from within the properties/activities tag. not sure how to do it. I was trying the following.
When I run the pipeline, I get the following error:
The expression 'item().properties.activities.name' cannot be evaluated because property 'name' cannot be selected. Array elements can only be selected using an integer index.
Any help in this regards will be immensely appreciated.
The error is because you are trying to access an array i.e., activities directly with a key. We need to access array elements with indices such as activities[0]
The following is a demonstration of how you can solve this. I got the same error when I used #item().properties.activities.name.
To demonstrate how to solve this, I have taken the given sample json as a parameter for pipeline and passing #pipeline().parameters.js.value in for each activity.
Now, I have used a set variable to show how I retrieved the name and type present in the activities property. The following dynamic content helps in achieving this.
#item().properties.activities[0].name
So, modify your prop_name and prop_type to the following dynamic content:
#item().properties.activities[0].name
#item().properties.activities[0].type
UPDATE:
If there are multiple objects inside activities property array, then you can follow the procedure below:
Create an execute pipeline activity to execute a pipeline called loop_activities. Create 4 parameters in loop_activities pipeline name, id, tp, activities. Pass the dynamic content as shown below:
In loop_activities pipeline, use for each to iterate through activities array. The dynamic content value for items field is #pipeline().parameters.activities.
Inside for each, you can access each required element with following dynamic content.
#for name:
#pipeline().parameters.name
#for id:
#pipeline().parameters.id
#for type:
#pipeline().parameters.type
#for prop_name:
#item().name
#for prop_type:
#item().type
The following is debug output of loop_activities (only for prop_name and prop_type) when I run pipeline1.
I'm using Azure Data Flow, and I'm using Union to combine two sources, so this union contains JSON documents. Is there a way to convert this JSON document to array of documents?
Union contains:
{"key":1,"value":"test8"}
{"key":2,"value":"test6"}
{"key":3,"value":"test3"}
What I'm looking for is a way to get like this format:
[
{
"key": 1,
"value": "test8"
},
{
"key": 2,
"value": "test6"
},
{
"key": 3,
"value": "test3"
}
]
Thanks for you help
You could use Aggregate transformation and use collect expression to combine all the JSON document and pass it to sink with JSON dataset. But this will not output the result exactly what you are looking for and gives aggregated column name in the output as shown below.
Aggregate:
Column1: collect(#(key=key,value=value))
Data flow Output:
As an alternative, you can copy the union JSON documents to the storage and use a copy data activity to get convert the JSON document to an array of documents.
Output:
I'm trying to archive old data from CosmosDB into Azure Tables but I'm very new to Azure Data Factory and I'm not sure what would be a good approach to do this. At first, I thought that this could be done with a Copy Activity but because the properties from my documents stored in the CosmosDB source vary, I'm getting mapping issues. Any idea on what would be a good approach to tackle this archiving process?
Basically, the way I want to store the data is to copy the document root properties as they are, and store the nested JSON as a serialized string.
For example, if I wanted to archive these 2 documents :
[
{
"identifier": "1st Guid here",
"Contact": {
"Name": "John Doe",
"Age": 99
}
},
{
"identifier": "2nd Guid here",
"Distributor": {
"Name": "Jane Doe",
"Phone": {
"Number": "12345",
"IsVerified": true
}
}
}
]
I'd like these documents to be stored in Azure Table like this:
identifier | Contact | Distributor
"Ist Guid here" | "{ \"Name\": \"John Doe\", \"Age\": 99 }" | null
"2nd Guid here" | null | "{\"Name\":\"Jane Doe\",\"Phone\":{\"Number\":\"12345\",\"IsVerified\":true}}"
Is this possible with the Copy Activity?
I tried using the mapping tab inside the CopyActivity, but when I try to run it I get an error saying that the dataType for one of the Nested JSON columns that are not present in the first row cannot be inferred.
Please follow my configuration in Mapping Tag.
Test output with your sample data:
i have a json file with the below format.
{
"results": [
{
"product": {
"code": "104AB001",
"names": [
{
"lang_code": "fr_CM",
"name": "BANOLE"
},
{
"lang_code": "f_CM",
"name": "BANOLE"
}
]
}
},
{
"product": {
"code": "104AB002",
"names": [
{
"lang_code": "fr_CM",
"name": "BANOLE"
},
{
"lang_code": "f_CM",
"name": "BANOLE"
}
]
}
}
]
}
I am using a copy activity and
"jsonNodeReference": "$.['results'][*].['product'].['names']",
"jsonPathDefinition": {
"product__code": "$.['results'][*].['product'].['code']",
"product__names__lang_code": "['lang_code']",
"product__names__name": "['name']"
}
The expected output is
product__code product__names__lang_code product__names__name
104AB001 fr_CM BANOLE
104AB001 f_CM BANOLE
104AB002 fr_CM BANOLE
104AB002 f_CM BANOLE
But i am getting
Azure data factory output as
When i did search in stack overflow and google, i got some info like it is not possible in azure data factory. below are the links
https://social.msdn.microsoft.com/Forums/en-US/5ebcef1f-5817-434c-9426-a83e9df35965/jsonnodereference-and-jsonpathdefinition-for-multiple-child-nodes?forum=AzureDataFactory
https://medium.com/#gary.strange/flattening-json-in-azure-data-factory-2f2130794258
My question is here, if it is not possible in azure data factory then what could be the other solution to achieve this.
Only one array can be flattened in a schema. Multiple arrays can be referenced—returned as one row containing all of the elements in the array. However, only one array can have each of its elements returned as individual rows. This is the current limitation with jsonPath.
However you can first convert json file with nested objects into CSV file using Logic App and then you can use the CSV file as input for Azure Data factory. Please refer below URL to understand how Logic App can be used to convert nested objects in json file to CSV.
[link] "https://adatis.co.uk/converting-json-with-nested-arrays-into-csv-in-azure-logic-apps-by-using-array-variable/"
Thanks