I'm using Azure Data Flow, and I'm using Union to combine two sources, so this union contains JSON documents. Is there a way to convert this JSON document to array of documents?
Union contains:
{"key":1,"value":"test8"}
{"key":2,"value":"test6"}
{"key":3,"value":"test3"}
What I'm looking for is a way to get like this format:
[
{
"key": 1,
"value": "test8"
},
{
"key": 2,
"value": "test6"
},
{
"key": 3,
"value": "test3"
}
]
Thanks for you help
You could use Aggregate transformation and use collect expression to combine all the JSON document and pass it to sink with JSON dataset. But this will not output the result exactly what you are looking for and gives aggregated column name in the output as shown below.
Aggregate:
Column1: collect(#(key=key,value=value))
Data flow Output:
As an alternative, you can copy the union JSON documents to the storage and use a copy data activity to get convert the JSON document to an array of documents.
Output:
Related
I am pulling a list of pipeline using REST API in ADF. The API gives a nested JSON and I am using a for each iterator to be pulled via a stored procedure to be pulled into a SQL table.
The sample JSON looks like this:
{
"value": [
{
"id": "ADF/pipelines/v_my_data_ww",
"name": "v_my_data_ww",
"type": "MS.DF/factories/pipelines",
"properties": {
"activities": [
{
"name": "Loaddata_v_my_data_ww",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
}
]
},
"folder": {
"name": "myData"
},
"annotations": [],
"lastPublishTime": "2021-04-01T22:09:20Z"
,
"etag": "iaui7881919"
}
}
]
}
So using the For Each Iterator #activity('Web1').output.value, I was able to pull in the main keys, like id, name , type. However I also need to pull in the name/type from within the properties/activities tag. not sure how to do it. I was trying the following.
When I run the pipeline, I get the following error:
The expression 'item().properties.activities.name' cannot be evaluated because property 'name' cannot be selected. Array elements can only be selected using an integer index.
Any help in this regards will be immensely appreciated.
The error is because you are trying to access an array i.e., activities directly with a key. We need to access array elements with indices such as activities[0]
The following is a demonstration of how you can solve this. I got the same error when I used #item().properties.activities.name.
To demonstrate how to solve this, I have taken the given sample json as a parameter for pipeline and passing #pipeline().parameters.js.value in for each activity.
Now, I have used a set variable to show how I retrieved the name and type present in the activities property. The following dynamic content helps in achieving this.
#item().properties.activities[0].name
So, modify your prop_name and prop_type to the following dynamic content:
#item().properties.activities[0].name
#item().properties.activities[0].type
UPDATE:
If there are multiple objects inside activities property array, then you can follow the procedure below:
Create an execute pipeline activity to execute a pipeline called loop_activities. Create 4 parameters in loop_activities pipeline name, id, tp, activities. Pass the dynamic content as shown below:
In loop_activities pipeline, use for each to iterate through activities array. The dynamic content value for items field is #pipeline().parameters.activities.
Inside for each, you can access each required element with following dynamic content.
#for name:
#pipeline().parameters.name
#for id:
#pipeline().parameters.id
#for type:
#pipeline().parameters.type
#for prop_name:
#item().name
#for prop_type:
#item().type
The following is debug output of loop_activities (only for prop_name and prop_type) when I run pipeline1.
I am using Alteryx to take an Excel file and convert to JSON. The JSON output I'm getting looks different to what I was expecting and also the object starts with "JSON": which I don't want to happen and I would also like to know how/which components I would use to map fields to specific JSON fields instead of key value pairs if I need to later in the flow.
I have attached my sample workflow and excel which are:
Excel screenshot
Alteryx test flow
JSON output I am seeing:
[
{
"JSON": "{\"email\":\"test123#test.com\",\"startdate\":\"2020-12-01\",\"isEnabled\":\"0\",\"status\":\"active\"}"
},
{
"JSON": "{\"email\":\"myemail#emails.com\",\"startdate\":\"2020-12-02\",\"isEnabled\":\"1\",\"status\":\"active\"}"
}
]
What I expected:
[{
"email": "test123#test.com",
"startdate": "2020-12-01",
"isEnabled": "0",
"status": "active"
},
{
"email": "myemail#emails.com",
"startdate": "2020-12-02",
"isEnabled": "1",
"status": "active"
}
]
Also, what component would I use if I wanted to map the structure above to another JSON structure similar this one:
[{
"name":"MyName",
"accounType":"array",
"contactDetails":{
"email":"test123#test.com",
"startDate":"2020-12-01"
}
}
} ]
Thanks
In the workflow that you have built, you are essentially creating the JSON twice. The JSON Build creates the JSON structure, so if you then want to output it, select your file to output and then change the dropdown to csv with delimiter \0 and no headers.
However, try putting an output straight after your Excel file and output to JSON, the Output Tool will build the JSON for you.
In answer to your second question, build the JSON for Contact Details first as a field (remember to rename JSON to contactDetails). Then build from there with one of the above options.
I have a JSON source document that will be uploaded to Azure blob storage regularly. The customer wants to have this input written to Azure Sql Database using Azure Data Factory. The JSON is however complex with many nested arrays and so far I have not be able to find a way to flatten the document. Perhaps this is not supported/possible?
[
{
"ActivityId": 1,
"Header": {},
"Body": [{
"1stSubArray": [{
"Id": 456,
"2ndSubArray": [{
"Id": "abc",
"Descript": "text",
"3rdSubArray": [{
"Id": "def",
"morefields": "text"
},
{
"Id": "ghi",
"morefields": "sample"
}]
}]
}]
}]
}
]
I need to flatten it:
ActivityId, Id, Id, Descript, Id, morefields
1, 456, abc, text1, def, text
1, 456, abc, text2, ghi, sample
1, 456, xyz, text3, jkl, textother
1, 456, xyz, text4, mno, moretext
There could be 8+ flat records per ActivityId. Anyone out there that has seen this and found a way to resolve using Azure Data Factory Copy Data?
Azure SQL Database has some capable JSON shredding abilities including OPENJSON which shreds JSON, and JSON_VALUE which returns scalar values from JSON. Being as you already have Azure SQL DB in your architecture, it would make sense to use it rather than add additional components.
So why not adopt an ELT pattern where you use Data Factory to insert the JSON into a table in Azure SQL DB and then call a stored procedure task to shred it? Some sample SQL based on your example:
DECLARE #json NVARCHAR(MAX) = '[
{
"ActivityId": 1,
"Header": {},
"Body": [
{
"1stSubArray": [
{
"Id": 456,
"2ndSubArray": [
{
"Id": "abc",
"Descript": "text",
"3rdSubArray": [
{
"Id": "def",
"morefields": "text"
},
{
"Id": "ghi",
"morefields": "sample"
}
]
},
{
"Id": "xyz",
"Descript": "text",
"3rdSubArray": [
{
"Id": "jkl",
"morefields": "textother"
},
{
"Id": "mno",
"morefields": "moretext"
}
]
}
]
}
]
}
]
}
]'
--SELECT #json j
-- INSERT INTO yourTable ( ...
SELECT
JSON_VALUE ( j.[value], '$.ActivityId' ) AS ActivityId,
JSON_VALUE ( a1.[value], '$.Id' ) AS Id1,
JSON_VALUE ( a2.[value], '$.Id' ) AS Id2,
JSON_VALUE ( a2.[value], '$.Descript' ) AS Descript,
JSON_VALUE ( a3.[value], '$.Id' ) AS Id3,
JSON_VALUE ( a3.[value], '$.morefields' ) AS morefields
FROM OPENJSON( #json ) j
CROSS APPLY OPENJSON ( j.[value], '$."Body"' ) AS m
CROSS APPLY OPENJSON ( m.[value], '$."1stSubArray"' ) AS a1
CROSS APPLY OPENJSON ( a1.[value], '$."2ndSubArray"' ) AS a2
CROSS APPLY OPENJSON ( a2.[value], '$."3rdSubArray"' ) AS a3;
As you can see, I've used CROSS APPLY to navigate multiple levels. My results:
In the past,you could follow this blog and my previous case:Loosing data from Source to Sink in Copy Data to set Cross-apply nested JSON array option in Blob Storage Dataset. However,it disappears now.
Instead,Collection Reference is applied for array items schema mapping in copy activity.
But based on my test,only one array can be flattened in a schema. Multiple arrays can be referenced—returned as one row containing all of the elements in the array. However, only one array can have each of its elements returned as individual rows. This is the current limitation with jsonPath settings.
As workaround,you can first convert json file with nested objects into CSV file using Logic App and then you can use the CSV file as input for Azure Data factory. Please refer this doc to understand how Logic App can be used to convert nested objects in json file to CSV. Surely,you could also make some efforts on the sql database side,such as SP which is mentioned in the comment by #GregGalloway.
Just for summary,unfortunately,the "Collection reference" only works for one level down in the array structure which is not suitable for #Emrikol. Finally,#Emrikol abandoned Data Factory and has built an app to the work.
i have a json file with the below format.
{
"results": [
{
"product": {
"code": "104AB001",
"names": [
{
"lang_code": "fr_CM",
"name": "BANOLE"
},
{
"lang_code": "f_CM",
"name": "BANOLE"
}
]
}
},
{
"product": {
"code": "104AB002",
"names": [
{
"lang_code": "fr_CM",
"name": "BANOLE"
},
{
"lang_code": "f_CM",
"name": "BANOLE"
}
]
}
}
]
}
I am using a copy activity and
"jsonNodeReference": "$.['results'][*].['product'].['names']",
"jsonPathDefinition": {
"product__code": "$.['results'][*].['product'].['code']",
"product__names__lang_code": "['lang_code']",
"product__names__name": "['name']"
}
The expected output is
product__code product__names__lang_code product__names__name
104AB001 fr_CM BANOLE
104AB001 f_CM BANOLE
104AB002 fr_CM BANOLE
104AB002 f_CM BANOLE
But i am getting
Azure data factory output as
When i did search in stack overflow and google, i got some info like it is not possible in azure data factory. below are the links
https://social.msdn.microsoft.com/Forums/en-US/5ebcef1f-5817-434c-9426-a83e9df35965/jsonnodereference-and-jsonpathdefinition-for-multiple-child-nodes?forum=AzureDataFactory
https://medium.com/#gary.strange/flattening-json-in-azure-data-factory-2f2130794258
My question is here, if it is not possible in azure data factory then what could be the other solution to achieve this.
Only one array can be flattened in a schema. Multiple arrays can be referenced—returned as one row containing all of the elements in the array. However, only one array can have each of its elements returned as individual rows. This is the current limitation with jsonPath.
However you can first convert json file with nested objects into CSV file using Logic App and then you can use the CSV file as input for Azure Data factory. Please refer below URL to understand how Logic App can be used to convert nested objects in json file to CSV.
[link] "https://adatis.co.uk/converting-json-with-nested-arrays-into-csv-in-azure-logic-apps-by-using-array-variable/"
Thanks
I'm using 'GetMetadata' activity in my pipelines to get all the folders and child items and item types. But this activity is giving output in the JSON format which i'm unable to store the values to a variable so that i can iterate thru them. I need to store the Folders metadata in a sql table
Get Metadata activity sample output is like below.
{
"itemName": "ParentFolder",
"itemType": "Folder",
"childItems": [
{
"name": "ChildFolder1",
"type": "Folder"
},
{
"name": "ChildFolder2",
"type": "Folder"
},
{
"name": "ChildFolder3",
"type": "Folder"
}
],
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (North Europe)",
"executionDuration": 187
}
Can some help me to store the above json output of 'Get MetaData' Activity into a sql table like below.
The easiest way to do this is pass the Get MetaData output as a string to a stored proc and parse it in your sql db using OPENJSON.
This is how to convert the output to a string.
#string(activity('Get Metadata').output)
Now you just pass that to a stored proc and then use OPENJSON to parse it.
I have seen many others doing this using ADF foreach, however if you have 1000s of files/folders you will end up paying a lot for this method overtime. (each loop counts as an activity)