I am doing a POC on ingesting a JSON though EventHub, processing it through Stream job and pushing it into a Azure SQL DW.
I have worked with JSON ingestion before but the difficulty I face now is with the naming structure used in JSON.
Here is the sample:
{
"1-1": [{
"Details": [{
"FirstName": "Super",
"LastName": "Man"
}
]
}
]
}
The root element has a hyphen (-) and I am having tough time parsing through this element to access the relevant items.
I have tried the following queries and I get NULLs in the SQL tables it outputs to:
--#1
SELECT
["2-1"].Details.FirstName AS First_Name
,["2-1"].Details.LastName AS Last_Name
INTO
[SA-OUTPUT]
FROM
[SA-INPUT]
--#2
SELECT
[2-1].Details.FirstName AS First_Name
,[2-1].Details.LastName AS Last_Name
INTO
[SA-OUTPUT]
FROM
[SA-INPUT]
--#3
SELECT
2-1.Details.FirstName AS First_Name
,2-1.Details.LastName AS Last_Name
INTO
[SA-OUTPUT]
FROM
[SA-INPUT]
--#4
SELECT
SA-INPUT.["2-1"].Details.FirstName AS First_Name
,SA-INPUT.["2-1"].Details.LastName AS Last_Name
INTO
[SA-OUTPUT]
FROM
[SA-INPUT]
Would appreciate the correct way to do this.
Thanks in advance.
Your JSON schema is nested but also has some arrays. In order to read the data you will need to use the GetArrayElement function.
Here's a query that will read your sample data:
WITH Step1 AS(
SELECT GetArrayElement([1-1], 0) as FirstLevel
FROM iothub),
Step2 AS(
SELECT GetArrayElement(FirstLevel.Details,0) SecondLevel
FROM Step1)
SELECT SecondLevel.FirstName, SecondLevel.LastName from Step2
For more info, you can have a look at our page Work with complex Data Types in JSON and AVRO.
Let me know if you have any question.
Thanks,
JS (ASA team)
It tried and it worked beautifully. If lets say I have to generate data from two separate array elements, I would have to create two separate CTEs.
{
"1-1": [{
"Details": [{
"FirstName": "Super",
"LastName": "Man"
}
]
}
]
},
{
"2-1": [{
"Address": [{
"Street": "Main",
"Lane": "Second"
}
]
}
]
}
How do I merge elements from two CTEs into one output query? I can only refer CTE in the following line.
Related
I'm using Azure Data Flow, and I'm using Union to combine two sources, so this union contains JSON documents. Is there a way to convert this JSON document to array of documents?
Union contains:
{"key":1,"value":"test8"}
{"key":2,"value":"test6"}
{"key":3,"value":"test3"}
What I'm looking for is a way to get like this format:
[
{
"key": 1,
"value": "test8"
},
{
"key": 2,
"value": "test6"
},
{
"key": 3,
"value": "test3"
}
]
Thanks for you help
You could use Aggregate transformation and use collect expression to combine all the JSON document and pass it to sink with JSON dataset. But this will not output the result exactly what you are looking for and gives aggregated column name in the output as shown below.
Aggregate:
Column1: collect(#(key=key,value=value))
Data flow Output:
As an alternative, you can copy the union JSON documents to the storage and use a copy data activity to get convert the JSON document to an array of documents.
Output:
I'm trying to archive old data from CosmosDB into Azure Tables but I'm very new to Azure Data Factory and I'm not sure what would be a good approach to do this. At first, I thought that this could be done with a Copy Activity but because the properties from my documents stored in the CosmosDB source vary, I'm getting mapping issues. Any idea on what would be a good approach to tackle this archiving process?
Basically, the way I want to store the data is to copy the document root properties as they are, and store the nested JSON as a serialized string.
For example, if I wanted to archive these 2 documents :
[
{
"identifier": "1st Guid here",
"Contact": {
"Name": "John Doe",
"Age": 99
}
},
{
"identifier": "2nd Guid here",
"Distributor": {
"Name": "Jane Doe",
"Phone": {
"Number": "12345",
"IsVerified": true
}
}
}
]
I'd like these documents to be stored in Azure Table like this:
identifier | Contact | Distributor
"Ist Guid here" | "{ \"Name\": \"John Doe\", \"Age\": 99 }" | null
"2nd Guid here" | null | "{\"Name\":\"Jane Doe\",\"Phone\":{\"Number\":\"12345\",\"IsVerified\":true}}"
Is this possible with the Copy Activity?
I tried using the mapping tab inside the CopyActivity, but when I try to run it I get an error saying that the dataType for one of the Nested JSON columns that are not present in the first row cannot be inferred.
Please follow my configuration in Mapping Tag.
Test output with your sample data:
I have a JSON source document that will be uploaded to Azure blob storage regularly. The customer wants to have this input written to Azure Sql Database using Azure Data Factory. The JSON is however complex with many nested arrays and so far I have not be able to find a way to flatten the document. Perhaps this is not supported/possible?
[
{
"ActivityId": 1,
"Header": {},
"Body": [{
"1stSubArray": [{
"Id": 456,
"2ndSubArray": [{
"Id": "abc",
"Descript": "text",
"3rdSubArray": [{
"Id": "def",
"morefields": "text"
},
{
"Id": "ghi",
"morefields": "sample"
}]
}]
}]
}]
}
]
I need to flatten it:
ActivityId, Id, Id, Descript, Id, morefields
1, 456, abc, text1, def, text
1, 456, abc, text2, ghi, sample
1, 456, xyz, text3, jkl, textother
1, 456, xyz, text4, mno, moretext
There could be 8+ flat records per ActivityId. Anyone out there that has seen this and found a way to resolve using Azure Data Factory Copy Data?
Azure SQL Database has some capable JSON shredding abilities including OPENJSON which shreds JSON, and JSON_VALUE which returns scalar values from JSON. Being as you already have Azure SQL DB in your architecture, it would make sense to use it rather than add additional components.
So why not adopt an ELT pattern where you use Data Factory to insert the JSON into a table in Azure SQL DB and then call a stored procedure task to shred it? Some sample SQL based on your example:
DECLARE #json NVARCHAR(MAX) = '[
{
"ActivityId": 1,
"Header": {},
"Body": [
{
"1stSubArray": [
{
"Id": 456,
"2ndSubArray": [
{
"Id": "abc",
"Descript": "text",
"3rdSubArray": [
{
"Id": "def",
"morefields": "text"
},
{
"Id": "ghi",
"morefields": "sample"
}
]
},
{
"Id": "xyz",
"Descript": "text",
"3rdSubArray": [
{
"Id": "jkl",
"morefields": "textother"
},
{
"Id": "mno",
"morefields": "moretext"
}
]
}
]
}
]
}
]
}
]'
--SELECT #json j
-- INSERT INTO yourTable ( ...
SELECT
JSON_VALUE ( j.[value], '$.ActivityId' ) AS ActivityId,
JSON_VALUE ( a1.[value], '$.Id' ) AS Id1,
JSON_VALUE ( a2.[value], '$.Id' ) AS Id2,
JSON_VALUE ( a2.[value], '$.Descript' ) AS Descript,
JSON_VALUE ( a3.[value], '$.Id' ) AS Id3,
JSON_VALUE ( a3.[value], '$.morefields' ) AS morefields
FROM OPENJSON( #json ) j
CROSS APPLY OPENJSON ( j.[value], '$."Body"' ) AS m
CROSS APPLY OPENJSON ( m.[value], '$."1stSubArray"' ) AS a1
CROSS APPLY OPENJSON ( a1.[value], '$."2ndSubArray"' ) AS a2
CROSS APPLY OPENJSON ( a2.[value], '$."3rdSubArray"' ) AS a3;
As you can see, I've used CROSS APPLY to navigate multiple levels. My results:
In the past,you could follow this blog and my previous case:Loosing data from Source to Sink in Copy Data to set Cross-apply nested JSON array option in Blob Storage Dataset. However,it disappears now.
Instead,Collection Reference is applied for array items schema mapping in copy activity.
But based on my test,only one array can be flattened in a schema. Multiple arrays can be referenced—returned as one row containing all of the elements in the array. However, only one array can have each of its elements returned as individual rows. This is the current limitation with jsonPath settings.
As workaround,you can first convert json file with nested objects into CSV file using Logic App and then you can use the CSV file as input for Azure Data factory. Please refer this doc to understand how Logic App can be used to convert nested objects in json file to CSV. Surely,you could also make some efforts on the sql database side,such as SP which is mentioned in the comment by #GregGalloway.
Just for summary,unfortunately,the "Collection reference" only works for one level down in the array structure which is not suitable for #Emrikol. Finally,#Emrikol abandoned Data Factory and has built an app to the work.
I have following type of documents:
{
"_id": "0710b1dd6cc2cdc9c2ffa099c8000f7b",
"_rev": "1-93687d40f54ff6ca72e66ca7fc99caff",
"date": "2018-06-04T07:46:08.848Z",
"topic": "some topic",
}
The collection is not very large. Only 20k documents.
However, the following query is very slow. Takes ca 5 secs!
{
selector: {
topic: 'some topic'
},
sort: ['date'],
}
I tried various indexes, e.g.
index: {
fields: ['topic', 'date']
}
but nothing really worked well.
What I am missing here?
When sorting in a Mango query, you need to ensure that the sort order you are asking for matches the index that you are using.
If you are indexing the data set in topic,date order then you can use the following query on "topic" to get the data out in data order using the index:
{
"selector": {
"topic": "some topic"
},
"sort": [
"topic",
"date"
]
}
Because the sort matches the form of the data in the index, the index is used to answer the query which should speed up your query time considerably.
I have following problem
I have a field mapping update to an index .Payload is complex where
I have:
{
"type": "abc",
"Party": [{
"Type": "abc",
"Id": "123",
"Name": "manasa",
"Phone": [{
"Type": "Office",
"Number": "12345"
}]
}]
}
And now I want to create a field for an index. The field name is phonenumber of type Collection(Edm.String)
where mapping is
{
"sourceFieldName" : "/Party/Phone/Number",
"targetFieldName" : "phonenumber",
"mappingFunction" : { "name" : "jsonArrayToStringCollection" }
}
In http post body
But still after indexing i get phone number result as null.That means the mapping went wrong.If you see the phone number in source json, it is inside a json array and it itself is an array and result needs to get stored inside a collection of a string.Is it possible how can I achieve this?
If this is not possible I atleast want field mapping till phone array ie., /Party/Phone/
If i index complete party array as a text, I get an error while running the index saying:
"Field 'partydetails' contains a term that is too large to process. The max length for UTF-8 encoded terms is 32766 bytes. The most likely cause of this error is that filtering, sorting, and/or faceting are enabled on this field, which causes the entire field value to be indexed as a single term. Please avoid the use of these options for large fields."
Can someone please help!
If party would have been a Json object than an array and phone would have been only a string array for example
{
"type": "abc",
"Party": {
"Type": "abc",
"Id": "123",
"Name": "manasa",
"Phone": [{
"12345",
"23463"
}]
}
}
Then I could have mapped
{
"sourceFieldName" : "Party/Phonenumber",
"targetFieldName" : "phonenumbers",
"mappingFunction" : { "name" : "jsonArrayToStringCollection" }
}
It map as collection of type odata EDM.string.
So to put this in better and straight forward way,
Either transform your json to something flatter (the example that I
gave above) or
Use the proper index incase if you know before inhand as
#Luis Cabrera said,
“sourceFieldName”: “/Party/0/Phone/0/Type
It is a limitation from azure search side.
Note that Party and Phone are arrays, so the field mapping you mention won't work.
You will need to index into the specific element. For example:
{
"sourceFieldName": "/Party/0/Phone/0/Type",
"targetFieldName": "firstPhoneNumberTypeOfFirstParty"
}
You may want to give that a shot.
Thanks!
Luis Cabrera | Program Manager | Azure Search