how to compare 2 JSON files in Azure data factory

how to compare 2 JSON files in Azure data factory - azure

I'm new to Azure data factory. I want to compare 2 json files through azure data factory. We need to get new list of id's in current JSON file which are not in previous JSON file. Below are the 2 sample JSON files.
Previous JSON file :
{
"count": 2,
"values": [
{
"id": "4e10aa02d0b945ae9dcf5cb9ded9a083"
},
{
"id": "cbc414db-4d08-48f2-8fb7-748c5da45ca9"
}
]
}
Current JSON file:
{
"count": 3,
"values": [
{
"id": "4e10aa02d0b945ae9dcf5cb9ded9a083"
},
{
"id": "cbc414db-4d08-48f2-8fb7-748c5da45ca9"
},
{
"id": "5ea951e3-88d7-40b4-9e3f-d787b94a43c8"
}
]
}
New id's has to perform one activity and old id's has to perform another activity.
WE are running out the time and please help me out.
Thanks in advance!

You can simply use a IfCondition Activity
If expression:
#equals(activity('Lookup1').output.value,activity('Lookup2').output.value)
Further I have used Fail Activity for False condition for better visibility.
--
Lookup1 Activity --> Json1.json
Lookup2 Activity --> Json2.json

This can be done using a single Filter Activity.
I have assigned two parameters "Old_json" and "New_json" for your Previous Json and Current Json files respectively.
In the settings of Filter activity,
Items: #pipeline().parameters.New_json.values
Condition: #not(contains(pipeline().parameters.Old_Json.values,item()))
So, this filter activity goes through each item in New json, and checks if they are present in the old json. If not present, then will give that as an output.
Output of the filter activity

Thanks #KarthikBhyresh-MT for a helpful answer.
Just to add, if (like me) you want to compare two files (or in my case, a file with the output of a SQL query), but don't care about the order of the records, you can do this using a ForEach activity. This also has the benefit of allowing a more specific error message in the case of a difference between the files.
My first If Condition checks the two files have the same row count, with the expression:
#equals(activity('Select from SQL').output.count, activity('Lookup from CSV').output.count)
The False branch leads to a Fail activity with message:
#concat(pipeline().parameters.TestName, ': CSV has ', string(activity('Lookup from CSV').output.count), ' records but SQL query returned ', string(activity('Select from SQL').output.count))
If this succeeds, flow passes to a ForEach, iterating through items:
#activity('Lookup from CSV').output.value
... which contains an If Condition with expression:
#contains(string(activity('Select from SQL').output.value), string(item()))
The False branch for that If Condition contains an Append variable activity, which appends to a variable I've added to the pipeline called MismatchedRecords. The Value appended is:
#item()
Following the ForEach, a final If Condition then checks whether MismatchedRecords contains any items:
#equals(length(variables('MismatchedRecords')), 0)
... and the False branch contains another Fail activity, with message:
#concat(string(length(variables('MismatchedRecords'))), ' records from CSV not found in SQL. Missing records: ', string(variables('MismatchedRecords')), ' SQL output: ', string(activity('Select from SQL').output.value))
The message contains specific information about the records which could not be matched, to allow further investigation.

Related

How to send the output values of a Lookup activity in an email in Data Factory?

I'm trying to send a LookUp activity output values as part of a body parameter in a POST request using LogicApp, which uses three parameters: "to", "email_body", "subject".
The LookUp activity depends on a query, and it may return from 2 rows up to 10 rows.
According to Azure, the output of the activity should look like this:
{
"count": 2,
"value": [
{
"column1":value1,
"column2":value2,
"column3":value3
},
{
"column1":value4,
"column2":value5,
"column3":value6
}
]
}
In this case, the query returned 2 rows, but how can I attach every output value to the POST body without having to use #activity('lookup_act').output.value[0].column1 and so on for every value?
The POST body is the following:
{
"email_body": "Hi, the following tables have been updated:
#{activity('lookup_act').output.value[0].column1}
#{activity('lookup_act').output.value[1].column1}",
"subject": "Update on tables",
"to": "email#domain.com"
}
I've tried using #activity('lookup_act').output.value to bring every value but it won't work.
Is there a way to call every single output value? If so, how can it be done and paste into a table?
Thanks beforehand.

There are two ways to get all values in mail:
1. Get whole lookup output array in mail.
First get the results from Lookup activity and then pass the output of this activity by converting it into a string otherwise you will get error regarding deserialization.
{"message":"#string(activity('Lookup1').output.value)",
"dataFactoryName":"#{pipeline().DataFactory}",
"pipelineName":"#{pipeline().Pipeline}",
"receiver":"#{pipeline().parameters.receiver}"}
OUTPUT
2. Get all the respective values column wise.
First get the results from Lookup activity then take a foreach loop and create append variable for every column to store every column value in single array.
ForEach activity setting:
Took append variable activity and created Idarray variable. and gave item().id as value to store all id values in a single array.
Then in web activity passed below body for getting all arrays.
{"message":"#{string(variables('Idarray'))} as Id, #{string(variables('Namearray'))} as Name, #{string(variables('ProfessionArray'))} as Profession",
"dataFactoryName":"#{pipeline().DataFactory}",
"pipelineName":"#{pipeline().Pipeline}",
"receiver":"#{pipeline().parameters.receiver}"}
OUTPUT

How to Create Azure Resource Graph Explorer Scheduled Reports and Email Alerts

I have a Kusto query taken from this example that looks like this:
Resources
| where type =~ 'microsoft.compute/virtualmachines'
| extend vmPowerState = tostring(properties.extended.instanceView.powerState.code)
| summarize count() by vmPowerState
I would like to create an weekly alert that send the result through an e-mail in a CSV file.
The Logic App is organized in 5 steps:
One:
Two:
With
URL: https://management.azure.com/providers/Microsoft.ResourceGraph/resources
Body:
{
"query": "Resources | where type =~ 'microsoft.compute/virtualmachines' | extend vmPowerState = tostring(properties.extended.instanceView.powerState.code) | summarize count() by vmPowerState"
}
Three:
Where I parse the Body and I give an extract of the JSON Schema:
{
"count": 3,
"data": [
{
"count_": 3,
"vmPowerState": "PowerState/stopped"
},
{
"count_": 29,
"vmPowerState": "PowerState/deallocated"
},
{
"count_": 118,
"vmPowerState": "PowerState/running"
}
],
"skip_token": null,
"total_records": 3
}
Here I have a few doubt because I found a guide that says that I should use array formula instead. I'm not very sure about that because I cannot see the details in the example. Anyway this is what I do:
Four:
Five:
Where I create the attachment from the CSV
The e-mail in the end arrives but the attachment is not a CSV, it's a JSON file:
What the hack am I doing wrong?

if you want to use "Create CSV table" with Columns set to "Automatic", do pass the "body" of "parse Json".
you don't need to use the array variable but whatever you use need to return an array like this:
The body of the json parser on your example has many other json nodes enveloping that. You should have the option "data" as there is an array there called "data"
if you want to cut it short, try "data"
you can change to "custom". that would allow you to remove redundant data or format data (like the "PowerState" in "PowerState/stopped"):
you can also add the .csv to the file name:
The above worked for me but it can be enhanced

The suggestoin posted by #BrunoLucasAzure really helped me understand how Logic Apps works.
However I would like to reply to my own question with the right solution: I had to paste a sample of the JSON output pressing on the button Use sample payload to generate schema.
Then follow the workflow and everything will be fine.
The next problem I need to fix is pagination but apparently there is a solution for that too: https://techcommunity.microsoft.com/t5/integrations-on-azure-blog/logic-app-http-pagination-deeper-look-build-custom-paging/ba-p/2907605

ADF/Synapse all Objects iterate and remove the Underscore

Wanted to iterate the list of objects/Tables and exclusively for one object which is not getting picked up as there is Underscore between the words "Admin_process" Expectation is to get as "Adminprocess" in the adf/synapse by removing the underscore,such that all objects will be passed to the copy operation.
Objects/Tables list
AdminUser
Admin_process
TempUser
Currently it is above, However is not reading the object "Admin_Process" as there is underscore.
Could you someone please tell me how to handle this case.
Thank you,

You can use replace function in ADF dynamic content.
please follow the demonstration below.
Here I am using an array parameter with keys and the above list of tables as values.
[
{
"Objectname": "AdminUser"
},
{
"Objectname": "Admin_process"
},
{
"Objectname": "TempUser"
}
]
Parameter array to ForEach activity:
To use replace function, create a set variable activity and give the below expression.
#replace(item().Objectname, '_','' )
Output with required result(Underscore removed):
Now you can pass this value to a copy activity inside the same ForEach activity.

Creating JSON Array in Azure Data Factory with multiple Copy Activities output objects

Is it possible to embed the output of a copy activity in Azure Data Factory within an array that is meant to be iterated over in a subsequent ForEach?
My goal is to create an array with the output of several copy activities and then in a ForEach, access the properties of those copy activities with dot notation (Ex: item().rowsRead). Image shows code details.
Image
Specifically, I have 7 copy activities whose output JSON object (described here) would be stored in an array that I then iterate over. In the ForEach I would be checking the properties on each of the copy activities (rowsRead, rowsCopied, etc.) for validation purposes.
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-monitoring

I think we can embed the output of a copy activity in Azure Data Factory within an array. I've created a test to save the output of 2 Copy activities into an array. We need to concat a string type and then convert it to json type. Please see my step2.
We can declare an array type variable named CopyInfo to store the output. The another array type variable named JsonArray is used to see the test result at debug mode.
In Append variable1 activity, I use #json(concat('{"activityName":"Copy1","activityObject":',activity('Copy data1').output,'}')) to save the output of Copy data1 activity and convert it from String type to Json type.
In Append variable2 activity, I use #json(concat('{"activityName":"Copy2","activityObject":',activity('Copy data2').output,'}')) to save the output of Copy data2 activity and convert it from String type to Json type.
Then I assign the value of variable CopyInfo to variable JsonArray
In the end, we can see the json array like :
"name": "JsonArray",
"value": [
{
"activityName": "Copy1",
"activityObject": {
"dataRead": 643,
"dataWritten": 643,
"filesRead": 1,
"filesWritten": 1,
...
},
{
"activityName": "Copy2",
"activityObject": {
"dataRead": 643,
"dataWritten": 643,
"filesRead": 1,
"filesWritten": 1,
...
}
}
]

Azure Stream Processing upsert to DocumentDB with array

I'm using Azure Stream Analytics to copy my Json over to DocumentDB using upsert to overwrite the document with the latest data. This is great for my base data, but I would love to be able to append the list data, as unfortunately I can only send one list item at a time.
In the example below, the document is matched on id, and all items are updated, but I would like the "myList" array to keep growing with the "myList" data from each document (with the same id). Is this possible? Is there any other way to use Stream Analytics to update this list in the document?
I'd rather steer clear of using a tumbling window if possible, but is that an option that would work?
Sample documents:
{
"id": "1234",
"otherData": "example",
"myList": [{"listitem": 1}]
}
{
"id": "1234",
"otherData": "example 2",
"myList": [{"listitem": 2}]
}
Desired output:
{
"id": "1234",
"otherData": "example 2",
"myList": [{"listitem": 1}, {"listitem": 2}]
}
My current query:
SELECT id, otherData, myList INTO [myoutput] FROM [myinput]

Currently arrays are not merged, this is the existing behavior of DocumentDB output from ASA, also mentioned in this article. I doubt using a tumbling window would help here.
Note that changes in the values of array properties in your JSON document result in the entire array getting overwritten, i.e. the array is not merged.
You could transform the input that is coming as an array (myList) into a dictionary using GetArrayElements function .
Your query might look something like --
SELECT i.id , i.otherData, listItemFromArray
INTO myoutput
FROM myinput i
CROSS APPLY GetArrayElements(i.myList) AS listItemFromArray
cheers!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

how to compare 2 JSON files in Azure data factory - azure

You can simply use a IfCondition Activity If expression: #equals(activity('Lookup1').output.value,activity('Lookup2').output.value) Further I have used Fail Activity for False condition for better visibility. -- Lookup1 Activity --> Json1.json Lookup2 Activity --> Json2.json

Related

How to send the output values of a Lookup activity in an email in Data Factory?

How to Create Azure Resource Graph Explorer Scheduled Reports and Email Alerts

ADF/Synapse all Objects iterate and remove the Underscore

Creating JSON Array in Azure Data Factory with multiple Copy Activities output objects

Azure Stream Processing upsert to DocumentDB with array

Categories

Resources