Send one or more events to EventHub Azure - azure

I am using an Logic App (LA) on Azure to query my db every 3 mins.
Then the LA uses an EventHub connector to send my query result, the table, to Azure Stream Analytics (ASA).
Normally the result table has around 100 rows, definitely many more in peak time.
I thought sending Eventhub message one row each time, would incur so many calls, hence perhaps delay the ASA's logic(?)
My questions are:
How to send multiple messages thru the LA's Eventhub Action Connector?
I see there's one option: Send one or more events to Eventhub, but wasn't able to figure out what to put in the content. Tried putting the table(the array). The following request body works.
e.g body:
[
{
"ContentData": "dHhuX2FnZV9yZXN1bHQ=",
"Properties": {
"tti_IngestTime": "2018-09-26T20:10:55.4480047+00:00",
"tti_SLAThresholdMins": 330,
"MinsPastSla": -6
}
},
{
"ContentData": "AhuBA2FnZV9yZXN1bHQ=",
"Properties": {
"tti_IngestTime": "2018-09-26T20:10:55.4480047+00:00",
"tti_SLAThresholdMins": 230,
"MinsPastSla": -5
}
}
]
Sending 100 events one by one to ASA, is there any performance concern?
Thank you!

Seem to find the answer.
(1) the JSON I am sending looks correct, and the post request to EvenHub is successful.
Post body is [{}, {}, {}], which is the correct format
(2) ASA couldn't read the stream is likely due to not able to deserialize the messages from EventHub.
I happen to change how I encode the base64 string for the "ContentData" send to the EventHub. It looks like the message sent to the EH,
{
"ContentData": "some base64() string",
"Properties": {}
},
the base64() needs to encode the "Properties" value, not anything else, for ASA to be able to deserialize the message.
It didn't work because I encoded using a random string instead of the value of the "Properties".

Related

Azure Data Factory REST API return invalid JSON file with pagination

I'm building a pipeline, which copy a response from a API into a file in my storage account. There is also an element of pagination. However, that works like a charm and i get all my data from all the pages.
My result is something like this:
{"data": {
"id": "Something",
"value": "Some other thing"
}}
The problem, is that the copy function just appends the response to the file and thereby making it invalid JSON, which is a big problem further down the line. The final output would look like:
{"data": {
"id": "22222",
"value": "Some other thing"
}}
{"data": {
"id": "33333",
"value": "Some other thing"
}}
I have tried everything I could think of and google my way to, but nothing changes how the data is appended to the file and i'm stuck with an invalid JSON file :(
As a backup plan, i'll just make a loop and create a JSON file for each PAGE. But that seems a bit janky and really slow
Anyone got an idea or have a solution for my problem?
When you copy data from Rest API to blob storage it will copy data in the form of set of objects by default.
Example:
sample data
{ "time": "2015-04-29T07:12:20.9100000Z", "callingimsi": "466920403025604"}
sink data
{"time":"2015-04-29T07:12:20.9100000Z","callingimsi":"466920403025604"}
{"time":"2015-04-29T07:13:21.0220000Z","callingimsi":"466922202613463"}
{"time":"2015-04-29T07:13:21.4370000Z","callingimsi":"466923101048691"}
This is the invalid format of Json.
To work around this, select file pattern in sink activity setting as Array of objects this will return array of all objects.
Output:

Getting the files from Azure Blob and sending them in one email

I have a setup that is exporting all the json files generated from an API sent to an email address upon a request sent to a shared mailbox, but the thing is that currently logic app is sending out separate emails, one json per email, so it's 7 emails in my case.
My final solution would be sending all the json files in one email. Have tried to figure out the connector methods, but seems that I cannot find that out. Tried to Google of course, but no luck.
Would really appreciate any help!
Current setup looks like this:
Azure Logic App 1:
Azure Logic App 2:
You need to build an array of all of the attachments outside of the loop.
This is the flow I tested with ...
... the two important points are:
Construct an Attachments Array
As you can see, I've declared a variable at the top Attachments of type Array.
Get your blobs and then loop over each one of them.
Within the loop, get the contents of the blob and then add an object to the array that looks like the following JSON structure ...
This is the Peek Code of the array item, noting that I am encoding the content as base64 ...
{
"inputs": {
"name": "Attachments",
"value": {
"ContentBytes": "#{base64(body('Get_blob_content_(V2)'))}",
"Name": "#{items('For_Each_Blob')?['DisplayName']}"
}
}
}
Send the Email
Now, when you send the email, refer to the array as the contents of the Attachments parameter.
That should get the job done for you, it worked for me.
Have you tried adding the output of the GetBlob Content to an array or adding it into a string? https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-create-variables-store-values#initialize-variable and then using this variable to create the email body?

Dynamic REST calls in Azure Synapse Pipeline

I am making a call to a REST API with Azure Synapse and the return dataset looks something like this:
{
"links": [
{
"rel": "next",
"href": "[myRESTendpoint]?limit=1000&offset=1000"
},
{
"rel": "last",
"href": "[myRESTendpoint]?limit=1000&offset=60000"
},
{
"rel": "self",
"href": "[myRESTendpoint]"
}
],
"count": 1000,
"hasMore": true,
"items": [
{
"links": [],
"closedate": "6/16/2014",
"id": "16917",
"number": "62000",
"status": "H",
"tranid": "0062000"
},...
],
"offset": 0,
"totalResults": 60316
}
I am familiar with making a REST call to a single endpoint that can return all the the data with a single call using a Synapse pipeline, but this particular REST endpoint has a hard limit on only returning 1000 records, but it does give a property named "hasMore".
Is there a way to recursively make rest calls in a Synapse pipeline until the "hasMore" property equals false?
The end goal of this is to sink data to either a dedicated SQL pool or into ADLS2 and transform from there.
I have tried to achieve the same scenario using Azure Data Factory which seems to be more appropriate and easy to achieve the goal "The end goal of this is to sink data to either a dedicated SQL pool or into ADLS2 and transform from there".
As you have to hit the page recursively to fetch 1000 records , you might set it in the following fashion if the response header/response body contain the URL for the next page.
You're less likely to be able to use the functionality if the next page link or query parameter isn't included in the response headers/body.
Alternatively, you may utilise loop logic and do the Copy Activity.
Create two parameters in the Rest Connector:
Fill in the parameters for the RestConnector's relative URL.
Using the Set Variable action, the value of this variable would be increased in a loop. For each cycle, the URL for the Copy Activity is dynamically set.If you want to loop or iterate, you may use the Until activity.
Alternative:
In my experience, the REST connection pagination is quite rigid. Usually put the action within a loop. As a result, to have more control.
FOREACH Loop, here
For those following the thread, I used IpsitaDash-MT's suggestion using the ForEach loop. In the case of this API, when a call is made I get a property returned at the end of the call named "totalResults". Here are the steps I used to achieve what I was looking to do:
Make a dummy call to the API to get the "totalResults" parameter. This is just a call to return the number of results I am looking to get. In the case of this API, the body of the request is a SQL statement, so when the dummy request is made I am only asking for the ID's of the results I am looking to get.
SQL statement example
I then take the property "totalResults" from that request set a dynamic value in the "Items" of the ForEach loop like this:
#range(0,add(div(sub(int(activity('Get Pages Customers').output.totalResults),mod(int(activity('Get Pages Customers').output.totalResults),1000)),1000),1))
NOTE: The API only allows pages of 1000 results, I do some math to get a range of page numbers. I also have to add 1 to the final result to include the last page.
ForEach Loop Settings
In the API I have two parameters that can be passed "limit" and "offset". Since I want all of the data there is no reason to have limit set to anything other than 1000 (the max allowable number). The offset parameter can be set to any number less than or equal to "totalResults" - "limit" and greater than or equal to 0. So I use the range established in step 2 and multiply it out by 1000 to set the offset parameter in the URL.
Setting the offset parameter in the copy data activity
Dynamic value of the Relative URL in the REST connector
NOTE: I found it better to sink the data as JSON into ADLS2 first rather than into a dedicated SQL pool due to the Lookup feature.
Since synapse does not allow nested ForEach loops, I run the data through a data flow to format the data and check for duplicates and updates.
When the data flow is completed it kicks off a lookup activity to get the data that was just processed and pass it into a new pipeline to use another ForEach loop to get the child data for each ID of parent data.
Data Flow and Lookup for child data pipeline

Get value from json in LogicApp

Rephrasing question entirely, as first attempt was unclear.
In my logic app I am reading a .json from blob which contains:
{
"alpha": {
"url": "https://linktoalpha.com",
"meta": "This logic app does job aaaa"
},
"beta": {
"url": "https://linktobeta.com",
"meta": "This logic app does job beta"
},
"theta": {
"url": "https://linktotheta.com",
"meta": "This logic app does job theta"
}
}
I'm triggering the logic app with a http post which contains in the body:
{ "logicappname": "beta" }
But the value for 'logicappname' could be alpha, beta or theta. I now need to set a variable which contains the url value for 'beta'. How can this be achieved without jsonpath support?
I am already json parsing the file contents from the blob and this IS giving me the tokens... but I cannot see how to select the value I need. Would appreciate any assistance, thank you.
For your requirement, I think just use "Parse JSON" action to do it. Please refer to the steps below:
1. I upload a file testJson.json to my blob storage, then get it and parse it in my logic app.
2. We can see there are three url in the screenshot below. As you want to get the url value for beta, it is the second one, so we can choose the second one.
If you want to get the url value by the param logicappname from the "When a HTTP request is received" trigger, you can use a expression when you create the result variable.
In my screenshot, the expression is:
body('Parse_JSON')?[triggerBody()?['logicappname']]?['url']
As the description of your question is a little unclear and I'm confused about the meaning of I am already json parsing the file contents from the blob and this IS giving me the tokens, why is "tokens" involved in it ? And in the original question it seems you want to do it by jsonpath but in the latest description you said without jsonpath ? So if I misunderstand your question, please let me know. Thanks.
Not sure if I understand your question. But I believe you can use Pars Json action after the http trigger.
With this you will get a control over the incoming JSON message and you can choose the 'URL' value as a dynamic content in the subsequent actions.
Let me know if my understanding about your question is wrong.

Stream Analytics to Event Hub - Unexpectedly concatenating events

I have a stream analytics job that is consuming an Event Hub of avro messages (we'll call this RawEvents), transforming/flattening the messages and firing them into a separate Event Hub (we'll call this FormattedEvents).
Each EventData instance in RawEvents consists of a single top level json object that has an array of more detailed events. This is a contrived example:
[{ "Events": [{ "dataOne": 123.0, "dataTwo": 234.0,
"subEventCode": 3, "dateTimeLocal": 1482170771, "dateTimeUTC":
1482192371 }, { "dataOne": 456.0, "dataTwo": 789.0,
"subEventCode": 20, "dateTimeLocal": 1482170771, "dateTimeUTC":
1482192371 }], "messageType": "myDeviceType-Events", "deviceID":
"myDevice", }]
The Stream Analytics job flattens the results and unpacks subEventCode, which is a bitmask. The results look something like this:
{"messagetype":"myDeviceType-Event","deviceid":"myDevice",eventid:1,"dataone":123,"datatwo":234,"subeventcode":6,"flag1":0,"flag2":1,"flag3":1,"flag4":0,"flag5":0,"flag6":0,"flag7":0,"flag8":0,"flag9":0,"flag10":0,"flag11":0,"flag12":0,"flag13":0,"flag14":0,"flag15":0,"flag16":0,"eventepochlocal":"2016-12-06T17:33:11.0000000Z","eventepochutc":"2016-12-06T23:33:11.0000000Z"} {"messagetype":"myDeviceType-Event","deviceid":"myDevice",eventid:2,"dataone":456,"datatwo":789,"subeventcode":8,"flag1":0,"flag2":0,"flag3":0,"flag4":1,"flag5":0,"flag6":0,"flag7":0,"flag8":0,"flag9":0,"flag10":0,"flag11":0,"flag12":0,"flag13":0,"flag14":0,"flag15":0,"flag16":0,"eventepochlocal":"2016-12-06T17:33:11.0000000Z","eventepochutc":"2016-12-06T23:33:11.0000000Z"}
I'm expecting to see two EventData instances when I pull messages from the FormattedEvents Event Hub. What I'm getting is a single EventData with both "flattened" events in the same message. This is expected behavior when targeting blob storage or Data Lake, but surprising when targeting an Event Hub. My expectation was for behavior similar to a Service Bus.
Is this expected behavior? Is there a configuration option to force the behavior if so?
Yes, this is expected behavior currently. The intent was to improve throughput trying to send as many events in an EventHub Message(EventData).
Unfortunately, there is no config option to override this behavior as of today. One possible way that may be worth trying is to leverage the concept of output partition key to something super unique (i.e. add this column to your query -- GetMetadataPropertyValue(ehInput, "EventId") as outputpk ) . Now specify that "outputpk" as PartitionKey in your output EventHub's ASA settings.
Let me know if that helps.
cheers
Chetan
I faced the same problem. Thanks for the answers of manually formatting the input message. I solved it with my colleague with a few lines of code, which removed line feed and carriage return. Then I replaced "}{" by "},{" and made it an array by adding "[" and "]" to both ends.
string modifiedMessage = myEventHubMessage.Replace("\n","").Replace("\r","");
modifiedMessage = "[" + modifiedMessage.Replace("}{","},{") + "]";
And then making the input as a list of objects according to its data structure:
List<TelemetryDataPoint> newDataPoints = new List<TelemetryDataPoint>();
try
{
newDataPoints = Newtonsoft.Json.JsonConvert.DeserializeObject<List<TelemetryDataPoint>>(modifiedMessage);
....
....

Resources