Stream Analytics to Event Hub - Unexpectedly concatenating events - azure

I have a stream analytics job that is consuming an Event Hub of avro messages (we'll call this RawEvents), transforming/flattening the messages and firing them into a separate Event Hub (we'll call this FormattedEvents).
Each EventData instance in RawEvents consists of a single top level json object that has an array of more detailed events. This is a contrived example:
[{ "Events": [{ "dataOne": 123.0, "dataTwo": 234.0,
"subEventCode": 3, "dateTimeLocal": 1482170771, "dateTimeUTC":
1482192371 }, { "dataOne": 456.0, "dataTwo": 789.0,
"subEventCode": 20, "dateTimeLocal": 1482170771, "dateTimeUTC":
1482192371 }], "messageType": "myDeviceType-Events", "deviceID":
"myDevice", }]
The Stream Analytics job flattens the results and unpacks subEventCode, which is a bitmask. The results look something like this:
{"messagetype":"myDeviceType-Event","deviceid":"myDevice",eventid:1,"dataone":123,"datatwo":234,"subeventcode":6,"flag1":0,"flag2":1,"flag3":1,"flag4":0,"flag5":0,"flag6":0,"flag7":0,"flag8":0,"flag9":0,"flag10":0,"flag11":0,"flag12":0,"flag13":0,"flag14":0,"flag15":0,"flag16":0,"eventepochlocal":"2016-12-06T17:33:11.0000000Z","eventepochutc":"2016-12-06T23:33:11.0000000Z"} {"messagetype":"myDeviceType-Event","deviceid":"myDevice",eventid:2,"dataone":456,"datatwo":789,"subeventcode":8,"flag1":0,"flag2":0,"flag3":0,"flag4":1,"flag5":0,"flag6":0,"flag7":0,"flag8":0,"flag9":0,"flag10":0,"flag11":0,"flag12":0,"flag13":0,"flag14":0,"flag15":0,"flag16":0,"eventepochlocal":"2016-12-06T17:33:11.0000000Z","eventepochutc":"2016-12-06T23:33:11.0000000Z"}
I'm expecting to see two EventData instances when I pull messages from the FormattedEvents Event Hub. What I'm getting is a single EventData with both "flattened" events in the same message. This is expected behavior when targeting blob storage or Data Lake, but surprising when targeting an Event Hub. My expectation was for behavior similar to a Service Bus.
Is this expected behavior? Is there a configuration option to force the behavior if so?

Yes, this is expected behavior currently. The intent was to improve throughput trying to send as many events in an EventHub Message(EventData).
Unfortunately, there is no config option to override this behavior as of today. One possible way that may be worth trying is to leverage the concept of output partition key to something super unique (i.e. add this column to your query -- GetMetadataPropertyValue(ehInput, "EventId") as outputpk ) . Now specify that "outputpk" as PartitionKey in your output EventHub's ASA settings.
Let me know if that helps.
cheers
Chetan

I faced the same problem. Thanks for the answers of manually formatting the input message. I solved it with my colleague with a few lines of code, which removed line feed and carriage return. Then I replaced "}{" by "},{" and made it an array by adding "[" and "]" to both ends.
string modifiedMessage = myEventHubMessage.Replace("\n","").Replace("\r","");
modifiedMessage = "[" + modifiedMessage.Replace("}{","},{") + "]";
And then making the input as a list of objects according to its data structure:
List<TelemetryDataPoint> newDataPoints = new List<TelemetryDataPoint>();
try
{
newDataPoints = Newtonsoft.Json.JsonConvert.DeserializeObject<List<TelemetryDataPoint>>(modifiedMessage);
....
....

Related

How to Separate Data from Multiple Devices on Microsoft Azure Stream Analytics

I am currently trying to connect 2 different devices to the IoT Hub, and I need to separate the data from each device. In order to do so, I tried configuring my stream analytics query like this:
SELECT
deviceId, temperature, humidity, CAST(iothub.EnqueuedTime AS datetime) AS event_date
INTO
NodeMCUOutput
FROM
iothubevents
WHERE
deviceId = "NodeMCU1"
However, for some reason, the output is not shown if the WHERE statement is in the code (the outputs are shown without it, but the data is not filtered). I need the WHERE statement in order to sort the data the way I want it. Am I missing something? Are there any solutions to this? Thanks a lot. Cheers!
The device ID and other properties that are not in the message itself are included as metadata on the message. You can read that metadata using the GetMetadataPropertyValue() function. This should work for you:
SELECT
GetMetadataPropertyValue(iothubevents, 'IoTHub.ConnectionDeviceId') as deviceId,
temperature,
humidity,
CAST(GetMetadataPropertyValue(iothubevents, 'IoTHub.EnqueuedTime') AS datetime) AS event_date
INTO
NodeMCUOutput
FROM
iothubevents
WHERE
GetMetadataPropertyValue(iothubevents, 'IoTHub.ConnectionDeviceId') = 'NodeMCU1'
I noticed you use a double quote in the WHERE clause.
You need a simple quote to get a match on strings. In this case it will be
WHERE deviceId = 'NodeMCU1'
If the deviceId is the one from IoT Hub metadata, Matthijs answer will help you to retrieve it.

How to avoid race condition when updating Azure Table Storage record

Azure Function utilising Azure Table Storage
I have an Azure Function which is triggered from Azure Service Bus topic subscription, let's call it "Process File Info" function.
The message on the subscription contains file information to be processed. Something similar to this:
{
"uniqueFileId": "adjsdakajksajkskjdasd",
"fileName":"mydocument.docx",
"sourceSystemRef":"System1",
"sizeBytes": 1024,
... and other data
}
The function carries out the following two operations -
Check individual file storage table for the existing of the file. If it exists, update that file. If it's new, add the file to the storage table (stored on a per system|per fileId basis).
Capture metrics on the file size bytes and store in a second storage table, called metrics (constantly incrementing the bytes, stored on a per system|per year/month basis).
The following diagram gives a brief summary of my approach:
The difference between the individualFileInfo table and the fileMetric is that the individual table has one record per file, where as the metric table stores one record per month that is constantly updated (incremented) gathering the total bytes that are passed through the function.
Data in the fileMetrics table is stored as follows:
The issue...
Azure functions are brilliant at scaling, in my setup I have a max of 6 of these functions running at any one time. Presuming each file message getting processed is unique - updating the record (or inserting) in the individualFileInfo table works fine as there are no race conditions.
However, updating the fileMetric table is proving problematic as say all 6 functions fire at once, they all intend to update the metrics table at the one time (constantly incrementing the new file counter or incrementing the existing file counter).
I have tried using the etag for optimistic updates, along with a little bit of recursion to retry should a 412 response come back from the storage update (code sample below). But I can't seem to avoid this race condition. Has anyone any suggestion on how to work around this constraint or come up against something similar before?
Sample code that is executed in the function for storing the fileMetric update:
internal static async Task UpdateMetricEntry(IAzureTableStorageService auditTableService,
string sourceSystemReference, long addNewBytes, long addIncrementBytes, int retryDepth = 0)
{
const int maxRetryDepth = 3; // only recurively attempt max 3 times
var todayYearMonth = DateTime.Now.ToString("yyyyMM");
try
{
// Attempt to get existing record from table storage.
var result = await auditTableService.GetRecord<VolumeMetric>("VolumeMetrics", sourceSystemReference, todayYearMonth);
// If the volume metrics table existing in storage - add or edit the records as required.
if (result.TableExists)
{
VolumeMetric volumeMetric = result.RecordExists ?
// Existing metric record.
(VolumeMetric)result.Record.Clone()
:
// Brand new metrics record.
new VolumeMetric
{
PartitionKey = sourceSystemReference,
RowKey = todayYearMonth,
SourceSystemReference = sourceSystemReference,
BillingMonth = DateTime.Now.Month,
BillingYear = DateTime.Now.Year,
ETag = "*"
};
volumeMetric.NewVolumeBytes += addNewBytes;
volumeMetric.IncrementalVolumeBytes += addIncrementBytes;
await auditTableService.InsertOrReplace("VolumeMetrics", volumeMetric);
}
}
catch (StorageException ex)
{
if (ex.RequestInformation.HttpStatusCode == 412)
{
// Retry to update the volume metrics.
if (retryDepth < maxRetryDepth)
await UpdateMetricEntry(auditTableService, sourceSystemReference, addNewBytes, addIncrementBytes, retryDepth++);
}
else
throw;
}
}
Etag keeps track of conflicts and if this code gets a 412 Http response it will retry, up to a max of 3 times (an attempt to mitigate the issue). My issue here is that I cannot guarantee the updates to table storage across all instances of the function.
Thanks for any tips in advance!!
You can put the second part of the work into a second queue and function, maybe even put a trigger on the file updates.
Since the other operation sounds like it might take most of the time anyways, it could also remove some of the heat from the second step.
You can then solve any remaining race conditions by focusing only on that function. You can use sessions to limit the concurrency effectively. In your case, the system id could be a possible session key. If you use that, you will only have one Azure Function processing data from one system at one time, effectively solving your race conditions.
https://dev.to/azure/ordered-queue-processing-in-azure-functions-4h6c
Edit: If you can't use Sessions to logically lock the resource, you can use locks via blob storage:
https://www.azurefromthetrenches.com/acquiring-locks-on-table-storage/

Send one or more events to EventHub Azure

I am using an Logic App (LA) on Azure to query my db every 3 mins.
Then the LA uses an EventHub connector to send my query result, the table, to Azure Stream Analytics (ASA).
Normally the result table has around 100 rows, definitely many more in peak time.
I thought sending Eventhub message one row each time, would incur so many calls, hence perhaps delay the ASA's logic(?)
My questions are:
How to send multiple messages thru the LA's Eventhub Action Connector?
I see there's one option: Send one or more events to Eventhub, but wasn't able to figure out what to put in the content. Tried putting the table(the array). The following request body works.
e.g body:
[
{
"ContentData": "dHhuX2FnZV9yZXN1bHQ=",
"Properties": {
"tti_IngestTime": "2018-09-26T20:10:55.4480047+00:00",
"tti_SLAThresholdMins": 330,
"MinsPastSla": -6
}
},
{
"ContentData": "AhuBA2FnZV9yZXN1bHQ=",
"Properties": {
"tti_IngestTime": "2018-09-26T20:10:55.4480047+00:00",
"tti_SLAThresholdMins": 230,
"MinsPastSla": -5
}
}
]
Sending 100 events one by one to ASA, is there any performance concern?
Thank you!
Seem to find the answer.
(1) the JSON I am sending looks correct, and the post request to EvenHub is successful.
Post body is [{}, {}, {}], which is the correct format
(2) ASA couldn't read the stream is likely due to not able to deserialize the messages from EventHub.
I happen to change how I encode the base64 string for the "ContentData" send to the EventHub. It looks like the message sent to the EH,
{
"ContentData": "some base64() string",
"Properties": {}
},
the base64() needs to encode the "Properties" value, not anything else, for ASA to be able to deserialize the message.
It didn't work because I encoded using a random string instead of the value of the "Properties".

Not able to log lengthy messages using application insights trackEvent() method in Node.js

We are trying to log some lengthy message using AppInsights trackEvent() message. But it is not logging into AppInsights and not giving any error.
Please help me in logging lengthy string.
Please let us know the max limit for the trackEvent()
if you want to log messages then you should be using the trackTrace methods of the AI SDK, not trackEvent. trackTrace is intended for long messages and has a huge limit: (32k!) See https://github.com/Microsoft/ApplicationInsights-dotnet/blob/develop/Schema/PublicSchema/MessageData.bond#L13
trackEvent is intended for named "events" like "opened file" or "clicked retry" or "canceled frobulating", where you might want to make charts, and track usage of a thing over time.
you can attach custom properties (string key, string value) and custom metrics (string key, double value) to anything. and if you set the operationId field on things in the sdk, anything with the same operationId can be easily found together via queries or visualized in the Azure Portal or in Visual Studio:
There are indeed limitation regarding the length. For example, the limit of the Name property of an event is 512 characters. See https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/src/Core/Managed/Shared/Extensibility/Implementation/Property.cs#L23
You can split it on substrings and put in Properties collection, each collection value length is 8 * 1024. I got this as a tip when I asked for it. See https://social.msdn.microsoft.com/Forums/en-US/84bd5ade-0b21-47cc-9b39-c6c7a292d87e/dependencytelemetry-sql-command-gets-truncated?forum=ApplicationInsights. Never tried it myself though

Is it possible to generate a unique BlobOutput name from an Azure WebJobs QueueInput item?

I have a continuous Azure WebJob that is running off of a QueueInput, generating a report, and outputting a file to a BlobOutput. This job will run for differing sets of data, each requiring a unique output file. (The number of inputs is guaranteed to scale significantly over time, so I cannot write a single job per input.) I would like to be able to run this off of a QueueInput, but I cannot find a way to set the output based on the QueueInput value, or any value except for a blob input name.
As an example, this is basically what I want to do, though it is invalid code and will fail.
public static void Job([QueueInput("inputqueue")] InputItem input, [BlobOutput("fileoutput/{input.Name}")] Stream output)
{
//job work here
}
I know I could do something similar if I used BlobInput instead of QueueInput, but I would prefer to use a queue for this job. Am I missing something or is generating a unique output from a QueueInput just not possible?
There are two alternatives:
Use IBInder to generate the blob name. Like shown in these samples
Have an autogenerated in the queue message object and bind the blob name to that property. See here (the BlobNameFromQueueMessage method) how to bind a queue message property to a blob name
Found the solution at Advanced bindings with the Windows Azure Web Jobs SDK via Curah's Complete List of Web Jobs Tutorials and Videos.
Quote for posterity:
One approach is to use the IBinder interface to bind the output blob and specify the name that equals the order id. The better and simpler approach (SimpleBatch) is to bind the blob name placeholder to the queue message properties:
public static void ProcessOrder(
[QueueInput("orders")] Order newOrder,
[BlobOutput("invoices/{OrderId}")] TextWriter invoice)
{
// Code that creates the invoice
}
The {OrderId} placeholder from the blob name gets its value from the OrderId property of the newOrder object. For example, newOrder is (JSON): {"CustomerName":"Victor","OrderId":"abc42"} then the output blob name is “invoices/abc42″. The placeholder is case-sensitive.
So, you can reference individual properties from the QueueInput object in the BlobOutput string and they will be populated correctly.

Resources