Partial Data Being Ingested To Azure Data Explorer From Event Hub - azure

I currently have an Azure Data Explorer setup to ingest data from Event Hub. For some reason unknown to me, my ingestion table is only seeing about 45% of events. I am testing this by sending 100 events to event hub individually at a time. I know my event hub is receiving these events because I setup a SQL table to also ingest these events, and that table is receiving 100% of them (under a separate consumer group). My assumption is that I have setup my Azure Data Explorer table incorrectly.
I have a very basic object I am sending
public class TestDocument
{
[JsonProperty("DocumentId")]
public string DocumentId { get; set; }
[JsonProperty("Title")]
public string Title { get; set; }
{
I have enabled streaming ingestion in Azure
Azure Data Explorer > Configurations > Streaming ingestion (ON)
I have enabled streaming ingestion in my table
.alter table TestTable policy streamingingestion enable
My Table mapping is as follows
.alter table TestTable ingestion json mapping "TestTable_mapping" '[{"column":"DocumentId","datatype":"string","Path":"$[\'DocumentId\']"},{"column":"Title","datatype":"string","Path":"$[\'Title\']"}]'
My data connection settings
Consumer group: Its own group
Event system properties: 0
Table name: TestTable
Data format: JSON
Mapping name: TestTable_mapping
Is there something I am missing here? Consistently, out of 100 events sent, I only see about 45-48 get ingested in my table.
EDIT:
Json payload of TestDocument
{"DocumentId":"10","Title":"TEST"}

Found out what is happening, I am adding a BOM to my serialized object, and it looks like ADX has issues with it. When I tried serializing my object without a BOM, I was able to see all data flow from event hub to ADX.
Here's a sample of how I am doing it:
private static readonly JsonSerializer Serializer;
static SerializationHelper()
{
Serializer = JsonSerializer.Create(SerializationSettings);
}
public static void Serialize(Stream stream, object toSerialize)
{
using var streamWriter = new StreamWriter(stream, Encoding.UTF8, DefaultStreamBufferSize, true);
using var jsonWriter = new JsonTextWriter(streamWriter);
Serializer.Serialize(jsonWriter, toSerialize);
}
What fixed it:
public static void Serialize(Stream stream, object toSerialize)
{
using var streamWriter = new StreamWriter(stream, new UTF8Encoding(false), DefaultStreamBufferSize, true);
using var jsonWriter = new JsonTextWriter(streamWriter);
Serializer.Serialize(jsonWriter, toSerialize);
}

Related

Kusto data ingestion from an Azure Function App ends with a 403

I try to ingest data from azure function app into a ADX database. I followed the instruction found in the the article here.
The difference is, I'd like to insert data into the table. I struggle with a 403 error "Principal 'aadapp=;' is not authorized to access table"
What I did:
I have created a AAD App with the following API permissions:
AAD App configured permission
I configured the database via Kusto Explorer:
.add database myDB ingestors ('aadapp=;')
'theAADAppname'
.add table PressureRecords ingestors ('aadapp=;') 'theAADAppname'
.add table TemperatureRecords ingestors ('aadapp=;') 'theAADAppname'
My code:
var kcsbDM = new KustoConnectionStringBuilder($"https://ingest-{serviceNameAndRegion}.kusto.windows.net:443/").WithAadApplicationKeyAuthentication(
applicationClientId: "<my AD app Id>",
applicationKey: "<my App Secret from Certificates & secrets>",
authority: "<my tenant Id>");
using (var ingestClient = KustoIngestFactory.CreateQueuedIngestClient(kcsbDM))
{
var ingestProps = new KustoQueuedIngestionProperties(databaseName, tableName);
ingestProps.ReportLevel = IngestionReportLevel.FailuresAndSuccesses;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
ingestProps.JSONMappingReference = mappingName;
ingestProps.Format = DataSourceFormat.json;
using (var memStream = new MemoryStream())
using (var writer = new StreamWriter(memStream))
{
var messageString = JsonConvert.SerializeObject(myObject); // maps to the table / mapping
writer.WriteLine(messageString);
writer.Flush();
memStream.Seek(0, SeekOrigin.Begin);
// Post ingestion message
ingestClient.IngestFromStream(memStream, ingestProps, leaveOpen: true);
}
The issue is that the mapping you are using in this ingestion command does not match the existing table schema (it has additional columns). In these cases Azure Data Explorer (Kusto) attempts to add the additional columns it finds in the mappings. Since the permission that the app has is 'ingestor', it cannot modify the table structure and thus the ingestion fails.
In your specific case, your table has a column that is written in a specific casing and in the ingestion mapping the same column has a different casing (for one character) so it is treated as a new column.
We will look into providing a better error message in this case.
Update: the issue is fixed in the system and now it works as expected.
Avnera thanks for your hint, potential it is an issue because of the Real vs double translation. In one of my first try I used double in the table and that worked. That is not longer possible, looks the supported data types changed.
My current configuration:
.create table PressureRecords ( Timestamp:datetime, DeviceId:guid, Pressure:real )
.create-or-alter table PressureRecords ingestion json mapping "PressureRecords"
'['
'{"column":"TimeStamp","path":"$.DateTime","datatype":"datetime","transform":null},'
'{"column":"DeviceId","path":"$.DeviceId","datatype":"guid","transform":null},'
'{"column":"Pressure","path":"$.Pressure","datatype":"real","transform":null}'
']'
public class PressureRecord
{
[JsonProperty(PropertyName = "Pressure")]
public double Pressure { get; set; }
[JsonProperty(PropertyName = "DateTime")]
public DateTime DateTime { get; set; } = DateTime.Now;
[JsonProperty(PropertyName = "DeviceId")]
[Key]
public Guid DeviceId { get; set; }
}

Logging long JSON gets trimmed in azure application insights

My goal is to log the users requests by using the azure application insights, the requests are being converted into JSON format and then saved.
Sometimes the user request can be very long and it gets trimmed in the azure application insight view which result in not-valid JSON.
Underneath CustomDimensions it looks like:
I'm using the Microsoft.ApplicationInsights.TelemetryClient namespace.
This is my code:
var properties = new Dictionary<string, string>
{
{ "RequestJSON", requestJSON }
};
TelemetryClientInstance.TrackTrace("some description", SeverityLevel.Verbose, properties);
I'm refer this overload:
public void TrackTrace(string message, SeverityLevel severityLevel, IDictionary<string, string> properties);
As per Trace telemetry: Application Insights data model, for Custom Properties, the Max value length is 8192.
In your case, it exceeds the limitation.
I can think of 2 solutions:
1.Write the requestJSON into message field when using TrackTrace method. The trace message Max length is 32768 characters, it may meet your need.
2.Split the requestJSON into more than 1 custom properties, when it's length is larger than 8192. For example, if the length of the requestJSON is 2*8192, then you can add 2 custome properties: property RequestJSON_1 stores the first 8192 data, and property RequestJSON_2 stores the left 8192 data.
When using solution 2, you can easily use Kusto query to join property RequestJSON_1 and property RequestJSON_2 together, so you get the completed / valid json data.

How to get Azure EventHub Depth

My EventHub has millions of messages ingestion every day. I'm processing those messages from Azure Function and printing offset and squence number value in logs.
public static async Task Run([EventHubTrigger("%EventHub%", Connection = "EventHubConnection", ConsumerGroup = "%EventHubConsumerGroup%")]EventData eventMessage,
[Inject]ITsfService tsfService, [Inject]ILog log)
{
log.Info($"PartitionKey {eventMessage.PartitionKey}, Offset {eventMessage.Offset} and SequenceNumber {eventMessage.SequenceNumber}");
}
Log output
PartitionKey , Offset 78048157161248 and SequenceNumber 442995283
Questions
PartitionKey value blank? I have 2 partitions in that EventHub
Is there any way to check backlogs? Some point of time I want to get how many messages my function need to process.
Yes, you can include the PartitionContext object as part of the signature, which will give you some additional information,
public static async Task Run([EventHubTrigger("HubName",
Connection = "EventHubConnectionStringSettingName",
ConsumerGroup = "Consumer-Group-If-Applicable")] EventData[] messageBatch, PartitionContext partitionContext, ILogger log)
Edit your host.json and set enableReceiverRuntimeMetric to true, e.g.
"version": "2.0",
"extensions": {
"eventHubs": {
"batchCheckpointFrequency": 100,
"eventProcessorOptions": {
"maxBatchSize": 256,
"prefetchCount": 512,
"enableReceiverRuntimeMetric": true
}
}
}
You now get access to RuntimeInformation on the PartitionContext, which has some information about the LastSequenceNumber, and your current message has it's own sequence number, so you could use the difference between these to calculate a metric, e.g something like,
public class EventStreamBacklogTracing
{
private static readonly Metric PartitionSequenceMetric =
InsightsClient.Instance.GetMetric("PartitionSequenceDifference", "PartitionId", "ConsumerGroupName", "EventHubPath");
public static void LogSequenceDifference(EventData message, PartitionContext context)
{
var messageSequence = message.SystemProperties.SequenceNumber;
var lastEnqueuedSequence = context.RuntimeInformation.LastSequenceNumber;
var sequenceDifference = lastEnqueuedSequence - messageSequence;
PartitionSequenceMetric.TrackValue(sequenceDifference, context.PartitionId, context.ConsumerGroupName,
context.EventHubPath);
}
}
I wrote an article on medium that goes into a bit more detail and show how you might consume the data in grafana,
https://medium.com/#dylanm_asos/azure-functions-event-hub-processing-8a3f39d2cd0f
PartitionKey value blank? I have 2 partitions in that EventHub
The partition key is not the same as the partition ids. When you publish an event to Event Hubs, you can set the partition key. If that partition key is not set, then it will be null when you go to consume it.
Partition key is for events where you don't care what partition it ends up in, just that you want events with the same key to end up in the same partition.
An example would be if you had hundreds of IoT devices transmitting telemetry data. You don't care what partition these IoT devices publish their data to, as long as it always ends up in the same partition. You may set the partition key to the serial number of the IoT device.
When that device publishes its event data with that key, the Event Hubs service will calculate a hash for that partition key, map it to a specific Event Hub partition, and will route any events with that key to the same partition.
The documentation from "Event Hubs Features: Publishing an Event" depicts it pretty well.

SQLInjection against CosmosDB in an Azure function

I have implemented an Azure function that is triggered by a HttpRequest. A parameter called name is passed as part of the HttpRequest. In Integration section, I have used the following query to retrieve data from CosmosDB (as an input):
SELECT * FROM c.my_collection pm
WHERE
Contains(pm.first_name,{name})
As you see I am sending the 'name' without sanitizing it. Is there any SQLInjection concern here?
I searched and noticed that parameterization is available but that is not something I can do anything about here.
When the binding occurs (the data from the HTTP Trigger gets sent to the Cosmos DB Input bind), it is passed through a SQLParameterCollection that will handle sanitization.
Please view this article:
Parameterized SQL provides robust handling and escaping of user input, preventing accidental exposure of data through “SQL injection”
This will cover any attempt to inject SQL through the name property.
If you're using Microsoft.Azure.Cosmos instead of Microsoft.Azure.Documents:
public class MyContainerDbService : IMyContainerDbService
{
private Container _container;
public MyContainerDbService(CosmosClient dbClient)
{
this._container = dbClient.GetContainer("MyDatabaseId", "MyContainerId");
}
public async Task<IEnumerable<MyEntry>> GetMyEntriesAsync(string queryString, Dictionary<string, object> parameters)
{
if ((parameters?.Count ?? 0) < 1)
{
throw new ArgumentException("Parameters are required to prevent SQL injection.");
}
var queryDef = new QueryDefinition(queryString);
foreach(var parm in parameters)
{
queryDef.WithParameter(parm.Key, parm.Value);
}
var query = this._container.GetItemQueryIterator<MyEntry>(queryDef);
List<MyEntry> results = new List<MyEntry>();
while (query.HasMoreResults)
{
var response = await query.ReadNextAsync();
results.AddRange(response.ToList());
}
return results;
}
}

OMS log search doesnot display all the columns present in WADETWEventTable azure diagnostic table

I have a cusotm event source which has special properties like message, componentName,Priority.
[The custom etw event source properties gets converted into azure WADETWEventTable table columns] .
My idea is to view logs stored in azure tables by using Microsoft operations management suite (OMS). I can see the logs but it doesn't display all the columns.
But OMS doesnot display these columns. I am using below code/configuration-
[EventSource(Name = "CustomEtw.OperationTrace")]
public sealed class CustomEventSource : EventSource
{
public static CustomEventSource log = new CustomEventSource();
#region [Custom Event Source]
[Event(1, Level = EventLevel.Informational)]
public void Info(string message, string componentName, bool priority)
{
WriteEvent(1, message, componentName, priority);
}
[Event(2, Level = EventLevel.Warning)]
public void Warning(string warningData)
{
WriteEvent(2, warningData);
}
}
Above custom event source logs data on to ETW stream and the same is visible on azure diagnostic table i.e.WADETWEventTable. This Azure table has data in message,ComponentName and Priority columns as well, but OMS doesn't display these columns when we search thru log search.
Please help, am I missing any configuration that need to b done at OMS side?
Why OMS displays only few columns?

Resources