How to get Azure EventHub Depth - azure

My EventHub has millions of messages ingestion every day. I'm processing those messages from Azure Function and printing offset and squence number value in logs.
public static async Task Run([EventHubTrigger("%EventHub%", Connection = "EventHubConnection", ConsumerGroup = "%EventHubConsumerGroup%")]EventData eventMessage,
[Inject]ITsfService tsfService, [Inject]ILog log)
{
log.Info($"PartitionKey {eventMessage.PartitionKey}, Offset {eventMessage.Offset} and SequenceNumber {eventMessage.SequenceNumber}");
}
Log output
PartitionKey , Offset 78048157161248 and SequenceNumber 442995283
Questions
PartitionKey value blank? I have 2 partitions in that EventHub
Is there any way to check backlogs? Some point of time I want to get how many messages my function need to process.

Yes, you can include the PartitionContext object as part of the signature, which will give you some additional information,
public static async Task Run([EventHubTrigger("HubName",
Connection = "EventHubConnectionStringSettingName",
ConsumerGroup = "Consumer-Group-If-Applicable")] EventData[] messageBatch, PartitionContext partitionContext, ILogger log)
Edit your host.json and set enableReceiverRuntimeMetric to true, e.g.
"version": "2.0",
"extensions": {
"eventHubs": {
"batchCheckpointFrequency": 100,
"eventProcessorOptions": {
"maxBatchSize": 256,
"prefetchCount": 512,
"enableReceiverRuntimeMetric": true
}
}
}
You now get access to RuntimeInformation on the PartitionContext, which has some information about the LastSequenceNumber, and your current message has it's own sequence number, so you could use the difference between these to calculate a metric, e.g something like,
public class EventStreamBacklogTracing
{
private static readonly Metric PartitionSequenceMetric =
InsightsClient.Instance.GetMetric("PartitionSequenceDifference", "PartitionId", "ConsumerGroupName", "EventHubPath");
public static void LogSequenceDifference(EventData message, PartitionContext context)
{
var messageSequence = message.SystemProperties.SequenceNumber;
var lastEnqueuedSequence = context.RuntimeInformation.LastSequenceNumber;
var sequenceDifference = lastEnqueuedSequence - messageSequence;
PartitionSequenceMetric.TrackValue(sequenceDifference, context.PartitionId, context.ConsumerGroupName,
context.EventHubPath);
}
}
I wrote an article on medium that goes into a bit more detail and show how you might consume the data in grafana,
https://medium.com/#dylanm_asos/azure-functions-event-hub-processing-8a3f39d2cd0f

PartitionKey value blank? I have 2 partitions in that EventHub
The partition key is not the same as the partition ids. When you publish an event to Event Hubs, you can set the partition key. If that partition key is not set, then it will be null when you go to consume it.
Partition key is for events where you don't care what partition it ends up in, just that you want events with the same key to end up in the same partition.
An example would be if you had hundreds of IoT devices transmitting telemetry data. You don't care what partition these IoT devices publish their data to, as long as it always ends up in the same partition. You may set the partition key to the serial number of the IoT device.
When that device publishes its event data with that key, the Event Hubs service will calculate a hash for that partition key, map it to a specific Event Hub partition, and will route any events with that key to the same partition.
The documentation from "Event Hubs Features: Publishing an Event" depicts it pretty well.

Related

Partial Data Being Ingested To Azure Data Explorer From Event Hub

I currently have an Azure Data Explorer setup to ingest data from Event Hub. For some reason unknown to me, my ingestion table is only seeing about 45% of events. I am testing this by sending 100 events to event hub individually at a time. I know my event hub is receiving these events because I setup a SQL table to also ingest these events, and that table is receiving 100% of them (under a separate consumer group). My assumption is that I have setup my Azure Data Explorer table incorrectly.
I have a very basic object I am sending
public class TestDocument
{
[JsonProperty("DocumentId")]
public string DocumentId { get; set; }
[JsonProperty("Title")]
public string Title { get; set; }
{
I have enabled streaming ingestion in Azure
Azure Data Explorer > Configurations > Streaming ingestion (ON)
I have enabled streaming ingestion in my table
.alter table TestTable policy streamingingestion enable
My Table mapping is as follows
.alter table TestTable ingestion json mapping "TestTable_mapping" '[{"column":"DocumentId","datatype":"string","Path":"$[\'DocumentId\']"},{"column":"Title","datatype":"string","Path":"$[\'Title\']"}]'
My data connection settings
Consumer group: Its own group
Event system properties: 0
Table name: TestTable
Data format: JSON
Mapping name: TestTable_mapping
Is there something I am missing here? Consistently, out of 100 events sent, I only see about 45-48 get ingested in my table.
EDIT:
Json payload of TestDocument
{"DocumentId":"10","Title":"TEST"}
Found out what is happening, I am adding a BOM to my serialized object, and it looks like ADX has issues with it. When I tried serializing my object without a BOM, I was able to see all data flow from event hub to ADX.
Here's a sample of how I am doing it:
private static readonly JsonSerializer Serializer;
static SerializationHelper()
{
Serializer = JsonSerializer.Create(SerializationSettings);
}
public static void Serialize(Stream stream, object toSerialize)
{
using var streamWriter = new StreamWriter(stream, Encoding.UTF8, DefaultStreamBufferSize, true);
using var jsonWriter = new JsonTextWriter(streamWriter);
Serializer.Serialize(jsonWriter, toSerialize);
}
What fixed it:
public static void Serialize(Stream stream, object toSerialize)
{
using var streamWriter = new StreamWriter(stream, new UTF8Encoding(false), DefaultStreamBufferSize, true);
using var jsonWriter = new JsonTextWriter(streamWriter);
Serializer.Serialize(jsonWriter, toSerialize);
}

Logging long JSON gets trimmed in azure application insights

My goal is to log the users requests by using the azure application insights, the requests are being converted into JSON format and then saved.
Sometimes the user request can be very long and it gets trimmed in the azure application insight view which result in not-valid JSON.
Underneath CustomDimensions it looks like:
I'm using the Microsoft.ApplicationInsights.TelemetryClient namespace.
This is my code:
var properties = new Dictionary<string, string>
{
{ "RequestJSON", requestJSON }
};
TelemetryClientInstance.TrackTrace("some description", SeverityLevel.Verbose, properties);
I'm refer this overload:
public void TrackTrace(string message, SeverityLevel severityLevel, IDictionary<string, string> properties);
As per Trace telemetry: Application Insights data model, for Custom Properties, the Max value length is 8192.
In your case, it exceeds the limitation.
I can think of 2 solutions:
1.Write the requestJSON into message field when using TrackTrace method. The trace message Max length is 32768 characters, it may meet your need.
2.Split the requestJSON into more than 1 custom properties, when it's length is larger than 8192. For example, if the length of the requestJSON is 2*8192, then you can add 2 custome properties: property RequestJSON_1 stores the first 8192 data, and property RequestJSON_2 stores the left 8192 data.
When using solution 2, you can easily use Kusto query to join property RequestJSON_1 and property RequestJSON_2 together, so you get the completed / valid json data.

Azure service bus implementation

Basically my ‘Maintopic’ topic receives three types of xml files (‘TEST’,’DEV’,’PROD’).
‘MainSubscription’ subscribes to that topic and now based on the XML file type, I need to route the XML files to:
Respective topics (child topics).
See the below message flow.
Maintopic --à MainSubscription’ (Filter on xml node type)-- > child topic 1 ( xml node type=’TEST’)
child topic 2 ( xml node type=’DEV’)
child topic 3 ( xml node type=’PROD)
I can add a subscription to the ‘Maintopic’, but where can I define all the filter logic for routing the file?
I am new to Azure cloud, how can I do this? I don't even know where to start.
Service Bus supports three filter conditions:
Boolean filters - The TrueFilter and FalseFilter either cause all arriving messages (true) or none of the arriving messages (false) to be selected for the subscription.
SQL Filters - A SqlFilter holds a SQL-like conditional expression that is evaluated in the broker against the arriving messages' user-defined properties and system properties.
Correlation Filters - A CorrelationFilter holds a set of conditions that are matched against one or more of an arriving message's user and system properties.
You must create a filtered subscription which will only receive messages you are interested in.
A filter can be based on any properties of the BrokeredMessage, except the body, since that would require every message to be desterilized in order to be handed on to the correct subscriptions. You can use sql filter.
An example of sql filter is below –
if (!namespaceManager.SubscriptionExists(topicName, filteredSubName1))
{
namespaceManager.CreateSubscription(topicName, filteredSubName1, new SqlFilter("From LIKE '%Smith'"));
}
You don’t send your messages directly to a subscription, you send them to the topic, and that will forward them to all the relevant subscriptions based on their filters. Below is the example -
var message1 = new BrokeredMessage("Second message");
message1.Properties["From"] = "Alan Smith";
var client = TopicClient.CreateFromConnectionString(connectionString, topicName);
client.Send(message1);
Below is how you receive message –
var subClient = SubscriptionClient.CreateFromConnectionString(connectionString, topicName, subscriptionName);
var received = subClient.ReceiveBatch(10, TimeSpan.FromSeconds(5));
foreach (var message in received)
{
Console.WriteLine("{0} '{1}' Label: '{2}' From: '{3}'",
subscriptionName,
message.GetBody<string>(),
message.Label,
message.Properties["From"]);
}

Spring Aggregation Group

I did create an aggregate service as below
#EnableBinding(Processor.class)
class Configuration {
#Autowired
Processor processor;
#ServiceActivator(inputChannel = Processor.INPUT)
#Bean
public MessageHandler aggregator() {
AggregatingMessageHandler aggregatingMessageHandler =
new AggregatingMessageHandler(new DefaultAggregatingMessageGroupProcessor(),
new SimpleMessageStore(10));
//AggregatorFactoryBean aggregatorFactoryBean = new AggregatorFactoryBean();
//aggregatorFactoryBean.setMessageStore();
aggregatingMessageHandler.setOutputChannel(processor.output());
//aggregatorFactoryBean.setDiscardChannel(processor.output());
aggregatingMessageHandler.setSendPartialResultOnExpiry(true);
aggregatingMessageHandler.setSendTimeout(1000L);
aggregatingMessageHandler.setCorrelationStrategy(new ExpressionEvaluatingCorrelationStrategy("requestType"));
aggregatingMessageHandler.setReleaseStrategy(new MessageCountReleaseStrategy(3)); //ExpressionEvaluatingReleaseStrategy("size() == 5")
aggregatingMessageHandler.setExpireGroupsUponCompletion(true);
aggregatingMessageHandler.setGroupTimeoutExpression(new ValueExpression<>(3000L)); //size() ge 2 ? 5000 : -1
aggregatingMessageHandler.setExpireGroupsUponTimeout(true);
return aggregatingMessageHandler;
}
}
Now i want to release the group as soon as a new group is created, so i only have one group at a time.
To be more specific i do receive two types of requests 'PUT' and 'DEL' . i want to keep aggregating per the above rules but as soon as i receive a request type other than what i am aggregating i want to release the current group and start aggregating the new Type.
The reason i want to do this is because these requests are sent to another party that don't support having PUT and DEL requests at the same time and i can't delay any DEL request as sequence between PUT and DEL is important.
I understand that i need to create a custom release Pojo but will i be able to check the current groups ?
For Example
If i receive 6 messages like below
PUT PUT PUT DEL DEL PUT
they should be aggregated as below
3PUT
2 DEL
1 PUT
OK. Thank you for sharing more info.
Yes, you custom ReleaseStrategy can check that message type and return true to lead to the group completion function.
As long as you have only static correlationKey, so only one group is there in the store. When your message is stepping to the ReleaseStrategy, there won't be much magic just to check the current group for completion signal. Since there are no any other groups in the store, there is no need any complex release logic.
You should add expireGroupsUponCompletion = true to let the group to be removed after completion and the next message will form a new group for the same correlationKey.
UPDATE
Thank you for further info!
So, yes, your original PoC is good. And even static correlationKey is fine, since you are just going to collect incoming messages to batches.
Your custom ReleaseStrategy should analyze MessageGroup for a message with different key and return true in that case.
The custom MessageGroupProcessor should filter a message with different key from the output List and send that message to the aggregator back to let to form a new group for a sequence for its key.
i ended up implementing the below ReleaseStrategy as i found it simpler than removing message and queuing it again.
class MessageCountAndOnlyOneGroupReleaseStrategy implements org.springframework.integration.aggregator.ReleaseStrategy {
private final int threshold;
private final MessageGroupProcessor messageGroupProcessor;
public MessageCountAndOnlyOneGroupReleaseStrategy(int threshold,MessageGroupProcessor messageGroupProcessor) {
super();
this.threshold = threshold;
this.messageGroupProcessor = messageGroupProcessor;
}
private MessageGroup currentGroup;
#Override
public boolean canRelease(MessageGroup group) {
if(currentGroup == null)
currentGroup = group;
if(!group.getGroupId().equals(currentGroup.getGroupId())) {
messageGroupProcessor.processMessageGroup(currentGroup);
currentGroup = group;
return false;
}
return group.size() >= this.threshold;
}
}
Note that i did used new HeaderAttributeCorrelationStrategy("request_type") instead of just FOO for CollorationStrategy

How can I get information about who is the primary node of the distributed queue and who is the backup node?

I add some data into distributed queue and I am wondering how I can get some information about who is the primary node of the queue and who is the backup node.
Thanks,
Bill
GridGain cache queue is distributed, i.e. different elements can be stored on different nodes. If your cache has backups then each element will be duplicated on two or more nodes. So there is no way to determine the primary or backup node for non-collocated queue.
If queue is collocated, all its items will be stored on one node (this can be used if you have many small queues instead of one large queue). In this case you can get primary and backup node for this queue passing queue name to affinity, like this:
// Create collocated queue (3rd parameter is true).
GridCacheQueue<String> queue = cache.dataStructures().queue("MyQueue", 100, true, true);
// Get primary and backup nodes using cache affinity.
Iterator<GridNode> nodes = cache.affinity().mapKeyToPrimaryAndBackups("MyQueue").iterator();
// First element in collection is always the primary node.
GridNode primary = nodes.next();
// Other nodes in collection are backup nodes.
GridNode backup1 = nodes.next();
GridNode backup2 = nodes.next();
You don't see anything during iteration over cache because queue elements are internal entries, so they are accessible only via GridCacheQueue API, but not via GridCache API. Here is an example:
// Create or get a queue.
GridCacheQueue<String> queue = cache.dataStructures().queue("MyQueue", 100, false, true);
for (String item : queue)
System.out.println(item);
So far I know that distributed queue is based on GridGain cache. However, I run the following code and I get empty cache.
GridCache<Object, Object> cache = grid.cache("partitioned_tx");
GridCacheDataStructures dataStruct = cache.dataStructures();
GridCacheQueue<String> queue = dataStruct.queue("myQueueName", 0, false, true);
for (int i = 0; i < 20; i++){
queue.add("Value-"+i);
}
GridCacheAffinity<Object> affinity = cache.affinity();
int part;
Collection<GridNode> nodes;
for(Object key:cache.keySet()){
System.out.println("key="+key.toString());
part = affinity.partition(key);
nodes = affinity.mapPartitionToPrimaryAndBackups(part);
for(GridNode node:nodes){
System.out.println("key of "+key.toString()+" is primary: "+affinity.isPrimary(node, key));
System.out.println("key of "+key.toString()+" is backup: "+affinity.isBackup(node, key));
}
}

Resources