Azure EventHub Message Ordering - azure

I have an event hub trigger that I've configured to listen to an event hub for message. On the sending side javascript script I'm initializing a client as follows
const client = new EventHubProducerClient(hubConnectionString, hubName);
and initializing a batch as follows:
const batchOptions = {
partitionKey: sessionId,
};
const batch = await client.createBatch(batchOptions)
and submitting the batch using the same options:
await client.sendBatch(batch, batchOptions);
In my host.json I've conifgured the eventHub maxBatchSize as follows:
"eventHub": {
"maxBatchSize": 1
},
I'm finding that I can't get the messages in order in the EventHubTrigger implemented in Python. While processing each of the List[func.EventHubEvent] and while logging the partition key as follows:
def main(events: List[func.EventHubEvent]):
for event in events:
logging.info(f"PartitionId: ${event.partition_key}")
I always seem to get PartitionId: $None which seems to indicate that the partition is not being set and potentially why it is out of order.

It looks like I wasn't passing the value down to the method so once I fixed that it appeared to be working.

Related

Unable to configure Azure Event Hub Producer

I am trying a sample code of Azure Event Hub Producer and trying to send some message to Azure Event Hub.
The eventhub and its policy is correctly configured for sending and listening messages. I am using Dotnet core 3.1 console application. However, the code doesn't move beyond CreateBatchAsync() call. I tried debugging and the breakpoint doesn't go to next line. Tried Try-catch-finally and still no progress. Please guide what I am doing wrong here. The Event hub on Azure is shows some number of successful incoming requests.
class Program
{
private const string connectionString = "<event_hub_connection_string>";
private const string eventHubName = "<event_hub_name>";
static async Task Main()
{
// Create a producer client that you can use to send events to an event hub
await using (var producerClient = new EventHubProducerClient(connectionString, eventHubName))
{
// Create a batch of events
using EventDataBatch eventBatch = await producerClient.CreateBatchAsync();
// Add events to the batch. An event is a represented by a collection of bytes and metadata.
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("First event")));
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("Second event")));
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("Third event")));
// Use the producer client to send the batch of events to the event hub
await producerClient.SendAsync(eventBatch);
Console.WriteLine("A batch of 3 events has been published.");
}
}
}
The call to CreateBatchAsync would be the first need to create a connection to Event Hubs. This indicates that you're likely experiencing a connectivity or authorization issue.
In the default configuration you're using, the default network timeout is 60 seconds and up to 3 retries are possible, with some back-off between them.
Because of this, a failure to connect or authorize may take up to roughly 5 minutes before it manifests. That said, the majority of connection errors are not eligible for retries, so the failure would normally surface after roughly 1 minute.
To aid in your debugging, I'd suggest tweaking the default retry policy to speed things up and surface an exception more quickly so that you have the information needed to troubleshoot and make adjustments. The options to do so are discussed in this sample and would look something like:
var connectionString = "<< CONNECTION STRING FOR THE EVENT HUBS NAMESPACE >>";
var eventHubName = "<< NAME OF THE EVENT HUB >>";
var options = new EventHubProducerClientOptions
{
RetryOptions = new EventHubsRetryOptions
{
// Allow the network operation only 15 seconds to complete.
TryTimeout = TimeSpan.FromSeconds(15),
// Turn off retries
MaximumRetries = 0,
Mode = EventHubsRetryMode.Fixed,
Delay = TimeSpan.FromMilliseconds(10),
MaximumDelay = TimeSpan.FromSeconds(1)
}
};
await using var producer = new EventHubProducerClient(
connectionString,
eventHubName,
options);

correlationid for azure events

What is the correct way to add a correlation-id to azure events ?
Right now, I send the events as follows:
const { EventHubProducerClient } = require('#azure/event-hubs');
const producer = new EventHubProducerClient(connectionString, eventHubName);
const batch = await producer.createBatch();
batch.tryAdd({
body: {
foo: "bar"
}
});
await producer.sendBatch(batch);
Of course as a workaround I could just add my own field to the body. However, I suspect that there is a built-in mechanism or default approach to do this.
The latest release exposes a correlationId property on EventData, which corresponds to the correlation-id field of the message properties section of the underlying AMQP message.
One important call-out is that the correlationId is intended to enable tracing of data within an application, such as an event's path from producer to consumer. It has no meaning to the Event Hubs service or within a distributed tracing/AppInsights/OpenTelemetry context.

EventHub with NodeJS SDK - All consumers in ConsumerGroup getting the message

I hope someone can clarify this for me:
I have 2 consumers in the same ConsumerGroup, it is my understanding that they should coordinate between them, but I am having the issue that both consumers are getting all the messages. My code is pretty simple:
const connectionString =...";
const eventHubName = "my-hub-dev";
const consumerGroup = "processor";
async function main() {
const consumerClient = new EventHubConsumerClient(consumerGroup, connectionString, eventHubName);
const subscription = consumerClient.subscribe({
processEvents: async (events, context) => {
for (const event of events) {
console.log(`Received event...`, event)
}
},
}
);
If I run two instances of this consumer code and publish an event, both instances will receive the event.
So my questions are:
Am I correct in my understanding that only 1 consumer should receive the message?
Is there anything I am missing here?
The EventHubConsumerClient requires a CheckpointStore that facilitates coordination between multiple clients. You can pass this to the EventHubConsumerClient constructor when you instantiate it.
The #azure/eventhubs-checkpointstore-blob uses Azure Storage Blob to store the metadata and required to coordinate multiple consumers using the same consumer group. It also stores checkpoint data: you can call context.updateCheckpoint with an event and if you stop and start a new receiver, it will continue from the last checkpointed event in the partition that event was associated with.
There's a full sample using the #azure/eventhubs-checkpointstore-blob here: https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/eventhub/eventhubs-checkpointstore-blob/samples/javascript/receiveEventsUsingCheckpointStore.js
Clarification: The Event Hubs service doesn't enforce a single owner for a partition when reading from a consumer group unless the client has specified an ownerLevel. The highest ownerLevel "wins". You can set this in the options bag you pass to subscribe, but if you want the CheckpointStore to handle coordination for you it's best not to set it.

Posting events to Azure Event Hubs using Policy defined on Event Hub Namespace

I have created an Event Hub Namespace and 2 event hubs. I defined a Shared Access Policy (SAP) on the Event Hub Namespace. However, when I use the connection string defined on the namespace, I am able to send events to only one of the hubs even though I create the client using the correct event hub name
function void SendEvent(connectionString, eventHubName){
await using(var producerClient = new EventHubProducerClient(connectionString, eventHubName)) {
// Create a batch of events
using EventDataBatch eventBatch = await producerClient.CreateBatchAsync();
var payload = GetEventModel(entity, entityName);
// Add events to the batch. An event is a represented by a collection of bytes and metadata.
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes(payload.ToString())));
// Use the producer client to send the batch of events to the event hub
await producerClient.SendAsync(eventBatch);
System.Diagnostics.Debug.WriteLine($"Event for {entity} sent to Hub {eventHubName}");
}
}
The above code is called for sending events to Hub1 and Hub2. When I use the connection string from the SAP defined on the Namespace, I can only send events to Hub1 or Hub2 whichever happens to be called first. I am specifying the eventHubName as Hub1 or Hub2 as appropriate.
I call the function SendEvent in my calling code.
The only way I can send to both hubs is to define SAP on each hub and use that connection string when creating the EventHubProducer
Am I missing something or is this by design?
I did a quick test at my side, and it can work well at my side.
Please try the code below, and let me know if it does not meet your need:
class Program
{
//the namespace level sas
private const string connectionString = "Endpoint=sb://yyeventhubns.servicebus.windows.net/;SharedAccessKeyName=mysas;SharedAccessKey=xxxx";
//I try to send data to the following 2 eventhub instances.
private const string hub1 = "yyeventhub1";
private const string hub2 = "yyeventhub2";
static async Task Main()
{
SendEvent(connectionString, hub1);
SendEvent(connectionString, hub2);
Console.WriteLine("**completed**");
Console.ReadLine();
}
private static async void SendEvent(string connectionString, string eventHubName)
{
// Create a producer client that you can use to send events to an event hub
await using (var producerClient = new EventHubProducerClient(connectionString, eventHubName))
{
// Create a batch of events
using EventDataBatch eventBatch = await producerClient.CreateBatchAsync();
// Add events to the batch. An event is a represented by a collection of bytes and metadata.
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("First event: "+eventHubName)));
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("Second event: "+eventHubName)));
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("Third event: "+eventHubName)));
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("Fourth event: " + eventHubName)));
eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes("Fifth event: " + eventHubName)));
// Use the producer client to send the batch of events to the event hub
await producerClient.SendAsync(eventBatch);
Console.WriteLine("A batch of 3 events has been published to: "+ eventHubName);
}
}
}
After running the code, I can see the data are sent to both of the 2 eventhub instances. Here is the screenshot:

Proper way of reading messages from Kafka topic and then closing

I am building a simple node.js API using express and kafka-node that returns unread messages from requested Kafka topic and consumer group when HTTP request is received and then closes the connection. I don't need or want the consumer to keep waiting for new messages.
In kafka-node, what is the proper way of checking if the end of the topic has been reached and if yes, close the connection to broker and exit the application in order to prevent new messages being read?
Here's my consumer.js. It's pretty much the same as example given in kafka-node documentation.
"use strict";
const kafka = require("kafka-node");
let topicName = "testTopic-01",
groupName = "testGroup-01",
consumerOptions = {
kafkaHost: "localhost: 9092",
groupId: groupName,
sessionTimeout: 15000,
protocol: ["roundrobin"],
fromOffset: "earliest",
encoding: "utf8"
};
const consumerGroup = new kafka.ConsumerGroup(consumerOptions, topicName);
consumerGroup.on("message", message => {
console.log(`Message: ${message.value}`);
});
consumerGroup.on("error", error => {
console.error(error);
});
console.log(`Consumer started on topic ${topicName} on group ${groupName}`);
You can fetch the current offset of a topic partition by using #Offset. By comparing the so fetched offset of your assigned topic partition, you then know what the last message in the corresponding topic partition is.
Keep in mind, that, if you have multiple consumers in parallel, you should keep track of the topic partition that your consumer inside the consumer group was assigned to (#fetchCommits).

Resources