I am building a simple node.js API using express and kafka-node that returns unread messages from requested Kafka topic and consumer group when HTTP request is received and then closes the connection. I don't need or want the consumer to keep waiting for new messages.
In kafka-node, what is the proper way of checking if the end of the topic has been reached and if yes, close the connection to broker and exit the application in order to prevent new messages being read?
Here's my consumer.js. It's pretty much the same as example given in kafka-node documentation.
"use strict";
const kafka = require("kafka-node");
let topicName = "testTopic-01",
groupName = "testGroup-01",
consumerOptions = {
kafkaHost: "localhost: 9092",
groupId: groupName,
sessionTimeout: 15000,
protocol: ["roundrobin"],
fromOffset: "earliest",
encoding: "utf8"
};
const consumerGroup = new kafka.ConsumerGroup(consumerOptions, topicName);
consumerGroup.on("message", message => {
console.log(`Message: ${message.value}`);
});
consumerGroup.on("error", error => {
console.error(error);
});
console.log(`Consumer started on topic ${topicName} on group ${groupName}`);
You can fetch the current offset of a topic partition by using #Offset. By comparing the so fetched offset of your assigned topic partition, you then know what the last message in the corresponding topic partition is.
Keep in mind, that, if you have multiple consumers in parallel, you should keep track of the topic partition that your consumer inside the consumer group was assigned to (#fetchCommits).
Related
I have an event hub trigger that I've configured to listen to an event hub for message. On the sending side javascript script I'm initializing a client as follows
const client = new EventHubProducerClient(hubConnectionString, hubName);
and initializing a batch as follows:
const batchOptions = {
partitionKey: sessionId,
};
const batch = await client.createBatch(batchOptions)
and submitting the batch using the same options:
await client.sendBatch(batch, batchOptions);
In my host.json I've conifgured the eventHub maxBatchSize as follows:
"eventHub": {
"maxBatchSize": 1
},
I'm finding that I can't get the messages in order in the EventHubTrigger implemented in Python. While processing each of the List[func.EventHubEvent] and while logging the partition key as follows:
def main(events: List[func.EventHubEvent]):
for event in events:
logging.info(f"PartitionId: ${event.partition_key}")
I always seem to get PartitionId: $None which seems to indicate that the partition is not being set and potentially why it is out of order.
It looks like I wasn't passing the value down to the method so once I fixed that it appeared to be working.
I am able to publish and subscribe to a topic but when the unsubscribe function is not working it?
var mqtt = require('mqtt');
var client = mqtt.connect('mqtt://localhost:1883');
var topic = 'home/machine1/lightSensor';
client .on('connect', ()=>{
client.unsubscribe(topic,console.log);
});
it returning null
From the mqttjs docs:
mqtt.Client#unsubscribe(topic/topic array, [options], [callback])
Unsubscribe from a topic or topics
topic is a String topic or an array of topics to unsubscribe from
options: options of unsubscribe.
properties: object
userProperties: The User Property is allowed to appear multiple times to represent multiple name, value pairs object
callback - function (err), fired on unsuback. An error occurs if client is disconnecting.
Returning null is the correct response if the unsubscribe succeeds.
What is the correct way to add a correlation-id to azure events ?
Right now, I send the events as follows:
const { EventHubProducerClient } = require('#azure/event-hubs');
const producer = new EventHubProducerClient(connectionString, eventHubName);
const batch = await producer.createBatch();
batch.tryAdd({
body: {
foo: "bar"
}
});
await producer.sendBatch(batch);
Of course as a workaround I could just add my own field to the body. However, I suspect that there is a built-in mechanism or default approach to do this.
The latest release exposes a correlationId property on EventData, which corresponds to the correlation-id field of the message properties section of the underlying AMQP message.
One important call-out is that the correlationId is intended to enable tracing of data within an application, such as an event's path from producer to consumer. It has no meaning to the Event Hubs service or within a distributed tracing/AppInsights/OpenTelemetry context.
I hope someone can clarify this for me:
I have 2 consumers in the same ConsumerGroup, it is my understanding that they should coordinate between them, but I am having the issue that both consumers are getting all the messages. My code is pretty simple:
const connectionString =...";
const eventHubName = "my-hub-dev";
const consumerGroup = "processor";
async function main() {
const consumerClient = new EventHubConsumerClient(consumerGroup, connectionString, eventHubName);
const subscription = consumerClient.subscribe({
processEvents: async (events, context) => {
for (const event of events) {
console.log(`Received event...`, event)
}
},
}
);
If I run two instances of this consumer code and publish an event, both instances will receive the event.
So my questions are:
Am I correct in my understanding that only 1 consumer should receive the message?
Is there anything I am missing here?
The EventHubConsumerClient requires a CheckpointStore that facilitates coordination between multiple clients. You can pass this to the EventHubConsumerClient constructor when you instantiate it.
The #azure/eventhubs-checkpointstore-blob uses Azure Storage Blob to store the metadata and required to coordinate multiple consumers using the same consumer group. It also stores checkpoint data: you can call context.updateCheckpoint with an event and if you stop and start a new receiver, it will continue from the last checkpointed event in the partition that event was associated with.
There's a full sample using the #azure/eventhubs-checkpointstore-blob here: https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/eventhub/eventhubs-checkpointstore-blob/samples/javascript/receiveEventsUsingCheckpointStore.js
Clarification: The Event Hubs service doesn't enforce a single owner for a partition when reading from a consumer group unless the client has specified an ownerLevel. The highest ownerLevel "wins". You can set this in the options bag you pass to subscribe, but if you want the CheckpointStore to handle coordination for you it's best not to set it.
I have a question about managing multiple CG's, created three consumer groups, every CG has its own kafka service, group Id and topic.
now i'm receiving messages as expected, but, i wondering if its possible to create next scenario:
create three consumer groups, but receive messages only from one,put others on pause/hold for now, if his kafka service will fall, consume messages from next consumer group, and same with the third.
Here's an example for my code:
function createConsumerGroup(topics){
const ConsumerGroup = kafka.ConsumerGroup;
//CREATE CONSUMER GROUPS FOR EVERY SERVICE
for(let i = 0; i < config.kafka_service.length ;i++){ //3
const options = {
groupId: config.kafka_service[i]['groupId'],
host: config.kafka_service[i]['zookeeperHost'],
kafkaHost: config.kafka_service[i]['kafkaHost'],
sessionTimeout: 15000,
protocol: ['roundrobin'],
fromOffset: 'latest'
}
//assign all services CG names and create [i] consumer groups!
let customConsumerGroupName = config.kafka_service[i]['consumerGroupName'];
customConsumerGroupName = new ConsumerGroup(options, topics);
customConsumerGroupName.on('connect', (resp) => {
console.log(`${config.kafka_service[i]['consumerGroupName']} is connected!`);
});
if(i > 0){
//pause consumers exept FIRST
customConsumerGroupName.pause();
}
customConsumerGroupName.on('message', (message) => {
console.log(message);
});
customConsumerGroupName.on('error', (error) => {
console.log('consumer group error: ', error);
//HERE I NEED TO CALL SECOND CONSUMER TO STEP UP
//MAYBE consumerGroup.resume(); ???
});
}
}
hopes its sound understandable, thanks :)
So it looks like the confusion arises because of the name of the Node package's 'ConsumerGroup'. In Kafka terms, the consumer group is controlled solely by the groupId used by each consumer. Consumers with the same groupId will not be given duplicate messages, each topic message is only read by a single consumer. If a consumer goes down, kafka detects this and gives it's partitions to a separate consumer.
The Node 'ConsumerGroup' is really just another Kafka consumer (the new Consumer with groups managed by Kafka rather than zookeeper as of Kafka >0.9).
So the way to leverage a kafka consumer group with the Node ConsumerGroup would be as follows:
function createConsumerGroup(topics){
const ConsumerGroup = kafka.ConsumerGroup;
//CREATE CONSUMER GROUPS FOR EVERY SERVICE
for(let i = 0; i < config.kafka_service.length ;i++){ //3
const options = {
groupId: 'SOME_GROUP_NAME',
host: config.kafka_service[i]['zookeeperHost'],
kafkaHost: config.kafka_service[i]['kafkaHost'],
sessionTimeout: 15000,
protocol: ['roundrobin'],
fromOffset: 'latest'
}
//assign all services CG names and create [i] consumer groups!
let customConsumerGroupName = config.kafka_service[i]['consumerGroupName'];
customConsumerGroupName = new ConsumerGroup(options, topics);
customConsumerGroupName.on('connect', (resp) => {
console.log(`${config.kafka_service[i]['consumerGroupName']} is connected!`);
});
customConsumerGroupName.on('message', (message) => {
console.log(message);
});
customConsumerGroupName.on('error', (error) => {
console.log('consumer group error: ', error);
//Error handling logic here, restart the consumer that failed perhaps?
//Depends on how you want to managed failed consumers.
});
}
}
Each instance of Nodes ConsumerGroup will be a member of the group 'SOME_GROUP_NAME', and any other consumers created with that same groupId will also act as members of the same kafka consumer group, regardless of server, etc.
Consumer groups solve two central scenarios:
1. Scaling
You can increase the number of consumers in a group to handle an increasing rate of messages being produced in the topic(s) the group is consuming (scaling out)
2. Failover
By having a group of consumers reading the same topic(s), they will automatically handle the situation where one or more consumer(s) go down.
So, instead of having "stand-by" consumer groups, where you have to handle which ones are active yourself, you just rely on Kafka's built-in failover. Consumers can run in several different containers (even in different data centers), and Kafka will automatically make sure that messages are delivered to the individual consumers, no matter where they are or how many of them are running at any given time.