I have a question about managing multiple CG's, created three consumer groups, every CG has its own kafka service, group Id and topic.
now i'm receiving messages as expected, but, i wondering if its possible to create next scenario:
create three consumer groups, but receive messages only from one,put others on pause/hold for now, if his kafka service will fall, consume messages from next consumer group, and same with the third.
Here's an example for my code:
function createConsumerGroup(topics){
const ConsumerGroup = kafka.ConsumerGroup;
//CREATE CONSUMER GROUPS FOR EVERY SERVICE
for(let i = 0; i < config.kafka_service.length ;i++){ //3
const options = {
groupId: config.kafka_service[i]['groupId'],
host: config.kafka_service[i]['zookeeperHost'],
kafkaHost: config.kafka_service[i]['kafkaHost'],
sessionTimeout: 15000,
protocol: ['roundrobin'],
fromOffset: 'latest'
}
//assign all services CG names and create [i] consumer groups!
let customConsumerGroupName = config.kafka_service[i]['consumerGroupName'];
customConsumerGroupName = new ConsumerGroup(options, topics);
customConsumerGroupName.on('connect', (resp) => {
console.log(`${config.kafka_service[i]['consumerGroupName']} is connected!`);
});
if(i > 0){
//pause consumers exept FIRST
customConsumerGroupName.pause();
}
customConsumerGroupName.on('message', (message) => {
console.log(message);
});
customConsumerGroupName.on('error', (error) => {
console.log('consumer group error: ', error);
//HERE I NEED TO CALL SECOND CONSUMER TO STEP UP
//MAYBE consumerGroup.resume(); ???
});
}
}
hopes its sound understandable, thanks :)
So it looks like the confusion arises because of the name of the Node package's 'ConsumerGroup'. In Kafka terms, the consumer group is controlled solely by the groupId used by each consumer. Consumers with the same groupId will not be given duplicate messages, each topic message is only read by a single consumer. If a consumer goes down, kafka detects this and gives it's partitions to a separate consumer.
The Node 'ConsumerGroup' is really just another Kafka consumer (the new Consumer with groups managed by Kafka rather than zookeeper as of Kafka >0.9).
So the way to leverage a kafka consumer group with the Node ConsumerGroup would be as follows:
function createConsumerGroup(topics){
const ConsumerGroup = kafka.ConsumerGroup;
//CREATE CONSUMER GROUPS FOR EVERY SERVICE
for(let i = 0; i < config.kafka_service.length ;i++){ //3
const options = {
groupId: 'SOME_GROUP_NAME',
host: config.kafka_service[i]['zookeeperHost'],
kafkaHost: config.kafka_service[i]['kafkaHost'],
sessionTimeout: 15000,
protocol: ['roundrobin'],
fromOffset: 'latest'
}
//assign all services CG names and create [i] consumer groups!
let customConsumerGroupName = config.kafka_service[i]['consumerGroupName'];
customConsumerGroupName = new ConsumerGroup(options, topics);
customConsumerGroupName.on('connect', (resp) => {
console.log(`${config.kafka_service[i]['consumerGroupName']} is connected!`);
});
customConsumerGroupName.on('message', (message) => {
console.log(message);
});
customConsumerGroupName.on('error', (error) => {
console.log('consumer group error: ', error);
//Error handling logic here, restart the consumer that failed perhaps?
//Depends on how you want to managed failed consumers.
});
}
}
Each instance of Nodes ConsumerGroup will be a member of the group 'SOME_GROUP_NAME', and any other consumers created with that same groupId will also act as members of the same kafka consumer group, regardless of server, etc.
Consumer groups solve two central scenarios:
1. Scaling
You can increase the number of consumers in a group to handle an increasing rate of messages being produced in the topic(s) the group is consuming (scaling out)
2. Failover
By having a group of consumers reading the same topic(s), they will automatically handle the situation where one or more consumer(s) go down.
So, instead of having "stand-by" consumer groups, where you have to handle which ones are active yourself, you just rely on Kafka's built-in failover. Consumers can run in several different containers (even in different data centers), and Kafka will automatically make sure that messages are delivered to the individual consumers, no matter where they are or how many of them are running at any given time.
Related
I hope someone can clarify this for me:
I have 2 consumers in the same ConsumerGroup, it is my understanding that they should coordinate between them, but I am having the issue that both consumers are getting all the messages. My code is pretty simple:
const connectionString =...";
const eventHubName = "my-hub-dev";
const consumerGroup = "processor";
async function main() {
const consumerClient = new EventHubConsumerClient(consumerGroup, connectionString, eventHubName);
const subscription = consumerClient.subscribe({
processEvents: async (events, context) => {
for (const event of events) {
console.log(`Received event...`, event)
}
},
}
);
If I run two instances of this consumer code and publish an event, both instances will receive the event.
So my questions are:
Am I correct in my understanding that only 1 consumer should receive the message?
Is there anything I am missing here?
The EventHubConsumerClient requires a CheckpointStore that facilitates coordination between multiple clients. You can pass this to the EventHubConsumerClient constructor when you instantiate it.
The #azure/eventhubs-checkpointstore-blob uses Azure Storage Blob to store the metadata and required to coordinate multiple consumers using the same consumer group. It also stores checkpoint data: you can call context.updateCheckpoint with an event and if you stop and start a new receiver, it will continue from the last checkpointed event in the partition that event was associated with.
There's a full sample using the #azure/eventhubs-checkpointstore-blob here: https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/eventhub/eventhubs-checkpointstore-blob/samples/javascript/receiveEventsUsingCheckpointStore.js
Clarification: The Event Hubs service doesn't enforce a single owner for a partition when reading from a consumer group unless the client has specified an ownerLevel. The highest ownerLevel "wins". You can set this in the options bag you pass to subscribe, but if you want the CheckpointStore to handle coordination for you it's best not to set it.
I am building a simple node.js API using express and kafka-node that returns unread messages from requested Kafka topic and consumer group when HTTP request is received and then closes the connection. I don't need or want the consumer to keep waiting for new messages.
In kafka-node, what is the proper way of checking if the end of the topic has been reached and if yes, close the connection to broker and exit the application in order to prevent new messages being read?
Here's my consumer.js. It's pretty much the same as example given in kafka-node documentation.
"use strict";
const kafka = require("kafka-node");
let topicName = "testTopic-01",
groupName = "testGroup-01",
consumerOptions = {
kafkaHost: "localhost: 9092",
groupId: groupName,
sessionTimeout: 15000,
protocol: ["roundrobin"],
fromOffset: "earliest",
encoding: "utf8"
};
const consumerGroup = new kafka.ConsumerGroup(consumerOptions, topicName);
consumerGroup.on("message", message => {
console.log(`Message: ${message.value}`);
});
consumerGroup.on("error", error => {
console.error(error);
});
console.log(`Consumer started on topic ${topicName} on group ${groupName}`);
You can fetch the current offset of a topic partition by using #Offset. By comparing the so fetched offset of your assigned topic partition, you then know what the last message in the corresponding topic partition is.
Keep in mind, that, if you have multiple consumers in parallel, you should keep track of the topic partition that your consumer inside the consumer group was assigned to (#fetchCommits).
I created a Service Bus Queue following the tutorial in Microsoft Documentation. I can send and receive messages, however, only half of my messages make it through. Literally half, only the even ones.
I tried changing the message frequency but it doesn't change anything. It doesn't matter if I send a message every 3 seconds or 3 messages per second, I only get half of them on the other end.
I have run the example code in all the possible languages and I have tried using the REST API and batch messaging but no dice.
I also tried using Azure Functions with the specific trigger for Service Bus Queues.
This is the receiving function code:
module.exports = async function(context, mySbMsg) {
context.log('JavaScript ServiceBus queue trigger function processed message', mySbMsg);
context.done();
};
And this is the send function code:
module.exports = async function (context, req) {
context.log('JavaScript HTTP trigger function processed a request.');
var azure = require('azure-sb');
var idx = 0;
function sendMessages(sbService, queueName) {
var msg = 'Message # ' + (++idx);
sbService.sendQueueMessage(queueName, msg, function (err) {
if (err) {
console.log('Failed Tx: ', err);
} else {
console.log('Sent ' + msg);
}
});
}
var connStr = 'Endpoint=sb://<sbnamespace>.servicebus.windows.net/;SharedAccessKeyName=<keyname>;SharedAccessKey=<key>';
var queueName = 'MessageQueue';
context.log('Connecting to ' + connStr + ' queue ' + queueName);
var sbService = azure.createServiceBusService(connStr);
sbService.createQueueIfNotExists(queueName, function (err) {
if (err) {
console.log('Failed to create queue: ', err);
} else {
setInterval(sendMessages.bind(null, sbService, queueName), 2000);
}
});
};
I expect to receive most of the sent messages (specially in this conditions of no load at all) but instead I only receive 50%.
My guess is that the reason is that you are only listening to one of 2 subscriptions on the topic and it is set up to split the messages between subscriptions. This functionality is used to split workload to multiple services. You can read about topics here: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-messaging-overview
and
https://learn.microsoft.com/en-us/azure/service-bus-messaging/topic-filters
Here sort description from above links:
"Partitioning uses filters to distribute messages across several existing topic subscriptions in a predictable and mutually exclusive manner. The partitioning pattern is used when a system is scaled out to handle many different contexts in functionally identical compartments that each hold a subset of the overall data; for example, customer profile information. With partitioning, a publisher submits the message into a topic without requiring any knowledge of the partitioning model. The message then is moved to the correct subscription from which it can then be retrieved by the partition's message handler."
To check this you can see if your service bus have partitioning turned on or any other filters.Turning partitioning off should do the trick in your case I think.
I have this code that loads when my node back end starts, I just changed this code from using a static list of topics for the Kafka Consumer and it now uses an array of topics it reads from the database. What I do not know is how to initialize the Kafka Consumer to reload the list of topics from the database when a user adds one using a react front end. I can make an API call from front end to back end but once I get to back end I don't know what to call for the Kafka Consumer to re-read the list from the DB and start to consume that list of topics that may have changed by user adding or removing.
This code uses the node-rdkafka package. I did see a consumer.disconnect function but wasn't sure if that was the best approach here or if I can just reload my init function??
async init(success, error, services) {
const client = new pg.Client(config.connectionString);
var topicsList;
await consumer.connect();
await client.connect()
await client.query('SELECT topic FROM public.job;').then(topics => {
topicsList = Array.from(Object.keys(topics.rows), k=>topics.rows[k].topic)
});
await client.end();
consumer.on('ready', function() {
//consumer.subscribe(['some_Data', 'other_Data', 'and_more_Data', 'deez_Data', 'etc_Data', 'hooha_Data', 'mine_Data']);
consumer.subscribe(topicsList);
consumer.consume();
})
consumer.on('data', function(data) {
const { container } = services;
success(data, services, 7) //write data to DB
});
}
I'm iterating over 400,000 json messages that need to get sent from my NodeJS Azure Function, into Azure Service Bus. The Function is able to create the topic, and start publishing messages.
It starts to go through the loop and publish messages. I see a couple thousand land in the queue before the publish fails with the following error:
{
Error: getaddrinfo ENOTFOUND ABC.servicebus.windows.net ABC.servicebus.windows.net:443 at errnoException (dns.js:53:10) at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:95:26)
code: 'ENOTFOUND',
errno: 'ENOTFOUND',
syscall: 'getaddrinfo',
hostname: 'ABC.servicebus.windows.net',
host: 'ABC.servicebus.windows.net',
port: '443'
}
My code to publish the messages composes a message, and pushes it through the JS API. The body of the message is a small JSON object:
{
"adult":false,
"id":511351,
"original_title":"Nossa Carne de Carnaval",
"popularity":0,
"video":false
}
The method in my Azure Function that is pushing this message into the Service Bus is as follows:
function publishNewMovies(context, movies) {
var azure = require('azure');
var moment = require('moment');
var topic = process.env["NewMovieTopic"];
var connectionString = process.env["AZURE_SERVICEBUS_CONNECTION_STRING"];
context.log("Configuring Service Bus.");
return new Promise((resolve, reject) => {
var serviceBusService = azure.createServiceBusService(connectionString);
serviceBusService.createTopicIfNotExists(topic, function(error) {
if (error) {
context.log(`Failed to get the Service Bus topic`);
reject(error);
}
context.log("Service Bus setup.");
// Delay the receiving of these messages by 5 minutes on any subscriber.
var scheduledDate = moment.utc();
scheduledDate.add('5', 'minutes');
context.log("Sending new movie messages.");
var message = {
body: '',
customProperties: {
messageNumber: 0
},
brokerProperties: {
ScheduledEnqueueTimeUtc: scheduledDate.toString()
}
}
for(index = 0; index < movies.length; index += 40) {
message.brokerProperties.ScheduledEnqueueTimeUtc = scheduledDate.add('11', 'seconds').toString();
for(batchIndex = 0; batchIndex < 40; batchIndex++) {
var currentIndex = index + batchIndex;
if (currentIndex >= movies.length) {
break;
}
message.customProperties.messageNumber = currentIndex;
message.body = JSON.stringify(movies[currentIndex]);
serviceBusService.sendTopicMessage(topic, message, function(error) {
if (error) {
context.log(`Failed to send topic message for ${message.body}: ${error}`);
reject(error);
}
})
}
}
});
});
}
This creates a message that is visible starting 5 minutes from the first Service Bus push. Then I batch send 40 messages for that scheduled time. Once the first batch is done, I schedule another 40 messages 11 seconds into the future. This is because there is another Azure Function that will be written that is going to listen to this Service Bus topic and make 3rd party API requests. I'm rate limited at 40 messages every 10 seconds.
I've read through the Azure Functions documentation on quotas and it looks like I might be hitting the 10,000 topic/queue limit. With this limit, how is someone supposed to push out large quantities of messages into the bus? Setting up multiple namespaces to get around this seems backwards when I'm sending the same message, just with different content - it belongs in the same namespace. The error that I'm receiving doesn't indicate I'm hitting a limit or anything. It sounds like it's failing to find the actual service-bus end-point for some reason.
Is the answer to this to handle partitioning? I can't find documentation on how to handle partitioning with NodeJs. Most of their API documentation is in C# which doesn't hasn't translate to NodeJs well for me.
Edit to show Bus metrics
Bus Metrics
Topic Metrics
Can you elaborate why you are actually creating so many topics?
This:
var topic = process.env["NewMovieTopic"];
you can have 1 topic which can get millions of messages which then would be transferred to the individual subscriptions which you would add filter criteria too. So there should be no need for so many topics.
Usually topics, subscriptions and queues would be created in the management plane (portal, arm, PS or cli) runtime or data operations would be functions, cloud apps, VMs, so Service bus likely can easily handle your volume unless you have a very specific reason for creating these many topics?