Kafka-node suddenly consumes from offset 0

Kafka-node suddenly consumes from offset 0 - node.js

Sometimes, kafka-node consumer starts consuming from offset 0, while its default behavior it to consume only newer messages. Then it will not switch back to its default behavior. Do you know how to solve this and what happens and its behavior suddenly changes? The code is very simple and this happens without altering the code.
var kafka = require("kafka-node") ;
Consumer = kafka.Consumer;
client = new kafka.KafkaClient();
consumer = new Consumer(client, [{ topic: "Topic_23", partition: 0}
]);
consumer.on("message", function(message) {
console.log(message)
});
The only solution I have found so far is to change the kafka topic. Then everything works fine again. Any ideas ?

In Kafka, offsets are not associated to specific consumers but instead, they are linked to the Consumer Groups. In your code, you don't provide the Consumer Group therefore, every time you fire up the consumer, it is being assigned to a different Consumer Group and thus, the offset starts from 0.
The following should do the trick (obviously the first time you are going to read all the messages):
var kafka = require("kafka-node") ;
Consumer = kafka.Consumer;
client = new kafka.KafkaClient();
payload = [{
topic: "Topic_23",
partition: 0
}]
var options = {
groupId: 'test-consumer-group',
fromOffset: 'latest'
};
consumer = new Consumer(client, payload, options);
consumer.on("message", function(message) {
console.log(message)
});

Related

Waiting for leadership elections in KafkaJS

The Situation
I am using kafkajs to write to some dynamically generated kafka topics.
I am finding writing to those topics immediately after registering my producer will regularly cause an error: There is no leader for this topic-partition as we are in the middle of a leadership election.
The full error is:
{"level":"ERROR","timestamp":"2020-08-24T17:48:40.201Z","logger":"kafkajs","message":"[Connection] Response Metadata(key: 3, version: 5)","broker":"localhost:9092","clientId":"tv-kitchen","error":"There is no leader for this topic-partition as we are in the middle of a leadership election","correlationId":1,"size":146}
The Code
Here is the code that is causing the problem:
import kafka from 'myConfiguredKafkaJs'
const run = async () => {
const producer = kafka.producer()
await producer.connect()
producer.send({
topic: 'myRandomTopicString',
messages: [{
value: 'yolo',
}],
})
}
run()
The Question
Two questions:
Is there anything special I should be doing when connecting to the producer (or sending) in order to ensure that logic blocks until the producer is truly ready to send data to a kafka topic?
Is there anything special I should be doing when sending data to the producer in order to ensure that messages are not dropped?

The Solution
Kafkajs offers a createTopics method through the admin client which has an optional waitForLeaders flag:
admin.createTopics({
waitForLeaders: true,
topics: [
{ topic: 'myRandomTopicString123' },
],
}
Using this resolves the problem.
import kafka from 'myConfiguredKafkaJs'
const run = async () => {
const producer = kafka.producer()
const admin = kafka.admin()
await admin.connect()
await producer.connect()
await admin.createTopics({
waitForLeaders: true,
topics: [
{ topic: 'myRandomTopicString123' },
],
})
producer.send({
topic: 'myRandomTopicString',
messages: [{
value: 'yolo',
}],
})
}
run()
Unfortunately this will result in a different error if the topic already existed, but that's a separate question and I suspect that error is more informational than breaking.
{"level":"ERROR","timestamp":"2020-08-24T18:19:48.465Z","logger":"kafkajs","message":"[Connection] Response CreateTopics(key: 19, version: 2)","broker":"localhost:9092","clientId":"tv-kitchen","error":"Topic with this name already exists","correlationId":2,"size":86}
EDIT: the above settings do require that your Kafka instance is properly configured. It is possible to have leadership elections never resolve, in which case KafkaJS will still complain about leadership elections!
In my experience this has been due to situations where a kafka broker was stopped without being de-registered from zookeeper, meaning zookeeper is waiting for a response from something that no longer exists.

How to create kafka topic with partitions in nodejs?

I am using kafka-node link api for creating kafka topics. I did not find how to create a kafka topic with partitions.
var kafka = require('kafka-node'),
Producer = kafka.Producer,
client = new kafka.Client(),
producer = new Producer(client);
// Create topics sync
producer.createTopics(['t','t1'], false, function (err, data) {
console.log(data);
});
// Create topics async
producer.createTopics(['t'], true, function (err, data) {});
producer.createTopics(['t'], function (err, data) {});// Simply omit 2nd arg
how to create kafka topic with partitions in nodejs.

From your node.js app execute the shell script $KAFKA_HOME/bin/kafka-topics.sh —create —topic topicname —partitions 8 —replication-factor 1 —zookeeper localhost:2181
Where $KAFKA_HOME is the location where you installed Kafka

As documentation describes, this method works only when auto.create.topics.enable is set to true:
This method is used to create topics on the Kafka server. It only works when auto.create.topics.enable, on the Kafka server, is set to true. Our client simply sends a metadata request to the server which will auto create topics. When async is set to false, this method does not return until all topics are created, otherwise it returns immediately.
This means that any operation on unknown topic will lead to its creation with default number of partitions configured by num.partitions parameter.
I'm not sure, but maybe one of the node-rdkafka implementations could allow you to call corresponding librdkafka method to create topic?

I am not that sure but I think as per your requirement the code has updated here:- https://github.com/SOHU-Co/kafka-node#createtopicstopics-cb, adding a parameter "replicaAssignment".
// Optional explicit partition / replica assignment
// When this property exists, partitions and replicationFactor properties are ignored
replicaAssignment: [
{
partition: 0,
replicas: [3, 4]
},
{
partition: 1,
replicas: [2, 1]
}
]

The Producer.createTopics takes a partitons option. See https://www.npmjs.com/package/kafka-node#createtopicstopics-cb
Pass an object, rather than a string
producer.createTopics(['t', 't1'], true, function (err, data) {});
becomes
producer.createTopics(
[
{ topic: 't', paritions: 5 },
{ topic: 't1', partitions: 23 },
],
true,
function (err, data) {}
);

can I limit consumption of kafka-node consumer?

It seems like my kafka node consumer:
var kafka = require('kafka-node');
var consumer = new Consumer(client, [], {
...
});
is fetching way too many messages than I can handle in certain cases.
Is there a way to limit it (for example accept no more than 1000 messages per second, possibly using the pause api?)
I'm using kafka-node, which seems to have a limited api comparing to the Java version

In Kafka, poll and process should happen in a coordinated/synchronized way. Ie, after each poll, you should process all received data first, before you do the next poll. This pattern will automatically throttle the number of messages to the max throughput your client can handle.
Something like this (pseudo-code):
while(isRunning) {
messages = poll(...)
for(m : messages) {
process(m);
}
}
(That is the reason, why there is not parameter "fetch.max.messages" -- you just do not need it.)

I had a similar situation where I was consuming messages from Kafka and had to throttle the consumption because my consumer service was dependent on a third party API which had its own constraints.
I used async/queue along with a wrapper of async/cargo called asyncTimedCargo for batching purpose.
The cargo gets all the messages from the kafka-consumer and sends it to queue upon reaching a size limit batch_config.batch_size or timeout batch_config.batch_timeout.
async/queue provides saturated and unsaturated callbacks which you can use to stop the consumption if your queue task workers are busy. This would stop the cargo from filling up and your app would not run out of memory. The consumption would resume upon unsaturation.
//cargo-service.js
module.exports = function(key){
return new asyncTimedCargo(function(tasks, callback) {
var length = tasks.length;
var postBody = [];
for(var i=0;i<length;i++){
var message ={};
var task = JSON.parse(tasks[i].value);
message = task;
postBody.push(message);
}
var postJson = {
"json": {"request":postBody}
};
sms_queue.push(postJson);
callback();
}, batch_config.batch_size, batch_config.batch_timeout)
};
//kafka-consumer.js
cargo = cargo-service()
consumer.on('message', function (message) {
if(message && message.value && utils.isValidJsonString(message.value)) {
var msgObject = JSON.parse(message.value);
cargo.push(message);
}
else {
logger.error('Invalid JSON Message');
}
});
// sms-queue.js
var sms_queue = queue(
retryable({
times: queue_config.num_retries,
errorFilter: function (err) {
logger.info("inside retry");
console.log(err);
if (err) {
return true;
}
else {
return false;
}
}
}, function (task, callback) {
// your worker task for queue
callback()
}), queue_config.queue_worker_threads);
sms_queue.saturated = function() {
consumer.pause();
logger.warn('Queue saturated Consumption paused: ' + sms_queue.running());
};
sms_queue.unsaturated = function() {
consumer.resume();
logger.info('Queue unsaturated Consumption resumed: ' + sms_queue.running());
};

From FAQ in the README
Create a async.queue with message processor and concurrency of one (the message processor itself is wrapped with setImmediate function so it will not freeze up the event loop)
Set the queue.drain to resume() the consumer
The handler for consumer's message event to pause() the consumer and pushes the message to the queue.

As far as I know the API does not have any kind of throttling. But both consumers (Consumer and HighLevelConsumer) have a 'pause()' function. So you could stop consuming if you get to much messages. Maybe that already offers what you need.
Please keep in mind what's happening. You send a fetch request to the broker and get a batch of message back. You can configure the min and max size of the messages (according to the documentation not the number of messages) you want to fetch:
{
....
// This is the minimum number of bytes of messages that must be available to give a response, default 1 byte
fetchMinBytes: 1,
// The maximum bytes to include in the message set for this partition. This helps bound the size of the response.
fetchMaxBytes: 1024 * 1024,
}

I was facing the same issue, initially fetchMaxBytes value was
fetchMaxBytes: 1024 * 1024 * 10 // 10MB
I just chanbed it to
fetchMaxBytes: 1024
It worked very smoothly after the change.

node-kafka pause() method on consumer. Any working version?

I can't make following code to work:
"use strict";
let kafka = require('kafka-node');
var conf = require('./providers/Config');
let client = new kafka.Client(conf.kafka.connectionString, conf.kafka.clientName);
let consumer = new kafka.HighLevelConsumer(client, [ { topic: conf.kafka.readTopic } ], { groupId: conf.kafka.clientName, paused: true });
let threads = 0;
consumer.on('message', function(message) {
threads++;
if (threads > 10) consumer.pause();
if (threads > 50) process.exit(1);
console.log(threads + " >>> " + message.value);
});
consumer.resume();
I see 50 messages in console and process exits by termination statement.
What I'm trying to understand, is that is it my code broken or package broken? Or maybe I'm just doing something wrong? Does anyone was able to make kafka consumer work with pause/resume? I tried several versions of kafka-node, but all of them behave same way. Thanks!

You are already using pause and resume in your code, so obviously they work. ;)
It's because pause doesn't pause the consumption of messages. It pauses the fetching of messages. I'm guessing you already fetched the first 50 in one throw before you receive the first message and call pause.
For kicks, I just tested pause() and resume() in the Node REPL and they work as expected:
var kafka = require('kafka-node');
var client = new kafka.Client('localhost:2181');
var consumer = new kafka.HighLevelConsumer(client, [{topic: 'MyTest'}]);
consumer.on('message', (msg) => { console.log(JSON.stringify(msg)) });
Then I go into another window and run:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic MyTest
And type some stuff, and it shows up in the first window. Then in the first window, I type: consumer.pause(); And type some more in the second window. Nothing appears in the first window. Then I run consumer.resume()in the first window, and the delayed messages appear.
BTW, you should be able to play with the Kafka config property fetch.message.max.bytes and control how many messages can be fetched at one time. For example, if you had fixed-width messages of 500 bytes, set fetch.message.max.bytes to something less than 1000 (but greater than 500!) to only receive a single message per fetch. But note that this might not fix the problem entirely -- I am fairly new to Node, but it is asynchronous, and I suspect a second fetch could get kicked off before you processed the first fetch completely (or at all).

RabbitMQ / AMQP: single queue, multiple consumers for same message?

I am just starting to use RabbitMQ and AMQP in general.
I have a queue of messages
I have multiple consumers, which I would like to do different things with the same message.
Most of the RabbitMQ documentation seems to be focused on round-robin, ie where a single message is consumed by a single consumer, with the load being spread between each consumer. This is indeed the behavior I witness.
An example: the producer has a single queue, and send messages every 2 sec:
var amqp = require('amqp');
var connection = amqp.createConnection({ host: "localhost", port: 5672 });
var count = 1;
connection.on('ready', function () {
var sendMessage = function(connection, queue_name, payload) {
var encoded_payload = JSON.stringify(payload);
connection.publish(queue_name, encoded_payload);
}
setInterval( function() {
var test_message = 'TEST '+count
sendMessage(connection, "my_queue_name", test_message)
count += 1;
}, 2000)
})
And here's a consumer:
var amqp = require('amqp');
var connection = amqp.createConnection({ host: "localhost", port: 5672 });
connection.on('ready', function () {
connection.queue("my_queue_name", function(queue){
queue.bind('#');
queue.subscribe(function (message) {
var encoded_payload = unescape(message.data)
var payload = JSON.parse(encoded_payload)
console.log('Recieved a message:')
console.log(payload)
})
})
})
If I start the consumer twice, I can see that each consumer is consuming alternate messages in round-robin behavior. Eg, I'll see messages 1, 3, 5 in one terminal, 2, 4, 6 in the other.
My question is:
Can I have each consumer receive the same messages? Ie, both consumers get message 1, 2, 3, 4, 5, 6? What is this called in AMQP/RabbitMQ speak? How is it normally configured?
Is this commonly done? Should I just have the exchange route the message into two separate queues, with a single consumer, instead?

Can I have each consumer receive the same messages? Ie, both consumers get message 1, 2, 3, 4, 5, 6? What is this called in AMQP/RabbitMQ speak? How is it normally configured?
No, not if the consumers are on the same queue. From RabbitMQ's AMQP Concepts guide:
it is important to understand that, in AMQP 0-9-1, messages are load balanced between consumers.
This seems to imply that round-robin behavior within a queue is a given, and not configurable. Ie, separate queues are required in order to have the same message ID be handled by multiple consumers.
Is this commonly done? Should I just have the exchange route the message into two separate queues, with a single consumer, instead?
No it's not, single queue/multiple consumers with each consumer handling the same message ID isn't possible. Having the exchange route the message onto into two separate queues is indeed better.
As I don't require too complex routing, a fanout exchange will handle this nicely. I didn't focus too much on Exchanges earlier as node-amqp has the concept of a 'default exchange' allowing you to publish messages to a connection directly, however most AMQP messages are published to a specific exchange.
Here's my fanout exchange, both sending and receiving:
var amqp = require('amqp');
var connection = amqp.createConnection({ host: "localhost", port: 5672 });
var count = 1;
connection.on('ready', function () {
connection.exchange("my_exchange", options={type:'fanout'}, function(exchange) {
var sendMessage = function(exchange, payload) {
console.log('about to publish')
var encoded_payload = JSON.stringify(payload);
exchange.publish('', encoded_payload, {})
}
// Recieve messages
connection.queue("my_queue_name", function(queue){
console.log('Created queue')
queue.bind(exchange, '');
queue.subscribe(function (message) {
console.log('subscribed to queue')
var encoded_payload = unescape(message.data)
var payload = JSON.parse(encoded_payload)
console.log('Recieved a message:')
console.log(payload)
})
})
setInterval( function() {
var test_message = 'TEST '+count
sendMessage(exchange, test_message)
count += 1;
}, 2000)
})
})

The last couple of answers are almost correct - I have tons of apps that generate messages that need to end up with different consumers so the process is very simple.
If you want multiple consumers to the same message, do the following procedure.
Create multiple queues, one for each app that is to receive the message, in each queue properties, "bind" a routing tag with the amq.direct exchange. Change you publishing app to send to amq.direct and use the routing-tag (not a queue). AMQP will then copy the message into each queue with the same binding. Works like a charm :)
Example: Lets say I have a JSON string I generate, I publish it to the "amq.direct" exchange using the routing tag "new-sales-order", I have a queue for my order_printer app that prints order, I have a queue for my billing system that will send a copy of the order and invoice the client and I have a web archive system where I archive orders for historic/compliance reasons and I have a client web interface where orders are tracked as other info comes in about an order.
So my queues are: order_printer, order_billing, order_archive and order_tracking
All have the binding tag "new-sales-order" bound to them, all 4 will get the JSON data.
This is an ideal way to send data without the publishing app knowing or caring about the receiving apps.

Just read the rabbitmq tutorial. You publish message to exchange, not to queue; it is then routed to appropriate queues. In your case, you should bind separate queue for each consumer. That way, they can consume messages completely independently.

Yes each consumer can receive the same messages. have a look at
http://www.rabbitmq.com/tutorials/tutorial-three-python.html
http://www.rabbitmq.com/tutorials/tutorial-four-python.html
http://www.rabbitmq.com/tutorials/tutorial-five-python.html
for different ways to route messages. I know they are for python and java but its good to understand the principles, decide what you are doing and then find how to do it in JS. Its sounds like you want to do a simple fanout (tutorial 3), which sends the messages to all queues connected to the exchange.
The difference with what you are doing and what you want to do is basically that you are going to set up and exchange or type fanout. Fanout excahnges send all messages to all connected queues. Each queue will have a consumer that will have access to all the messages separately.
Yes this is commonly done, it is one of the features of AMPQ.

The send pattern is a one-to-one relationship. If you want to "send" to more than one receiver you should be using the pub/sub pattern. See http://www.rabbitmq.com/tutorials/tutorial-three-python.html for more details.

RabbitMQ / AMQP: single queue, multiple consumers for same message and page refresh.
rabbit.on('ready', function () { });
sockjs_chat.on('connection', function (conn) {
conn.on('data', function (message) {
try {
var obj = JSON.parse(message.replace(/\r/g, '').replace(/\n/g, ''));
if (obj.header == "register") {
// Connect to RabbitMQ
try {
conn.exchange = rabbit.exchange(exchange, { type: 'topic',
autoDelete: false,
durable: false,
exclusive: false,
confirm: true
});
conn.q = rabbit.queue('my-queue-'+obj.agentID, {
durable: false,
autoDelete: false,
exclusive: false
}, function () {
conn.channel = 'my-queue-'+obj.agentID;
conn.q.bind(conn.exchange, conn.channel);
conn.q.subscribe(function (message) {
console.log("[MSG] ---> " + JSON.stringify(message));
conn.write(JSON.stringify(message) + "\n");
}).addCallback(function(ok) {
ctag[conn.channel] = ok.consumerTag; });
});
} catch (err) {
console.log("Could not create connection to RabbitMQ. \nStack trace -->" + err.stack);
}
} else if (obj.header == "typing") {
var reply = {
type: 'chatMsg',
msg: utils.escp(obj.msga),
visitorNick: obj.channel,
customField1: '',
time: utils.getDateTime(),
channel: obj.channel
};
conn.exchange.publish('my-queue-'+obj.agentID, reply);
}
} catch (err) {
console.log("ERROR ----> " + err.stack);
}
});
// When the visitor closes or reloads a page we need to unbind from RabbitMQ?
conn.on('close', function () {
try {
// Close the socket
conn.close();
// Close RabbitMQ
conn.q.unsubscribe(ctag[conn.channel]);
} catch (er) {
console.log(":::::::: EXCEPTION SOCKJS (ON-CLOSE) ::::::::>>>>>>> " + er.stack);
}
});
});

As I assess your case is:
I have a queue of messages (your source for receiving messages, lets name it q111)
I have multiple consumers, which I would like to do different things with the same message.
Your problem here is while 3 messages are received by this queue, message 1 is consumed by a consumer A, other consumers B and C consumes message 2 and 3. Where as you are in need of a setup where rabbitmq passes on the same copies of all these three messages(1,2,3) to all three connected consumers (A,B,C) simultaneously.
While many configurations can be made to achieve this, a simple way is to use the following two step concept:
Use a dynamic rabbitmq-shovel to pickup messages from the desired queue(q111) and publish to a fanout exchange (exchange exclusively created and dedicated for this purpose).
Now re-configure your consumers A,B & C (who were listening to queue(q111)) to listen from this Fanout exchange directly using a exclusive & anonymous queue for each consumer.
Note: While using this concept don't consume directly from the source queue(q111), as messages already consumed wont be shovelled to your Fanout exchange.
If you think this does not satisfies your exact requirement... feel free to post your suggestions :-)

I think you should check sending your messages using the fan-out exchanger. That way you willl receiving the same message for differents consumers, under the table RabbitMQ is creating differents queues for each one of this new consumers/subscribers.
This is the link for see the tutorial example in javascript
https://www.rabbitmq.com/tutorials/tutorial-one-javascript.html

To get the behavior you want, simply have each consumer consume from its own queue. You'll have to use a non-direct exchange type (topic, header, fanout) in order to get the message to all of the queues at once.

If you happen to be using the amqplib library as I am, they have a handy example of an implementation of the Publish/Subscribe RabbitMQ tutorial which you might find handy.

There is one interesting option in this scenario I haven`t found in answers here.
You can Nack messages with "requeue" feature in one consumer to process them in another.
Generally speaking it is not a right way, but maybe it will be good enough for someone.
https://www.rabbitmq.com/nack.html
And beware of loops (when all concumers nack+requeue message)!

Fan out was clearly what you wanted. fanout
read rabbitMQ tutorial:
https://www.rabbitmq.com/tutorials/tutorial-three-javascript.html
here's my example:
Publisher.js:
amqp.connect('amqp://<user>:<pass>#<host>:<port>', async (error0, connection) => {
if (error0) {
throw error0;
}
console.log('RabbitMQ connected')
try {
// Create exchange for queues
channel = await connection.createChannel()
await channel.assertExchange(process.env.EXCHANGE_NAME, 'fanout', { durable: false });
await channel.publish(process.env.EXCHANGE_NAME, '', Buffer.from('msg'))
} catch(error) {
console.error(error)
}
})
Subscriber.js:
amqp.connect('amqp://<user>:<pass>#<host>:<port>', async (error0, connection) => {
if (error0) {
throw error0;
}
console.log('RabbitMQ connected')
try {
// Create/Bind a consumer queue for an exchange broker
channel = await connection.createChannel()
await channel.assertExchange(process.env.EXCHANGE_NAME, 'fanout', { durable: false });
const queue = await channel.assertQueue('', {exclusive: true})
channel.bindQueue(queue.queue, process.env.EXCHANGE_NAME, '')
console.log(" [*] Waiting for messages in %s. To exit press CTRL+C");
channel.consume('', consumeMessage, {noAck: true});
} catch(error) {
console.error(error)
}
});
here is an example i found in the internet. maybe can also help.
https://www.codota.com/code/javascript/functions/amqplib/Channel/assertExchange

You just need to assign different groups to the consumers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Kafka-node suddenly consumes from offset 0 - node.js

Related

Waiting for leadership elections in KafkaJS

How to create kafka topic with partitions in nodejs?

can I limit consumption of kafka-node consumer?

node-kafka pause() method on consumer. Any working version?

RabbitMQ / AMQP: single queue, multiple consumers for same message?

Categories

Resources