How to create kafka topic with partitions in nodejs? - node.js

I am using kafka-node link api for creating kafka topics. I did not find how to create a kafka topic with partitions.
var kafka = require('kafka-node'),
Producer = kafka.Producer,
client = new kafka.Client(),
producer = new Producer(client);
// Create topics sync
producer.createTopics(['t','t1'], false, function (err, data) {
console.log(data);
});
// Create topics async
producer.createTopics(['t'], true, function (err, data) {});
producer.createTopics(['t'], function (err, data) {});// Simply omit 2nd arg
how to create kafka topic with partitions in nodejs.

From your node.js app execute the shell script $KAFKA_HOME/bin/kafka-topics.sh —create —topic topicname —partitions 8 —replication-factor 1 —zookeeper localhost:2181
Where $KAFKA_HOME is the location where you installed Kafka

As documentation describes, this method works only when auto.create.topics.enable is set to true:
This method is used to create topics on the Kafka server. It only works when auto.create.topics.enable, on the Kafka server, is set to true. Our client simply sends a metadata request to the server which will auto create topics. When async is set to false, this method does not return until all topics are created, otherwise it returns immediately.
This means that any operation on unknown topic will lead to its creation with default number of partitions configured by num.partitions parameter.
I'm not sure, but maybe one of the node-rdkafka implementations could allow you to call corresponding librdkafka method to create topic?

I am not that sure but I think as per your requirement the code has updated here:- https://github.com/SOHU-Co/kafka-node#createtopicstopics-cb, adding a parameter "replicaAssignment".
// Optional explicit partition / replica assignment
// When this property exists, partitions and replicationFactor properties are ignored
replicaAssignment: [
{
partition: 0,
replicas: [3, 4]
},
{
partition: 1,
replicas: [2, 1]
}
]

The Producer.createTopics takes a partitons option. See https://www.npmjs.com/package/kafka-node#createtopicstopics-cb
Pass an object, rather than a string
producer.createTopics(['t', 't1'], true, function (err, data) {});
becomes
producer.createTopics(
[
{ topic: 't', paritions: 5 },
{ topic: 't1', partitions: 23 },
],
true,
function (err, data) {}
);

Related

Waiting for leadership elections in KafkaJS

The Situation
I am using kafkajs to write to some dynamically generated kafka topics.
I am finding writing to those topics immediately after registering my producer will regularly cause an error: There is no leader for this topic-partition as we are in the middle of a leadership election.
The full error is:
{"level":"ERROR","timestamp":"2020-08-24T17:48:40.201Z","logger":"kafkajs","message":"[Connection] Response Metadata(key: 3, version: 5)","broker":"localhost:9092","clientId":"tv-kitchen","error":"There is no leader for this topic-partition as we are in the middle of a leadership election","correlationId":1,"size":146}
The Code
Here is the code that is causing the problem:
import kafka from 'myConfiguredKafkaJs'
const run = async () => {
const producer = kafka.producer()
await producer.connect()
producer.send({
topic: 'myRandomTopicString',
messages: [{
value: 'yolo',
}],
})
}
run()
The Question
Two questions:
Is there anything special I should be doing when connecting to the producer (or sending) in order to ensure that logic blocks until the producer is truly ready to send data to a kafka topic?
Is there anything special I should be doing when sending data to the producer in order to ensure that messages are not dropped?
The Solution
Kafkajs offers a createTopics method through the admin client which has an optional waitForLeaders flag:
admin.createTopics({
waitForLeaders: true,
topics: [
{ topic: 'myRandomTopicString123' },
],
}
Using this resolves the problem.
import kafka from 'myConfiguredKafkaJs'
const run = async () => {
const producer = kafka.producer()
const admin = kafka.admin()
await admin.connect()
await producer.connect()
await admin.createTopics({
waitForLeaders: true,
topics: [
{ topic: 'myRandomTopicString123' },
],
})
producer.send({
topic: 'myRandomTopicString',
messages: [{
value: 'yolo',
}],
})
}
run()
Unfortunately this will result in a different error if the topic already existed, but that's a separate question and I suspect that error is more informational than breaking.
{"level":"ERROR","timestamp":"2020-08-24T18:19:48.465Z","logger":"kafkajs","message":"[Connection] Response CreateTopics(key: 19, version: 2)","broker":"localhost:9092","clientId":"tv-kitchen","error":"Topic with this name already exists","correlationId":2,"size":86}
EDIT: the above settings do require that your Kafka instance is properly configured. It is possible to have leadership elections never resolve, in which case KafkaJS will still complain about leadership elections!
In my experience this has been due to situations where a kafka broker was stopped without being de-registered from zookeeper, meaning zookeeper is waiting for a response from something that no longer exists.

Kafka-node suddenly consumes from offset 0

Sometimes, kafka-node consumer starts consuming from offset 0, while its default behavior it to consume only newer messages. Then it will not switch back to its default behavior. Do you know how to solve this and what happens and its behavior suddenly changes? The code is very simple and this happens without altering the code.
var kafka = require("kafka-node") ;
Consumer = kafka.Consumer;
client = new kafka.KafkaClient();
consumer = new Consumer(client, [{ topic: "Topic_23", partition: 0}
]);
consumer.on("message", function(message) {
console.log(message)
});
The only solution I have found so far is to change the kafka topic. Then everything works fine again. Any ideas ?
In Kafka, offsets are not associated to specific consumers but instead, they are linked to the Consumer Groups. In your code, you don't provide the Consumer Group therefore, every time you fire up the consumer, it is being assigned to a different Consumer Group and thus, the offset starts from 0.
The following should do the trick (obviously the first time you are going to read all the messages):
var kafka = require("kafka-node") ;
Consumer = kafka.Consumer;
client = new kafka.KafkaClient();
payload = [{
topic: "Topic_23",
partition: 0
}]
var options = {
groupId: 'test-consumer-group',
fromOffset: 'latest'
};
consumer = new Consumer(client, payload, options);
consumer.on("message", function(message) {
console.log(message)
});

Kafka to Elasticsearch consumption with node.js

I know there are quite a few node.js modules that implement a Kafka consumer that gets msgs and writes to elastic. But I only need some of the fields from each msg and not all of them. Is there an existing solution I don't know about?
The question is asking for an example from node.js. The kafka-node module provides a very nice mechanism for getting a Consumer, which you can combine with the elasticsearch-js module:
// configure Elasticsearch client
var elasticsearch = require('elasticsearch');
var esClient = new elasticsearch.Client({
// ... connection details ...
});
// configure Kafka Consumer
var kafka = require('kafka-node');
var Consumer = kafka.Consumer;
var client = new kafka.Client();
var consumer = new Consumer(
client,
[
// ... topics / partitions ...
],
{ autoCommit: false }
);
consumer.on('message', function(message) {
if (message.some_special_field === "drop") {
return; // skip it
}
// drop fields (you can use delete message['field1'] syntax if you need
// to parse a more dynamic structure)
delete message.field1;
delete message.field2;
delete message.field3;
esClient.index({
index: 'index-name',
type: 'type-name',
id: message.id_field, // ID will be auto generated if none/unset
body: message
}, function(err, res) {
if (err) {
throw err;
}
});
});
consumer.on('error', function(err) {
console.log(err);
});
NOTE: Using the Index API is not a good practice when you have tons of messages being sent through because it requires that Elasticsearch create a thread per operation, which is obviously wasteful and it will eventually lead to rejected requests if the thread pool is exhausted as a result. In any bulk ingestion situation, a better solution is to look into using something like Elasticsearch Streams (or Elasticsearch Bulk Index Stream that builds on top of it), which builds on top of the official elasticsearch-js client. However, I've never used those client extensions so I don't really know how well they do or do not work, but usage would simply replace the part where I am showing the indexing happening.
I'm not convinced that the node.js approach is actually better than the Logstash one below in terms of maintenance and complexity, so I've left both here for reference.
The better approach is probably to consume Kafka from Logstash, then ship it off to Elasticsearch.
You should be able to use Logstash to do this in a straight forward way using the Kafka input and Elasticsearch output.
Each document in the Logstash pipeline is called an "event". The Kafka input assumes that it will receive JSON coming in (configurable by its codec), which will populate a single event with all of the fields from that message.
You can then drop those fields that you have no interest in handling, or conditionally the entire event.
input {
# Receive from Kafka
kafka {
# ...
}
}
filter {
if [some_special_field] == "drop" {
drop { } # skip the entire event
}
# drop specific fields
mutate {
remove_field => [
"field1", "field2", ...
]
}
}
output {
# send to Elasticsearch
elasticsearch {
# ...
}
}
Naturally, you'll need to configure the Kafka input (from the first link) and the Elasticsearch output (and the second link).
The previous answer is not scaleable for production.
You will have to use ElasticSearch bulk API. You can use this NPM package https://www.npmjs.com/package/elasticsearch-kafka-connect It allows you to send data from Kafka to ES (duplex connection ES to kafka is still in development as per May 2019)

kafka.common.OffsetOutOfRangeException - for no valid reason

I am using kafka-node library to consume the kafka messages from node.js. This is the simple code I tried to consume:
consumer = new Consumer(
client,
[
{ topic: 't', partition: 0 }, { topic: 't1', partition: 1 }
],
{
autoCommit: false
}
);
consumer.on('message', function (message) {
console.log(message);
});
This was working perfectly sometime before, but now I keep on getting this error:
kafka.common.OffsetOutOfRangeException: Request for offset 19 but we only have log segments in the range 0 to 0.
I don't understand what makes Kafka to throw this error without any reason for the same code which was working before, I haven't given offset as 19 no where. I tried deleting the topics, logs etc. I keep on getting this error. Can anyone please guide me how to handle this?

RabbitMQ / AMQP: single queue, multiple consumers for same message?

I am just starting to use RabbitMQ and AMQP in general.
I have a queue of messages
I have multiple consumers, which I would like to do different things with the same message.
Most of the RabbitMQ documentation seems to be focused on round-robin, ie where a single message is consumed by a single consumer, with the load being spread between each consumer. This is indeed the behavior I witness.
An example: the producer has a single queue, and send messages every 2 sec:
var amqp = require('amqp');
var connection = amqp.createConnection({ host: "localhost", port: 5672 });
var count = 1;
connection.on('ready', function () {
var sendMessage = function(connection, queue_name, payload) {
var encoded_payload = JSON.stringify(payload);
connection.publish(queue_name, encoded_payload);
}
setInterval( function() {
var test_message = 'TEST '+count
sendMessage(connection, "my_queue_name", test_message)
count += 1;
}, 2000)
})
And here's a consumer:
var amqp = require('amqp');
var connection = amqp.createConnection({ host: "localhost", port: 5672 });
connection.on('ready', function () {
connection.queue("my_queue_name", function(queue){
queue.bind('#');
queue.subscribe(function (message) {
var encoded_payload = unescape(message.data)
var payload = JSON.parse(encoded_payload)
console.log('Recieved a message:')
console.log(payload)
})
})
})
If I start the consumer twice, I can see that each consumer is consuming alternate messages in round-robin behavior. Eg, I'll see messages 1, 3, 5 in one terminal, 2, 4, 6 in the other.
My question is:
Can I have each consumer receive the same messages? Ie, both consumers get message 1, 2, 3, 4, 5, 6? What is this called in AMQP/RabbitMQ speak? How is it normally configured?
Is this commonly done? Should I just have the exchange route the message into two separate queues, with a single consumer, instead?
Can I have each consumer receive the same messages? Ie, both consumers get message 1, 2, 3, 4, 5, 6? What is this called in AMQP/RabbitMQ speak? How is it normally configured?
No, not if the consumers are on the same queue. From RabbitMQ's AMQP Concepts guide:
it is important to understand that, in AMQP 0-9-1, messages are load balanced between consumers.
This seems to imply that round-robin behavior within a queue is a given, and not configurable. Ie, separate queues are required in order to have the same message ID be handled by multiple consumers.
Is this commonly done? Should I just have the exchange route the message into two separate queues, with a single consumer, instead?
No it's not, single queue/multiple consumers with each consumer handling the same message ID isn't possible. Having the exchange route the message onto into two separate queues is indeed better.
As I don't require too complex routing, a fanout exchange will handle this nicely. I didn't focus too much on Exchanges earlier as node-amqp has the concept of a 'default exchange' allowing you to publish messages to a connection directly, however most AMQP messages are published to a specific exchange.
Here's my fanout exchange, both sending and receiving:
var amqp = require('amqp');
var connection = amqp.createConnection({ host: "localhost", port: 5672 });
var count = 1;
connection.on('ready', function () {
connection.exchange("my_exchange", options={type:'fanout'}, function(exchange) {
var sendMessage = function(exchange, payload) {
console.log('about to publish')
var encoded_payload = JSON.stringify(payload);
exchange.publish('', encoded_payload, {})
}
// Recieve messages
connection.queue("my_queue_name", function(queue){
console.log('Created queue')
queue.bind(exchange, '');
queue.subscribe(function (message) {
console.log('subscribed to queue')
var encoded_payload = unescape(message.data)
var payload = JSON.parse(encoded_payload)
console.log('Recieved a message:')
console.log(payload)
})
})
setInterval( function() {
var test_message = 'TEST '+count
sendMessage(exchange, test_message)
count += 1;
}, 2000)
})
})
The last couple of answers are almost correct - I have tons of apps that generate messages that need to end up with different consumers so the process is very simple.
If you want multiple consumers to the same message, do the following procedure.
Create multiple queues, one for each app that is to receive the message, in each queue properties, "bind" a routing tag with the amq.direct exchange. Change you publishing app to send to amq.direct and use the routing-tag (not a queue). AMQP will then copy the message into each queue with the same binding. Works like a charm :)
Example: Lets say I have a JSON string I generate, I publish it to the "amq.direct" exchange using the routing tag "new-sales-order", I have a queue for my order_printer app that prints order, I have a queue for my billing system that will send a copy of the order and invoice the client and I have a web archive system where I archive orders for historic/compliance reasons and I have a client web interface where orders are tracked as other info comes in about an order.
So my queues are: order_printer, order_billing, order_archive and order_tracking
All have the binding tag "new-sales-order" bound to them, all 4 will get the JSON data.
This is an ideal way to send data without the publishing app knowing or caring about the receiving apps.
Just read the rabbitmq tutorial. You publish message to exchange, not to queue; it is then routed to appropriate queues. In your case, you should bind separate queue for each consumer. That way, they can consume messages completely independently.
Yes each consumer can receive the same messages. have a look at
http://www.rabbitmq.com/tutorials/tutorial-three-python.html
http://www.rabbitmq.com/tutorials/tutorial-four-python.html
http://www.rabbitmq.com/tutorials/tutorial-five-python.html
for different ways to route messages. I know they are for python and java but its good to understand the principles, decide what you are doing and then find how to do it in JS. Its sounds like you want to do a simple fanout (tutorial 3), which sends the messages to all queues connected to the exchange.
The difference with what you are doing and what you want to do is basically that you are going to set up and exchange or type fanout. Fanout excahnges send all messages to all connected queues. Each queue will have a consumer that will have access to all the messages separately.
Yes this is commonly done, it is one of the features of AMPQ.
The send pattern is a one-to-one relationship. If you want to "send" to more than one receiver you should be using the pub/sub pattern. See http://www.rabbitmq.com/tutorials/tutorial-three-python.html for more details.
RabbitMQ / AMQP: single queue, multiple consumers for same message and page refresh.
rabbit.on('ready', function () { });
sockjs_chat.on('connection', function (conn) {
conn.on('data', function (message) {
try {
var obj = JSON.parse(message.replace(/\r/g, '').replace(/\n/g, ''));
if (obj.header == "register") {
// Connect to RabbitMQ
try {
conn.exchange = rabbit.exchange(exchange, { type: 'topic',
autoDelete: false,
durable: false,
exclusive: false,
confirm: true
});
conn.q = rabbit.queue('my-queue-'+obj.agentID, {
durable: false,
autoDelete: false,
exclusive: false
}, function () {
conn.channel = 'my-queue-'+obj.agentID;
conn.q.bind(conn.exchange, conn.channel);
conn.q.subscribe(function (message) {
console.log("[MSG] ---> " + JSON.stringify(message));
conn.write(JSON.stringify(message) + "\n");
}).addCallback(function(ok) {
ctag[conn.channel] = ok.consumerTag; });
});
} catch (err) {
console.log("Could not create connection to RabbitMQ. \nStack trace -->" + err.stack);
}
} else if (obj.header == "typing") {
var reply = {
type: 'chatMsg',
msg: utils.escp(obj.msga),
visitorNick: obj.channel,
customField1: '',
time: utils.getDateTime(),
channel: obj.channel
};
conn.exchange.publish('my-queue-'+obj.agentID, reply);
}
} catch (err) {
console.log("ERROR ----> " + err.stack);
}
});
// When the visitor closes or reloads a page we need to unbind from RabbitMQ?
conn.on('close', function () {
try {
// Close the socket
conn.close();
// Close RabbitMQ
conn.q.unsubscribe(ctag[conn.channel]);
} catch (er) {
console.log(":::::::: EXCEPTION SOCKJS (ON-CLOSE) ::::::::>>>>>>> " + er.stack);
}
});
});
As I assess your case is:
I have a queue of messages (your source for receiving messages, lets name it q111)
I have multiple consumers, which I would like to do different things with the same message.
Your problem here is while 3 messages are received by this queue, message 1 is consumed by a consumer A, other consumers B and C consumes message 2 and 3. Where as you are in need of a setup where rabbitmq passes on the same copies of all these three messages(1,2,3) to all three connected consumers (A,B,C) simultaneously.
While many configurations can be made to achieve this, a simple way is to use the following two step concept:
Use a dynamic rabbitmq-shovel to pickup messages from the desired queue(q111) and publish to a fanout exchange (exchange exclusively created and dedicated for this purpose).
Now re-configure your consumers A,B & C (who were listening to queue(q111)) to listen from this Fanout exchange directly using a exclusive & anonymous queue for each consumer.
Note: While using this concept don't consume directly from the source queue(q111), as messages already consumed wont be shovelled to your Fanout exchange.
If you think this does not satisfies your exact requirement... feel free to post your suggestions :-)
I think you should check sending your messages using the fan-out exchanger. That way you willl receiving the same message for differents consumers, under the table RabbitMQ is creating differents queues for each one of this new consumers/subscribers.
This is the link for see the tutorial example in javascript
https://www.rabbitmq.com/tutorials/tutorial-one-javascript.html
To get the behavior you want, simply have each consumer consume from its own queue. You'll have to use a non-direct exchange type (topic, header, fanout) in order to get the message to all of the queues at once.
If you happen to be using the amqplib library as I am, they have a handy example of an implementation of the Publish/Subscribe RabbitMQ tutorial which you might find handy.
There is one interesting option in this scenario I haven`t found in answers here.
You can Nack messages with "requeue" feature in one consumer to process them in another.
Generally speaking it is not a right way, but maybe it will be good enough for someone.
https://www.rabbitmq.com/nack.html
And beware of loops (when all concumers nack+requeue message)!
Fan out was clearly what you wanted. fanout
read rabbitMQ tutorial:
https://www.rabbitmq.com/tutorials/tutorial-three-javascript.html
here's my example:
Publisher.js:
amqp.connect('amqp://<user>:<pass>#<host>:<port>', async (error0, connection) => {
if (error0) {
throw error0;
}
console.log('RabbitMQ connected')
try {
// Create exchange for queues
channel = await connection.createChannel()
await channel.assertExchange(process.env.EXCHANGE_NAME, 'fanout', { durable: false });
await channel.publish(process.env.EXCHANGE_NAME, '', Buffer.from('msg'))
} catch(error) {
console.error(error)
}
})
Subscriber.js:
amqp.connect('amqp://<user>:<pass>#<host>:<port>', async (error0, connection) => {
if (error0) {
throw error0;
}
console.log('RabbitMQ connected')
try {
// Create/Bind a consumer queue for an exchange broker
channel = await connection.createChannel()
await channel.assertExchange(process.env.EXCHANGE_NAME, 'fanout', { durable: false });
const queue = await channel.assertQueue('', {exclusive: true})
channel.bindQueue(queue.queue, process.env.EXCHANGE_NAME, '')
console.log(" [*] Waiting for messages in %s. To exit press CTRL+C");
channel.consume('', consumeMessage, {noAck: true});
} catch(error) {
console.error(error)
}
});
here is an example i found in the internet. maybe can also help.
https://www.codota.com/code/javascript/functions/amqplib/Channel/assertExchange
You just need to assign different groups to the consumers.

Resources