Kafka to Elasticsearch consumption with node.js

Kafka to Elasticsearch consumption with node.js - node.js

I know there are quite a few node.js modules that implement a Kafka consumer that gets msgs and writes to elastic. But I only need some of the fields from each msg and not all of them. Is there an existing solution I don't know about?

The question is asking for an example from node.js. The kafka-node module provides a very nice mechanism for getting a Consumer, which you can combine with the elasticsearch-js module:
// configure Elasticsearch client
var elasticsearch = require('elasticsearch');
var esClient = new elasticsearch.Client({
// ... connection details ...
});
// configure Kafka Consumer
var kafka = require('kafka-node');
var Consumer = kafka.Consumer;
var client = new kafka.Client();
var consumer = new Consumer(
client,
[
// ... topics / partitions ...
],
{ autoCommit: false }
);
consumer.on('message', function(message) {
if (message.some_special_field === "drop") {
return; // skip it
}
// drop fields (you can use delete message['field1'] syntax if you need
// to parse a more dynamic structure)
delete message.field1;
delete message.field2;
delete message.field3;
esClient.index({
index: 'index-name',
type: 'type-name',
id: message.id_field, // ID will be auto generated if none/unset
body: message
}, function(err, res) {
if (err) {
throw err;
}
});
});
consumer.on('error', function(err) {
console.log(err);
});
NOTE: Using the Index API is not a good practice when you have tons of messages being sent through because it requires that Elasticsearch create a thread per operation, which is obviously wasteful and it will eventually lead to rejected requests if the thread pool is exhausted as a result. In any bulk ingestion situation, a better solution is to look into using something like Elasticsearch Streams (or Elasticsearch Bulk Index Stream that builds on top of it), which builds on top of the official elasticsearch-js client. However, I've never used those client extensions so I don't really know how well they do or do not work, but usage would simply replace the part where I am showing the indexing happening.
I'm not convinced that the node.js approach is actually better than the Logstash one below in terms of maintenance and complexity, so I've left both here for reference.
The better approach is probably to consume Kafka from Logstash, then ship it off to Elasticsearch.
You should be able to use Logstash to do this in a straight forward way using the Kafka input and Elasticsearch output.
Each document in the Logstash pipeline is called an "event". The Kafka input assumes that it will receive JSON coming in (configurable by its codec), which will populate a single event with all of the fields from that message.
You can then drop those fields that you have no interest in handling, or conditionally the entire event.
input {
# Receive from Kafka
kafka {
# ...
}
}
filter {
if [some_special_field] == "drop" {
drop { } # skip the entire event
}
# drop specific fields
mutate {
remove_field => [
"field1", "field2", ...
]
}
}
output {
# send to Elasticsearch
elasticsearch {
# ...
}
}
Naturally, you'll need to configure the Kafka input (from the first link) and the Elasticsearch output (and the second link).

The previous answer is not scaleable for production.
You will have to use ElasticSearch bulk API. You can use this NPM package https://www.npmjs.com/package/elasticsearch-kafka-connect It allows you to send data from Kafka to ES (duplex connection ES to kafka is still in development as per May 2019)

Related

Nodejs Cluster Architecture reading from single REDIS instance

I'm using Nodejs cluster module to have multiple workers running.
I created a basic Architecture where there will be a single MASTER process which is basically an express server handling multiple requests and the main task of MASTER would be writing incoming data from requests into a REDIS instance. Other workers(numOfCPUs - 1) will be non-master i.e. they won't be handling any request as they are just the consumers. I have two features namely ABC and DEF. I distributed the non-master workers evenly across features via assigning them type.
For eg: on a 8-core machine:
1 will be MASTER instance handling request via express server
Remaining (8 - 1 = 7) will be distributed evenly. 4 to feature:ABD and 3 to fetaure:DEF.
non-master workers are basically consumers i.e. they read from REDIS in which only MASTER worker can write data.
Here's the code for the same:
if (cluster.isMaster) {
// Fork workers.
for (let i = 0; i < numCPUs - 1; i++) {
ClusteringUtil.forkNewClusterWithAutoTypeBalancing();
}
cluster.on('exit', function(worker) {
console.log(`Worker ${worker.process.pid}::type(${worker.type}) died`);
ClusteringUtil.removeWorkerFromList(worker.type);
ClusteringUtil.forkNewClusterWithAutoTypeBalancing();
});
// Start consuming on server-start
ABCConsumer.start();
DEFConsumer.start();
console.log(`Master running with process-id: ${process.pid}`);
} else {
console.log('CLUSTER type', cluster.worker.process.env.type, 'running on', process.pid);
if (
cluster.worker.process.env &&
cluster.worker.process.env.type &&
cluster.worker.process.env.type === ServerTypeEnum.EXPRESS
) {
// worker for handling requests
app.use(express.json());
...
}
{
Everything works fine except consumers reading from REDIS.
Since there are multiple consumers of a particular feature, each one reads the same message and start processing individually, which is what I don't want. If there are 4 consumers, 1 is marked as busy and can not consumer until free, 3 are available. Once the message for that particular feature is written in REDIS by MASTER, the problem is all 3 available consumers of that feature start consuming. This means that the for a single message, the job is done based on number of available consumers.
const stringifedData = JSON.stringify(req.body);
const key = uuidv1();
const asyncHsetRes = await asyncHset(type, key, stringifedData);
if (asyncHsetRes) {
await asyncRpush(FeatureKeyEnum.REDIS.ABC_MESSAGE_QUEUE, key);
res.send({ status: 'success', message: 'Added to processing queue' });
} else {
res.send({ error: 'failure', message: 'Something went wrong in adding to queue' });
}
Consumer simply accepts messages and stop when it is busy
module.exports.startHeartbeat = startHeartbeat = async function(config = {}) {
if (!config || !config.type || !config.listKey) {
return;
}
heartbeatIntervalObj[config.type] = setInterval(async () => {
await asyncLindex(config.listKey, -1).then(async res => {
if (res) {
await getFreeWorkerAndDoJob(res, config);
stopHeartbeat(config);
}
});
}, HEARTBEAT_INTERVAL);
};
Ideally, a message should be read by only one consumer of that particular feature. After consuming, it is marked as busy so it won't consume further until free(I have handled this). Next message could only be processed by only one consumer out of other available consumers.
Please help me in tacking this problem. Again, I want one message to be read by only one free consumer and rest free consumers should wait for new message.
Thanks

I'm not sure I fully get your Redis consumers architecture, but I feel like it contradicts with the use case of Redis itself. What you're trying to achieve is essentially a queue based messaging with an ability to commit a message once its done.
Redis has its own pub/sub feature, but it is built on fire and forget principle. It doesn't distinguish between consumers - it just sends the data to all of them, assuming that its their logic to handle the incoming data.
I recommend to you use Queue Servers like RabbitMQ. You can achieve your goal with some features that AMQP 0-9-1 supports: message acknowledgment, consumer's prefetch count and so on. You can set up your cluster with very agile configs like ok, I want to have X consumers, and each can handle 1 unique (!) message at a time and they will receive new ones only after they let the server (rabbitmq) know that they successfully finished message processing. This is highly configurable and robust.
However, if you want to go serverless with some fully managed service so that you don't provision like virtual machines or anything else to run a message queue server of your choice, you can use AWS SQS. It has pretty much similar API and features list.
Hope it helps!

Best way to query all documents from a mongodb collection in a reactive way w/out flooding RAM

I want to query all the documents in a collection in a reactive way. The collection.find() method of the mongodb nodejs driver returns a cursor that fires events for each document found in the collection. So I made this:
function giant_query = (db) => {
var req = db.collection('mycollection').find({});
return Rx.Observable.merge(Rx.Observable.fromEvent(req, 'data'),
Rx.Observable.fromEvent(req, 'end'),
Rx.Observable.fromEvent(req, 'close'),
Rx.Observable.fromEvent(req, 'readable'));
}
It will do what I want: fire for each document, so I can treat then in a reactive way, like this:
Rx.Observable.of('').flatMap(giant_query).do(some_function).subscribe()
I could query the documents in packets of tens, but then I'd have to keep track of an index number for each time the observable stream is fired, and I'd have to make an observable loop which I do not know if it's possible or the right way to do it.
The problem with this cursor is that I don't think it does things in packets. It'll probably fire all the events in a short period of time, therefore flooding my RAM. Even if I buffer some events in packets using Observable's buffer, the events and events data (the documents) are going to be waiting on RAM to be manipulated.
What's the best way to deal with it n a reactive way?

I'm not an expert on mongodb, but based on the examples I've seen, this is a pattern I would try.
I've omitted the events other than data, since throttling that one seems to be the main concern.
var cursor = db.collection('mycollection').find({});
const cursorNext = new Rx.BehaviourSubject('next'); // signal first batch then wait
const nextBatch = () => {
if(cursor.hasNext()) {
cursorNext.next('next');
}
});
cursorNext
.switchMap(() => // wait for cursorNext to signal
Rx.Observable.fromPromise(cursor.next()) // get a single doc
.repeat() // get another
.takeWhile(() => cursor.hasNext() ) // stop taking if out of data
.take(batchSize) // until full batch
.toArray() // combine into a single emit
)
.map(docsBatch => {
// do something with the batch
// return docsBatch or modified doscBatch
})
... // other operators?
.subscribe(x => {
...
nextBatch();
});
I'm trying to put together a test of this Rx flow without mongodb, in the meantime this might give you some ideas.

You also might wanna check my solution without using of rxJS:
Mongoose Cursor: http bulk request from collection

High performance on Nodejs RabbitMQ server

I'm building an analysis system with a million users online in the same time. I use RabbitMQ such as message broker to reduce capacity for server
Here is my diagram
My system include 3 components.
Publisher server : ( Producer )
This system was built on nodejs. The purpose of this system to publish the messages into queue
RabbitMQ queue : This system stored the messages that publisher server sent to. After that, one connect is opened to send message from queue for subscriber server.
Subscriber server ( Consumer ) : This system receive the messages from queue
Publisher server source code
var amqp = require('amqplib/callback_api');
amqp.connect("amqp://localhost", function(error, connect) {
if (error) {
return callback(-1, null);
} else {
connect.createChannel(function(error, channel) {
if (error) {
return callback(-3, null);
} else {
var q = 'logs';
var msg = data; // object
// convert msg object to buffer
var new_msg = Buffer.from(JSON.stringify(msg), 'binary');
channel.assertExchange(q, 'fanout', { durable: false });
channel.publish(q, 'message_queues', new Buffer(new_msg));
console.log(" [x] Sent %s", new_msg);
return callback(null, msg);
}
});
}
});
create exclusively exchange "message_queues" with "fanout" to send
broadcast to all consumer
Subscriber server source code
var amqp = require('amqplib/callback_api');
amqp.connect("amqp://localhost", function(error, connect) {
if (error) {
console.log('111');
} else {
connect.createChannel(function(error, channel) {
if (error) {
console.log('1');
} else {
var ex = 'logs';
channel.assertExchange(ex, 'fanout', { durable: false });
channel.assertQueue('message_queues', { exclusive: true }, function(err, q) {
if (err) {
console.log('123');
} else {
console.log(" [*] Waiting for messages in %s. To exit press CTRL+C", q.queue);
channel.bindQueue(q.queue, ex, 'message_queues');
channel.consume(q.queue, function(msg) {
console.log(" [x] %s", msg.content.toString());
}, { noAck: true });
}
});
}
});
}
});
receive messge from "message_queues" exchange
When I implement send a message. The system work well, however I tried benchmark test performance of this system (with ~ 1000 users sent request per second ) then the system has some issue. The system seem as overload / buffer overflow ( or some thing don't work well ).
I just only read about rabbitmq 2 days ago. I know its tutorials is basic example, so I need help to build systems in real world than .. Any
solution & suggestion
Hope that my question make a sense

Your question is general. Probably you should provide more details to help to identify the bottleneck and help you out.
So, first of all I think you should check the rabbit mq - whether its a bottleneck or not.
There are many things that can go wrong:
The number of consumers that can consume the message is too low (I assume you use a pool of consumers)
The network is too slow
The queues and messages are replicated between too many nodes of Rabbit MQ and go do disk (its possible to use rabbit mq like this)
The consumer can't really handle a message and it gets constantly re-queued
So, in general during your tests you should check rabbit mq and see what happens there.
The message once arrives into queue is in Ready State once this happens, it will be there till one of consumers connected to queue won't attempt to take the the message for handling
When one of consumers (rabbit does round-robin between them) picks the message for processing it's state will turn to Unacknowledged
if consumer fails to handle the message, it will be re-queued by rabbit so that another consumer would have a chance to handle the message.
Of course, if consumer handles the message successfully, the message disappears from rabbit mq server.
Assuming you've installed rabbit mq web ui (I highly recommend it especially for beginners) - you can visually see what happens in your queue - you'll see how many messages are in ready state, and how many are unacknowledged.
This will help to identify a bottleneck.
For example - if you see that only one message is usually in unacknowledged state, this can mean that the consumer can't handle the message and sends it back to rabbit. On the other hand new messages always arrive from producer, so the number of ready messages will increase very fast
It also can point on the fact that you use only one consumer that can handle only one message at a time. So you can consider paralleling here, by running many consumers in different threads or even clustering your application (in rabbit consumers can reside in different machines)
Hope this helps in general, of course, as I've said before if you have more specific questions - please provide more information about what exactly happens during the test

How do you implement AWS Elasticache auto discovery for node.js

I'm a node noob and trying to understand how one would implement auto discovery in a node.js application. I'm going to use the cluster module and want each worker process to be kept up to date (and persistently connected to) the elasticache nodes.
Since there is no concept of shared memory (like PHP APC) would you have to have code that runs in each worker, that wakes up every X seconds and somehow updates the list of IP's and re-connects the memcache client?
How do people solve this today? Example code would be much appreciated.

Note that at this time, Auto Discovery is only available for cache clusters running the memcached engine.
For Cache Engine Version 1.4.14 or Higher you need to create a TCP/IP socket to the Cache Cluster Configuration Endpoint (or any Cache Node Endpoint) and send this command:
config get cluster
With Node.js you can use the net.Socket class to to that.
The reply consists of two lines:
The version number of the configuration information. Each time a node is added or removed from the cache cluster, the version number increases by one.
A list of cache nodes. Each node in the list is represented by a hostname|ip-address|port group, and each node is delimited by a space.
A carriage return and a linefeed character (CR + LF) appears at the end of each line.
Here you can find a more thorough description of how to add Auto Discovery to your client library.
Using the cluster module you need to store the same information in each process (i.e. child) and I would use "setInterval" per child to periodically check (e.g. every 60 seconds) the list of nodes and re-connect only if the list has changed (this should not happen very often).
You can optionally update the list on the master only and use "worker.send" to update the workers. This could keep all the processes running in a single server more in sync, but it would not help in a multi server architecture, so it is very important to use consistent hashing in order to be able to change the list of nodes and loose the "minimum" amount of keys stored in the memcached cluster.
I would use a global variable to store this kind of configuration.
Thinking twice you can use the AWS SDK for Node.js to get the list of ElastiCache Nodes (and that works for the Redis engine as well).
In that case the code would be something like:
var util = require('util'),
AWS = require('aws-sdk'),
Memcached = require('memcached');
global.AWS_REGION = 'eu-west-1'; // Just as a sample I'm using the EU West region
global.CACHE_CLUSTER_ID = 'test';
global.CACHE_ENDPOINTS = [];
global.MEMCACHED = null;
function init() {
AWS.config.update({
region: global.AWS_REGION
});
elasticache = new AWS.ElastiCache();
function getElastiCacheEndpoints() {
function sameEndpoints(list1, list2) {
if (list1.length != list2.length)
return false;
return list1.every(
function(e) {
return list2.indexOf(e) > -1;
});
}
function logElastiCacheEndpoints() {
global.CACHE_ENDPOINTS.forEach(
function(e) {
util.log('Memcached Endpoint: ' + e);
});
}
elasticache.describeCacheClusters({
CacheClusterId: global.CACHE_CLUSTER_ID,
ShowCacheNodeInfo: true
},
function(err, data) {
if (!err) {
util.log('Describe Cache Cluster Id:' + global.CACHE_CLUSTER_ID);
if (data.CacheClusters[0].CacheClusterStatus == 'available') {
var endpoints = [];
data.CacheClusters[0].CacheNodes.forEach(
function(n) {
var e = n.Endpoint.Address + ':' + n.Endpoint.Port;
endpoints.push(e);
});
if (!sameEndpoints(endpoints, global.CACHE_ENDPOINTS)) {
util.log('Memached Endpoints changed');
global.CACHE_ENDPOINTS = endpoints;
if (global.MEMCACHED)
global.MEMCACHED.end();
global.MEMCACHED = new Memcached(global.CACHE_ENDPOINTS);
process.nextTick(logElastiCacheEndpoints);
setInterval(getElastiCacheEndpoints, 60000); // From now on, update every 60 seconds
}
} else {
setTimeout(getElastiCacheEndpoints, 10000); // Try again after 10 seconds until 'available'
}
} else {
util.log('Error describing Cache Cluster:' + err);
}
});
}
getElastiCacheEndpoints();
}
init();

RabbitMQ / AMQP: single queue, multiple consumers for same message?

I am just starting to use RabbitMQ and AMQP in general.
I have a queue of messages
I have multiple consumers, which I would like to do different things with the same message.
Most of the RabbitMQ documentation seems to be focused on round-robin, ie where a single message is consumed by a single consumer, with the load being spread between each consumer. This is indeed the behavior I witness.
An example: the producer has a single queue, and send messages every 2 sec:
var amqp = require('amqp');
var connection = amqp.createConnection({ host: "localhost", port: 5672 });
var count = 1;
connection.on('ready', function () {
var sendMessage = function(connection, queue_name, payload) {
var encoded_payload = JSON.stringify(payload);
connection.publish(queue_name, encoded_payload);
}
setInterval( function() {
var test_message = 'TEST '+count
sendMessage(connection, "my_queue_name", test_message)
count += 1;
}, 2000)
})
And here's a consumer:
var amqp = require('amqp');
var connection = amqp.createConnection({ host: "localhost", port: 5672 });
connection.on('ready', function () {
connection.queue("my_queue_name", function(queue){
queue.bind('#');
queue.subscribe(function (message) {
var encoded_payload = unescape(message.data)
var payload = JSON.parse(encoded_payload)
console.log('Recieved a message:')
console.log(payload)
})
})
})
If I start the consumer twice, I can see that each consumer is consuming alternate messages in round-robin behavior. Eg, I'll see messages 1, 3, 5 in one terminal, 2, 4, 6 in the other.
My question is:
Can I have each consumer receive the same messages? Ie, both consumers get message 1, 2, 3, 4, 5, 6? What is this called in AMQP/RabbitMQ speak? How is it normally configured?
Is this commonly done? Should I just have the exchange route the message into two separate queues, with a single consumer, instead?

Can I have each consumer receive the same messages? Ie, both consumers get message 1, 2, 3, 4, 5, 6? What is this called in AMQP/RabbitMQ speak? How is it normally configured?
No, not if the consumers are on the same queue. From RabbitMQ's AMQP Concepts guide:
it is important to understand that, in AMQP 0-9-1, messages are load balanced between consumers.
This seems to imply that round-robin behavior within a queue is a given, and not configurable. Ie, separate queues are required in order to have the same message ID be handled by multiple consumers.
Is this commonly done? Should I just have the exchange route the message into two separate queues, with a single consumer, instead?
No it's not, single queue/multiple consumers with each consumer handling the same message ID isn't possible. Having the exchange route the message onto into two separate queues is indeed better.
As I don't require too complex routing, a fanout exchange will handle this nicely. I didn't focus too much on Exchanges earlier as node-amqp has the concept of a 'default exchange' allowing you to publish messages to a connection directly, however most AMQP messages are published to a specific exchange.
Here's my fanout exchange, both sending and receiving:
var amqp = require('amqp');
var connection = amqp.createConnection({ host: "localhost", port: 5672 });
var count = 1;
connection.on('ready', function () {
connection.exchange("my_exchange", options={type:'fanout'}, function(exchange) {
var sendMessage = function(exchange, payload) {
console.log('about to publish')
var encoded_payload = JSON.stringify(payload);
exchange.publish('', encoded_payload, {})
}
// Recieve messages
connection.queue("my_queue_name", function(queue){
console.log('Created queue')
queue.bind(exchange, '');
queue.subscribe(function (message) {
console.log('subscribed to queue')
var encoded_payload = unescape(message.data)
var payload = JSON.parse(encoded_payload)
console.log('Recieved a message:')
console.log(payload)
})
})
setInterval( function() {
var test_message = 'TEST '+count
sendMessage(exchange, test_message)
count += 1;
}, 2000)
})
})

The last couple of answers are almost correct - I have tons of apps that generate messages that need to end up with different consumers so the process is very simple.
If you want multiple consumers to the same message, do the following procedure.
Create multiple queues, one for each app that is to receive the message, in each queue properties, "bind" a routing tag with the amq.direct exchange. Change you publishing app to send to amq.direct and use the routing-tag (not a queue). AMQP will then copy the message into each queue with the same binding. Works like a charm :)
Example: Lets say I have a JSON string I generate, I publish it to the "amq.direct" exchange using the routing tag "new-sales-order", I have a queue for my order_printer app that prints order, I have a queue for my billing system that will send a copy of the order and invoice the client and I have a web archive system where I archive orders for historic/compliance reasons and I have a client web interface where orders are tracked as other info comes in about an order.
So my queues are: order_printer, order_billing, order_archive and order_tracking
All have the binding tag "new-sales-order" bound to them, all 4 will get the JSON data.
This is an ideal way to send data without the publishing app knowing or caring about the receiving apps.

Just read the rabbitmq tutorial. You publish message to exchange, not to queue; it is then routed to appropriate queues. In your case, you should bind separate queue for each consumer. That way, they can consume messages completely independently.

Yes each consumer can receive the same messages. have a look at
http://www.rabbitmq.com/tutorials/tutorial-three-python.html
http://www.rabbitmq.com/tutorials/tutorial-four-python.html
http://www.rabbitmq.com/tutorials/tutorial-five-python.html
for different ways to route messages. I know they are for python and java but its good to understand the principles, decide what you are doing and then find how to do it in JS. Its sounds like you want to do a simple fanout (tutorial 3), which sends the messages to all queues connected to the exchange.
The difference with what you are doing and what you want to do is basically that you are going to set up and exchange or type fanout. Fanout excahnges send all messages to all connected queues. Each queue will have a consumer that will have access to all the messages separately.
Yes this is commonly done, it is one of the features of AMPQ.

The send pattern is a one-to-one relationship. If you want to "send" to more than one receiver you should be using the pub/sub pattern. See http://www.rabbitmq.com/tutorials/tutorial-three-python.html for more details.

RabbitMQ / AMQP: single queue, multiple consumers for same message and page refresh.
rabbit.on('ready', function () { });
sockjs_chat.on('connection', function (conn) {
conn.on('data', function (message) {
try {
var obj = JSON.parse(message.replace(/\r/g, '').replace(/\n/g, ''));
if (obj.header == "register") {
// Connect to RabbitMQ
try {
conn.exchange = rabbit.exchange(exchange, { type: 'topic',
autoDelete: false,
durable: false,
exclusive: false,
confirm: true
});
conn.q = rabbit.queue('my-queue-'+obj.agentID, {
durable: false,
autoDelete: false,
exclusive: false
}, function () {
conn.channel = 'my-queue-'+obj.agentID;
conn.q.bind(conn.exchange, conn.channel);
conn.q.subscribe(function (message) {
console.log("[MSG] ---> " + JSON.stringify(message));
conn.write(JSON.stringify(message) + "\n");
}).addCallback(function(ok) {
ctag[conn.channel] = ok.consumerTag; });
});
} catch (err) {
console.log("Could not create connection to RabbitMQ. \nStack trace -->" + err.stack);
}
} else if (obj.header == "typing") {
var reply = {
type: 'chatMsg',
msg: utils.escp(obj.msga),
visitorNick: obj.channel,
customField1: '',
time: utils.getDateTime(),
channel: obj.channel
};
conn.exchange.publish('my-queue-'+obj.agentID, reply);
}
} catch (err) {
console.log("ERROR ----> " + err.stack);
}
});
// When the visitor closes or reloads a page we need to unbind from RabbitMQ?
conn.on('close', function () {
try {
// Close the socket
conn.close();
// Close RabbitMQ
conn.q.unsubscribe(ctag[conn.channel]);
} catch (er) {
console.log(":::::::: EXCEPTION SOCKJS (ON-CLOSE) ::::::::>>>>>>> " + er.stack);
}
});
});

As I assess your case is:
I have a queue of messages (your source for receiving messages, lets name it q111)
I have multiple consumers, which I would like to do different things with the same message.
Your problem here is while 3 messages are received by this queue, message 1 is consumed by a consumer A, other consumers B and C consumes message 2 and 3. Where as you are in need of a setup where rabbitmq passes on the same copies of all these three messages(1,2,3) to all three connected consumers (A,B,C) simultaneously.
While many configurations can be made to achieve this, a simple way is to use the following two step concept:
Use a dynamic rabbitmq-shovel to pickup messages from the desired queue(q111) and publish to a fanout exchange (exchange exclusively created and dedicated for this purpose).
Now re-configure your consumers A,B & C (who were listening to queue(q111)) to listen from this Fanout exchange directly using a exclusive & anonymous queue for each consumer.
Note: While using this concept don't consume directly from the source queue(q111), as messages already consumed wont be shovelled to your Fanout exchange.
If you think this does not satisfies your exact requirement... feel free to post your suggestions :-)

I think you should check sending your messages using the fan-out exchanger. That way you willl receiving the same message for differents consumers, under the table RabbitMQ is creating differents queues for each one of this new consumers/subscribers.
This is the link for see the tutorial example in javascript
https://www.rabbitmq.com/tutorials/tutorial-one-javascript.html

To get the behavior you want, simply have each consumer consume from its own queue. You'll have to use a non-direct exchange type (topic, header, fanout) in order to get the message to all of the queues at once.

If you happen to be using the amqplib library as I am, they have a handy example of an implementation of the Publish/Subscribe RabbitMQ tutorial which you might find handy.

There is one interesting option in this scenario I haven`t found in answers here.
You can Nack messages with "requeue" feature in one consumer to process them in another.
Generally speaking it is not a right way, but maybe it will be good enough for someone.
https://www.rabbitmq.com/nack.html
And beware of loops (when all concumers nack+requeue message)!

Fan out was clearly what you wanted. fanout
read rabbitMQ tutorial:
https://www.rabbitmq.com/tutorials/tutorial-three-javascript.html
here's my example:
Publisher.js:
amqp.connect('amqp://<user>:<pass>#<host>:<port>', async (error0, connection) => {
if (error0) {
throw error0;
}
console.log('RabbitMQ connected')
try {
// Create exchange for queues
channel = await connection.createChannel()
await channel.assertExchange(process.env.EXCHANGE_NAME, 'fanout', { durable: false });
await channel.publish(process.env.EXCHANGE_NAME, '', Buffer.from('msg'))
} catch(error) {
console.error(error)
}
})
Subscriber.js:
amqp.connect('amqp://<user>:<pass>#<host>:<port>', async (error0, connection) => {
if (error0) {
throw error0;
}
console.log('RabbitMQ connected')
try {
// Create/Bind a consumer queue for an exchange broker
channel = await connection.createChannel()
await channel.assertExchange(process.env.EXCHANGE_NAME, 'fanout', { durable: false });
const queue = await channel.assertQueue('', {exclusive: true})
channel.bindQueue(queue.queue, process.env.EXCHANGE_NAME, '')
console.log(" [*] Waiting for messages in %s. To exit press CTRL+C");
channel.consume('', consumeMessage, {noAck: true});
} catch(error) {
console.error(error)
}
});
here is an example i found in the internet. maybe can also help.
https://www.codota.com/code/javascript/functions/amqplib/Channel/assertExchange

You just need to assign different groups to the consumers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Kafka to Elasticsearch consumption with node.js - node.js

I know there are quite a few node.js modules that implement a Kafka consumer that gets msgs and writes to elastic. But I only need some of the fields from each msg and not all of them. Is there an existing solution I don't know about?

The previous answer is not scaleable for production. You will have to use ElasticSearch bulk API. You can use this NPM package https://www.npmjs.com/package/elasticsearch-kafka-connect It allows you to send data from Kafka to ES (duplex connection ES to kafka is still in development as per May 2019)

Related

Nodejs Cluster Architecture reading from single REDIS instance

Best way to query all documents from a mongodb collection in a reactive way w/out flooding RAM

High performance on Nodejs RabbitMQ server

How do you implement AWS Elasticache auto discovery for node.js

RabbitMQ / AMQP: single queue, multiple consumers for same message?

Categories

Resources