AMQPlib nodejs consumer task concurrency - node.js

I'm building a background task management system with rabbitmq and nodejs using the amqlib module.
Some of the tasks are really CPU-consuming, so if I'm launching a lot of them and I have only a few workers up, my server can get killed (using too much CPU).
I'm wondering if there is a way to create an amqp queue so that my consumers will only consume one task of this queue at a time (i.e. Before an ack or a reject, do not send a task of this kind to this consumer).
Or should I handle this myself in the code (maybe keeping a reference in my worker that I'm handling a task of this queue and rejecting all tasks of this queue while I'm executing the task ?).
Here is my sample code :
I'm creating the amqp connection like that
const amqpConn = require('amqplib').connect('amqp://localhost');
My queue name is tasks :
amqpConn.then((conn) => {
return conn.createChannel();
}).then((ch) => {
return ch.assertQueue('tasks').then((ok) => {
ch.sendToQueue(q, new Buffer(`something to do ${i}`));
});
}).catch(console.warn);
And here is my consumer (I guess this is where I should do the work to limit only one concurrent task of this queue) :
amqpConn.then((conn) => {
return conn.createChannel();
}).then((ch) => {
return ch.assertQueue('tasks').then((ok) => {
return ch.consume('tasks', (msg) => {
if (msg !== null) {
console.log(msg.content.toString());
ch.ack(msg);
}
});
});
}).catch(console.warn);
Thanks a lot !

I'm wondering if there is a way to create an amqp queue so that my consumers will only consume one task of this queue at a time
If this is what you really need then yes, simply have exactly one consumer and declare the queue exclusive. In that way one tasks is consumed at the time.

I think I got it going by :
creating a Channel per queue
using the prefetch_count of the channel to limit the concurrency on a per-consumer basis
https://www.rabbitmq.com/consumer-prefetch.html

Related

worker thread won't respond after first message?

I'm making a server script and, to make it easier for both hosts and clients to do what they want, I made a customizable server script that runs using nw.js(with a visual interface). Said script was made using web workers since nw.js was having problems with support to worker threads.
Now that NW.js fixed their problems with worker threads, I've been trying to move all the things that were inside the web workers to worker threads, but there's a problem: When the main thread receives the answer from the second thread, the later stops responding to any subsequent message.
For example, running the following code with either NW.js or Node.js itself will return "pong" only once
const { Worker } = require('worker_threads');
const worker = new Worker('const { parentPort } = require("worker_threads");parentPort.once("message",message => parentPort.postMessage({ pong: message })); ', { eval: true });
worker.on('message', message => console.log(message));
worker.postMessage('ping');
worker.postMessage('ping');
How do I configure the worker so it will keep responding to whatever message it receives after the first one?
Because you use EventEmitter.once() method. According to the documentation this method does the next:
Adds a one-time listener function for the event named eventName. The
next time eventName is triggered, this listener is removed and then
invoked.
If you need your worker to process more than one event then use EventEmitter.on()
const worker = new Worker('const { parentPort } = require("worker_threads");' +
'parentPort.on("message",message => parentPort.postMessage({ pong: message }));',
{ eval: true });

Correct way to process batches using receiveMessages

We are using the #azure/service-bus package to process message batches from multiple topics.
The code we use to take 20 messages from the topic every 2 seconds looks like this.
let isProcessing: boolean = false;
setInterval(async () => {
if (isProcessing === false) {
isProcessing = true;
try {
const messages: Array<ServiceBusMessage>
= await receiver.receiveMessages(Configuration.SB.batchSize as number);
if (messages.length > 0) {
this.logger.info(`[SB] ${topic} - ${messages.length} require processing`);
await Promise.all([
...messages.map(message => this.handleMsg(receiver, message, topic, moduleRef, handler))
]).catch(error => {
this.logger.error(error.message, error);
});
}
isProcessing = false;
} catch (error) {
this.logger.error(error.message, error);
isProcessing = false;
}
}
}, Configuration.SB.tickInterval as number);
My question is - Is this the best way to do this? Is there a better way? It works and is fairly performant BUT I think we are losing receiveAndDelete messages sometimes and I am trying to workout if its our implementation
Thanks for any help
It works and is fairly performant BUT I think we are losing receiveAndDelete messages sometimes and I am trying to workout if its our implementation
There are two modes to receive messages
Unsafe with ReceiveAndDelete
Safe with PeekLock
When ReceiveAndDelete mode is used, the moment messages are received by the client, they are automatically deleted from the server. So this is at-most-once delivery.
With PeekLock a message is "leased" to the client for a maximum of 5 minutes and the client has to either acknowledge successful processing by requesting message completion or by cancelling/dead-lettering if it can't handle it. If none of these operations take place within the defined lease time (which doesn't have to be strictly 5 minutes and could be less), the message is retried until a maximum number of delivery attempts (MaxDeliveryCount) is exceeded and the message is dead-lettered. Note that the message is never lost. Even if it failed to process and was dead-lettered. Therefore this is at-least-once-delivery which could be more suitable for your scenario. It will have a slight impact on how you code your client, but not a drastic change.

Nodejs Cluster Architecture reading from single REDIS instance

I'm using Nodejs cluster module to have multiple workers running.
I created a basic Architecture where there will be a single MASTER process which is basically an express server handling multiple requests and the main task of MASTER would be writing incoming data from requests into a REDIS instance. Other workers(numOfCPUs - 1) will be non-master i.e. they won't be handling any request as they are just the consumers. I have two features namely ABC and DEF. I distributed the non-master workers evenly across features via assigning them type.
For eg: on a 8-core machine:
1 will be MASTER instance handling request via express server
Remaining (8 - 1 = 7) will be distributed evenly. 4 to feature:ABD and 3 to fetaure:DEF.
non-master workers are basically consumers i.e. they read from REDIS in which only MASTER worker can write data.
Here's the code for the same:
if (cluster.isMaster) {
// Fork workers.
for (let i = 0; i < numCPUs - 1; i++) {
ClusteringUtil.forkNewClusterWithAutoTypeBalancing();
}
cluster.on('exit', function(worker) {
console.log(`Worker ${worker.process.pid}::type(${worker.type}) died`);
ClusteringUtil.removeWorkerFromList(worker.type);
ClusteringUtil.forkNewClusterWithAutoTypeBalancing();
});
// Start consuming on server-start
ABCConsumer.start();
DEFConsumer.start();
console.log(`Master running with process-id: ${process.pid}`);
} else {
console.log('CLUSTER type', cluster.worker.process.env.type, 'running on', process.pid);
if (
cluster.worker.process.env &&
cluster.worker.process.env.type &&
cluster.worker.process.env.type === ServerTypeEnum.EXPRESS
) {
// worker for handling requests
app.use(express.json());
...
}
{
Everything works fine except consumers reading from REDIS.
Since there are multiple consumers of a particular feature, each one reads the same message and start processing individually, which is what I don't want. If there are 4 consumers, 1 is marked as busy and can not consumer until free, 3 are available. Once the message for that particular feature is written in REDIS by MASTER, the problem is all 3 available consumers of that feature start consuming. This means that the for a single message, the job is done based on number of available consumers.
const stringifedData = JSON.stringify(req.body);
const key = uuidv1();
const asyncHsetRes = await asyncHset(type, key, stringifedData);
if (asyncHsetRes) {
await asyncRpush(FeatureKeyEnum.REDIS.ABC_MESSAGE_QUEUE, key);
res.send({ status: 'success', message: 'Added to processing queue' });
} else {
res.send({ error: 'failure', message: 'Something went wrong in adding to queue' });
}
Consumer simply accepts messages and stop when it is busy
module.exports.startHeartbeat = startHeartbeat = async function(config = {}) {
if (!config || !config.type || !config.listKey) {
return;
}
heartbeatIntervalObj[config.type] = setInterval(async () => {
await asyncLindex(config.listKey, -1).then(async res => {
if (res) {
await getFreeWorkerAndDoJob(res, config);
stopHeartbeat(config);
}
});
}, HEARTBEAT_INTERVAL);
};
Ideally, a message should be read by only one consumer of that particular feature. After consuming, it is marked as busy so it won't consume further until free(I have handled this). Next message could only be processed by only one consumer out of other available consumers.
Please help me in tacking this problem. Again, I want one message to be read by only one free consumer and rest free consumers should wait for new message.
Thanks
I'm not sure I fully get your Redis consumers architecture, but I feel like it contradicts with the use case of Redis itself. What you're trying to achieve is essentially a queue based messaging with an ability to commit a message once its done.
Redis has its own pub/sub feature, but it is built on fire and forget principle. It doesn't distinguish between consumers - it just sends the data to all of them, assuming that its their logic to handle the incoming data.
I recommend to you use Queue Servers like RabbitMQ. You can achieve your goal with some features that AMQP 0-9-1 supports: message acknowledgment, consumer's prefetch count and so on. You can set up your cluster with very agile configs like ok, I want to have X consumers, and each can handle 1 unique (!) message at a time and they will receive new ones only after they let the server (rabbitmq) know that they successfully finished message processing. This is highly configurable and robust.
However, if you want to go serverless with some fully managed service so that you don't provision like virtual machines or anything else to run a message queue server of your choice, you can use AWS SQS. It has pretty much similar API and features list.
Hope it helps!

High performance on Nodejs RabbitMQ server

I'm building an analysis system with a million users online in the same time. I use RabbitMQ such as message broker to reduce capacity for server
Here is my diagram
My system include 3 components.
Publisher server : ( Producer )
This system was built on nodejs. The purpose of this system to publish the messages into queue
RabbitMQ queue : This system stored the messages that publisher server sent to. After that, one connect is opened to send message from queue for subscriber server.
Subscriber server ( Consumer ) : This system receive the messages from queue
Publisher server source code
var amqp = require('amqplib/callback_api');
amqp.connect("amqp://localhost", function(error, connect) {
if (error) {
return callback(-1, null);
} else {
connect.createChannel(function(error, channel) {
if (error) {
return callback(-3, null);
} else {
var q = 'logs';
var msg = data; // object
// convert msg object to buffer
var new_msg = Buffer.from(JSON.stringify(msg), 'binary');
channel.assertExchange(q, 'fanout', { durable: false });
channel.publish(q, 'message_queues', new Buffer(new_msg));
console.log(" [x] Sent %s", new_msg);
return callback(null, msg);
}
});
}
});
create exclusively exchange "message_queues" with "fanout" to send
broadcast to all consumer
Subscriber server source code
var amqp = require('amqplib/callback_api');
amqp.connect("amqp://localhost", function(error, connect) {
if (error) {
console.log('111');
} else {
connect.createChannel(function(error, channel) {
if (error) {
console.log('1');
} else {
var ex = 'logs';
channel.assertExchange(ex, 'fanout', { durable: false });
channel.assertQueue('message_queues', { exclusive: true }, function(err, q) {
if (err) {
console.log('123');
} else {
console.log(" [*] Waiting for messages in %s. To exit press CTRL+C", q.queue);
channel.bindQueue(q.queue, ex, 'message_queues');
channel.consume(q.queue, function(msg) {
console.log(" [x] %s", msg.content.toString());
}, { noAck: true });
}
});
}
});
}
});
receive messge from "message_queues" exchange
When I implement send a message. The system work well, however I tried benchmark test performance of this system (with ~ 1000 users sent request per second ) then the system has some issue. The system seem as overload / buffer overflow ( or some thing don't work well ).
I just only read about rabbitmq 2 days ago. I know its tutorials is basic example, so I need help to build systems in real world than .. Any
solution & suggestion
Hope that my question make a sense
Your question is general. Probably you should provide more details to help to identify the bottleneck and help you out.
So, first of all I think you should check the rabbit mq - whether its a bottleneck or not.
There are many things that can go wrong:
The number of consumers that can consume the message is too low (I assume you use a pool of consumers)
The network is too slow
The queues and messages are replicated between too many nodes of Rabbit MQ and go do disk (its possible to use rabbit mq like this)
The consumer can't really handle a message and it gets constantly re-queued
So, in general during your tests you should check rabbit mq and see what happens there.
The message once arrives into queue is in Ready State once this happens, it will be there till one of consumers connected to queue won't attempt to take the the message for handling
When one of consumers (rabbit does round-robin between them) picks the message for processing it's state will turn to Unacknowledged
if consumer fails to handle the message, it will be re-queued by rabbit so that another consumer would have a chance to handle the message.
Of course, if consumer handles the message successfully, the message disappears from rabbit mq server.
Assuming you've installed rabbit mq web ui (I highly recommend it especially for beginners) - you can visually see what happens in your queue - you'll see how many messages are in ready state, and how many are unacknowledged.
This will help to identify a bottleneck.
For example - if you see that only one message is usually in unacknowledged state, this can mean that the consumer can't handle the message and sends it back to rabbit. On the other hand new messages always arrive from producer, so the number of ready messages will increase very fast
It also can point on the fact that you use only one consumer that can handle only one message at a time. So you can consider paralleling here, by running many consumers in different threads or even clustering your application (in rabbit consumers can reside in different machines)
Hope this helps in general, of course, as I've said before if you have more specific questions - please provide more information about what exactly happens during the test

can I limit consumption of kafka-node consumer?

It seems like my kafka node consumer:
var kafka = require('kafka-node');
var consumer = new Consumer(client, [], {
...
});
is fetching way too many messages than I can handle in certain cases.
Is there a way to limit it (for example accept no more than 1000 messages per second, possibly using the pause api?)
I'm using kafka-node, which seems to have a limited api comparing to the Java version
In Kafka, poll and process should happen in a coordinated/synchronized way. Ie, after each poll, you should process all received data first, before you do the next poll. This pattern will automatically throttle the number of messages to the max throughput your client can handle.
Something like this (pseudo-code):
while(isRunning) {
messages = poll(...)
for(m : messages) {
process(m);
}
}
(That is the reason, why there is not parameter "fetch.max.messages" -- you just do not need it.)
I had a similar situation where I was consuming messages from Kafka and had to throttle the consumption because my consumer service was dependent on a third party API which had its own constraints.
I used async/queue along with a wrapper of async/cargo called asyncTimedCargo for batching purpose.
The cargo gets all the messages from the kafka-consumer and sends it to queue upon reaching a size limit batch_config.batch_size or timeout batch_config.batch_timeout.
async/queue provides saturated and unsaturated callbacks which you can use to stop the consumption if your queue task workers are busy. This would stop the cargo from filling up and your app would not run out of memory. The consumption would resume upon unsaturation.
//cargo-service.js
module.exports = function(key){
return new asyncTimedCargo(function(tasks, callback) {
var length = tasks.length;
var postBody = [];
for(var i=0;i<length;i++){
var message ={};
var task = JSON.parse(tasks[i].value);
message = task;
postBody.push(message);
}
var postJson = {
"json": {"request":postBody}
};
sms_queue.push(postJson);
callback();
}, batch_config.batch_size, batch_config.batch_timeout)
};
//kafka-consumer.js
cargo = cargo-service()
consumer.on('message', function (message) {
if(message && message.value && utils.isValidJsonString(message.value)) {
var msgObject = JSON.parse(message.value);
cargo.push(message);
}
else {
logger.error('Invalid JSON Message');
}
});
// sms-queue.js
var sms_queue = queue(
retryable({
times: queue_config.num_retries,
errorFilter: function (err) {
logger.info("inside retry");
console.log(err);
if (err) {
return true;
}
else {
return false;
}
}
}, function (task, callback) {
// your worker task for queue
callback()
}), queue_config.queue_worker_threads);
sms_queue.saturated = function() {
consumer.pause();
logger.warn('Queue saturated Consumption paused: ' + sms_queue.running());
};
sms_queue.unsaturated = function() {
consumer.resume();
logger.info('Queue unsaturated Consumption resumed: ' + sms_queue.running());
};
From FAQ in the README
Create a async.queue with message processor and concurrency of one (the message processor itself is wrapped with setImmediate function so it will not freeze up the event loop)
Set the queue.drain to resume() the consumer
The handler for consumer's message event to pause() the consumer and pushes the message to the queue.
As far as I know the API does not have any kind of throttling. But both consumers (Consumer and HighLevelConsumer) have a 'pause()' function. So you could stop consuming if you get to much messages. Maybe that already offers what you need.
Please keep in mind what's happening. You send a fetch request to the broker and get a batch of message back. You can configure the min and max size of the messages (according to the documentation not the number of messages) you want to fetch:
{
....
// This is the minimum number of bytes of messages that must be available to give a response, default 1 byte
fetchMinBytes: 1,
// The maximum bytes to include in the message set for this partition. This helps bound the size of the response.
fetchMaxBytes: 1024 * 1024,
}
I was facing the same issue, initially fetchMaxBytes value was
fetchMaxBytes: 1024 * 1024 * 10 // 10MB
I just chanbed it to
fetchMaxBytes: 1024
It worked very smoothly after the change.

Resources