Correct way to process batches using receiveMessages - node.js

We are using the #azure/service-bus package to process message batches from multiple topics.
The code we use to take 20 messages from the topic every 2 seconds looks like this.
let isProcessing: boolean = false;
setInterval(async () => {
if (isProcessing === false) {
isProcessing = true;
try {
const messages: Array<ServiceBusMessage>
= await receiver.receiveMessages(Configuration.SB.batchSize as number);
if (messages.length > 0) {
this.logger.info(`[SB] ${topic} - ${messages.length} require processing`);
await Promise.all([
...messages.map(message => this.handleMsg(receiver, message, topic, moduleRef, handler))
]).catch(error => {
this.logger.error(error.message, error);
});
}
isProcessing = false;
} catch (error) {
this.logger.error(error.message, error);
isProcessing = false;
}
}
}, Configuration.SB.tickInterval as number);
My question is - Is this the best way to do this? Is there a better way? It works and is fairly performant BUT I think we are losing receiveAndDelete messages sometimes and I am trying to workout if its our implementation
Thanks for any help

It works and is fairly performant BUT I think we are losing receiveAndDelete messages sometimes and I am trying to workout if its our implementation
There are two modes to receive messages
Unsafe with ReceiveAndDelete
Safe with PeekLock
When ReceiveAndDelete mode is used, the moment messages are received by the client, they are automatically deleted from the server. So this is at-most-once delivery.
With PeekLock a message is "leased" to the client for a maximum of 5 minutes and the client has to either acknowledge successful processing by requesting message completion or by cancelling/dead-lettering if it can't handle it. If none of these operations take place within the defined lease time (which doesn't have to be strictly 5 minutes and could be less), the message is retried until a maximum number of delivery attempts (MaxDeliveryCount) is exceeded and the message is dead-lettered. Note that the message is never lost. Even if it failed to process and was dead-lettered. Therefore this is at-least-once-delivery which could be more suitable for your scenario. It will have a slight impact on how you code your client, but not a drastic change.

Related

Any suggestions about how to publish a huge amount of messages within one round of request / response?

If I publish 50K messages using Promise.all like below:
const pubsub = new PubSub({ projectId: PUBSUB_PROJECT_ID });
const topic = pubsub.topic(topicName, {
batching: {
maxMessages: 1000,
maxMilliseconds: 100,
},
});
const n = 50 * 1000;
const dataBufs: Buffer[] = [];
for (let i = 0; i < n; i++) {
const data = `message payload ${i}`;
const dataBuffer = Buffer.from(data);
dataBufs.push(dataBuffer);
}
const tasks = dataBufs.map((d, idx) =>
topic.publish(d).then((messageId) => {
console.log(`[${new Date().toISOString()}] Message ${messageId} published. index: ${idx}`);
})
);
// publish messages concurrencly
await Promise.all(tasks);
// send response to front-end
res.json(data);
I will hit this issue: pubsub-emulator throw error and publisher throw "Retry total timeout exceeded before any response was received" when publish 50k messages
If I use for loop and async/await. The issue is gone.
const n = 50 * 1000;
for (let i = 0; i < n; i++) {
const data = `message payload ${i}`;
const dataBuffer = Buffer.from(data);
const messageId = await topic.publish(dataBuffer)
console.log(`[${new Date().toISOString()}] Message ${messageId} published. index: ${i}`)
}
// some logic ...
// send response to front-end
res.json(data);
But it will block the execution of subsequent logic because of async/await until all messages have been published. It takes a long time to post 50k messages.
Any suggestions about how to publish a huge amount of messages(about 50k) without blocking the execution of subsequent logic? Do I need to use child_process or some queue like bull to publish the huge amount of messages in the background without blocking request/response workflow of the API? This means I need to respond to the front-end as soon as possible, the 50k messages should be the background tasks.
It seems there is a memory queue inside #google/pubsub library. I am not sure if I should use another queue like bull again.
The time it will take to publish large amounts of data depends on a lot of factors:
Message size. The larger the messages, the longer it takes to send them.
Network capacity (both of the connection between wherever the publisher is running and Google Cloud and, if relevant, of the virtual machine itself). This puts an upper bound on the amount of data that can be transmitted. It is not atypical to see smaller virtual machines with limits in the 40MB/s range. Note that if you are testing via Wifi, the limits could be even lower than this.
Number of threads and number of CPU cores. When having to run a lot of asynchronous callbacks, the ability to schedule them to run can be limited by the parallel capacity of the machine or runtime environment.
Typically, it is not good to try to send 50,000 publishes simultaneously from one instance of a publisher. It is likely that the above factors will cause the client to get overloaded and result in deadline exceeded errors. The best way to prevent this is to limit the number of messages that can be outstanding for publish at one time. Some of the libraries like Java support this natively. The Node.js library does not yet support this feature, but likely will in the future.
In the meantime, you'd want to keep a counter of the number of messages outstanding and limit it to whatever the client seems to be able to handle. Start with 1000 and work up or down from there based on the results. A semaphore would be a pretty standard way to achieve this behavior. In your case the code would look something like this:
var sem = require('semaphore')(1000);
var publishes = []
const tasks = dataBufs.map((d, idx) =>
sem.take(function() => {
publishes.push(topic.publish(d).then((messageId) => {
console.log(`[${new Date().toISOString()}] Message ${messageId} published. index: ${idx}`);
sem.leave();
}));
})
);
// Await the start of publishing all messages
await Promise.all(tasks);
// Await the actual publishes
await Promise.all(publishes);

Async base-local with MQTT

I need to synchronize a base and a local client with MQTT. If client publishes then the other one will get the message.
If my MQTT broker is down, I need to stop sending messages, save the messages somewhere, wait for a connection, then continue sending.
If my local or base client is down for a second, I need to save the message which I didn't send, then send it when I turn on my base/local.
I'm working with Node.js and can't figure out how to implement this.
This is my handler when I connect or disconnect with my MQTT server.
client.on('connect',()=>{
store.state = true;
run(store).then((value)=>console.log('stop run'));
});
client.on('offline',()=>{
store.state = false;
console.log('offline');
});
This is my run function. I use store.state to decide if I should stop this interval. But this code does not seem to be a good way to implement my concept.
function run(store) {
return new Promise((resolve,reject)=>{
let interval = setInterval(()=>{
if (!store.state) {
clearInterval(interval);
resolve(true);
}
else if (store.queue.length > 0) {
let data = store.queue.pop();
let res = client.publish('push',JSON.stringify(data),{qos:2});
}
},300)
});
}
What should I do to implement a function which always sends, stop upon 'disconnect', then continues sending when connected?
I don't think set interval which 300ms is good.
If you want something that "always runs", at set intervals and in spite of any errors inside the loop, setInterval() makes sense. You are right that queued messages can be cleared faster than "once every 300 ms".
Since MQTT.js has a built-in queue, you could simplify a lot by using it. However, your messages are published to a target called "push", so I guess you want them delivered in the order of the queue. This answer keeps the queue and focuses on sending the next message as soon as the last one is confirmed.
What if res=client.publish(..) false ?
Good point! If you want to make sure it arrives, better to remove it once the publish has succeeded. For this, you need to retrieve the value without removing it, and use the callback argument to find out what happened (publish() is asynchronous). If that was the only change, it might look like:
let data = store.queue[store.queue.length - 1];
client.publish('push', JSON.stringify(data), {qos:2}, (err) => {
if(!err) {
store.queue.pop();
}
// Ready for next publish; call this function again
});
Extending that to include a callback-based run:
function publishFromQueue(data) {
return new Promise((resolve,reject)=>{
let res = client.publish('push', JSON.stringify(data), {qos:2}, (err) => {
resolve(!err);
});
});
}
async function run(store) {
while (store.queue.length > 0 && store.state) {
let data = store.queue[store.queue.length - 1];
let res = await publishFromQueue(data);
if(res) {
store.queue.pop();
}
}
}
This should deliver all the queued messages in order as soon as possible, without blocking. The only drawback is that it does not run constantly. You have two options:
Recur at set intervals, as you have already done. Slower, though you could set a shorter interval.
Only run() when needed, like:
let isRunning = false; //Global for tracking state of running
function queueMessage(data) {
store.queue.push(data);
if(!isRunning) {
isRunning = true;
run(store);
}
isRunning = false;
}
As long as you can use this instead of pushing to the queue, it should come out similar length, and more immediate and efficient.

High performance on Nodejs RabbitMQ server

I'm building an analysis system with a million users online in the same time. I use RabbitMQ such as message broker to reduce capacity for server
Here is my diagram
My system include 3 components.
Publisher server : ( Producer )
This system was built on nodejs. The purpose of this system to publish the messages into queue
RabbitMQ queue : This system stored the messages that publisher server sent to. After that, one connect is opened to send message from queue for subscriber server.
Subscriber server ( Consumer ) : This system receive the messages from queue
Publisher server source code
var amqp = require('amqplib/callback_api');
amqp.connect("amqp://localhost", function(error, connect) {
if (error) {
return callback(-1, null);
} else {
connect.createChannel(function(error, channel) {
if (error) {
return callback(-3, null);
} else {
var q = 'logs';
var msg = data; // object
// convert msg object to buffer
var new_msg = Buffer.from(JSON.stringify(msg), 'binary');
channel.assertExchange(q, 'fanout', { durable: false });
channel.publish(q, 'message_queues', new Buffer(new_msg));
console.log(" [x] Sent %s", new_msg);
return callback(null, msg);
}
});
}
});
create exclusively exchange "message_queues" with "fanout" to send
broadcast to all consumer
Subscriber server source code
var amqp = require('amqplib/callback_api');
amqp.connect("amqp://localhost", function(error, connect) {
if (error) {
console.log('111');
} else {
connect.createChannel(function(error, channel) {
if (error) {
console.log('1');
} else {
var ex = 'logs';
channel.assertExchange(ex, 'fanout', { durable: false });
channel.assertQueue('message_queues', { exclusive: true }, function(err, q) {
if (err) {
console.log('123');
} else {
console.log(" [*] Waiting for messages in %s. To exit press CTRL+C", q.queue);
channel.bindQueue(q.queue, ex, 'message_queues');
channel.consume(q.queue, function(msg) {
console.log(" [x] %s", msg.content.toString());
}, { noAck: true });
}
});
}
});
}
});
receive messge from "message_queues" exchange
When I implement send a message. The system work well, however I tried benchmark test performance of this system (with ~ 1000 users sent request per second ) then the system has some issue. The system seem as overload / buffer overflow ( or some thing don't work well ).
I just only read about rabbitmq 2 days ago. I know its tutorials is basic example, so I need help to build systems in real world than .. Any
solution & suggestion
Hope that my question make a sense
Your question is general. Probably you should provide more details to help to identify the bottleneck and help you out.
So, first of all I think you should check the rabbit mq - whether its a bottleneck or not.
There are many things that can go wrong:
The number of consumers that can consume the message is too low (I assume you use a pool of consumers)
The network is too slow
The queues and messages are replicated between too many nodes of Rabbit MQ and go do disk (its possible to use rabbit mq like this)
The consumer can't really handle a message and it gets constantly re-queued
So, in general during your tests you should check rabbit mq and see what happens there.
The message once arrives into queue is in Ready State once this happens, it will be there till one of consumers connected to queue won't attempt to take the the message for handling
When one of consumers (rabbit does round-robin between them) picks the message for processing it's state will turn to Unacknowledged
if consumer fails to handle the message, it will be re-queued by rabbit so that another consumer would have a chance to handle the message.
Of course, if consumer handles the message successfully, the message disappears from rabbit mq server.
Assuming you've installed rabbit mq web ui (I highly recommend it especially for beginners) - you can visually see what happens in your queue - you'll see how many messages are in ready state, and how many are unacknowledged.
This will help to identify a bottleneck.
For example - if you see that only one message is usually in unacknowledged state, this can mean that the consumer can't handle the message and sends it back to rabbit. On the other hand new messages always arrive from producer, so the number of ready messages will increase very fast
It also can point on the fact that you use only one consumer that can handle only one message at a time. So you can consider paralleling here, by running many consumers in different threads or even clustering your application (in rabbit consumers can reside in different machines)
Hope this helps in general, of course, as I've said before if you have more specific questions - please provide more information about what exactly happens during the test

Amazon SQS with aws-sdk receiveMessage Stall

I'm using the aws-sdk node module with the (as far as I can tell) approved way to poll for messages.
Which basically sums up to:
sqs.receiveMessage({
QueueUrl: queueUrl,
MaxNumberOfMessages: 10,
WaitTimeSeconds: 20
}, function(err, data) {
if (err) {
logger.fatal('Error on Message Recieve');
logger.fatal(err);
} else {
// all good
if (undefined === data.Messages) {
logger.info('No Messages Object');
} else if (data.Messages.length > 0) {
logger.info('Messages Count: ' + data.Messages.length);
var delete_batch = new Array();
for (var x=0;x<data.Messages.length;x++) {
// process
receiveMessage(data.Messages[x]);
// flag to delete
var pck = new Array();
pck['Id'] = data.Messages[x].MessageId;
pck['ReceiptHandle'] = data.Messages[x].ReceiptHandle;
delete_batch.push(pck);
}
if (delete_batch.length > 0) {
logger.info('Calling Delete');
sqs.deleteMessageBatch({
Entries: delete_batch,
QueueUrl: queueUrl
}, function(err, data) {
if (err) {
logger.fatal('Failed to delete messages');
logger.fatal(err);
} else {
logger.debug('Deleted recieved ok');
}
});
}
} else {
logger.info('No Messages Count');
}
}
});
receiveMessage is my "do stuff with collected messages if I have enough collected messages" function
Occasionally, my script is stalling because I don't get a response for Amazon at all, say for example there are no messages in the queue to consume and instead of hitting the WaitTimeSeconds and sending a "no messages object", the callback isn't called.
(I'm writing this up to Amazon Weirdness)
What I'm asking is whats the best way to detect and deal with this, as I have some code in place to stop concurrent calls to receiveMessage.
The suggested answer here: Nodejs sqs queue processor also has code that prevents concurrent message request queries (granted it's only fetching one message a time)
I do have the whole thing wrapped in
var running = false;
runMonitorJob = setInterval(function() {
if (running) {
} else {
running = true;
// call SQS.receive
}
}, 500);
(With a running = false after the delete loop (not in it's callback))
My solution would be
watchdogTimeout = setTimeout(function() {
running = false;
}, 30000);
But surely this would leave a pile of floating sqs.receive's lurking about and thus much memory over time?
(This job runs all the time, and I left it running on Friday, it stalled Saturday morning and hung till I manually restarted the job this morning)
Edit: I have seen cases where it hangs for ~5 minutes and then suddenly gets messages BUT with a wait time of 20 seconds it should throw a "no messages" after 20 seconds. So a WatchDog of ~10 minutes might be more practical (depending on the rest of ones business logic)
Edit: Yes Long Polling is already configured Queue Side.
Edit: This is under (latest) v2.3.9 of aws-sdk and NodeJS v4.4.4
I've been chasing this (or a similar) issue for a few days now and here's what I've noticed:
The receiveMessage call does eventually return although only after 120 seconds
Concurrent calls to receiveMessage are serialised by the AWS.SDK library so making multiple calls in parallel have no effect.
The receiveMessage callback does not error - in fact after the 120 seconds have passed, it may contain messages.
What can be done about this? This sort of thing can happen for a number of reasons and some/many of these things can't necessarily be fixed. The answer is to run multiple services each calling receiveMessage and processing the messages as they come - SQS supports this. At any time, one of these services may hit this 120 second lag but the other services should be able to continue on as normal.
My particular problem is that I have some critical singleton services that can't afford 120 seconds of down time. For this I will look into either 1) use HTTP instead of SQS to push messages into my service or 2) spawn slave processes around each of the singletons to fetch the messages from SQS and push them into the service.
I also ran into this issue, but not when calling receiveMessage but sendMessage. I also saw hangups of exactly 120 seconds. I also saw it with a few other services, like Firehose.
That lead me to this line in the AWS SDK:
SQS Constructor
httpOptions:
timeout [Integer] — Sets the socket to timeout after timeout milliseconds of inactivity on the socket. Defaults to two minutes (120000).
to implement a fix, I override the timeout for my SQS client that performs the sendMessage to timeout after 10 seconds, and another with 25 seconds for receiving (where I long poll for 20 seconds):
var sendClient = new AWS.SQS({httpOptions:{timeout:10*1000}});
var receiveClient = new AWS.SQS({httpOptions:{timeout:25*1000}});
I've had this out in production for a week now and I've noticed that all of my SQS stalling issues have been eliminated.

nodejs http response.write: is it possible out-of-memory?

If i have following code to send data repeatedly to client every 10ms:
setInterval(function() {
res.write(somedata);
}, 10ms);
What would happen if the client is very slow to receive the data?
Will server get out-of-memory error?
Edit:
actually the connection is kept alive, sever send jpeg data endlessly (HTTP multipart/x-mixed-replace header + body + header + body.....)
Because node.js response.write is asynchronous,
so some users guess it may store data in internal buffer and wait until low layer tells it can send,
so the internal buffer will grow, am i right?
If i am right, then how to resolve this?
the problem is node.js does not notify me when data is send for a single write call.
In other word, i can not tell user this way is theoretically no risk of "out of memory" and how to fix it.
Update:
By the keyword "drain" event given by user568109, i studied the source of node.js, and got conclusion:
it really will cause "out-of-memory" error. I should check return value of response.write(...)===false and then handle "drain" event of the response.
http.js:
OutgoingMessage.prototype._buffer = function(data, encoding) {
this.output.push(data); //-------------No check here, will cause "out-of-memory"
this.outputEncodings.push(encoding);
return false;
};
OutgoingMessage.prototype._writeRaw = function(data, encoding) { //this will be called by resonse.write
if (data.length === 0) {
return true;
}
if (this.connection &&
this.connection._httpMessage === this &&
this.connection.writable &&
!this.connection.destroyed) {
// There might be pending data in the this.output buffer.
while (this.output.length) {
if (!this.connection.writable) { //when not ready to send
this._buffer(data, encoding); //----------> save data into internal buffer
return false;
}
var c = this.output.shift();
var e = this.outputEncodings.shift();
this.connection.write(c, e);
}
// Directly write to socket.
return this.connection.write(data, encoding);
} else if (this.connection && this.connection.destroyed) {
// The socket was destroyed. If we're still trying to write to it,
// then we haven't gotten the 'close' event yet.
return false;
} else {
// buffer, as long as we're not destroyed.
this._buffer(data, encoding);
return false;
}
};
Some gotchas:
If sending over http it is not be a good idea. The browser may consider the request as timeout if it is not finished within specified amount of time. Server too will close connection which is idle for too long. If client cannot keep up, the timeout is almost certain.
setInterval for 10ms is also subject to some restrictions. It doesn't mean it will repeat after every 10ms, 10ms is the minimum it will wait before repeating. It will be slower than what you set the interval.
Let's say you chance to overload the response with data, then at some point the server will end connection and respond by 413 Request Entity Too Large depending on what the limit is set.
Node.js has single threaded architecture with a max memory limitation of around 1.7 GB. If you set your above server limits to too high and have many incoming connections you will get process out of memory error.
So with appropriate limits it will either give timeout or be request too large. (And there are no other errors in your program.)
Update
You need to use drain event. The http response is a writable stream. It has its own internal buffer. When the buffer is emptied the drain event is triggered. You should learn more about streams as you would go in deeper. This will help you not just in http. You can find several resources about streams on web.

Resources