Channels keep increasing for every exchange.publish() in RabbitMQ with node-amqp library - node.js

I'm using node-amqp library for my nodejs project. I also posted the issue to it's github project page.
It keeps creating new channels and they stay idle forever. After an hour channels were ~12000. I checked the options for exchange and publish but so far I'm not even close to solution.
What's wrong with the code and/or is there any options/settings for rabbitmq server for the issue?
Here is the sample code:
connection.exchange("brcks-wfa",{type:'direct',durable:true}, function(exchange) {
setInterval(function() {
...
awS.forEach(function(wc){
...
nstbs.forEach(function(br){
...
BUpdate(brnewinfo,function(st){
if(st){
exchange.publish(route, brnewinfo,{contentType:"application/json"});
}
});
});
...
});
}, 4000);
});

There is a bug in node-amqp where channels are not closed. The rabbit MQ team no longer recommends using this library anymore, instead they are recommending ampq.node which is a bit more low-level and lets/requires you to handle channels manually.

Related

Google Pub/Sub with distributed subscribers in Node.js

We are attempting to migrate a message processing app from Kafka to Google Pub/Sub and it's just not working as expected.
We are running in Kubernetes (Google Cloud) where there may be multiple pods processing messages on the same subscription. Topics and subscriptions are all created using terraform and are more or less permanent. They are not created/destroyed on the fly by the application.
In our development environment, where message throughput is rather low, everything works just fine. But when we scale up to production levels, everything seems to fall apart. We get big backlogs of unacked messages, and yet some pods are not receiving any messages at all. And then, all of a sudden, the backlog will just go away, but then climb again.
We are using the nodejs client library provided by google: #google-cloud/pubsub:3.1.0
Each instance of the application subscribes to the same named subscription, and according to the documentation, messages should be distributed to each subscriber. But that is not happening. Some pods will be consuming messages rapidly, while others sit idle.
Every message is processed in a try/catch block and we are not observing any errors being thrown. So, as far as we know, every received message is getting acked.
I am suspicious that, as pods are terminated with autoscaling or updated deployments, that we are not properly closing subscriptions, but there are no examples addressing a distributed environment and I have not found any document that specifically addresses how to properly manage resources. It is also worth mentioning that the app has multiple subscriptions to different topics.
When a pod shuts down, what actions should be taken on the Subscription object and the PubSub client object? Maybe that's not even the issue, but it seems like a reasonable place to start.
When we start a subscription we do something like this:
private exampleSubscribe(): Subscription {
// one suggestion for having multiple subscriptions in the same app
// was to use separate clients for each
const pubSubClient = new PubSub({
// use a regional endpoint for message ordering
apiEndpoint: 'us-central1-pubsub.googleapis.com:443',
});
pubSubClient.projectId = 'my-project-id';
const sub = pubSubClient.subscription('my-subscription-name', {
// have tried various values for maxMessage from 5 to the default of 1000
flowControl: { maxMessages: 250, allowExcessMessages: false },
ackDeadline: 30,
});
sub.on('message', async (message) => {
await this.exampleMessageProcessing(message);
});
return sub;
}
private async exampleMessageProcessing(message: Message): Promise<void> {
try {
// do some cool stuff
} catch (error) {
// log the error
} finally {
message.ack();
}
}
Upon termination of a pod, we do this:
private async exampleCloseSub(sub: Subscription) {
try {
sub.removeAllListeners('message');
await sub.close();
// note that we do nothing with the PubSub
// client object -- should it also be closed?
} catch (error) {
// ignore error, we are shutting down
}
}
When running with Kafka, we can easily keep up with the message pace with usually no more than 2 pods. So I know that we are not running into issues of it simply taking too long to process each message.
Why are messages being left unacked? Why are pods not receiving messages when there is clearly a large backlog? What is the correct way to shut down one subscriber on a shared subscription?
It turns out that the issue was an improper implementation of message ordering.
The official docs for message ordering in Pub/Sub are rather brief:
https://cloud.google.com/pubsub/docs/ordering
Not much there regarding how to implement an ordering key or the implications of message ordering on horizontal scaling.
Though they do link to some external resources, one of which is this blog post:
https://medium.com/google-cloud/google-cloud-pub-sub-ordered-delivery-1e4181f60bc8
In our case, we did not have enough distinct ordering keys to allow for proper distribution of messages across subscribers/pods.
So this was definitely an RTFM situation, or more accurately: Read The Fine Blog Post Referred To By The Manual. I would have much preferred that the important details were actually in the official documentation. Is that to much to ask for?

How should one properly handle opening and closing of Azure Service Bus/Topic Clients in an Azure Function?

I'm not sure of the proper way to manage the lifespans of the various clients necessary to interact with the Azure Service Bus. From my understanding there are three different but similar clients to manage: ServiceBusClient, a Topic/Queue/Subscription Service, and then a Sender of some sort. In my case, its TopicService and a Sender. Should I close the sender after every message? After a certain amount of downtime? And same with all the others? I feel like I should keep the ServiceBusClient open until the function is entirely complete, so that probably carries over to the Topic Client as well. There's just so many ways to skin this one, I'm not sure where to start to draw the line. I'm pretty sure it's not this extreme:
function sendMessage(message: SendableMessageInfo) {
let client=createServiceBusClientFromConnectionString(connectionString)
let tClient = createTopicClient(client);
const sender = tClient.createSender();
sender.send(message);
sender.close();
tClient.close();
client.close();
}
But leaving everything open all the time seems like a memory leak waiting to happen. Should I handle this all through error handling? Try-catch, then close everything in a finally block?
I could also just use the Azure Function binding, correct me if I'm wrong:
const productChanges: AzureFunction = async function (context: Context, products: product[]): Promise<void> {
context.bindings.product_changes = []
for (let product of product) {
if(product.updated) {
let message = this.createMessage(product)
context.bindings.product_changes.push(message)
}
}
context.done();
}
I can't work out from the docs or source which would be better (both in terms of performance and finances) for an extremely high throughput Topic (at surge, ~100,000 requests/sec).
Any advice would be appreciated!
In my opinion, we'd better use Azure binding or set the client static but not create the client every time. If use Azure binding, we will not consider the problem about close the sender, if set the client static, it's ok too. Both of the solutions have good performance and there is no difference in cost (you can refer to this tutorial for servicebus price: https://azure.microsoft.com/en-us/pricing/details/service-bus/) between these twos. Hope it would be helpful to your question.
I know this is a late reply, but I'll try to explain the concepts behind the clients below in case someone lands here looking for answers.
Version 1
_ ServiceBusClient (maintains the connection)
|_ TopicClient
|_ Sender (sender link)
Version 7
_ ServiceBusClient (maintains the connection)
|_ ServiceBusSender (sender link)
In both version 1 and version 7 of #azure/service-bus SDK, when you use the sendMessages method or the equivalent send method for the first time, a connection is created on the ServiceBusClient if there was none and the new sender link is created.
The sender link remains active for a while and is cleared on its own(by the SDK) if there is no activity. Even if it is closed by inactivity, the subsequent send call even after waiting for a long duration would work just fine since it creates a new sender link.
Once you're done using the ServiceBusClient, you can close the client and all the internal senders, receivers are also closed with this if they are not already closed individually.
The latest version 7.0.0 of #azure/service-bus has been released recently.
#azure/service-bus - 7.0.0
Samples for 7.0.0
Guide to migrate from #azure/service-bus v1 to v7

First call to Microsoft.Azure.ServiceBus.Core.MessageSender.SendAsync times out, subsequent calls don't

I have some code written to communicate with an azure service bus. It sends messages to a queue. It's in a project targeting .net standard 2.0
When I run it from a .net core terminal app it runs fine. But, when the same code is called from a .net framework 4.7.2 project then the first attempt to send a message results in the following exception after 30 to 90 seconds:
"The remote party closed the WebSocket connection without completing the close handshake."
But any further messages will be sent without problem.
// This is using Microsoft.Azure.ServiceBus, if that makes any difference...
MessageSender MessageSender = new MessageSender(ConnectionString, SendQueueName;
try
{
await MessageSender.SendAsync(new Message(Encoding.UTF8.GetBytes("Test that won't work")));
}
catch(Exception e)
{
// Error will be caught here:
// "The remote party closed the WebSocket connection without completing the close handshake."
}
await MessageSender.SendAsync(new Message(Encoding.UTF8.GetBytes("Test that will work")));
Does anybody know why the first call fails? And how to make it not fail? Or fail quicker? I've tried changing the OperationTimeout and RetryPolicy but they don'e seem to have any effect.
These first connections are via port 5671/56712, which Trend antivirus intercepts. Once these have timed out then the framework falls back to using 443, which works fine.
We tried turning Trend off and running testing the connection and its pretty much instantaneous.

TypeScript: Large memory consumption while using ZeroMQ ROUTER/DEALER

We have recently started working on Typescript language for one of the application where a queue'd communication is expected between a server and client/clients.
For achieving the queue'd communication, we are trying to use the ZeroMQ library version 4.6.0 as a npm package: npm install -g zeromq and npm install -g #types/zeromq.
The exact scenario :
The client is going to send thousands of messages to the server over ZeroMQ. The server in-turn will be responding with some acknowledgement message per incoming message from the client. Based on the acknowledgement message, the client will send next message.
ZeroMQ pattern used :
The ROUTER/DEALER pattern (we cannot use any other pattern).
Client side code :
import Zmq = require('zeromq');
let clientSocket : Zmq.Socket;
let messageQueue = [];
export class ZmqCommunicator
{
constructor(connString : string)
{
clientSocket = Zmq.socket('dealer');
clientSocket.connect(connString);
clientSocket.on('message', this.ReceiveMessage);
}
public ReceiveMessage = (msg) => {
var argl = arguments.length,
envelopes = Array.prototype.slice.call(arguments, 0, argl - 1),
payload = arguments[0];
var json = JSON.parse(msg.toString('utf8'));
if(json.type != "error" && json.type =='ack'){
if(messageQueue.length>0){
this.Dispatch(messageQueue.splice(0, 1)[0]);
}
}
public Dispatch(message) {
clientSocket.send(JSON.stringify(message));
}
public SendMessage(msg: Message, isHandshakeMessage : boolean){
// The if condition will be called only once for the first handshake message. For all other messages, the else condition will be called always.
if(isHandshakeMessage == true){
clientSocket.send(JSON.stringify(message));
}
else{
messageQueue.push(msg);
}
}
}
On the server side, we already have a ROUTER socket configured.
The above code is pretty straight forward. The SendMessage() function is essentially getting called for thousands of messages and the code works successfully but with load of memory consumption.
Problem :
Because the behavior of ZeroMQ is asynchronous, the client has to wait on the call back call ReceiveMessage() whenever it has to send a new message to ZeroMQ ROUTER (which is evident from the flow to the method Dispatch).
Based on our limited knowledge with TypeScript and usage of ZeroMQ with TypeScript, the problem is that because default thread running the typescript code (which creates the required 1000+ messages and sends to SendMessage()) continues its execution (creating and sending more messages) after sending the first message (handshake message essentially), unless all the 1000+ messages are created and sent to SendMessage() (which is not sending the data but queuing the data as we want to interpret the acknowledgement message sent by the router socket and only based on the acknowledgement we want to send the next message), the call does not come to the ReceiveMessage() call back method.
It is to say that the call comes to ReceiveMessage() only after the default thread creating and calling SendMessage() is done doing this for 1000+ message and now there is no other task for it to do any further.
Because ZeroMQ does not provide any synchronous mechanism of sending/receiving data using the ROUTER/DEALER, we had to utilize the queue as per the above code using a messageQueue object.
This mechanism will load a huge size messageQueue (with 1000+ messages) in memory and will dequeue only after the default thread gets to the ReceiveMessage() call at the end. The situation will only worsen if say we have 10000+ or even more messages to be sent.
Questions :
We have validated this behavior certainly. So we are sure of the understanding that we have explained above. Is there any gap in our understanding of either/or TypeScript or ZeroMQ usage?
Is there any concept like a blocking queue/limited size array in Typescript which would take limited entries on queue, and block any new additions to the queue until the existing ones are queues (which essentially applies that the default thread pauses its processing till the time the call back ReceiveMessage() is called which will de-queue entries from the queue)?
Is there any synchronous ZeroMQ methodology (We have used it in similar setup for C# where we pool on ZeroMQ and received the data synchronously)?.
Any leads on using multi-threading for such a scenario? Not sure if Typescript supports multi threading to a good extent.
Note : We have searched on many forums and have not got any leads any where. The above description may have multiple questions inside one question (against the rules of stackoverflow forum); but for us all of these questions are interlinked to using ZeroMQ effectively in Typescript.
Looking forward to getting some leads from the community.
Welcome to ZeroMQ
If this is your first read about ZeroMQ, feel free to first take a 5 seconds read - about the main conceptual differences in [ ZeroMQ hierarchy in less than a five seconds ] Section.
1 ) ... Is there any gap in our understanding of either/or TypeScript or ZeroMQ usage ?
Whereas I cannot serve for the TypeScript part, let me mention a few details, that may help you move forwards. While ZeroMQ is principally a broker-less, asynchronous signalling/messaging framework, it has many flavours of use and there are tools to enforce both a synchronous and asynchronous cooperation between the application code and the ZeroMQ Context()-instance, which is the cornerstone of all the services design.
The native API provides means to define, whether a respective call ought block, until a message processing across the Context()-instance's boundary was able to get completed, or, on the very contrary, if a call ought obey the ZMQ_DONTWAIT and asynchronously return the control back to the caller, irrespectively of the operation(s) (in-)completion.
As additional tricks, one may opt to configure ZMQ_SND_HWM + ZMQ_RCV_HWM and other related .setsockopt()-options, so as to meet a specific blocking / silent-dropping behaviours.
Because ZeroMQ does not provide any synchronous mechanism of sending/receiving data
Well, ZeroMQ API does provide means for a synchronous call to .send()/.recv() methods, where the caller is blocked until any feasible message could get delivered into / from a Context()-engine's domain of control.
Obviously, the TypeScript language binding/wrapper is responsible for exposing these native API services to your hands.
3 ) Is there any synchronous ZeroMQ methodology (We have used it in similar setup for C# where we pool on ZeroMQ and received the data synchronously) ?
Yes, there are several such :
- the native API, if not instructed by a ZMQ_DONTWAIT flag, blocks until a message can get served
- the native API provides a Poller()-object, that can .poll(), if given a -1 as a long duration specifier to wait for sought for events, blocking the caller until any such event comes and appears to the Poller()-instance.
Again, the TypeScript language binding/wrapper is responsible for exposing these native API services to your hands.
... Large memory consumption ...
Well, this may signal a poor resources management care. ZeroMQ messages, once got allocated, ought become also free-d, where appropriate. Check your TypeScript code and the TypeScript language binding/wrapper sources, if the resources systematically get disposed off and free-d from memory.

amqplib - how to safely check if a queue exists

I'm using the amqplib library for nodejs to work with RabbitMQ. I'm trying to check whether a queue exists by using the function checkQueue:
mychannel.checkQueue('xxx', function (err, ok) {
console.log(err);
console.log(ok)
});
But it not only throws an error, but also closes the channel. How can I safely check if the queue exists?
You can't without risking to destroy the channel. The workaround is to create a temporary channel which you can use to do the check.
A comment from amqp.node dev:
(https://github.com/squaremo/amqp.node/issues/280)
The behaviour of checkQueue is dictated by the protocol, but it can be
worked around. One tactic is to create a "sacrificial" extra channel with
which to test whether the queue exists. Once you have the answer, you can
throw the extra channel away, or keep it around for more tests.

Resources