Realtime messaging with NodeJS across multiple processes - node.js

I'm trying to implement an API that interacts with a NodeJS server for realtime messaging. Now when that NodeJS app is deployed to a scalable environment like Heroku, multiple instances of this app may be running.
Is it possible to design the node app so that all clients subscribed to a "message channel" will receive this message, although multiple node instances are running - and therefore multiple copies of this channel?

Check out zeromq, it should provide some simple, high performance IPC abstractions to do what you want. In particular, the pub/sub example will be useful.
The main challenge as I imagine it, without knowing anything about how Heroku spawns multiple server instances, will be the logic to determine who is the publisher (the rest of the instances will be subscribers). So let's say, for argument's sake, that your hosting provider gives you an environment variable called INSTANCE_NUM which is an integer in [0,1024] indicating the instance number of the process; so we'll say that instance zero is the message publisher.
var zmq = require('zeromq')
if (process.env['INSTANCE_NUM'] === '0') { // I'm the publisher.
var emitter = getEventEmitter(); // e.g. an HttpServer.
var pub = zmq.createSocket('pub');
pub.bindSync('tcp://*:5555');
emitter.on('someEvent', function(data) {
pub.send(data);
});
} else { // I'm a subscriber.
var sub = zmq.createSocket('sub');
sub.subscribe('');
sub.on('message', function(data) {
// Handle the event data...
});
sub.connect('tcp://localhost:5555');
}
Note that I'm new to zeromq and the above code is totally untested, just for demonstration.

Related

Firebase Functions: How to maintain 'app-global' API client?

How can I achieve an 'app-wide' global variable that is shared across Cloud Function instances and function invocations? I want to create a truly 'global' object that is initialized only once per the lifetime of all my functions.
Context:
My app's entire backend is Firestore + Firebase Cloud Functions. That is, I use a mix of background (Firestore) triggers and HTTP functions to implement backend logic. Additionally, I rely on a 3rd-party location service to continually listen to location updates from sensors. I want just a single instance of the client on which to subscribe to these updates.
The problem is that Firebase/Google Cloud Functions are stateless, meaning that function instances don't share memory/objects/state. If I call functionA, functionB, functionC, there's going to be at least 3 instances of locationService clients created, each listening separately to the 3rd party service so we end up with duplicate invocations of the location API callback.
Sample code:
// index.js
const functions = require("firebase-functions");
exports.locationService = require('./location_service');
this.locationService.initClient();
// define callable/HTTP functions & Firestore triggers
...
and
// location_service.js
var tracker = require("third-party-tracker-js");
const self = (module.exports = {
initClient: function () {
tracker.initialize('apiKey')
.then((client)=>{
client.setCallback(async function(payload) {
console.log("received location update: ", payload)
// process the payload ...
// with multiple function instances running at once, we receive as many callbacks for each location update
})
client.subscribeProject()
.then((subscription)=>{
subscription.subscribe()
.then((subscribeMsg)=>{
console.log("subscribed to project with message: ", subscribeMsg); // success
});
// subscription.unsubscribe(); // ??? at what point should we unsubscribe?
})
.catch((err)=>{
throw(err)
})
})
.catch((err)=>{
throw(err)
})
},
});
I realize what I'm trying to do is roughly equivalent to implementing a daemon in a single-process environment, and it appears that serverless environments like Firebase/Google Cloud Functions aren't designed to support this need because each instance runs as its own process. But I'd love to hear any contrary ideas and possible workarounds.
Another idea...
Inspired by this related SO post and the official GCF docs on stateless functions, I thought about using Firestore to persist a tracker value that allows us to conditionally initialize the API client. Roughly like this:
// read value from db; only initialize the client if there's no valid subscription
let locSubscriberActive = await getSubscribeStatusFromDb();
if (!locSubscriberActive) {
this.locationService.initClient();
}
// in `location_service.js`, do setSubscribeStatusToDb(); // set flag to true when we call subscribe(). reset when we get terminated
The problem faced: at what point do I unset/reset that value? Intuitively, I would do so the moment the function instance that initialized the client gets recycled/killed. However, it appears that it is not possible to know when a Firebase Cloud Function instance is terminated? I searched everywhere but couldn't find docs on how to detect such an event...
What you're trying to do is not at all supported in Cloud Functions. It's important to realize that there may be any number of server instances allocated for each deployed function. That's how Cloud Functions scales up and down to match the load on the function in a cost-effective way. These instances might be terminated at any time for any reason. You have no indication when an instance terminates.
Also, instances are not capable of performing any computation when they are idle. CPU resources are clamped down after a function terminates, and are spun up again when the next function is invoked on that instance. You can't have any "daemon" code running when a function is not actively being invoked. I don't know what your locationService does, but it is certainly doing nothing at all after a function terminates, regardless of how it terminated.
For any sort of long-running or daemon-like code, Cloud Functions is not a suitable product. You should instead consider also using another product that lets you run code 24/7 without disruptions. App Engine and Compute Engine are viable alternatives, and you will have to think carefully about if and how you want their server instances to scale with load.

Why am I receiving this error on Azure when using eventhubs?

I started using Azure recently and It has been an overwhelming experience. I started experimenting with eventhubs and I'm basically following the official tutorials on how to send and receive messages from eventhubs using nodejs.
Everything worked perfectly so I built a small web app (static frontend app) and I connected it with a node backend, where the communication with eventhubs occurs. So basically my app is built like this:
frontend <----> node server <-----> eventhubs
As you can see it is very simple. The node server is fetching data from eventhubs and sending it forward to the frontend, where the values are shown. It is a cool experience and I'm enjoying MS Azure until this error occured:
azure.eventhub.common.EventHubError: ErrorCodes.ResourceLimitExceeded: Exceeded the maximum number of allowed receivers per partition in a consumer group which is 5. List of connected receivers - nil, nil, nil, nil, nil.
This error is really confusing. Im using the default consumer group and only one app. I never tried to access this consumer group from another app. It said the limit is 5, I'm using only one app so it should be fine or am I missing something? I'm not checking what is happening here.
I wasted too much time googling and researching about this but I didn't get it. At the end, I thought that maybe every time I deploy the app (my frontend and my node server) on azure, this would be counted as one consumer and since I deployed the app more than 5 times then this error is showing up. Am I right or this is nonsense?
Edit
I'm using websockets as a communication protocol between my app (frontend) and my node server (backend). The node server is using the default consumer group ( I didn't change nothing), I just followed this official example from Microsoft. I'm basically using the code from MS docs that's why I didn't post any code snippet from my node server and since the error happens in backend and not frontend then it will not be helpful if I posted any frontend code.
So to wrap up, I'm using websocket to connect front & backend. It works perfectly for a day or two and then this error starts to happen. Sometimes I open more than one client (for example a client from the browser and client from my smartphone).
I think I don't understand the concept of this consumer group. Like is every client a consumer? so if I open my app (the same app) in 5 different tabs in my browser, do I have 5 consumers then?
I didn't quite understand the answer below and what is meant by "pooling client", therefore, I will try to post code examples here to show you what I'm trying to do.
Code snippets
Here is the function I'm using on the server side to communicate with eventhubs and receive/consume a message
async function receiveEventhubMessage(socket, eventHubName, connectionString) {
const consumerClient = new EventHubConsumerClient(consumerGroup, connectionString, eventHubName);
const subscription = consumerClient.subscribe({
processEvents: async (events, context) => {
for (const event of events) {
console.log("[ consumer ] Message received : " + event.body);
io.emit('msg-received', event.body);
}
},
processError: async (err, context) => {
console.log(`Error : ${err}`);
}
}
);
If you notice, I'm giving the eventhub and connection string as an argument in order to be able to change that. Now in the frontend, I have a list of multiple topics and each topic have its own eventhubname but they have the same eventhub namespace.
Here is an example of two eventhubnames that I have:
{
"EventHubName": "eh-test-command"
"EventHubName": "eh-test-telemetry"
}
If the user chooses to send a command (from the frontend, I just have a list of buttons that the user can click to fire an event over websockets) then the CommandEventHubName will be sent from the frontend to the node server. The server will receive that eventhubname and switch the consumerClient in the function I posted above.
Here is the code where I'm calling that:
// io is a socket.io object
io.on('connection', socket => {
socket.on('onUserChoice', choice => {
// choice is an object sent from the frontend based on what the user choosed. e.g if the user choosed command then choice = {"EventhubName": "eh-test-command", "payload": "whatever"}
receiveEventhubMessage(socket, choice.EventHubName, choice.EventHubNameSpace)
.catch(err => console.log(`[ consumerClient ] Error while receiving eventhub messages: ${err}`));
}
}
The app I'm building will be extending in the future to a real use case in the automotive field, that's why this is important for me. Therefore, I'm trying to figure out how can I switch between eventhubs without creating a new consumerClient each time the eventhubname changes?
I must say that I didn't understand the example with the "pooling client". I am seeking more elaboration or, ideally, a minimal example just to put me on the way.
Based on the conversation in the issue, it would seem that the root cause of this is that your backend is creating a new EventHubConsumerClient for each request coming from your frontend. Because each client will open a dedicated connection to the service, if you have more than 5 requests for the same Event Hub instance using the same consumer group, you'll exceed the quota.
To get around this, you'll want to consider pooling your EventHubConsumerClient instances so that you're starting with one per Event Hub instance. You can safely use the pooled client to handle a request for your frontend by calling subscribe. This will allow you to share the connection amongst multiple frontend requests.
The key idea being that your consumerClient is not created for every request, but shares an instance among requests. Using your snippet to illustrate the simplest approach, you'd end up hoisting your client creation to outside the function to receive. It may look something like:
const consumerClient = new EventHubConsumerClient(consumerGroup, connectionString, eventHubName);
async function receiveEventhubMessage(socket, eventHubName, connectionString) {
const subscription = consumerClient.subscribe({
processEvents: async (events, context) => {
for (const event of events) {
console.log("[ consumer ] Message received : " + event.body);
io.emit('msg-received', event.body);
}
},
processError: async (err, context) => {
console.log(`Error : ${err}`);
}
}
);
That said, the above may not be adequate for your environment depending on the architecture of the application. If whatever is hosting receiveEventHubMessage is created dynamically for each request, nothing changes. In that case, you'd want to consider something like a singleton or dependency injection to help extend the lifespan.
If you end up having issues scaling to meet your requests, you can consider increasing the number of clients for each Event Hub and/or spreading requests out to different consumer groups.

How to implement rabbitMQ into node.js microservice app right way?

Greetings Stackoverflow.
I've been using stackoverflow for years to find answers, and this is my first attempts to make a question myself. So feel free to tell me if I'm doing it wrong way.
Currently I'm developing a data analytical system based on microservice architecture.
It is assumed that this system will consist of a dozen self-sufficient microservices communicating with each other by RabbitMQ. Each of them is encapsulated in a docker-container and the whole system is powered by docker-swarm in the production.
In particular each microservice is a node.js application and related database, connected with some ORM interface. Its task is to manage and serve data in a CRUD manner, and to provide results of some prepared queries based on the contained data. Nothing extraordinary.
To provide microservice-microservice communication I assume to use amqplib. But the way to use it is uncertain yet.
My current question is how to make use of amqplib in a OOP manner to link inter microservice communication network with application's object-related functionality? By OOP manner, I mean the possibility to replace amqplib (and RabbitMQ itself) in the long run without the need to make changes to the data-related logic.
What I really searching for is the example of currently working microservice application utilizing AMQP. I'd pretty much appreciate that if somebody could give a link to it.
And the second part of my question.
Does it make sense to build microservice application based on event-driven principals, and just pass messages from RabbitMQ to the application's main event queue? So that each procedure would be called the same way, despite the fact that it is an internal or external event.
As for the abstract example of single microservice:
Let's say I have an event service and a listener connected to this service:
class UserManager {
constructor(eventService) {
this.eventService = eventService;
this.eventServce.on("users.user.create-request", (payload) => {
User.create(payload); // User interface is omitted in this example
}
}
}
const eventService = new EventEmmiter();
const userManager = new UserManager(eventService);
On the other hand I've got RabbitMQ connection, that is waiting for messages:
const amqp = require('amqplib');
amqp.connect('amqp-service-in-docker').then(connection => {
connection.createChannel().then(channel => {
// Here we use topic type of exchange to be able to filter only related messages
channel.assertExchange('some-exchange', 'topic');
channel.assertQueue('').then(queue => {
// And here we are waiting only the related messages
channel.bind(queue.queue, 'some-exchange', 'users.*');
channel.consume(queue.queue, message => {
// And here is the crucial part
}
}
}
}
What I'm currently think off is to just parse and forward this message to eventService and use it's routing key as the name of the event, like this:
channel.consume(query.query, message => {
const eventName = message.fields.routingKey;
const eventPayload = JSON.parse(message.content.toString());
eventService.emit(eventName, eventPayload);
}
But how about RPC's? Should I make another exchange or even a channel for them with another approach, something like:
// In RPC channel
channel.consume(query.query, message => {
eventService.once('users.user.create-response', response => {
const recipient = message.properites.replyTo;
const correlationId = msg.properties.correlationId;
// Send response to specified recipient
channel.sendToQueue(
recipient,
Buffer.from(JSON.stringify(resonse)),
{
correlationId: correlationId
}
);
channel.ack(message);
});
// Same thing
const eventName = message.fields.routingKey;
const eventPayload = JSON.parse(message.content.toString());
eventService.emit(eventName, eventPayload);
}
And then my User class should fire 'users.user.create-response' event every time it creates a new user. Isn't this a crutch?

NodeJS and AWS SQS

Folks,
I would like to set up a message queue between our Java API and NodeJS API.
After reading several examples of using aws-sdk, I am not sure how to make the service watch the queue.
For instance, this article Using SQS with Node: Receiving Messages Example Code tells me to use the sqs.receiveMessage() to receive and sqs.deleteMessage() to delete a message.
What I am not clear about, is how to wrap this into a service that runs continuously, which constantly takes the messages off the sqs queue, passes them to the model, stores them in mongo, etc.
Hope my question is not entirely vague. My experience with Node lies primarily with Express.js.
Is the answer as simple as using something like sqs-poller? How would I implement the same into an already running NodeJS Express app? Quite possibly I should look into SNS to not have any delay in message transfers.
Thanks!
For a start, Amazon SQS is a pseudo queue that guarantees availability of messages but not their sequence in FIFO fashion. You have to implement sequencing logic into your app if you want it to work that way.
Coming back to your question, SQS has to be polled within your app to check if there are new messages available. I implemented this in an app using setInterval(). I would poll the queue for items and if no items were found, I would delay the next call and in case some items were found, the next call would be immediate bypassing the setInterval(). This is obviously a very raw implementation and you can look into alternatives. How about a child process on your server that pings your NodeJS app when a new item is found in SQS ? I think you can implement the child process as a watcher in BASH without using NodeJS. You can also look into npm modules if there is already one for this.
In short, there are many ways you can poll but polling has to be done one way or the other if you are working with Amazon SQS.
I am not sure about this but if you want to be notified of items, you might want to look into Amazon SNS.
When writing applications to consume messages from SQS I use sqs-consumer:
const Consumer = require('sqs-consumer');
const app = Consumer.create({
queueUrl: 'https://sqs.eu-west-1.amazonaws.com/account-id/queue-name',
handleMessage: (message, done) => {
console.log('Processing message: ', message);
done();
}
});
app.on('error', (err) => {
console.log(err.message);
});
app.start();
See the docs for more information (well documented):
https://github.com/bbc/sqs-consumer

why is performance of redis+socket.io better than just socket.io?

I earlier had all my code in socket.io+node.js server. I recently converted all the code to redis+socket.io+socket.io+node.js after noticing slow performance when too many users send messages across the server.
So, why socket.io alone was slow because it is not multi threaded, so it handles one request or emit at a time.
What redis does is distribute these requests or emits across channels. Clients subscribe to different channels, and when a message is published on a channel, all the client subscribed to it receive the message. It does it via this piece of code:
sub.on("message", function (channel, message) {
client.emit("message",message);
});
The client.on('emit',function(){}) takes it from here to publish messages to different channels.
Here is a brief code explaining what i am doing with redis:
io.sockets.on('connection', function (client) {
var pub = redis.createClient();
var sub = redis.createClient();
sub.on("message", function (channel, message) {
client.emit('message',message);
});
client.on("message", function (msg) {
if(msg.type == "chat"){
pub.publish("channel." + msg.tousername,msg.message);
pub.publish("channel." + msg.user,msg.message);
}
else if(msg.type == "setUsername"){
sub.subscribe("channel." +msg.user);
}
});
});
As redis stores the channel information, we can have different servers publish to the same channel.
So, what i dont understand is, if sub.on("message") is getting called every time a request or emit is sent, why is redis supposed to be giving better performance? I suppose even the sub.on("message") method is not multi threaded.
As you might know, Redis allows you to scale with multiple node instances. So the performance actually comes after the fact. Utilizing the Pub/Sub method is not faster. It's technically slower because you have to communicate between Redis for every Pub/Sign signal. The "giving better performance" is only really true when you start to horizontally scale out.
For example, you have one node instance (simple chat room) -- that can handle a maximum of 200 active users. You are not using Redis yet because there is no need. Now, what if you want to have 400 active users? Whilst using your example above, you can now achieve this 400 user mark, which is a "performance increase". In the sense you can now handle more users, but not really a speed increase. If that makes sense. Hope this helps!

Resources