Node.js primus websocket clustering

Node.js primus websocket clustering - node.js

Trying most of the websocket engines I've concluded that best way is using Primus (a universal wrapper for real-time frameworks) so to be able to test any websocket framework that may come around without changing my functionality.
Even though that Primus does what it says, i've found myself in the situation were I wanted to scale .
Primus has many plugins and two of those are : primus-cluster and primus-redis-rooms.
These two are the ones that use Redis pub-sub in order to scale when you have many node processes. The problem that I faced with both plugins is that I cannot send a message to an individual socket - spark . Meaning that sparks are not saved - passed to Redis so that each process knows how many sparks are in total .
Does anyone have an idea on how to implement this?

The problem with primus-redis and primus-redis rooms is that only implements broadcasting and not from one server -> a spark on a different server messaging.
As for the rooms hack that you suggest is an "ok" alternative but it's definitely a hack and provides a lot of overhead. I don't think it's that hard to create a plugin which:
adds the spark.id to redis (spark.id -> server address) for each connection that it accepts.
removes the spark.id from redis when the connection disconnects.
adds pub/sub channel (server address) for the server so it can receive messages.
make this channel listen to messages with spark.ids and finds the sparks on the Primus server and writes the message.
write a method that finds the spark.id in redis, so it knows the server address and does a PUBLISH to the channel with the message that needs to be written together with the spark.id.
publish module to npm and receive a lot of free beer ;-)
It might take a bit longer to write then the hack you suggested but It would probably be worth the effort.

Related

Working with WebSockets and NodeJs clusters

I currently have a Node server running that works with MongoDB. It handles some HTTP requests, but it largely used WebSockets. Basically, the server connects multiple users to rooms with WebSockets.
My server currently has around 12k WebSockets open and it's almost crippling my single threaded server, and now I'm not sure how to convert it over.
The server holds HashMap variables for the connected users and rooms. When a user does an action, the server often references those HashMap variables. So, I'm not sure how to use clusters in this. I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Does anyone have any ideas on what to do?
Thank you.

You can look at the socket.io-redis adapter for architectural ideas or you can just decide to use socket.io and the Redis adapter.
They move the equivalent of your hashmap to a separate process redis in-memory database so all clustered processes can get access to it.
The socket.io-redis adapter also supports higher-level functions so that you can emit to every socket in a room with one call and the adapter finds where everyone in the room is connected, contacts that specific cluster server, and has it send the message to them.
I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Threads in node.js are not lightweight things (each has its own V8 instance) so you will not want a nodejs thread for every WebSocket connection. You could group a certain number of WebSocket connections on a web worker, but at that point, it is likely easier to use clustering because nodejs will handle the distribution across the clusters for you automatically whereas you'll have to do that yourself for your own web worker pool.

socket.io : Queries around broadcasting and max connections

I am new to Socket IO development. I wanted to know the following around it :
MAX limit for the number of concurrent OPEN Sockets supported ?
Guidelines / extra care to be taken to fine-tune the Node Server for Production.
Does socket.io ensure message deliver ? or it is send-and-forget ? Also are there any node-modules which when installed leverage this feature ?
In case socket.io does not support message delivery; how can I ensure that the message was sent and received successfully to the intended person ?

MAX limit for the number of concurrent OPEN Sockets supported ?
This depends entirely on your environment, your application, and its configuration. Socket.IO has many potential transports, some of which don't even require a persistent connection. There is no simple answer to this question, and nor should there be. This is the wrong question to ask. In a usual scenario, Socket.IO isn't going to be your bottleneck... your application itself will be. What you should be asking about is how to scale your application as you grow... and the answer to that is dependent on the specifics of how your application works.
Guidelines / extra care to be taken to fine-tune the Node Server for Production.
There are entire books on this. Start with the Node.js documentation.
Does socket.io ensure message deliver ?
Socket.IO when used in default configuration is a reliable transport. Of course things can always get lost... it's the internet after all... but yes, retries will happen. I've found this is one of the best parts of Socket.IO, is that if you need to ensure a message is going to get there, it does its best to do that.
Also are there any node-modules which when installed leverage this feature ?
What feature?
In case socket.io does not support message delivery; how can I ensure that the message was sent and received successfully to the intended person ?
Yes, you can deliver messages with Socket.IO... that's sort of the whole point. As far as whether data made it to the right person, you just need to send it to the right place. Remember though that someone else could always be sitting at the computer....

Use Apache Thrift for two-way communication?

Is it possible to implement a two-way communication between client and server with Apache Thrift? Thus not only to be able to make RPC from client to server, but also the other way round? In my project I have the requirement that the server must also push some data to the client without being asked by the client before to do this.

There are two ways how to achieve this with Thrift.
If both ends are more or less peers and you connect them through sockets or pipes, you simply set up a server and a client on both ends and you're pretty much done. This does not work in all cases, however, especially with HTTP.
If you connect server and client through HTTP or a similar channel, there is a technique called "long polling". It basically requires the client to call the server as usual, but the call will only return when the server wants to send some data back to the client. After receiving the data, the client starts another call if he's still interested in more data.
As Denis pointed out, depending on your exact use case, you might want to consider using a MQ system. Note that it is still possible to use Thrift to de/serialize the messages into and from the queues. The contrib folder has some examples that show how to use Thrift with ZMQ, Rebus and some others.

You are better to use queues then, e.g. ZeroMQ.

redis in Node.js app environment

I am building an app with several Node.js instances as a Backend (http server, socket server and several a pool of domain servers). Now I am trying to cover several communication and configuration aspects and am wondering if redis makes an appropriate solution.
So, I would use it for the following purposes:
Implementation of a shared run-time lookup table. It's a table of several hundreds of relativelly simple records, accessed and manipulated by 2 node-instances.
Implementation of message queues. Each domain server receives commands from the http server and should execute them sequentially. Domain server should be able to listen on a redis-event, and execute each new command upon its arival
socket sever also has a regis message queue and listen to its event, in order to push notification to connected clients
Is redis "too heavy" for such a purpose?
Does it offer all needed functionality?
I can definitelly implement a look-up in a file and/or memory and a queue using sockets. However, it might make a code cleaner and a solution more robust with redis.

Redis is definitely not a heavy solution, on the contrary.
It's small, insanely fast (when using pipelining), easy to deploy. I consider it as a light solution, a kind of swiss knife that may solves many problems.
Redis based message queues are OK if you don't expect any guarantee on the message delivery. That is to say Redis based queues can't assure you the client has received the message. If it's a problem for your application you should consider using an heavier solution, like 0mq or Rabbitmq.

Using Backbone.iobind (socket.io) with a cluster of node.js servers

I'm using Backbone.iobind to bind my client Backbone models over socket.io to the back-end server which in turn store it all to MongoDB.
I'm using socket.io so I can synchronize changes back to other clients Backbone models.
The problems starts when I try to run the same thing over a cluster of node.js servers.
Setting a session store was easy using connect-mongo which stores the session to MongoDB.
But now I can't inform all the clients on every change, since the clients are distributed between the different node.js servers.
The only solution I found is to set a pub/sub queue between the different node.js servers (e.g. mubsub), which seems like a very heavy weight solution that will trigger an event on all the servers for every change.

How did you reach the conclusion that pub/sub is a "very heavy weight solution"?
Sounds like you got it right up until that part :-)
Oh, and pub/sub is not a queue.
Let's examine that claim:
The nice thing about pub/sub is that you publish and subscribe to channels/topics.
So, using the classic chat server example, let's say you have a million users connected in total, but #myroom only has 50 users in it.
When a message is sent to #myroom, it's being published once. No duplication whatsoever.
In most use-cases you won't even need to store it on disk/RAM, so we're mostly looking at network/bandwidth here. And, I mean, you're probably throwing more data (probably over the wire?) to MongoDB already, so I assume that's not your bottleneck.
If you also use socket.io's rooms features (which is basically its own pub/sub mechanism), that means only 5 users will have that message emitted to them over the websocket.
And no, socket.io won't iterate over 1M clients to find out which of them are in room #myroom ;-)
So the message is published once, each subscriber (node.js instance) will get notified once, and only the relevant clients -- socket.io won't waste CPU cycles in order to find them as it keeps track of them when they join() or leave() a room -- will receive the message.
Doesn't that sound pretty efficient and light-weight?
Give Redis a shot.
It's really simple to set-up, runs entirely in memory, blazing-fast, replication is extremely simple, etc.
That's the way socket.io recommends passing events between nodes.
You can find more information/code here.
Additionally, if MongoDB can't handle the load at any point, you can use Redis as your session-store as well.
Hope this helps!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string