Working with WebSockets and NodeJs clusters - node.js

I currently have a Node server running that works with MongoDB. It handles some HTTP requests, but it largely used WebSockets. Basically, the server connects multiple users to rooms with WebSockets.
My server currently has around 12k WebSockets open and it's almost crippling my single threaded server, and now I'm not sure how to convert it over.
The server holds HashMap variables for the connected users and rooms. When a user does an action, the server often references those HashMap variables. So, I'm not sure how to use clusters in this. I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Does anyone have any ideas on what to do?
Thank you.

You can look at the socket.io-redis adapter for architectural ideas or you can just decide to use socket.io and the Redis adapter.
They move the equivalent of your hashmap to a separate process redis in-memory database so all clustered processes can get access to it.
The socket.io-redis adapter also supports higher-level functions so that you can emit to every socket in a room with one call and the adapter finds where everyone in the room is connected, contacts that specific cluster server, and has it send the message to them.
I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Threads in node.js are not lightweight things (each has its own V8 instance) so you will not want a nodejs thread for every WebSocket connection. You could group a certain number of WebSocket connections on a web worker, but at that point, it is likely easier to use clustering because nodejs will handle the distribution across the clusters for you automatically whereas you'll have to do that yourself for your own web worker pool.

Related

redis in Node.js app environment

I am building an app with several Node.js instances as a Backend (http server, socket server and several a pool of domain servers). Now I am trying to cover several communication and configuration aspects and am wondering if redis makes an appropriate solution.
So, I would use it for the following purposes:
Implementation of a shared run-time lookup table. It's a table of several hundreds of relativelly simple records, accessed and manipulated by 2 node-instances.
Implementation of message queues. Each domain server receives commands from the http server and should execute them sequentially. Domain server should be able to listen on a redis-event, and execute each new command upon its arival
socket sever also has a regis message queue and listen to its event, in order to push notification to connected clients
Is redis "too heavy" for such a purpose?
Does it offer all needed functionality?
I can definitelly implement a look-up in a file and/or memory and a queue using sockets. However, it might make a code cleaner and a solution more robust with redis.
Redis is definitely not a heavy solution, on the contrary.
It's small, insanely fast (when using pipelining), easy to deploy. I consider it as a light solution, a kind of swiss knife that may solves many problems.
Redis based message queues are OK if you don't expect any guarantee on the message delivery. That is to say Redis based queues can't assure you the client has received the message. If it's a problem for your application you should consider using an heavier solution, like 0mq or Rabbitmq.

Using Backbone.iobind (socket.io) with a cluster of node.js servers

I'm using Backbone.iobind to bind my client Backbone models over socket.io to the back-end server which in turn store it all to MongoDB.
I'm using socket.io so I can synchronize changes back to other clients Backbone models.
The problems starts when I try to run the same thing over a cluster of node.js servers.
Setting a session store was easy using connect-mongo which stores the session to MongoDB.
But now I can't inform all the clients on every change, since the clients are distributed between the different node.js servers.
The only solution I found is to set a pub/sub queue between the different node.js servers (e.g. mubsub), which seems like a very heavy weight solution that will trigger an event on all the servers for every change.
How did you reach the conclusion that pub/sub is a "very heavy weight solution"?
Sounds like you got it right up until that part :-)
Oh, and pub/sub is not a queue.
Let's examine that claim:
The nice thing about pub/sub is that you publish and subscribe to channels/topics.
So, using the classic chat server example, let's say you have a million users connected in total, but #myroom only has 50 users in it.
When a message is sent to #myroom, it's being published once. No duplication whatsoever.
In most use-cases you won't even need to store it on disk/RAM, so we're mostly looking at network/bandwidth here. And, I mean, you're probably throwing more data (probably over the wire?) to MongoDB already, so I assume that's not your bottleneck.
If you also use socket.io's rooms features (which is basically its own pub/sub mechanism), that means only 5 users will have that message emitted to them over the websocket.
And no, socket.io won't iterate over 1M clients to find out which of them are in room #myroom ;-)
So the message is published once, each subscriber (node.js instance) will get notified once, and only the relevant clients -- socket.io won't waste CPU cycles in order to find them as it keeps track of them when they join() or leave() a room -- will receive the message.
Doesn't that sound pretty efficient and light-weight?
Give Redis a shot.
It's really simple to set-up, runs entirely in memory, blazing-fast, replication is extremely simple, etc.
That's the way socket.io recommends passing events between nodes.
You can find more information/code here.
Additionally, if MongoDB can't handle the load at any point, you can use Redis as your session-store as well.
Hope this helps!

multiple child_process with node.js / socket.io

This is more of a design question rather than implementation but I am kind of wondering if I can design something like this. I have an interactive app (similar to python shell). I want to host a server (lets say using either node.js http server or socket.io since I am not sure which one would be better) which would spawn a new child_process for every client that connects to it and maintains a different context for that particular client. I am a complete noob in terms of node.js or socket.io. The max I have managed is to have one child process on a socket.io server and connect the client to it.
So the question is, would this work ? If not is there any other way in node to get it to work or am I better off with a local server.
Thanks
Node.js - is single process web platform. Using clustering (child_process), you will create independent execution of same application with separate thread.
Each thread cost memory, and this is generally why most of traditional systems is not much scalable as will require thread per client. For node it will be extremely inefficient from hardware resources point of view.
Node is event based, and you dont need to worry much about scope as far as your application logic does not exploit it.
Count of workers is recommended to be equal of CPU Cores on hardware.
There is always a master application, that will create workers. Each worker will create http + socket.io listeners which technically will be bound to master socket and routed from there.
http requests will be routed for to different workers while sockets will be routed on connection moment, but then that worker will handle this socket until it gets disconnected.

node.js server with socket.io handling 50000 simultaneous clients

We are developing a Javascript control which should be constantly connected to a server for receiving animation updates.
We are planning to host this stuff on an Amazon cloud.
The scenario is like this: server connects to activemq queue waiting for updates, for each update it broadcasts it to all connected clients.
Is it even possible to handle such load with node.js + socket.io?
Will a single node.js server be able to handle such load?
How to organize fast transport between different nodes if we will have to use more than one node?
Will single node.js server be able to handle such load?.. How to organize fast transport between different nodes if we will have to use more than one node
You say that you are planning to host on Amazon. So first off, nothing should be scoped for a single server. Amazon machines will simply "disappear", you have to assume that you are going to use multiple computers.
...handling 50k simultaneous clients
So to start with, 50k connections for a single box is a very big number. Here's a very detailed blog post discussing "getting to 10k" with node.js+socket.io.
Here's a very telling quote:
it seemed as though 10,000 clients simply required more serialization
than my server was able to handle.
So a key component to "getting to 50k" is going to be the amount of work required just pushing data over the wire.
How to organize fast transport between different nodes if we will have to use more than one node.
That blog post is the first of 3. When you're done the first, read the other two. That should point you in the right direction.

Building a web app to support team collaboration using Socket.io

I'm building a web application that will allow team collaboration. That is, a user within a team will be able to edit shared data, and their edits should be pushed to other connected team members.
Are Socket.io rooms a reasonable way of achieving this?
i.e. (roughly speaking):
All connected team members will join the same room (dynamically created upon first team member connecting).
Any edits received by the
server will be broadcast to the room (in addition to being persisted,
etc).
On the client-side, any edits received will be used to update
the shared data displayed in the browser accordingly.
Obviously it will need to somehow handle simultaneous updates to the same data.
Does this seem like a reasonable approach?
Might I need to consider something more robust, such as having a Redis database to hold the shared data during an editing session (with it being 'flushed' to the persistant DB at regular intervals)?
All you need is Socket.IO (with RedisStore) and Express.js. With Socket.IO you can setup rooms and also limit the access per room to only users who are auth.
Using Redis you can make your app scale outside a process.
Useful links for you to read:
Handling Socket.IO, Express and sessions
Scaling Socket.IO
How to reuse redis connection in socket.io?
socket.io chat with private rooms
How to handle user and socket pairs with node.js + redis
Node.js, multi-threading and Socket.io

Resources