Real time comunication between servers and clients - multithreading

I have a socket game server that runs everything on one single process; the problem is when i want to scale out my app.
Since it is a card game and when there is an event on a table, i can easily reach all the players that are in the same room because i have direct access to their socket connection.
if i want another server (or many depending on the load) it is another complete different process and i need to be able to have for instance 1 room, where players from server 1 can play against players from server 2, and in case server 1 fails, the connections can be taken from server 2 and keep them playing without interruptions.
What would be the architecture for this?

Some hosting providers support both websockets and horizontal scaling. This will allow your users to establish a websocket connection with a node. However, you may need an event from that user to broadcast to other users connected to other notes.
You may want to consider something like RabbitMQ. By using a fanout or topic exchange you can broadcast the event to a set of listeners. The listeners will be the various nodes in your cluster that are maintaining the websocket connections.

Related

Working with WebSockets and NodeJs clusters

I currently have a Node server running that works with MongoDB. It handles some HTTP requests, but it largely used WebSockets. Basically, the server connects multiple users to rooms with WebSockets.
My server currently has around 12k WebSockets open and it's almost crippling my single threaded server, and now I'm not sure how to convert it over.
The server holds HashMap variables for the connected users and rooms. When a user does an action, the server often references those HashMap variables. So, I'm not sure how to use clusters in this. I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Does anyone have any ideas on what to do?
Thank you.
You can look at the socket.io-redis adapter for architectural ideas or you can just decide to use socket.io and the Redis adapter.
They move the equivalent of your hashmap to a separate process redis in-memory database so all clustered processes can get access to it.
The socket.io-redis adapter also supports higher-level functions so that you can emit to every socket in a room with one call and the adapter finds where everyone in the room is connected, contacts that specific cluster server, and has it send the message to them.
I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Threads in node.js are not lightweight things (each has its own V8 instance) so you will not want a nodejs thread for every WebSocket connection. You could group a certain number of WebSocket connections on a web worker, but at that point, it is likely easier to use clustering because nodejs will handle the distribution across the clusters for you automatically whereas you'll have to do that yourself for your own web worker pool.

How does Server keep track of all Client(s) connected in Real time data pushing scenario?

I kinda understand that Websocket is the protocol that is used for real-time data flowing back & forth.
My question can be very pre-mature but couldn't find much help on the web.
Say 1000 clients are connected to a server which sends out real-time stock prices. When there is an update on the server front, how will server know all the 1000 clients to which it needs to send an update?
If this is some sort of looping that happens on the server side where all connected clients details are cached & then update will be sent out to all of them, isn't is an overhead ?
This SOF answer made some sense but didn't clear my doubt.
How does Server keep track of all Client(s) connected in Real time data pushing scenario?
It doesn't... it only keeps track of the clients it's serving specifically.
This answer is not node.js specific.
Say 1000 clients are connected to a server which sends out real-time stock prices. When there is an update on the server front, how will server know all the 1000 clients to which it needs to send an update?
To actually understand this a little better, we should consider larger numbers. i.e., let's assume 1 million clients connected to a service.
Obviously, a sane design will require redundancy, so no single service will hold all 1 million connections (and if a single server instance fails, clients can re-connect to a different server instance).
In this case, there's no single server that is aware of all clients.
It makes more sense for each server to manage it's own internal subscription / client list. Each server will also act as a pub/sub client for a centralized pub/sub service (such as a Redis cluster or whatever).
Assuming 1000 server instances, each serving 1000 clients, we would have find that the pub/sub service is aware only of 1,000 "clients" (server instances). Each server is unaware of the other clients, it's only aware of the 1,000 clients it's managing.
If this is some sort of looping that happens on the server side where all connected clients details are cached & then update will be sent out to all of them, isn't is an overhead?
The algorithm itself is implementation specific, but in general, each server will incur some overhead in order to manage the pub/sub layer.
However, since each server only manages a small subset of the total client count, the overhead is distributed across a number of systems.
Channel Oriented vs. Connection Oriented Design
I should probably note that the pub/sub design isn't connection oriented.
The server isn't (or shouldn't be) looping over all the connections asking "are you subscribed to this channel"?.
Rather, pub/sub design assumes a "channel" oriented design, where it locates the channel object(s) and loops over a client list.
On one hand, this approach might (or might not) consume more memory. Since each "channel" should contain a list of clients listening to that channel, a single client object might belong to more than a single list.
On the other hand, the loop has less code branches and experiences less overhead than a connection oriented design. Also, this approach allows for pub/sub clients that aren't connection bound (such as internal hooks / callbacks).
Say 1000 clients are connected to a server which sends out real-time stock prices. When there is an update on the server front, how will server know all the 1000 clients to which it needs to send an update?
Socket.io already keeps track by itself and its pretty easy to emit to all connected clients.
Socket.io - Emit Cheatsheet
If you are worried about what would happen when your user-base grows, you can scale your service to multiple nodes.
If you actually end up scaling and have more than one server node, then you can use
socketio-redis.
Adapter to enable broadcasting of events to multiple separate socket.io server nodes.

Multiple websockets onto multiple servers: how do they communicate?

I have a node server accepting websocket connections from the clients. Each client can broadcast a message to all of the other clients.
UPDATE: I am using https://github.com/websockets/ws as my library of choice.
At the moment, the server has an array with all of the connections. Each connection has a tabId. When one of the client emits a message, I go through all of the connections and check: if the connection's tabId doesn't match, I send the message to the client.
For loading issues, I am facing the problem of having to have more than one server. So, there will be say two servers, each one with a number of clients.
How do I make sure that a message gets broadcast to all of the websocket clients, and not only the ones connected to the same server?
One possible solution I thought is to have the connections stored on a database, where each record has the tabId and the serverId. However, even a simple broadcast gets tricky as messages to "local" sockets are easy to broadcast (the socket is local and available) whereas messages to "remote" sockets are tricky, and would imply intra-server communication.
Is there a good pattern to solve this? Surely, this is something that people face every day.
You could use a messagequeue like RabbitMQ.
When a client logs in to your server, create a consumer which listens to a queue which will receive messages directed to that particular client. And when the clients are sending messages, just use a publisher to publish them to the recipients queue.
This way it doesn't matter and you don't need to know on which nodes the clients are on, or if they jump from a node to another.

node.js server with socket.io handling 50000 simultaneous clients

We are developing a Javascript control which should be constantly connected to a server for receiving animation updates.
We are planning to host this stuff on an Amazon cloud.
The scenario is like this: server connects to activemq queue waiting for updates, for each update it broadcasts it to all connected clients.
Is it even possible to handle such load with node.js + socket.io?
Will a single node.js server be able to handle such load?
How to organize fast transport between different nodes if we will have to use more than one node?
Will single node.js server be able to handle such load?.. How to organize fast transport between different nodes if we will have to use more than one node
You say that you are planning to host on Amazon. So first off, nothing should be scoped for a single server. Amazon machines will simply "disappear", you have to assume that you are going to use multiple computers.
...handling 50k simultaneous clients
So to start with, 50k connections for a single box is a very big number. Here's a very detailed blog post discussing "getting to 10k" with node.js+socket.io.
Here's a very telling quote:
it seemed as though 10,000 clients simply required more serialization
than my server was able to handle.
So a key component to "getting to 50k" is going to be the amount of work required just pushing data over the wire.
How to organize fast transport between different nodes if we will have to use more than one node.
That blog post is the first of 3. When you're done the first, read the other two. That should point you in the right direction.

Scalable push application with node.js

I'm thinking about writing a few web applications having almost the same requirements as a chat. And I would like them to be able to scale easily.
I have worked a bit with node.js and I understand how it can help design push applications but I have some difficulties when thinking about having them run on multiple servers.
Here are some design I can think of for a large scale chat app :
1 - Servers have state, they keep the connections opened and clients can have new messages pushed to them. In this scenario, we are limited by the physical memory of one server so we cannot scale linearly if we have too many users per room.
2 - Servers have no state, they request a distributed database to respond to clients requests. In this scenario, clients poll the servers. We could scale linearly but the throughput is decreased, the messages are not delivered instantly and polling has been shown as a bad practice when scaling.
3 - Mix of 1 and 2. Servers keep the connections of its clients opened and poll the distributed database. The application is more complex to write and we still use polling. Similar client's requests (clients of the same room) are just grouped into a single one done by the server. The code becomes unnecessary complicated and it does not scale in the situation where we have many rooms and a few users per room.
4 - Servers have no state and the database cluster uses event to notify every registered servers about new messages. This is the solution I would like to have but I haven't heard of any database which has this feature. (Some people are talking about this feature for mongodb here: https://jira.mongodb.org/browse/SERVER-124)
So Why is the 4th solution not used so much today?
How do people usually design their applications in this case?
Since you want a push application, you would probably use Socket.IO with RedisStore.
By using this combination, the data for all the connections is kept in Redis (in-memory database), so you can scale outside a process. Another use of Redis here is for pub-sub.
The idea is to trigger an event when something needs to be pushed, then sent a message to the browser using Socket.io. If you want to listen to database changes, perhaps it's better to use CouchDB with it's _changes feature.
Resources:
https://github.com/dshaw/talks/tree/master/2011-10-jsclub/sample-app
http://www.ranu.com.ar/2011/11/redisstore-and-rooms-with-socketio.html
How to reuse redis connection in socket.io?
Instead of triggers for case 4, you might want to hook into MongoDB replication.
Let's assume you have a replica set (you wouldn't run single mongod, would you?).
Every change is written to the oplog on the primary and then is replicated to secondaries.
You can efficiently pull new updates from the oplog, using tailable cursors. Note, this is still pull, not push.
Then your node.js will push these events to the clients.

Resources