How to Scale Node.js WebSocket Redis Server? - node.js

I'm writing a chat server for Acani, and I have some questions about Scaling node.js and websockets with load balancer scalability.
What exactly does it mean to load balance Node.js? Does that mean there will be n independent versions of my server application running, each on a separate server?
To allow one client to broadcast a message to all the others, I store a set of all the webSocketConnections opened on the server. But, if I have n independent versions of my server application running, each on a separate server, then will I have n different sets of webSocketConnections?
If the answers to 1 & 2 are affirmative, then how do I store a universal set of webSocketConnections (across all servers)? One way I think I could do this is use Redis Pub/Sub and just have every webSocketConnection subscribe to a channel on Redis.
But, then, won't the single Redis server become the bottleneck? How would I then scale Redis? What does it even mean to scale Redis? Does that mean I have m independent versions of Redis running on different servers? Is that even possible?
I heard Redis doesn't scale. Why would someone say that. What does that mean? If that's true, is there a better solution to for pub/sub and/or storing a list of all broadcasted messages?
Note: If your answer is that Acani would never have to scale, even if each of all seven billion people (and growing) on Earth were to broadcast a message every second to everyone else on earth, then please give a valid explanation.

Well, few answers for your question:
To load balance Node.js, it means exactly what you thought about what it is, except that you don't really need separate server, you can run more then one process of your node server on the same machine.
Each server/process of your node server will have it's own connections, the default store for websockets (for example Socket.IO) is MemoryStore, it means that all the connections will be stored on the machine memory, it is required to work with RedisStore in order to work with redis as a connection store.
Redis PUB/SUB is a good way to achieve this task
You are right about what you said here, redis doesn't scale at this moment and running a lot of processes/connections connected to redis can make redis to be a bottleneck.
Redis doesn't scale, that is correct, but according to this presentation you can see that a cluster development is in top priority at redis and redis do have a cluster, it's just not stable yet: (taken from http://redis.io/download)
Where's Redis Cluster?
Redis development is currently focused on Redis 2.6 that will bring you support for Lua scripting and many other improvements. This is our current priority, however the unstable branch already contains most of the fundamental parts of Redis Cluster. After the 2.6 release we'll focus our energies on turning the current Redis Cluster alpha in a beta product that users can start to seriously test.
It is hard to make forecasts since we'll release Redis Cluster as stable only when we feel it is rock solid and useful for our customers, but we hope to have a reasonable beta for summer 2012, and to ship the first stable release before the end of 2012.
See the presentation here: http://redis.io/presentation/Redis_Cluster.pdf

2) Using Redis might not work to store connections: Redis can store data in string format, and if the connecion object has circular references (ie, Engine.IO) you won't be able serialise them
3) Creating a new Redis client for each client might not be a good approach so avoid that trap if you can
Consider using ZMQ node library to have processes communicate with each other through TCP (or IPC if they are clustered as in master-worker)

Related

multicore nodejs game server

I'm developing a card game backend with nodejs, I'm using socket.io and Redis as an adaptor so different instances can access to all sockets.
I'm planning to save match states which might be between 5 to 10 KB in Redis and while players in a match might be connected to different instances when each one of them performs an action, their instance will update match state in Redis.
it seems to be ok but I feel it has some problems, for example, in case of any crash or restarts I need to fetch all matches and do some actions on them but how to make each instance handle a part of matches?
also, I will start a timer for the next player while it might be connected to a different instance so when they play their turn it will handle in another instance and can't stop the timer because the timer was started in another instance!
I think if I could have an instance in charge of a specific match it would solve most of my problems but I don't know how to achieve it.
any advice or suggestion would be appreciated.
I am not sure to follow the exact problem here, but you should be able to create an highly available system, that will allow you to not restart from scratch.
Use an high available Redis instance, for example using Redis Sentinel, or Redis Cloud instances that provide HA
Using Node JS Socket.io you can also deploy it on many server (backed by Redis),take a look to https://socket.io/docs/using-multiple-nodes/#Using-Node-JS-Cluster
This should allow you to have an "alway-on" architecture that can scale.

Using Redis as a distributed internal nodeJS cache

Currently I have two load-balanced single-process nodeJS instances on Amazon beanstalk. They each just use a simple in-memory object as a cache. Because they are on different servers, their cache is not shared.
This is where Redis comes in. I want to utilize Redis to create a shared cache, but only to prime the internal memory of NodeJs.
You see, currently I am caching 4KB-10KB objects, if I solely relied on Redis then I have not only the Redis latency to retrieve the obejct but the network latency as well. I rather use Redis as a persistent cache that will prime my nodeJS instances when they are booted up, and also keep both internal caches in sync periodically (every x minutes)
A pretty basic nodeJs memory cache is https://github.com/tcs-de/nodecache
To complicate things even more, I am looking to start using nodeJS cluster ability to fork multiple processes of the application under the same server. Therefore, it is important that all clusters share 1 in-memory local server cache.
The aforementioned nodejs lib has a wrapper that aids the use in a cluster environment : https://github.com/lvx3/cluster-cache
To recap,
I will have Server A and server B who will be load balanced evenly. Each server (A&B) will have say 4 nodeJs processes who need to share just 1 cache (That is, the 4 server A nodeJs processes should all use a Server A cache, same for B)
Then I want the Server A and Server B cache to periodically sync and "persist" onto Redis. In the event of a crash or redeployment the Server cache would be primed with what is in Redis.
What are my options? Is there any well established solutions or mix of solutions that would be suitable? Such as a plugin like nodecache (simple) that has a Redis plugin? I also use express so perhaps there are express middleware that would be well suited for this.
Is it even worth the complexity to use Redis to prime a local server memory cache or should I just rely solely on Redis and take the network latency hit?
An acceptable but somewhat disappointing time for me to get back a 10KB object is 20 ms. I'd much prefer around 1ms. Redis and the nodeJs servers will be on Amazon so will be pretty close together.
I understand that if I have a redis cache of say 50MB that the same 50MB would exist on Server A and Server B. I am more than willing to spend money on hardware/ram for the benefit of speed.

Node.js and redis on same server?

We're using cloud hosting (Linode) to host a node.js based (and socket.io) chat app, with redis as the main DB. We haven't launched yet, but we're looking at hosting redis and node.js on the same machine (8 gb instance, redis limited to 5 GB for instance). All communication will be held in redis (ie straight from client to redis, no variables for dialog in node.js). To avoid network travel times amonsgt other bottle necks, we are looking at hosting redis and node.js on the same server. I can't find anything in documentation that would state this is a bad idea, but our sysops guy isn't convinced. Are there any drawbacks to going down this route?
Refer to a very similar answer I posted on a similar matter on SO: Redis deployment configuration - master slave replication where the OP was having a similar issue but his concern was more related to performance.
My main issue with your solution (a simply side note on the other answer) is the simple fact that your node.js application by design has to be cloud facing, e.g., the internet whereas your Redis or other Databases shouldn't.
It doesn't mean you'll have a security issue by all means, but in my opinion it's a best practice to only expose the hosts you really need to, e.g., the ones usually serving content directly to the user.
By not deploying Redis to an internet facing host you enforce a lot of security constrains simply by the topology design of your network.
Is it ok to host those services in the same box:
Yes, do run the benchmarks from time to time to check if you need to scale horizontally or just bump up the flooded hosts.
Check: Redis deployment configuration - master slave replication
Will I have security problems by having Redis or another service facing the internet?
If you know what you are doing - no, you won't have security issues. I still wouldn't do it though.
There is very little speed increase in cutting out the "network bottleneck"; redis is so fast that the latency is negligible. The issue with that approach is that if your box crashes then all of your data held in redis is gone unless you are replicating to a slave. If it goes down any other process not on that machince won't beable to access until it restarts. This may be fine for a staging env but I'd guard against it in production.

NodeJS + SocketIO: Scaling and preventing single point of failure

So the first app that people usually build with SocketIO and Node is usually a chatting app. This chatting app basically has 1 Node server that will broadcast to multiple clients. In the Node code, you would have something like.
//Psuedocode
for(client in clients){
if(client != messageSender){
user.send(message);
}
}
This is great for a low number of users, but I see a problem with this. First of all, there is a single point of failure which is the Node server. Second of all, the app will slow down as the number of clients grow. What is there to do then when we reach this bottleneck? Is there an architecture (horizontal/vertical scaling) that can be used to alleviate this problem?
For that "one day" when your chat app needs multiple, fault-tolerant node servers, and you want to use socket.io to cross communicate between the server and the client, there is a node.js module that fits the bill.
https://github.com/hookio/hook.io
It's basically an event emitting framework to cross communicate between multiple "things" -- such as multiple node servers.
It's relatively complicated to use, compared to most modules, which is understandable since this is a complex problem to solve.
That being said, you'd probably have to have a few thousand simultaneous users and lots of other problems before you begin to have problems with this.
Another thing you can do, is try to develop your application in a way so that if a connection is lost (which happens all the time anyway), eg. server goes down, client has network issues (eg. mobile user), etc, your application should be able to handle that and recover from such issues gracefully.
Since Node.js has a single event-loop thread, this single point of failure is written into its DNA. Even reloading a server after code changes require this thread to be stopped.
There are however a lot of tools available to handle such failures gracefully. You could use forever; a simple CLI tool for ensuring that a given script runs continuously. Other options include distribute and up. Distribute is a load balancing middleware for Node. Up builds on top of Distribute to offer zero downtime reloads using either a JavaScript API or command line interface:
Further reading I find you just need to use Redis Store with Socket.io to maintain connection references between two or more processes/ servers. These options have already been discussed extensively here and here.
There's also the option of using socket.io-clusterhub if you don't intend to use the Redis store.

Scaling Node.JS across multiple cores / servers

Ok so I have an idea I want to peruse but before I do I need to understand a few things fully.
Firstly the way I think im going to go ahead with this system is to have 3 Server which are described below:
The First Server will be my web Front End, this is the server that will be listening for connection and responding to clients, this server will have 8 cores and 16GB Ram.
The Second Server will be the Database Server, pretty self explanatory really, connect to the host and set / get data.
The Third Server will be my storage server, this will be where downloadable files are stored.
My first questions is:
On my front end server, I have 8 cores, what's the best way to scale node so that the load is distributed across the cores?
My second question is:
Is there a system out there I can drop into my application framework that will allow me to talk to the other cores and pass messages around to save I/O.
and final question:
Is there any system I can use to help move the content from my storage server to the request on the front-end server with as little overhead as possible, speed is a concern here as we would have 500+ clients downloading and uploading concurrently at peak times.
I have finally convinced my employer that node.js is extremely fast and its the latest in programming technology, and we should invest in a platform for our Intranet system, but he has requested detailed documentation on how this could be scaled across the current hardware we have available.
On my front end server, I have 8
cores, what's the best way to scale
node so that the load is distributed
across the cores?
Try to look at node.js cluster module which is a multi-core server manager.
Firstly, I wouldn't describe the setup you propose as 'scaling', it's more like 'spreading'. You only have one app server serving the requests. If you add more app servers in the future, then you will have a scaling problem then.
I understand that node.js is single-threaded, which implies that it can only use a single core. Not my area of expertise on how to/if you can scale it, will leave that part to someone else.
I would suggest NFS mounting a directory on the storage server to the app server. NFS has relatively low overhead. Then you can access the files as if they were local.
Concerning your first question: use cluster (we already use it in a production system, works like a charm).
When it comes to worker messaging, i cannot really help you out. But your best bet is cluster too. Maybe there will be some functionality that provides "inter-core" messaging accross all cluster workers in the future (don't know the roadmap of cluster, but it seems like an idea).
For your third requirement, i'd use a low-overhead protocol like NFS or (if you can go really crazy when it comes to infrastructure) a high-speed SAN backend.
Another advice: use MongoDB as your database backend. You can start with low-end hardware and scale up your database instance with ease using MongoDB's sharding/replication set features (if that is some kind of requirement).

Resources