Node.js + Socket.IO scaling with redis + cluster

Node.js + Socket.IO scaling with redis + cluster - node.js

Currently, I'm faced with the task where I must scale a Node.js app using Amazon EC2. From what I understand, the way to do this is to have each child server use all available processes using cluster, and have sticky connections to ensure that every user connecting to the server is "remembered" as to what worker they're data is currently on from previous sessions.
After doing this, the next best move from what I know is to deploy as many servers as needed, and use nginx to load balance between all of them, again using sticky connections to know which "child" server that each users data is on.
So when a user connects to the server, is this what happens?
Client connection -> Find/Choose server -> Find/Choose process -> Socket.IO handshake/connection etc.
If not, please allow me to better understand this load balancing task. I also do not understand the importance of redis in this situation.
Below is the code I'm using to use all CPU's on one machine for a seperate Node.js process:
var express = require('express');
cluster = require('cluster'),
net = require('net'),
sio = require('socket.io'),
sio_redis = require('socket.io-redis');
var port = 3502,
num_processes = require('os').cpus().length;
if (cluster.isMaster) {
// This stores our workers. We need to keep them to be able to reference
// them based on source IP address. It's also useful for auto-restart,
// for example.
var workers = [];
// Helper function for spawning worker at index 'i'.
var spawn = function(i) {
workers[i] = cluster.fork();
// Optional: Restart worker on exit
workers[i].on('exit', function(worker, code, signal) {
console.log('respawning worker', i);
spawn(i);
});
};
// Spawn workers.
for (var i = 0; i < num_processes; i++) {
spawn(i);
}
// Helper function for getting a worker index based on IP address.
// This is a hot path so it should be really fast. The way it works
// is by converting the IP address to a number by removing the dots,
// then compressing it to the number of slots we have.
//
// Compared against "real" hashing (from the sticky-session code) and
// "real" IP number conversion, this function is on par in terms of
// worker index distribution only much faster.
var worker_index = function(ip, len) {
var s = '';
for (var i = 0, _len = ip.length; i < _len; i++) {
if (ip[i] !== '.') {
s += ip[i];
}
}
return Number(s) % len;
};
// Create the outside facing server listening on our port.
var server = net.createServer({ pauseOnConnect: true }, function(connection) {
// We received a connection and need to pass it to the appropriate
// worker. Get the worker for this connection's source IP and pass
// it the connection.
var worker = workers[worker_index(connection.remoteAddress, num_processes)];
worker.send('sticky-session:connection', connection);
}).listen(port);
} else {
// Note we don't use a port here because the master listens on it for us.
var app = new express();
// Here you might use middleware, attach routes, etc.
// Don't expose our internal server to the outside.
var server = app.listen(0, 'localhost'),
io = sio(server);
// Tell Socket.IO to use the redis adapter. By default, the redis
// server is assumed to be on localhost:6379. You don't have to
// specify them explicitly unless you want to change them.
io.adapter(sio_redis({ host: 'localhost', port: 6379 }));
// Here you might use Socket.IO middleware for authorization etc.
console.log("Listening");
// Listen to messages sent from the master. Ignore everything else.
process.on('message', function(message, connection) {
if (message !== 'sticky-session:connection') {
return;
}
// Emulate a connection event on the server by emitting the
// event with the connection the master sent us.
server.emit('connection', connection);
connection.resume();
});
}

I believe your general understanding is correct, although I'd like to make a few comments:
Load balancing
You're correct that one way to do load balancing is having nginx load balance between the different instances, and inside each instance have cluster balance between the worker processes it creates. However, that's just one way, and not necessarily always the best one.
Between instances
For one, if you're using AWS anyway, you might want to consider using ELB. It was designed specifically for load balancing EC2 instances, and it makes the problem of configuring load balancing between instances trivial. It also provides a lot of useful features, and (with Auto Scaling) can make scaling extremely dynamic without requiring any effort on your part.
One feature ELB has, which is particularly pertinent to your question, is that it supports sticky sessions out of the box - just a matter of marking a checkbox.
However, I have to add a major caveat, which is that ELB can break socket.io in bizarre ways. If you just use long polling you should be fine (assuming sticky sessions are enabled), but getting actual websockets working is somewhere between extremely frustrating and impossible.
Between processes
While there are a lot of alternatives to using cluster, both within Node and without, I tend to agree cluster itself is usually perfectly fine.
However, one case where it does not work is when you want sticky sessions behind a load balancer, as you apparently do here.
First off, it should be made explicit that the only reason you even need sticky sessions in the first place is because socket.io relies on session data stored in-memory between requests to work (during the handshake for websockets, or basically throughout for long polling). In general, relying on data stored this way should be avoided as much as possible, for a variety of reasons, but with socket.io you don't really have a choice.
Now, this doesn't seem too bad, since cluster can support sticky sessions, using the sticky-session module mentioned in socket.io's documentation, or the snippet you seem to be using.
The thing is, since these sticky sessions are based on the client's IP, they won't work behind a load balancer, be it nginx, ELB, or anything else, since all that's visible inside the instance at that point is the load balancer's IP. The remoteAddress your code tries to hash isn't actually the client's address at all.
That is, when your Node code tries to act as a load balancer between processes, the IP it tries to use will just always be the IP of the other load balancer, that balances between instances. Therefore, all requests will end up at the same process, defeating cluster's whole purpose.
You can see the details of this issue, and a couple of potential ways to solve it (none of which particularly pretty), in this question.
The importance of Redis
As I mentioned earlier, once you have multiple instances/processes receiving requests from your users, in-memory storage of session data is no longer sufficient. Sticky sessions are one way to go, although other, arguably better solutions exist, among them central session storage, which Redis can provide. See this post for a pretty comprehensive review of the subject.
Seeing as your question is about socket.io, though, I'll assume you probably meant Redis's specific importance for websockets, so:
When you have multiple socket.io servers (instances/processes), a given user will be connected to only one such server at any given time. However, any of the servers may, at any time, wish to emit a message to a given user, or even a broadcast to all users, regardless of which server they're currently under.
To that end, socket.io supports "Adapters", of which Redis is one, that allow the different socket.io servers to communicate among themselves. When one server emits a message, it goes into Redis, and then all servers see it (Pub/Sub) and can send it to their users, making sure the message will reach its target.
This, again, is explained in socket.io's documentation regarding multiple nodes, and perhaps even better in this Stack Overflow answer.

Related

Sticky Session on Heroku

We have have a NodeJS application running with SocketIO and clustering on heroku. To get SocketIO working we use the redis-adapter like discussed here: https://socket.io/docs/using-multiple-nodes/.
Then we've implemented sticky sessions like shown in the sticky session documentation here: https://github.com/elad/node-cluster-socket.io.
Turns out that when we deploy to Heroku, the connection.remoteAddress in:
// Create the outside facing server listening on our port.
var server = net.createServer({ pauseOnConnect: true }, function(connection) {
// We received a connection and need to pass it to the appropriate
// worker. Get the worker for this connection's source IP and pass
// it the connection.
var index = worker_index(connection.remoteAddress, num_processes);
var worker = workers[index];
worker.send('sticky-session:connection', connection);
}).listen(port);
is actually the IP address of some heroku routing server and NOT the client IP. I've seen that the request header "x-forwarded-for" could be used to get the client IP, but when we pause the connection in this way, we don't even have the headers yet?

We searched all over for a solution, but apparently there's no good solutions.
Here are some of the better suggestions:
https://github.com/indutny/sticky-session/issues/6
https://github.com/indutny/sticky-session/pull/45
None of them seemed good performance wise and therefore we ended up changing SocketIO communication to Websockets only. This eliminates the need for sticky sessions all together.

Scalable architecture for socket.io

I am new to socket.io and Node JS and I am trying to build a scalable application with a high number of simultaneous socket connections (10,000+).
Currently, I started on a model where my server creates child process, and every child process listens a specific port with a sicket.io instance attached. Once a client connects, he is redirected on a specific port.
The big question is : Does having several socket.io instances on several ports increases the number of possible connections ?
Here is my code, just in case :
Server
var server = http.createServer(app);
server.childList = [];
for (var i = 0; i < app.portList.length; i++) {
server.childList[i] = require('child_process').fork('child.js');
}
server.listen(3443, () => {
for (var i = 0; i < app.portList.length; i++) {
server.childList[i].send({ message: 'createServer', port: app.portList[i] });;
}
});
child.js :
var app = require('./app');
var http = require('http');
var socket_io = require( "socket.io" );
process.on('message', (m) => {
if (m.message === 'createServer') {
var childServ = http.createServer(app);
childServ.listen(m.port, () => {
console.log("childServ listening on port "+m.port);
});
var io = socket_io();
io.attach( childServ );
io.sockets.on('connection', function (socket) {
console.log("A client just connected to my socket_io server on port "+m.port);
});
}
});
Feel free to release the kraken if I did something horrible there

First off, what you need to optimize depends on how busy your socket.io connections are and whether the activity is mostly asynchronous I/O operations or whether it's CPU-intensive stuff. As you may already know, node.js scales really well already for asynchronous I/O stuff, but it needs multiple processes to scale well for CPU-intensive stuff. Further, there are some situations where the garbage collector gets too busy (lots and lots of small requests being served) and you also need to go to multiple processes for that reason.
More server instances (up to at least the number of CPUs you have in the server) will give you more CPU processing power (if that's what you need). It won't necessarily increase the number of max connections you can support on a box if most of them are idle. For that, you have to custom tune your server to support lots and lots of connections.
Usually, you would NOT want N socket.io servers each listening on a different port. That puts the burden on the clients to somehow select a port and the client has to know exactly what ports to choose from (e.g. how many server instances you have).
Usually, you don't do it this way. Usually, you have N processes all listening on the same port and you use some sort of loadbalancer to distribute the load among them. This makes the server infrastructure transparent to the clients which means you can scale the servers up or down without changing the client behavior at all. In fact, you can even add more than one physical server box and increase capacity even further that way.
Here's an article from the socket.io doc on using multiple nodes with a load balancer to increase capacity: Socket.io - using multiple nodes (updated link). There's also explicit support by redis for a combination of multiple socket.io instances and redis so you can communicate with any socket.io instance regardless of process.

Does having several socket.io instances on several ports increases the number of possible connections ?
Yes, you have built a simple load-balancer which is a pretty common practice. There are several good tutorials about different ways of scaling node.js.
Horizontally scale socket.io with redis
http://goldfirestudios.com/blog/136/Horizontally-Scaling-Node.js-and-WebSockets-with-Redis
Your load balancer will speed up your code to a point because you utilize multiple threads but I read on some other thread a while ago that a rule of thumb is to start around 2-3 processes per cpu core. More than that cause more overhead then help, but that is highly dependent on situation.

Load Balancing with Node and Heroku

I have a web app that accepts api requests from an ios app. My web app is hosted on Heroku using their free dyno which is able to process 512 mb of data per request. Because node is a single threaded application this will be a problem once we start getting higher levels of traffic from the ios end to the web server. I'm also not the richest person in the world so i'm wondering if it would be smart to create another free heroku app and use a round robin approach to balance the load received from the ios app?
I just need to be pointed into the right direction. Vertical scaling is not really an option financially.

I'm the Node.js platform owner at Heroku.
You may be doing some premature optimization. Node.js, on our smallest 1X size (512MB RAM), can handle hundreds of simultaneous connections and thousands of requests per minute.
If your iOS app is consistently maxing that out, it may be time to consider monetization!

As mentioned by Daniel it's against Heroku rules. Having said that there are probably other services that would allow you to do that.
One way to approach this problem is to use cluster module with ZeroMQ (you need to have ZeroMQ installed before using the module - see module description).
var cluster = require('cluster');
var zmq = require('zmq');
var ROUTER_SOCKET = 'tcp://127.0.0.1:5555';
var DEALER_SOCKET = 'tcp://127.0.0.1:7777';
if (cluster.isMaster) {
// this is the main process - create Router and Dealer sockets
var router = zmq.socket('router').bind(ROUTER_SOCKET);
var dealer = zmq.socket('dealer').bind(DEALER_SOCKET);
// forward messages between router and dealer
router.on('message', function() {
var frames = Array.prototype.slice.cal(arguments);
dealer.send(frames);
});
dealer.on('message', function() {
var frames = Array.prototype.slice.cal(arguments);
router.send(frames);
});
// listen for workers processes to come online
cluster.on('online', function() {
// do something with a new worker, maybe keep an array of workers
});
// fork worker processes
for (var i = 0, i < 100; i++) {
cluster.fork();
}
} else {
// worker process - connect to Dealer
let responder = zmq.socket('rep').connect(DEALER_SOCKET);
responder.on('message', function(data) {
// do something with incomming data
})
}
This is just to point you in the right direction. If you think about it you can create a script with a parameter that will tell it if it's a master or a worker process. Then on the main server run it as is, and on additional servers run it using worker flag which will force it to connect to the main dealer.
Now your main app needs to send the requests to the router, which will be later forwarded to the worker processes:
var zmq = require('zmq');
var requester = zmq.socket('req');
var ROUTER_SOCKET = 'tcp://127.0.0.1:5555';
// handle replies - for example completion status from the worker processes
requester.on('message', function(data) {
// do something with the replay
});
requester.connect(ROUTER_SOCKET);
// send requests to the router
requester.send({
// some object describing the task
});

So first off, as the other replies have pointed out, running two copies of your app to avoid Heroku's limits violates their ToS, which may not be a great idea.
There is, however, some good news. For starters (from Heroku's docs):
The dyno manager will restart your dyno and log an R15 error if the memory usage of a:
free, hobby or standard-1x dyno reaches 2.5GB, five times its quota.
As I understand it, despite the fact that your dyno has 512mb of actual RAM, it'll swap out to 5x that before it actually restarts. So you can go beyond 512mb (as long as you're willing to pay the performance penalty for swapping to disk, which can be severe).
Further to that, Heroku bills by the second and allows you to scale your dyno formation up and down as needed. This is fairly easy to do within your own app by hitting the Heroku API – I see that you've tagged this with NodeJS so you might want to check out:
Heroku's node client
the very-barebones-but-still-functional toots/node-heroku module
Both of these modules allow you to scale up and down your formation of dynos — with a simple heuristic (say, always have a spare 1X dyno running), you could add capacity while you're processing a request, and get rid of the spare capacity when api requests aren't running. Given that you're billed by the second, this can end up being very inexpensive; 1X dynos work out to something like 5¢ an hour to run. If you end up running extra dynos for even a few hours a day, it's a very, very small cost to you.
Finally: there are also 3rd party services such as Adept and Hirefire (two random examples from Google, I'm sure there are more) that allow you to automate this to some degree, but I don't have any experience with them.

You certainly could, I mean, programatically - but that would bypass Heroku's TOS:
4.4 You may not develop multiple Applications to simulate or act as a single Application or otherwise access the Heroku Services in a manner intended to avoid incurring fees.
Now, I'm not sure about this:
Because node is a single threaded application this will be a problem once we start getting higher levels of traffic from the ios end to the web server.
There are some threads discussing that, with some interesting answers:
Clustering Node JS in Heavy Traffic Production Environment
How to decide when to use Node.js?
Also, they link to this video, introducing Node.js, which talks a bit about benchmarks:
Introduction of Node JS by Ryan Dahl

How to scale socket.io without redis

I'm currently searching for an alternative to scale my express app with socket.io. The problem is that I don't want to use redis as socket.io store. Are there any other possibilities to cluster socket.io except with Clusterhub?
EDIT: I tried to use fakeredis as replacement for redis, but it seems like it doesn't work with socket.io. From ActionHero.js I know that faye-websocket works with fakeredis.

This might well depends on your socket.io usage and the type of scaling you want to achieve (cluster vs scaling to multiple machines).
So, here is what I did to scale our usage of socket.io to multiples servers.
We have 3 servers behind a load balancer, when a socket connects it connect to any of the 3 servers, the three server has an in memory list of the sockets, and the three servers have an order list of internal server address e.g. [server1, server2, server3].
What I do basically is a ring (internally we call it the "ring of sockets"):
If I need to emit an event to a socket from server1, I look first if the socket is connected to that server1, if not I send an http request to the next server (server2) which will check if the socket is there, if not there it will send the same request to server3, and so on until reaching the origin in which case you might throw an error.
Its almost the same if I need to broadcast a message, I start from one server and then call an http endpoint on the others.
The algorithm I use to determine the next node (next_node.js) is:
var nodes = process.env.NODES.split(',');
//this is usually: http://server1/,http://server2/,http://server3/
var url = require('url');
var current = require("os").hostname();
//origin is the node that started the lookup
exports.get = function (origin) {
var next_node_i = nodes.map(function (uri) {
return url.parse(uri).hostname;
}).reduce(function (prev, curr, i, arr){
return curr === current && i < arr.length - 1 ? i + 1 : prev;
}, 0);
var next_node = nodes[next_node_i];
if (origin && url.parse(next_node).hostname === origin) {
// if the next node is equal to the first node initiating the lookup
// it means the socket we are looking for is not connect to any node.
return null;
}
return next_node;
};
Caveats:
Latency is low between these server and network partitioning is unlikely, they are physically on the same datacenter. But if it were a network partitioning is not that important for us.
We always run the ring in the same direction. An improved version will be to run in both directions(?)
Servers share a secret to call these endpoints.
In my opinion this is a very easy way to achieve scaling in a lot of socket.io use cases, there might be a lot of other scenarios where this is not an option but I hope this give some ideas.

If your comfortable with Azure services, some of the guys on the Azure team have gone to the liberty of writing a service bus store for socket.io.
Glenn Block Explains Socket.IO Scale-Out on Service Bus

Load Balance: Node.js - Socket.io - Redis

I have 3 Servers running NodeJs, and they are related each other with Redis (1 master, 2 slaves).
The issue i'm having is that running the system on a single server works fine, but when I scale it to 3 NodeJS servers, it starts missing messages and the system gets unstable.
My load balancer does not accept sticky sessions. So every time that the requests from the client arrives to it, they can go to a different server.
I'm pointing all the NodeJS servers to the Redis Master.
It looks like socket.io is storing information on each server and it is not being distributed with redis.
I'm using socket.io V9, I'm suspecting that I don't have any handshake code, could this be the reason?
My code to configure socket.io is:
var express = require('express');
var io = require('socket.io');
var redis = require('socket.io/node_modules/redis');
var RedisStore = require('socket.io/lib/stores/redis');
var pub = redis.createClient("a port", "an ip");
var sub = redis.createClient("a port", "an ip");
var client = redis.createClient("a port", "an ip");
var events = require('./modules/eventHandler');
exports.createServer = function createServer() {
var app = express();
var server = app.listen(80);
var socketIO = io.listen(server);
socketIO.configure(function () {
socketIO.set('store', new RedisStore({
redisPub: pub,
redisSub: sub,
redisClient: client
}));
socketIO.set('resource', '/chat/socket.io');
socketIO.set('log level', 0);
socketIO.set('transports', [, 'htmlfile', 'xhr-polling', 'jsonp-polling']);
});
// attach event handlers
events.attachHandlers(socketIO);
// return server instance
return server;
};

Redis only syncs from the master to the slaves. It never syncs from the slaves to the master. So, if you're writing to all 3 of your machines, then the only messages that will wind up synced across all three servers will be the ones hitting the master. This is why it looks like you're missing messages.
More info here.
Read only slave
Since Redis 2.6 slaves support a read-only mode that
is enabled by default. This behavior is controlled by the
slave-read-only option in the redis.conf file, and can be enabled and
disabled at runtime using CONFIG SET.
Read only slaves will reject all
the write commands, so that it is not possible to write to a slave
because of a mistake. This does not mean that the feature is conceived
to expose a slave instance to the internet or more generally to a
network where untrusted clients exist, because administrative commands
like DEBUG or CONFIG are still enabled. However security of read-only
instances can be improved disabling commands in redis.conf using the
rename-command directive.
You may wonder why it is possible to revert
the default and have slave instances that can be target of write
operations. The reason is that while this writes will be discarded if
the slave and the master will resynchronize, or if the slave is
restarted, often there is ephemeral data that is unimportant that can
be stored into slaves. For instance clients may take information about
reachability of master in the slave instance to coordinate a fail over
strategy.

I arrived to this post:
It can be a good idea to have a "proxy" between nodejs servers and the load balancer.
With this approach XHR-Polling can be used in load balancers without Sticky sessions.
Load balancing with node.js using http-proxy
using nodejs-http-proxy i can have custom routing route, ex. by adding a parameter on the "connect url" of socket.io.
Anyone tried this solution before?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string