Chat project - load balance with socket.io - node.js

I am involved in a development project of a chat where we are using node.js, socket.io (rooms) and mongodb. We are at the stage of performance testing and we are very concerned if the system needs a load balance.
How can we develop if our project needs it? J'a researched on NGINX looks cool, but we are in doubt whether solves our problem as how the system will be a chat, we fear the servers are not ~talking~ with each other correctly ...
Where do we go if we need a load balancing?

To ensure that we can scale to multiple nodes but keep up interconnectivity between different clients and different servers, I use redis. It's actually very simple to use and set up.
What this does is creates a pub/sub system between your servers to keep track of your different socket clients.
var io = require('socket.io')(3000),
redis = require('redis'),
redisAdapter = require('socket.io-redis'),
port = 6379,
host = '127.0.0.1',
pub = redis.createClient(port, host),
sub = redis.createClient(port, host, {detect_buffers: true}),
server = http(),
socketServer = io(server, {adapter: redisAdapter({pubClient: pub, subClient: sub})});
read more here: socket.io-redis
As far as handling the different node servers, there are different approaches.
AWS ELB(elastic load balancer)
Nginx
Apache
HAProxy
Among others...

Check out the NPM package mong.socket.io . It has the ability to save socket.io data to mongoDB like below;
{
"_id" : ObjectId("54b901332e2f73f5594c6267"),
"event" : "join",
"message" : {
"name" : "join",
"nodeId" : 426506139219,
"args" : "[\"URAiA6mO6VbCwquWKH0U\",\"/54b6821asdf66asdasd2f0f9cd2997413780273376\"]"
}}
Or you may use the redis adapter as mentioned there;
Socket.IO Using multiple nodes
Then just use the NGINX reverse proxy and all of the node processes should share Socket.IO events with each other.

Related

NodeJS Express - Two NodeJS instances on same port (vhost)

I'm trying to run 2 instances of NodeJS on the same port and server from diffrent server.js files (diffrent dir, config etc). My server provider gave me an information that vhost is running for a diffrent domain, and there is the question. How to handle it in NodeJS Express app ? I've tried to use vhost from https://github.com/expressjs/vhost like that :
const app = express();
const vhost = require('vhost');
app.use(vhost('example1.org', app));
// Start up the Node server
app.listen(4100, () => {
console.log(`Node server listening on 4100`);
});
And for second application like that:
const app = express();
const vhost = require('vhost');
app.use(vhost('example2.org', app));
// Start up the Node server
app.listen(4100, () => {
console.log(`Node server listening on 4100`);
});
But when I'm trying to run second instance I'm getting EADDRINUSE ::: 4100, so vhost doesn't work here.
Do you know how to fix it ?
You can only have one process listen to one port, not just in Node.js, but generally (with exceptions that don't apply here).
You can achieve what you need to one of two ways:
Combine the node apps
You could make the apps into one application, listen once and then forward requests for each host to separate bits of code - if you wanted to achieve code separation still, the separate bits of code could be NPM modules that are actually written and maintained in isolation.
Use webserver to proxy the requests
You could run the 2 node processes on some free port, say 5000 and 5001, and use a webserver to forward requests to it automatically based on host. I'd recommend Nginx for this, as its proxying capabilities are both relatively easy to set up, and powerful. It's also fairly good at not using too many system resources. Apache and others can also be used for this, but my personal preference would be Nginx.
Conclusion
My recommendation would be that you install a webserver and forward requests on the exposed port to the separately running node processes. I'd actually recommend that you run node behind a proxy as default for a project, and only expose it directly in excpetional circumstances. You get a lot of configuration options, security, and scalability benefits if your app already involves a well hardened server setup.

How to check socket is alive (connected) in socket.io with multiple nodes and socket.io-redis

I am using socket.io with multiple nodes, socket.io-redis and nginx. I follow this guide: http://socket.io/docs/using-multiple-nodes/
I am trying to do: At a function (server site), I want to query by socketid that this socket is connected or disconnect
I tried io.of('namespace').connected[socketid], it only work for current process ( it mean that it can check for current process only).
Anyone can help me? Thanks for advance.
How can I check socket is alive (connected) with socketid I tried
namespace.connected[socketid], it only work for current process.
As you said, separate process means that the sockets are only registered on the process that they first connected to. You need to use socket.io-redis to connect all your nodes together, and what you can do is broadcast an event each time a client connects/disconnects, so that each node has an updated real-time list of all the clients.
Check out here
as mentioned above you should use socket.io-redis to get it work on multiple nodes.
var io = require('socket.io')(3000);
var redis = require('socket.io-redis');
io.adapter(redis({ host: 'localhost', port: 6379 }));
I had the same problem and no solution at my convenience. So I made a log of the client to see the different methods and variable that I can use. there is the client.conn.readystate property for the state of the connection "open/closed" and the client.onclose() function to capture the closing of the connection.
const server = require('http').createServer(app);
const io = require('socket.io')(server);
let clients = [];
io.on('connection', (client)=>{
clients.push(client);
console.log(client.conn.readyState);
client.onclose = ()=>{
// do something
console.log(client.conn.readyState);
clients.splice(clients.indexOf(client),1);
}
});
When deploying Socket.IO application on a multi-nodes cluster, that means multiple SocketIO servers, there are two things to take care of:
Using the Redis adapter and Enabling the sticky session feature: when a request comes from a SocketIO client (browser) to your app, it gets associated with a particular session-id, these requests must be kept connecting with the same process (Pod in Kubernetes) that originated their ids.
you can learn more about this from this Medium story (source code available) https://saphidev.medium.com/socketio-redis...

How io.adapter works under the hood?

I'm working on 1-1 chat rooms application powered by node.js + express + socket.io.
I am following the article: Socket.IO - Rooms and Namespaces
In the article they demonstrate how to initiate the io.adapter using the module socket.io-redis:
var io = require('socket.io')(3000);
var redis = require('socket.io-redis');
io.adapter(redis({ host: 'localhost', port: 6379 }));
Two questions:
In the docs, They are mentioning two more arguments: pubClient and subClient. Should I supply them? What's the difference?
How the io.adapter behaves? For example, if user A is connected to server A and user B is server B, and they want to "talk" with each other. What's going under the hood?
Thanks.
You do not need to pass your own pubClient/subClient. If you pass host/port, they will be created for you. But, if you want to create them yourself, for any reason (e.g. you want to tweak reconnection timeouts), you create those 2 clients and pass it to adapter.
The adapter broadcasts all emits internally. So, it gives you the cluster feature. E.g. lets suppose that you have chat application, and you have 3 node.js servers behind load balancer (so they share single URL). Lets also assume that 6 different browsers connect to load balancer URL and they are routed to 3 separate node.js processes, 2 users per node.js server. If client #1 sends a message, node.js #1 will do something like io.to('chatroom').emit('msg from user #1'). Without adapter, both server #1 users will receive the emit, but not the remaining 4 users. If you use adapter, however, remaining node.js #2 and node.js #3 will receive info that emit was done and will issue identical emit to their clients - and all 6 users will receive initial message.
I've been struggling with this same issue, but have found an answer that seems to be working for me, at least in my initial testing phases.
I have a clustered application running 8 instances using express, cluster , socket.io , socket.io-redis and NOT sticky-sessions -> because using sticky seemed to cause a ton of bizarre bugs.
what I think is missing from the socket.io docs is this:
io.adapter(redis({ host: 'localhost', port: 6379 })); only supports web sockets ( well at the very least it doesn't support long polling ) , and so the client needs to specify that websockets are the only transport available. As soon as I did that I was able to get it going. So on the client side, I added {transports:['websockets']} to the socket constructor... so instead of this...
var socketio = io.connect( window.location.origin );
use this
var socketio = io.connect( window.location.origin , {transports:['websocket']} );
I haven't been able to find any more documentation from socket.io to support my theory but adding that got it going.
I forked this great chat example that wasn't working and got it working here: https://github.com/squivo/chat-example-cluster so there's finally a working example online :D

Load Balance: Node.js - Socket.io - Redis

I have 3 Servers running NodeJs, and they are related each other with Redis (1 master, 2 slaves).
The issue i'm having is that running the system on a single server works fine, but when I scale it to 3 NodeJS servers, it starts missing messages and the system gets unstable.
My load balancer does not accept sticky sessions. So every time that the requests from the client arrives to it, they can go to a different server.
I'm pointing all the NodeJS servers to the Redis Master.
It looks like socket.io is storing information on each server and it is not being distributed with redis.
I'm using socket.io V9, I'm suspecting that I don't have any handshake code, could this be the reason?
My code to configure socket.io is:
var express = require('express');
var io = require('socket.io');
var redis = require('socket.io/node_modules/redis');
var RedisStore = require('socket.io/lib/stores/redis');
var pub = redis.createClient("a port", "an ip");
var sub = redis.createClient("a port", "an ip");
var client = redis.createClient("a port", "an ip");
var events = require('./modules/eventHandler');
exports.createServer = function createServer() {
var app = express();
var server = app.listen(80);
var socketIO = io.listen(server);
socketIO.configure(function () {
socketIO.set('store', new RedisStore({
redisPub: pub,
redisSub: sub,
redisClient: client
}));
socketIO.set('resource', '/chat/socket.io');
socketIO.set('log level', 0);
socketIO.set('transports', [, 'htmlfile', 'xhr-polling', 'jsonp-polling']);
});
// attach event handlers
events.attachHandlers(socketIO);
// return server instance
return server;
};
Redis only syncs from the master to the slaves. It never syncs from the slaves to the master. So, if you're writing to all 3 of your machines, then the only messages that will wind up synced across all three servers will be the ones hitting the master. This is why it looks like you're missing messages.
More info here.
Read only slave
Since Redis 2.6 slaves support a read-only mode that
is enabled by default. This behavior is controlled by the
slave-read-only option in the redis.conf file, and can be enabled and
disabled at runtime using CONFIG SET.
Read only slaves will reject all
the write commands, so that it is not possible to write to a slave
because of a mistake. This does not mean that the feature is conceived
to expose a slave instance to the internet or more generally to a
network where untrusted clients exist, because administrative commands
like DEBUG or CONFIG are still enabled. However security of read-only
instances can be improved disabling commands in redis.conf using the
rename-command directive.
You may wonder why it is possible to revert
the default and have slave instances that can be target of write
operations. The reason is that while this writes will be discarded if
the slave and the master will resynchronize, or if the slave is
restarted, often there is ephemeral data that is unimportant that can
be stored into slaves. For instance clients may take information about
reachability of master in the slave instance to coordinate a fail over
strategy.
I arrived to this post:
It can be a good idea to have a "proxy" between nodejs servers and the load balancer.
With this approach XHR-Polling can be used in load balancers without Sticky sessions.
Load balancing with node.js using http-proxy
using nodejs-http-proxy i can have custom routing route, ex. by adding a parameter on the "connect url" of socket.io.
Anyone tried this solution before?

SocketIO on a Node.js cluster

I have a standalone Node.js app which has SocketIO server that listens on a certain port, e.g. 8888. Now I am trying to run this app in a cluster and because cluster randomly assigns workers to requests, SocketIO clients in XHR polling mode once handshaken and authorized with one worker get routed to another worker where they're not handshaken and the mess begins.
And because workers don't share anything, I can't find a workaround. Is there a known solution to this issue?
There is no "simple" solution. What you have to do is the following:
If a client connects to a worker, save the connection-id together with the worker-id and a potential additional identification-id in a global (=for all workers accessible) store (i.e. redis).
If a client gets routed to another worker, use the store to look up which worker is reponsible for this client (either with the connection-id or with the additional identification-id and then hand it over to that worker (either with the nodejs-worker-master-worker-communication or via redis-pub-sub)
I habe implemented such thing with sock.js and an additional degree of complexity: I have two node.js servers with four workers each, so I had to use redis-pub-sub for worker/worker communication, because it is not guaranteed that they are on the same machine.
Actually there is a simple solution: using Redis to store sockets states.
Everything is explained in Socket.IO documentation:
The default 'session' storage in Socket.IO is in memory (MemoryStore).
The MemoryStore only allows you to deploy socket.io on a single
process. If you want to scale to multiple process and / or multiple
servers you can use our RedisStore which uses the Redis NoSQL database
as man in the middle.
So in order to change the store instance to RedisStore we add this:
var RedisStore = require('socket.io/lib/stores/redis')
, redis = require('socket.io/node_modules/redis')
, pub = redis.createClient()
, sub = redis.createClient()
, client = redis.createClient();
// Needs to be done after 'listen()'
io.set('store', new RedisStore({
redisPub : pub
, redisSub : sub
, redisClient : client
}));
Of course you will need to have a redis server running.

Resources