SocketIO on a Node.js cluster

SocketIO on a Node.js cluster - node.js

I have a standalone Node.js app which has SocketIO server that listens on a certain port, e.g. 8888. Now I am trying to run this app in a cluster and because cluster randomly assigns workers to requests, SocketIO clients in XHR polling mode once handshaken and authorized with one worker get routed to another worker where they're not handshaken and the mess begins.
And because workers don't share anything, I can't find a workaround. Is there a known solution to this issue?

There is no "simple" solution. What you have to do is the following:
If a client connects to a worker, save the connection-id together with the worker-id and a potential additional identification-id in a global (=for all workers accessible) store (i.e. redis).
If a client gets routed to another worker, use the store to look up which worker is reponsible for this client (either with the connection-id or with the additional identification-id and then hand it over to that worker (either with the nodejs-worker-master-worker-communication or via redis-pub-sub)
I habe implemented such thing with sock.js and an additional degree of complexity: I have two node.js servers with four workers each, so I had to use redis-pub-sub for worker/worker communication, because it is not guaranteed that they are on the same machine.

Actually there is a simple solution: using Redis to store sockets states.
Everything is explained in Socket.IO documentation:
The default 'session' storage in Socket.IO is in memory (MemoryStore).
The MemoryStore only allows you to deploy socket.io on a single
process. If you want to scale to multiple process and / or multiple
servers you can use our RedisStore which uses the Redis NoSQL database
as man in the middle.
So in order to change the store instance to RedisStore we add this:
var RedisStore = require('socket.io/lib/stores/redis')
, redis = require('socket.io/node_modules/redis')
, pub = redis.createClient()
, sub = redis.createClient()
, client = redis.createClient();
// Needs to be done after 'listen()'
io.set('store', new RedisStore({
redisPub : pub
, redisSub : sub
, redisClient : client
}));
Of course you will need to have a redis server running.

Related

Do I need to specify a client for connect-redis?

I'm trying to use redis on a node express server to store user sessions, by using the connect-redis(on Github) library. And I find this block of settings works well:
var app = express();
app.use(session({
secret: 'hahahahahahahahahaha',
cookie: { maxAge: 36000000 },
store: new redisStore(),
}))
Notice I didn't pass any params to new redisStore() and it also works for now (in its documentation a client is passed), guess it's using localhost and default port by default.
But I'm worrying, if I put my server on AWS EC2 in the future which is a shared server, will that be a problem if I don't specify a client? Say, will redis conflicts with other servers also hosting on that EC2?

You are using a redis client here. You just relinquished the configuration of the client though. That will be a problem in the future, as (presumably) you will be deploying on AWS in a cluster. The redis client needs to be common across instances.
If you don't define the port number and other details of the client, you basically are undoing the purpose of having state management. Your redis clients will be limited to your own instance.
Go through http://redis.io/topics/partitioning for more info.
The idea is to have one (or a cluster) of redis instances that run independent of your AWS instances, so that state is shared across all of your instances. That is why you need to have complete control of your redis client

How to check socket is alive (connected) in socket.io with multiple nodes and socket.io-redis

I am using socket.io with multiple nodes, socket.io-redis and nginx. I follow this guide: http://socket.io/docs/using-multiple-nodes/
I am trying to do: At a function (server site), I want to query by socketid that this socket is connected or disconnect
I tried io.of('namespace').connected[socketid], it only work for current process ( it mean that it can check for current process only).
Anyone can help me? Thanks for advance.

How can I check socket is alive (connected) with socketid I tried
namespace.connected[socketid], it only work for current process.
As you said, separate process means that the sockets are only registered on the process that they first connected to. You need to use socket.io-redis to connect all your nodes together, and what you can do is broadcast an event each time a client connects/disconnects, so that each node has an updated real-time list of all the clients.

Check out here
as mentioned above you should use socket.io-redis to get it work on multiple nodes.
var io = require('socket.io')(3000);
var redis = require('socket.io-redis');
io.adapter(redis({ host: 'localhost', port: 6379 }));

I had the same problem and no solution at my convenience. So I made a log of the client to see the different methods and variable that I can use. there is the client.conn.readystate property for the state of the connection "open/closed" and the client.onclose() function to capture the closing of the connection.
const server = require('http').createServer(app);
const io = require('socket.io')(server);
let clients = [];
io.on('connection', (client)=>{
clients.push(client);
console.log(client.conn.readyState);
client.onclose = ()=>{
// do something
console.log(client.conn.readyState);
clients.splice(clients.indexOf(client),1);
}
});

When deploying Socket.IO application on a multi-nodes cluster, that means multiple SocketIO servers, there are two things to take care of:
Using the Redis adapter and Enabling the sticky session feature: when a request comes from a SocketIO client (browser) to your app, it gets associated with a particular session-id, these requests must be kept connecting with the same process (Pod in Kubernetes) that originated their ids.
you can learn more about this from this Medium story (source code available) https://saphidev.medium.com/socketio-redis...

How io.adapter works under the hood?

I'm working on 1-1 chat rooms application powered by node.js + express + socket.io.
I am following the article: Socket.IO - Rooms and Namespaces
In the article they demonstrate how to initiate the io.adapter using the module socket.io-redis:
var io = require('socket.io')(3000);
var redis = require('socket.io-redis');
io.adapter(redis({ host: 'localhost', port: 6379 }));
Two questions:
In the docs, They are mentioning two more arguments: pubClient and subClient. Should I supply them? What's the difference?
How the io.adapter behaves? For example, if user A is connected to server A and user B is server B, and they want to "talk" with each other. What's going under the hood?
Thanks.

You do not need to pass your own pubClient/subClient. If you pass host/port, they will be created for you. But, if you want to create them yourself, for any reason (e.g. you want to tweak reconnection timeouts), you create those 2 clients and pass it to adapter.
The adapter broadcasts all emits internally. So, it gives you the cluster feature. E.g. lets suppose that you have chat application, and you have 3 node.js servers behind load balancer (so they share single URL). Lets also assume that 6 different browsers connect to load balancer URL and they are routed to 3 separate node.js processes, 2 users per node.js server. If client #1 sends a message, node.js #1 will do something like io.to('chatroom').emit('msg from user #1'). Without adapter, both server #1 users will receive the emit, but not the remaining 4 users. If you use adapter, however, remaining node.js #2 and node.js #3 will receive info that emit was done and will issue identical emit to their clients - and all 6 users will receive initial message.

I've been struggling with this same issue, but have found an answer that seems to be working for me, at least in my initial testing phases.
I have a clustered application running 8 instances using express, cluster , socket.io , socket.io-redis and NOT sticky-sessions -> because using sticky seemed to cause a ton of bizarre bugs.
what I think is missing from the socket.io docs is this:
io.adapter(redis({ host: 'localhost', port: 6379 })); only supports web sockets ( well at the very least it doesn't support long polling ) , and so the client needs to specify that websockets are the only transport available. As soon as I did that I was able to get it going. So on the client side, I added {transports:['websockets']} to the socket constructor... so instead of this...
var socketio = io.connect( window.location.origin );
use this
var socketio = io.connect( window.location.origin , {transports:['websocket']} );
I haven't been able to find any more documentation from socket.io to support my theory but adding that got it going.
I forked this great chat example that wasn't working and got it working here: https://github.com/squivo/chat-example-cluster so there's finally a working example online :D

Load Balance: Node.js - Socket.io - Redis

I have 3 Servers running NodeJs, and they are related each other with Redis (1 master, 2 slaves).
The issue i'm having is that running the system on a single server works fine, but when I scale it to 3 NodeJS servers, it starts missing messages and the system gets unstable.
My load balancer does not accept sticky sessions. So every time that the requests from the client arrives to it, they can go to a different server.
I'm pointing all the NodeJS servers to the Redis Master.
It looks like socket.io is storing information on each server and it is not being distributed with redis.
I'm using socket.io V9, I'm suspecting that I don't have any handshake code, could this be the reason?
My code to configure socket.io is:
var express = require('express');
var io = require('socket.io');
var redis = require('socket.io/node_modules/redis');
var RedisStore = require('socket.io/lib/stores/redis');
var pub = redis.createClient("a port", "an ip");
var sub = redis.createClient("a port", "an ip");
var client = redis.createClient("a port", "an ip");
var events = require('./modules/eventHandler');
exports.createServer = function createServer() {
var app = express();
var server = app.listen(80);
var socketIO = io.listen(server);
socketIO.configure(function () {
socketIO.set('store', new RedisStore({
redisPub: pub,
redisSub: sub,
redisClient: client
}));
socketIO.set('resource', '/chat/socket.io');
socketIO.set('log level', 0);
socketIO.set('transports', [, 'htmlfile', 'xhr-polling', 'jsonp-polling']);
});
// attach event handlers
events.attachHandlers(socketIO);
// return server instance
return server;
};

Redis only syncs from the master to the slaves. It never syncs from the slaves to the master. So, if you're writing to all 3 of your machines, then the only messages that will wind up synced across all three servers will be the ones hitting the master. This is why it looks like you're missing messages.
More info here.
Read only slave
Since Redis 2.6 slaves support a read-only mode that
is enabled by default. This behavior is controlled by the
slave-read-only option in the redis.conf file, and can be enabled and
disabled at runtime using CONFIG SET.
Read only slaves will reject all
the write commands, so that it is not possible to write to a slave
because of a mistake. This does not mean that the feature is conceived
to expose a slave instance to the internet or more generally to a
network where untrusted clients exist, because administrative commands
like DEBUG or CONFIG are still enabled. However security of read-only
instances can be improved disabling commands in redis.conf using the
rename-command directive.
You may wonder why it is possible to revert
the default and have slave instances that can be target of write
operations. The reason is that while this writes will be discarded if
the slave and the master will resynchronize, or if the slave is
restarted, often there is ephemeral data that is unimportant that can
be stored into slaves. For instance clients may take information about
reachability of master in the slave instance to coordinate a fail over
strategy.

I arrived to this post:
It can be a good idea to have a "proxy" between nodejs servers and the load balancer.
With this approach XHR-Polling can be used in load balancers without Sticky sessions.
Load balancing with node.js using http-proxy
using nodejs-http-proxy i can have custom routing route, ex. by adding a parameter on the "connect url" of socket.io.
Anyone tried this solution before?

Should I share Redis connection between files/modules?

I'm developing a node.js app and I am in need of heavy Redis usage. The app will be clustered across 8 CPU cores.
Right now I have 100 concurrent connections to Redis because every worker per CPU has several modules running require('redis').createClient().
Scenario A:
file1.js:
var redis = require('redis').createClient();
file2.js
var redis = require('redis').createClient();
SCENARIO B:
redis.js
var redis = require('redis').createClient();
module.exports = redis;
file1.js
var redis = require('./redis');
file2.js
var redis = require('./redis');
Which approach is better: creating new Redis instance in every new file I introduce (scenario A) or creating one Redis connection globally (scenario B) and sharing this connection across all modules I have. What are drawbacks/benefits of each solution?
Thanks in advance!

When I face a question such as this I generally think about three basic questions.
Which is more readable?
Which allows better code reuse?
Which is more efficient?
Not necessarily in this order as it depends on the scenario, but I believe in this case all three of these questions are in favor of option B.
If you ever needed to modify options for createClient, you would then need to edit them in every file which uses it. Which in option A is every file which uses redis, and option B is just redis.js. Also if a newer or different product comes out and you want to replace redis It would be feasible to make redis.js a wrapper for a different package or even a newer redis client substantially cutting down conversion time.
Globals are generally a bad thing, but in this example redis.js should not be storing mutable state, so there is no problem having a global/singleton in this context.

Both Node and Redis can handle lots of connections pretty well, so that's not a problem.
In your situation, you're creating Redis connections at the startup of your application, so the number of connections you're setting up is limited (in the sense that after your application is started, the number of connections will be constant).
Situations where you'd want to reuse the same connection is in highly dynamic situations, for instance with an HTTP-server where you need to query Redis for every request. Creating a new connection for each request would be a waste of resources (creating and destroying connections all the time) and reusing one connection for each request would be preferable.
As for which of the two scenario's I'd prefer, I'm leaning towards Scenario A myself.

You can create file to handle connection and functions with redis
redis-con.js
const redis = require('redis');
let redisClient;
(async () => {
redisClient = redis.createClient();
redisClient.on("error", (error) => console.error(`Error Redis: ${error}`));
await redisClient.connect();
})();
module.exports = redisClient;
Then you need to create function to handle set, get and del.
now, just import the connection

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

SocketIO on a Node.js cluster - node.js

Related

Do I need to specify a client for connect-redis?

How to check socket is alive (connected) in socket.io with multiple nodes and socket.io-redis

How io.adapter works under the hood?

Load Balance: Node.js - Socket.io - Redis

Should I share Redis connection between files/modules?

Categories

Resources