Nodejs Clustering with Sticky-Session - node.js

const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
console.log(`Master ${process.pid} is running`);
// Fork workers.
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`worker ${worker.process.pid} died`);
});
} else {
// Workers can share any TCP connection
// In this case it is an HTTP server
var sticky = require('sticky-session');
var express = require('express');
var app = express();
app.get('/', function (req, res) {
console.log('worker: ' + cluster.worker.id);
res.send('Hello World!');
});
var server = http.createServer(app);
sticky.listen(server,3000);
console.log(`Worker ${process.pid} started`);
}
I looked up the documentation for nodejs clustering and sticky-session
and another stack overflow answer regarding this
var cluster = require('cluster');
var http = require('http');
var sticky = require('sticky-session');
var express = require('express');
var app = express();
app.get('/', function (req, res) {
console.log('worker: ' + cluster.worker.id);
res.send('Hello World!');
});
var server = http.createServer(app);
sticky.listen(server,3000);
If the above snippet is run without forking it works fine but else never works as shown in the clustered example above in which the threads are started but server is never initialised .
I read there is alternative of sticky-cluster can somebody give a proper authoritative answer on this topic which will be useful for people looking for the same and the another main issue comes with this is the app.locals object which is used to store variables for an app instance and the occurrence multiple server instances causes this to break as values will be different across different instances so this approach causes a big issue and app breaks so .When answering please don't copy paste some code please give a detailed answer detailing the approach its benefit and short comings.
I am not looking for a answer that is limited to using sticky-sessions nodejs module, I welcome all other approaches in which all cores of the processor are used and but ensuring session continuity .
If it involves RedisStore or MongoDb store its ok,What I want to know is about a standard approach in case of nodejs application with clustering with session continuity
https://github.com/indutny/sticky-session
https://nodejs.org/api/cluster.html
https://stackoverflow.com/a/37769107/3127499

There is a small problem in your code.
"sticky-session" module already uses node.js "cluster" module within.You dont need to "fork()" because sticky-session will already do it for you. Lets find out how:
var cluster = require('cluster'); // Only required if you want the worker id
var sticky = require('sticky-session');
var server = require('http').createServer(function(req, res) {
res.end('worker: ' + cluster.worker.id);
});
sticky.listen(server, 3000);
calling sticky.listen() will already spawn workers for you.See the listen() implementation below
function listen(server, port, options) {
if (!options)
options = {};
if (cluster.isMaster) {
var workerCount = options.workers || os.cpus().length;
var master = new Master(workerCount, options.env);
master.listen(port);
master.once('listening', function() {
server.emit('listening');
});
return false;
}
return true;
}
This line var master = new Master(workerCount, options.env) is responsible for spawning workers.
see the Master() implementation below:
function Master(workerCount, env) {
net.Server.call(this, {
pauseOnConnect: true
}, this.balance);
this.env = env || {};
this.seed = (Math.random() * 0xffffffff) | 0;
this.workers = [];
debug('master seed=%d', this.seed);
this.once('listening', function() {
debug('master listening on %j', this.address());
for (var i = 0; i < workerCount; i++)
// spawning workers
this.spawnWorker();
});
}
So indeed when you call sticky.listen(server,port) you are actually calling cluster.fork().hence you should not explicitly again call fork().
Now your code should look like:
var cluster = require('cluster'); // Only required if you want the worker id
var sticky = require('sticky-session');
var server = require('http').createServer(function(req, res) {
res.end('worker: ' + cluster.worker.id);
});
//sticky.listen() will return false if Master
if (!sticky.listen(server, 3000)) {
// Master code
server.once('listening', function() {
console.log('server started on 3000 port');
});
} else {
// Worker code
}
One important thing to remember is that spawned workers will have its own EVENTLOOP and memory hence resources are not shared among each other.
You can use "REDIS" or other npm modules such as "memored" to share resources among different workers.
Hope this solves your both issues.

I think you are confusing sticky session with shared memory store.
Let me try to help:
Sticky-sessions module is balancing requests using their IP address. Thus client will always connect to same worker server, and socket.io will work as expected, but on multiple
processes!
Implementing sticky sessions means that you now have multiple nodes accepting connections. However, it DOES NOT guarantee that these nodes will SHARE the same memory, as each worker has their own eventloop and internal memory state.
In other words, data being processed by one node may not be available to other worker nodes, which explains the issue you pointed out.
...another main issue comes with this is the app.locals object which
is used to store variables for an app instance and the occurrence
multiple server instances causes this to break as values will be
different across different instances so this approach causes a big
issue and app breaks...
Thus, to resolve this, we would require using something like Redis so that data can be shared across multiple nodes.
Hope this helps!

If I understand your questions correctly, you are dealing with in-memory data storage or session storage. This is one of the known problems in the session based authentication in multi-node or in a cluster. Suppose you made a call to Node A and get the session called sessionA but for the next call you made it to Node B. Node B does not know anything about sessionA. People try to solve this issue by using sticky session but that is not enough. Good practice will be to use an alternative approach, such as JWT or oAuth2. I prefer JWT for service to service communication. JWT does not store anything and stateless. It works brilliantly with REST since REST is also stateless. Here https://www.rfc-editor.org/rfc/rfc7519 is the specification of the JWT implementation. If you need to have some sort of refresh token, that case you need to consider a storage. Storage can be anything like REDIS, MongoDB or any other SQL based DB. For further clarification about JWT in nodejs:
https://jwt.io/
https://jwt.io/introduction/
https://www.npmjs.com/package/jsonwebtoken
https://cloud.google.com/iot/docs/how-tos/credentials/jwts#iot-core-jwt-refresh-nodejs

Related

Can't receive redis data from socket io

I'm building a realtime visualization using redis as pubsub messenger between python and node. There's a python script always running which sets a redis hash with hmset. That side of the app is working fine, if I enter the following example command: "HGETALL 'sellers-80183917'" in a redis client I end up getting the proper data.
The problem is in the js side. I'm using socketio and redis nodejs libraries to listen to the redis instance and publish the results online through a d3js viz.
I run the following code with node:
var express = require('express');
var app = express();
var redis = require('redis');
app.use(express.static(__dirname + '/public'));
var http = require('http').Server(app);
var io = require('socket.io')(http);
var sredis = require('socket.io-redis');
io.adapter(sredis({ host: 'localhost', port: 6379 }));
redisSubscriber = redis.createClient(6379, 'localhost', {});
redisSubscriber.on('message', function(channel, message) {
io.emit(channel, message);
});
app.get('/sellers/:seller_id', function(req, res){
var seller_id = req.params.seller_id;
redisSubscriber.subscribe('sellers-'.concat(seller_id));
res.render( 'seller.ejs', { seller:seller_id } );
});
http.listen(3000, '127.0.0.1', function(){
console.log('listening on *:3000');
});
And this is the relevant part of the seller.ejs file that's receiving the user requests and outputting the viz:
var socket = io('http://localhost:3000');
var stats;
var seller_key = 'sellers-'.concat(<%= seller %>);
socket.on(seller_key, function(msg){
stats = [];
console.log('Im in');
var seller = $.parseJSON(msg);
var items = seller['items'];
for(item in items) {
var item_data = items[item];
stats.push({'title': item_data['title'], 'today_visits': item_data['today_visits'], 'sold_today': item_data['sold_today'], 'conversion_rate': item_data['conversion_rate']});
}
setupData(stats);
});
The problem is that the socket_on() method never receives anything and I don't see where the problem is as everything seems to be working fine besides this.
I think that you might be confused as to what Pub/Sub in Redis actually is. It's not a way to listen to changes on hashes; you can have a Pub/Sub channel called sellers-1, and you can have a hash with the key sellers-1, but those are unrelated to each other.
As documented here:
Pub/Sub has no relation to the key space.
There is a thing called keyspace notifications that can be used to listen to changes in the key space (through Pub/Sub channels); however, this feature isn't enabled by default because it'll take up more resources.
Perhaps an easier method would be to publish a message after the HMSET, so any subscribers would know that the hash got changed (they would then retrieve the hash contents themselves, or the published message would contain the relevant data).
This brings us to the next possible issue: you only have one subscriber connection, redisSubscriber.
From what I understand from the Node.js Redis driver, calling .subscribe() on such a connection would remove any previous subscriptions in favor of the new one. So if you were previously subscribed to the sellers-1 channel and subscribe to sellers-2, you wouldn't be receiving messages from the sellers-1 channel anymore.
You can listen on multiple channels by either passing an array of channels, or by passing them as a arguments:
redisSubscriber.subscribe([ 'sellers-1', 'sellers-2', ... ])
// Or:
redisSubscriber.subscribe('sellers-1', 'sellers-2', ... )
You would obviously have to track each "active" seller subscription. Either that, or create a new connection for each subscription, which also isn't ideal.
It's probably a better idea to have a single Pub/Sub channel on which all changes would get published, instead of a separate channel for each seller.
Finally: if your seller id's aren't hard to guess (for instance, if it's based on an incremental integer value), it would be trivial for someone to write a client that would make it possible to listen in on any seller channel they'd like. It might not be a problem, but it is something to be aware of.

Memcache response: "bad command line format" on Google Cloud Platform with large values using node.js

Modifying the example found on the Google App Engine Documentation to store a large string results in an response Error: bad command line format
Often subsequent requests will then be met with Error: Server at <IP>:11211 not available as if something in the set function briefly knocks the server offline.
Tried running in GAE's shared Memcache as well as with a dedicated 1GB (10,000 MCU per second per GB) with no difference.
Multiple nodejs memcached libraries as well as large Buffer and JSON formatted values all return the same error. All of my research appears to show that this error tends to come from a 'key' greater than 250 characters, but not from a large value (since attempt below is well short of both the Memcache quota and the 1MiB limit per value).
Here's a full app.js to demonstrate:
'use strict';
var express = require('express');
var Memcached = require('memcached');
var app = express();
// The environment variables are automatically set by App Engine when running
// on GAE. When running locally, you should have a local instance of the
// memcached daemon running.
var memcachedAddr = process.env.MEMCACHE_PORT_11211_TCP_ADDR || 'localhost';
var memcachedPort = process.env.MEMCACHE_PORT_11211_TCP_PORT || '11211';
var memcached = new Memcached(memcachedAddr + ':' + memcachedPort);
app.get('/', function(req, res, next) {
memcached.get('foo', function(err, value) {
if (err) { return next(err); }
if (value) {
console.log('Exists');
return res.status(200).send('Value: ' + value);
}
var str = "";
// Make a big string
var loops = 1000;
// works with loops = 10, fails with loops = 1000
for(var i = 0; i < loops; i++){
str += "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
}
memcached.set('foo', str, 60, function(err) {
if (err) { return next(err); }
console.log('Created');
return res.redirect('/');
});
});
});
var server = app.listen(process.env.PORT || 8080, '0.0.0.0', function() {
console.log('App listening at http://%s:%s', server.address().address,
server.address().port);
console.log('Press Ctrl+C to quit.');
});
The same large string can be manually set in Memcache using the Google Cloud Console form. Subsequent calls to retrieve the value also work. Am I drastically miscalculating MCU based on the value size?
When running locally (OSX) I experience zero issues and am able to store very large values. Error only occurs with deployed code. Any advice or direction is much appreciated.
There is currently a bug in the Memcache proxy preventing values of over 4kb being written. There is a buffer in between which is truncating the packet confusing the memcache server. For the time being, any data over 4kb should be stored in some sort of database (cookies won't work either since cookies also can't be stored over 4kb). I will update this question once the issue is fixed.
Edit: Issue should now be fixed.

Sticky socket.io sessions by cookie for node.js cluster without sticky express sessions

I am working with the express and socket.io libraries of Node.js on the same server, listening on the same port. I would like to use the cluster module to support round-robin load balancing, but I want the load-balancing behavior for express and socket.io to be different. The behavior is as follows:
Incoming connections for HTTP/S should connect to any single worker
Incoming connections for WS/S should connect to a specific worker, based on a cookie value---more broadly, based on a value
Are there any available libraries to accomplish my desired behaviors? If not, how should I go about accomplishing these behaviors?
There's a bunch of ways you could do this. I'm gonna link you to this guide on using redis as a pub sub but also I'll give you a super short overview of what it could look like.
So spawn two clusters at start up, or however many you want..
var aWorker = cluster.fork();
var bWorker = cluster.fork();
then you need to set them up to listen on their respective ports, so using the net module:
var server1 = require('net').createServer([options], function( connection ) {
aWorker.send( 'ConnectionEvent' , connection );
}).listen(80); //HTTP/WS
var server2 = require('net').createServer([options], function( connection ) {
bWorker.send( 'ConnectionEvent' , connection );
}).listen(443); //HTTPS/WSS
In your cluster process:
var app_server = require('express')().listen( 0, 'localhost' );
var io = require('socket.io')(app_server);
io.adapter(require('socket.io-redis')({ host: '127.0.0.1', port: **REDIS PORT** });
io.on('connection', function (socket) {
//Rest of your io server code
...
process.on( 'message' , function( message, connection ) {
if( connection && message === 'ConnectionEvent' ) {
app_server.emit( 'connection', connection );
connection.resume();
}
}
I believe the room function of Socket.io would accomplish what you're trying to do in your second point rather than relying on creating new worker tasks. That's just my opinion though.

Pass request to specific forked node instance

Correct me if I am wrong, but it isn't possible to start multiple http-servers on the same port.
Based on this it is interesting the NodeJS cluster may fork. Of cause I know there is the master what is passing the request to one of the forked workers. What worker is managed by operating system or cluster.schedulingPolicy= "rr" for "round robin".
The point is: Every worker needs its own memory, so you need x-times much memory where x is the number of workers.
But if I like to run different (sub)domains out of my node app, I also like to hold different parts of an in_memory database (e.g. a simple JSON file) bound to a (sub)domain. OR based on resources like subdomain.example.tdl/resource1/whatever.
It doesn't seams to be possible. Either resource based nor domain based.
In my opinion it should be possible, because I can route based on request-objects (res.url) and resources (params) by different existing middleware.
So that way it should be possible to tell the master to pass the request to a specific forked instance.
It's possible: you need create net server at master, and pass connection by you rules to workers http server:
var cluster = require('cluster');
if (cluster.isMaster) {
var workers = [];
// Create workers
for (var i=0; i<require('os').cpus().length; i++) {
workers[i] = cluster.fork({WORKER_INDEX:i, JSON_INDEX:i});
}
// Create net server at master
var server = require('net').createServer({pauseOnConnect:true}, function(c) {
var b = Math.floor( Math.random()*workers.length );
workers[b].send("doit",c);
}).listen(3000);
} else {
// Load specific data for worker (pass parametr JSON_INDEX)
var json = "{default:default}";
try {
json = require("fs").readFileSync('./data_'+process.env.JSON_INDEX+'.json');
} catch (e) {}
// Create http server and pass specific json to client
var server = require('http').createServer( function(req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end(json);
}).listen(0,'127.0.0.1');
// Get message from master and check if need pass to http server
process.on('message', function(m,c) {
if ( "doit" === m ) {
server.emit('connection', c);
c.resume();
}
});
}

Socket.io 'Handshake' failing with cluster and sticky-session

I am having problems getting the sticky-sessions socket.io module to work properly with even a simple example. Following the very minimal example given in the readme (https://github.com/indutny/sticky-session), I am just trying to get this example to work:
var cluster = require('cluster');
var sticky = require('sticky-session');
var http = require('http');
if (cluster.isMaster) {
for (var i = 0; i < 4; i++) {
cluster.fork();
}
Object.keys(cluster.workers).forEach(function(id) {
console.log("Worker running with ID : " +
cluster.workers[id].process.pid);
});
}
if (cluster.isWorker) {
var anotherServer = http.createServer(function(req, res) {
res.end('hello world!');
});
anotherServer.listen(3000);
console.log('http server on 3000');
}
sticky(function() {
var io = require('socket.io')();
var server = http.createServer(function(req, res) {
res.end('socket.io');
});
io.listen(server);
io.on('connection', function onConnect(socket) {
console.log('someone connected.');
socket.on('sync', sync);
socket.on('send', send);
function sync(id) {
socket.join(id);
console.log('someone joined ' + id);
}
function send(id, msg) {
io.sockets.in(id).emit(msg);
console.log('someone sent ' + msg + ' to ' + id);
}
});
return server;
}).listen(3001, function() {
console.log('socket.io server on 3001')
});
and a simple client:
var socket = require('socket.io-client')('http://localhost:3001');
socket.on('connect', function() {
console.log('connected')
socket.emit('sync', 'secret')
});
The workers start up fine. The http servers work fine. But when the client connects, the console logs 'someone connected' and nothing more. The client never fires the on connect event, so I think the upgrade/handshake is failing or something. If anyone can spot what I am doing wrong that would help alot.
Thanks!
#jordyyy : I was facing same issue after googling I have fond answer.
Socket.Io handshaking task complete in more than one request and when you will run on sticky session it means you are using multiple process according to your core.
So handshaking request will distribute on different different process and they can't talk.(not IPC) (They are child process) and most of time connection will be failed/lost.(connection-disconnect event occurs frequently )
So what is solution ? Solution is socketio-sticky-session
Socketio-sticky-session, manage connection on IP based. So when you will request by any client then it will maintain ip address with respect process/worker. So further request will be forward to same process/worker and your connection properly stabilized.
And When you will use redies adapter then you can actually maintain socket
connection data b/w all processes/workers.
For more information
https://github.com/elad/node-cluster-socket.io
(you need some patch on worker_index method, if your server is supporting IPv6)
Just knowledge bytes. :) :)
One more thing, you don't need to fork process. It will be done by sticky session.
This was super old and wasn't really answered when i needed it, but my solution was to drop this bad module and any other super confusing module and just use pub/sub with redis adapter. The only other step was to force transports to websockets, and if that bothers anyone then use something else. For my purposes my solution was simple, readable, didn't mess with the 'typical' socket.io api, and best of all it worked extremely well.

Resources