Node Cluster issue using Socket.io and Redis

Node Cluster issue using Socket.io and Redis - node.js

Ok, I have an express-powered API where I also have socket.io running to receive/send realtime events...all works just dandy. I need to cluster my app. I set everything up based on the below code. I spin up workers, they get connections and everything works, except the fact that now I can't "blast" to all socket.io connections. Here is the setup (taken from this):
var express = require('express'),
cluster = require('cluster'),
net = require('net'),
sio = require('socket.io'),
sio_redis = require('socket.io-redis');
var port = 3000,
num_processes = require('os').cpus().length;
if (cluster.isMaster) {
// This stores our workers. We need to keep them to be able to reference
// them based on source IP address. It's also useful for auto-restart,
// for example.
var workers = [];
// Helper function for spawning worker at index 'i'.
var spawn = function(i) {
workers[i] = cluster.fork();
// Optional: Restart worker on exit
workers[i].on('exit', function(worker, code, signal) {
console.log('respawning worker', i);
spawn(i);
});
};
// Spawn workers.
for (var i = 0; i < num_processes; i++) {
spawn(i);
}
// Helper function for getting a worker index based on IP address.
// This is a hot path so it should be really fast. The way it works
// is by converting the IP address to a number by removing the dots,
// then compressing it to the number of slots we have.
//
// Compared against "real" hashing (from the sticky-session code) and
// "real" IP number conversion, this function is on par in terms of
// worker index distribution only much faster.
var workerIndex = function (ip, len) {
var _ip = ip.split(/['.'|':']/),
arr = [];
for (el in _ip) {
if (_ip[el] == '') {
arr.push(0);
}
else {
arr.push(parseInt(_ip[el], 16));
}
}
return Number(arr.join('')) % len;
}
// Create the outside facing server listening on our port.
var server = net.createServer({ pauseOnConnect: true }, function(connection) {
// We received a connection and need to pass it to the appropriate
// worker. Get the worker for this connection's source IP and pass
// it the connection.
var worker = workers[worker_index(connection.remoteAddress, num_processes)];
worker.send('sticky-session:connection', connection);
}).listen(port);
} else {
// Note we don't use a port here because the master listens on it for us.
var app = new express();
// Here you might use middleware, attach routes, etc.
// Don't expose our internal server to the outside.
var server = app.listen(0, 'localhost'),
io = sio(server);
// Tell Socket.IO to use the redis adapter. By default, the redis
// server is assumed to be on localhost:6379. You don't have to
// specify them explicitly unless you want to change them.
io.adapter(sio_redis({ host: 'localhost', port: 6379 }));
// Here you might use Socket.IO middleware for authorization etc.
// Listen to messages sent from the master. Ignore everything else.
process.on('message', function(message, connection) {
if (message !== 'sticky-session:connection') {
return;
}
// Emulate a connection event on the server by emitting the
// event with the connection the master sent us.
server.emit('connection', connection);
connection.resume();
});
}
So I connect from various machines to test concurrency, workers do their thing and all is good, but when I get an IO connection, I'm logging the TOTAL "connected" count and it's always 1 per instance. I need a way to say
allClusterForks.emit(stuff)
I get the connection on the correct worker pid, but "ALL CONNECTIONS" always returns 1.
io.on('connection', function(socket) {
console.log('Connected to worker %s', process.pid);
console.log("Adapter ROOMS %s ", io.sockets.adapter.rooms);
console.log("Adapter SIDS %s ", io.sockets.adapter.sids);
console.log("SOCKETS CONNECTED %s ", Object.keys(io.sockets.connected).length);
});
I can see the subscribe/unsubscribe coming in using Redis MONITOR
1454701383.188231 [0 127.0.0.1:63150] "subscribe" "socket.io#/#gXJscUUuVQGzsYJfAAAA#"
1454701419.130100 [0 127.0.0.1:63167] "subscribe" "socket.io#/#geYSvYSd5zASi7egAAAA#"
1454701433.842727 [0 127.0.0.1:63167] "unsubscribe" "socket.io#/#geYSvYSd5zASi7egAAAA#"
1454701444.630427 [0 127.0.0.1:63150] "unsubscribe" "socket.io#/#gXJscUUuVQGzsYJfAAAA#"
These are connections from 2 different machines, I would expect by using the socket io redis adapter that these subscriptions would be coming in on the same redis connection, but they are different.
Am I just totally missing something? There's a surprising lack of documentation/articles out there for this that aren't either completely outdated/wrong/ambiguous.
EDIT:
Node v5.3.0
Redis v3.0.6
Socket.io v1.3.7

So if anyone comes across this, I figured out that actually "looking" at the counts of connected sockets across processes is not a thing, but broadcasting or emitting to them is. So I've basically just been "testing" for no reason. All works as expected. I WILL be rewriting the socket.io-redis adapter to allow checking counts across processes.
There was a pull request a few years ago to implement support for what I was trying to do. https://github.com/socketio/socket.io-redis/pull/15 and I might try cleaning that up and re-submitting.

Related

socket.io: clientsCount and on connection and on disconnect events

I count the number of online websockets via onConnection and onDisconnect events:
const socketIo = require('socket.io');
var on_connect = 0;
var on_disconnect = 0;
var port = 6001;
var io = socketIo(port, {
pingTimeout: 5000,
pingInterval: 10000
});
//I have only one NameSpace and root NS is not used
var ns1 = io.of('ns1');
ns1
.on('connection', function (socket) {
on_connect += 1;
socket.on('disconnect', function (reason) {
on_disconnect += 1;
});
});
...
var online = on_connect - on_disconnect;
...
But online value not equal io.engine.clientsCount value.
And over time the difference between online value and io.engine.clientsCount value is growing up.
Why this is happens?
What is needed to make to fix this?

on_connect and on_disconnect variables are updates in the callback events, whereas the online variable is not recalculated. So you will need to recalculate the online variable every time the other variable change.

It might be easier to use only one variable to count connections. Increment it on connection, and decrement it on disconnect. That's how I keep track of the number of connections. Then there isn't a need to calculate it every time its value is needed.
Also, the line that states var online = on_connect - on_disconnect; Is occurring before either is modified... That's what #gvmani is trying to tell you.
Here's an example of some of what I'm doing. The code below sets up to listen for connections & disconnections and maintains a count of current connections. I should note that I'm not using a namespace like the OP, but the counting portion is what's of importance. I'll also note that I use connCount > 0 in the send() function. Which in my application is used to broadcast to all connected clients.
/* ******************************************************************** */
// initialize the server
const http = require('http');
const server = http.createServer();
// Socket.io listens to our server
const io = require('socket.io').listen(server);
// Count connections as they occur, decrement when a client disconnects.
// If the counter is zero then we won't send anything over the socket.
var connCount = 0;
// A client has connected,
io.on('connection', function(socket) {
// Increment the connection counter
connCount += 1;
// log the new connection for debugging purposes.
console.log(`on connect - ${socket.id} ${connCount}`);
// The client that initiated the connection has disconnected.
socket.on('disconnect', function () {
connCount -= 1;
console.log(`on disconnect - ${socket.id} ${connCount}`);
});
});
// Start listening...
server.listen(3000);
// Send something to all connected clients (a broadcast) the
// 'channel' will indicate the destination within the client
// and 'data' becomes the payload.
function send(channel, data) {
console.log(`send() - channel = ${channel} payload = ${JSON.stringify(data)}`);
if(connCount > 0) io.emit(channel, {payload: data});
else console.log('send() - no connections');
};

Node-Red: Create server and share input

I'm trying to create a new node for Node-Red. Basically it is a udp listening socket that shall be established via a config node and which shall pass all incoming messages to dedicated nodes for processing.
This is the basic what I have:
function udpServer(n) {
RED.nodes.createNode(this, n);
this.addr = n.host;
this.port = n.port;
var node = this;
var socket = dgram.createSocket('udp4');
socket.on('listening', function () {
var address = socket.address();
logInfo('UDP Server listening on ' + address.address + ":" + address.port);
});
socket.on('message', function (message, remote) {
var bb = new ByteBuffer.fromBinary(message,1,0);
var CoEdata = decodeCoE(bb);
if (CoEdata.type == 'digital') { //handle digital output
// pass to digital handling node
}
else if (CoEdata.type == 'analogue'){ //handle analogue output
// pass to analogue handling node
}
});
socket.on("error", function (err) {
logError("Socket error: " + err);
socket.close();
});
socket.bind({
address: node.addr,
port: node.port,
exclusive: true
});
node.on("close", function(done) {
socket.close();
});
}
RED.nodes.registerType("myServernode", udpServer);
For the processing node:
function ProcessAnalog(n) {
RED.nodes.createNode(this, n);
var node = this;
this.serverConfig = RED.nodes.getNode(this.server);
this.channel = n.channel;
// how do I get the server's message here?
}
RED.nodes.registerType("process-analogue-in", ProcessAnalog);
I can't figure out how to pass the messages that the socket receives to a variable number of processing nodes, i.e. multiple processing nodes shall share on server instance.
==== EDIT for more clarity =====
I want to develop a new set of nodes:
One Server Node:
Uses a config-node to create an UDP listening socket
Managing the socket connection (close events, error etc)
Receives data packages with one to many channels of different data
One to many processing nodes
The processing nodes shall share the same connection that the Server Node has established
The processing nodes shall handle the messages that the server is emitting
Possibly the Node-Red flow would use as many processing Nodes as there are channels in the server's data package
To quote the Node-Red documentation on config-nodes:
A common use of config nodes is to represent a shared connection to a
remote system. In that instance, the config node may also be
responsible for creating the connection and making it available to the
nodes that use the config node. In such cases, the config node should
also handle the close event to disconnect when the node is stopped.
As far as I understood this, I make the connection available via this.serverConfig = RED.nodes.getNode(this.server); but I cannot figure out how to pass data, which is received by this connection, to the node that is using this connection.

A node has no knowledge of what nodes it is connected to downstream.
The best you can do from the first node is to have 2 outputs and to send digital to one and analogue to the other.
You would do this by passing an array to the node.send() function.
E.g.
//this sends output to just the first output
node.sent([msg,null]);
//this sends output to just the second output
node.send([null,msg]);
Nodes that have receive messagess need to add a listener for input
e.g.
node.on('input', function(msg) {
...
});
All of this is well documented on the Node-RED page
The other option is if the udpServer node is a config node then you need to implement your own listeners, best bet is to look something like the MQTT nodes in core for examples of pooling connections

Node.js net module remoteAddress is undefined

I have a Node app running on AWS Elastic Beanstalk (so Node running behind Nginx). I am running socket IO with redis as the memory store and have Node running clustered using the cluster module. Generally everything works great but every now and then I get a user trying to connect that throws an undefined error on the connection.remoteAddress. My code looks like this for the connections:
if (cluster.isMaster) {
/*
------------------------------------------------------------------------------------
This stores our workers. We need to keep them to be able to reference
them based on source IP address. It's also useful for auto-restart
We also setup a message listener to every worker in order to blast AND FILTER
socket io updates to all nodes.
------------------------------------------------------------------------------------
*/
var workers = [];
var messageRelay = function(msg) {
for(var i = 0; i < workers.length; i++) {
workers[i].send(msg);
}
};
var spawn = function(i) {
workers[i] = cluster.fork();
console.log("Hello from worker %s",workers[i].process.pid);
workers[i].on('message', messageRelay);
/*
----------------------------------------
Restart worker if it gets destroyed
----------------------------------------
*/
workers[i].on('disconnect', function(worker) {
console.log('Worker disconnected');
});
workers[i].on('exit', function(worker, code, signal) {
console.log('respawning worker', i);
spawn(i);
});
};
for (var i = 0; i < cpuCount; i++) {
spawn(i);
}
/*
--------------------------------------------------------------------------------------------
Helper function for getting a worker index based on IP address (supports IPv4 AND IPv6)
This is a hot path so it should be really fast. The way it works
is by converting the IP address to a number by removing the dots (for IPv4) and removing
the :: for IPv6, then compressing it to the number of slots we have.
Compared against "real" hashing (from the sticky-session code) and
"real" IP number conversion, this function is on par in terms of
worker index distribution only much faster.
--------------------------------------------------------------------------------------------
*/
var workerIndex = function (ip, len) {
var _ip = ip.split(/['.'|':']/),
arr = [];
for (el in _ip) {
if (_ip[el] == '') {
arr.push(0);
}
else {
arr.push(parseInt(_ip[el], 16));
}
}
return Number(arr.join('')) % len;
}
/*
------------------------------------------------------------------------------------
Create the outside facing server listening on our port.
------------------------------------------------------------------------------------
*/
var server = net.createServer({ pauseOnConnect: true }, function(connection) {
/*
------------------------------------------------------------------------------------
We received a connection and need to pass it to the appropriate
worker. Get the worker for this connection's source IP and pass
it the connection.
------------------------------------------------------------------------------------
*/
if(connection.remoteAddress === undefined) {
console.log("BLEH: %o ", connection.remoteAddress);
return;
}
else {
var worker = workers[workerIndex(connection.remoteAddress, cpuCount)];
worker.send('sticky-session:connection', connection);
}
}).listen(port, function() {
console.log("Spun up worker %s", process.pid);
console.log('Server listening on *:' + port);
});
}
else {
var sio = require('socket.io');
var redis = require('socket.io-redis');
var ioEvents = require(__base + 'lib/ioEvents');
var app = new express();
/*
------------------------------------------------------------------------------------
Note we don't use a port here because the master listens on it for us.
------------------------------------------------------------------------------------
*/
var server = app.listen(0, 'localhost'),
io = sio(server);
/*
----------------------------------------------------------------------------------------------
Using Redis as the store instead of memory. This allows us to blast socket updates
to all processes (unfiltered). For example, we can do io.sockets.emit("message")
and it will be distributed to all node processes.
We cannot filter these messages to specific socket connections or specific configurations
(e.g. updateSquares(socket)), in order to do that we must blast an update to all workers
and let each process filter the request individually.
----------------------------------------------------------------------------------------------
*/
io.adapter(redis({host:'localhost', port: portRedis}));
/*
------------------------------------------------------------------------------------
Setup the socket listeners
------------------------------------------------------------------------------------
*/
ioEvents.incoming(io);
/*
------------------------------------------------------------------------------------
Listen to master for worker process updates
------------------------------------------------------------------------------------
*/
process.on('message', function(message, connection) {
/*
------------------------------------------------------------------------------------
Listen for special updates to all nodes
------------------------------------------------------------------------------------
*/
if(message.squareUpdate) {
console.log("worker %s received message %o", process.pid, message.squareUpdate);
ioEvents.squaresForceUpdate(message.squareUpdate);
}
/*
------------------------------------------------------------------------------------
If it's not a special message, then check to make sure it's just a sticky-session
Otherwise, just bail, no need to do anything else
------------------------------------------------------------------------------------
*/
if (message !== 'sticky-session:connection') {
return;
}
/*
------------------------------------------------------------------------------------
| Emulate a connection event on the server by emitting the
| event with the connection the master sent us.
------------------------------------------------------------------------------------
*/
server.emit('connection', connection);
connection.resume();
});
So the problem lies in the section above with the "BLEH" log. For some reason, remoteAddress is undefined...but only SOMETIMES. Most of the connections look just fine, but randomly I'll get a user trying to connect that throws that error. I'd like to understand what is going on here. I've read that I cannot do IP stuff when there is a proxy involved (something between Node and the User)...but 98% of the time, the connections to workers are fine and everything works as expected. Any help here is really appreciated.

NodeJS on multiple processors (PM2, Cluster, Recluster, Naught)

I am investigating options for running node in a multi-core environment.
I'm trying to determine the best approach and so far I've seen these options
Use built in cluster library to spin up works and respond to signals
Use PM but, PM2 -i is listed as beta.
Naught
Recluster
Are there other alternatives? What are folks using in production?

I've been using the default cluster library, and it works very well. I've had over 10,000 concurrents(multiple clusters on multiple servers) and it works very well.
It is suggested to use clusters with domain for error handling.
This is lifted straight from http://nodejs.org/api/domain.html I've mades some changes on how it spawns new clusters for each core of your machine. and got rid of if/else and added express.
var cluster = require('cluster'),
http = require('http'),
PORT = process.env.PORT || 1337,
os = require('os'),
server;
function forkClusters () {
var cpuCount = os.cpus().length;
// Create a worker for each CPU
for (var i = 0; i < cpuCount ; i += 1) {
cluster.fork();
}
}
// Master Process
if (cluster.isMaster) {
// You can also of course get a bit fancier about logging, and
// implement whatever custom logic you need to prevent DoS
// attacks and other bad behavior.
//
// See the options in the cluster documentation.
//
// The important thing is that the master does very little,
// increasing our resilience to unexpected errors.
forkClusters ()
cluster.on('disconnect', function(worker) {
console.error('disconnect!');
cluster.fork();
});
}
function handleError (d) {
d.on('error', function(er) {
console.error('error', er.stack);
// Note: we're in dangerous territory!
// By definition, something unexpected occurred,
// which we probably didn't want.
// Anything can happen now!Be very careful!
try {
// make sure we close down within 30 seconds
var killtimer = setTimeout(function() {
process.exit(1);
}, 30000);
// But don't keep the process open just for that!
killtimer.unref();
// stop taking new requests.
server.close();
// Let the master know we're dead.This will trigger a
// 'disconnect' in the cluster master, and then it will fork
// a new worker.
cluster.worker.disconnect();
} catch (er2) {
// oh well, not much we can do at this point.
console.error('Error sending 500!', er2.stack);
}
});
}
// child Process
if (cluster.isWorker) {
// the worker
//
// This is where we put our bugs!
var domain = require('domain');
var express = require('express');
var app = express();
app.set('port', PORT);
// See the cluster documentation for more details about using
// worker processes to serve requests.How it works, caveats, etc.
var d = domain.create();
handleError(d);
// Now run the handler function in the domain.
//
// put all code here. any code included outside of domain.run will not handle errors on the domain level, but will crash the app.
//
d.run(function() {
// this is where we start our server
server = http.createServer(app).listen(app.get('port'), function () {
console.log('Cluster %s listening on port %s', cluster.worker.id, app.get('port'));
});
});
}

We use Supervisor to manage our Node.JS process's, to start them upon boot, and to act as a watchdog in case the process's crash.
We use Nginx as a reverse-proxy to load balance traffic between the process's that listen to different ports
this way each process is isolated from the other.
for example: Nginx listens on port 80 and forwards traffic to ports 8000-8003

I was using PM2 for quite a while, but their pricing is expensive for my needs as I'm having my own analytics environment and I don't require support, so I decided to experiment alternatives. For my case, just forever made the trick, very simple one actually:
forever -m 5 app.js
Another useful example is
forever start app.js -p 8080

Socket.IO Maximum call stack size exceeded

I've written a small Socket.IO server, which works fine, I can connect to it, I can send/receive messages, so everything is working ok. Just the relevant part of the code is presented here:
var RedisStore = require('socket.io/lib/stores/redis');
const pub = redis.createClient('127.0.0.1', 6379);
const sub = redis.createClient('127.0.0.1', 6379);
const store = redis.createClient('127.0.0.1', 6379);
io.configure(function() {
io.set('store', new RedisStore({
redisPub : pub,
redisSub : sub,
redisClient : store
}));
});
io.sockets.on('connection', function(socket) {
socket.on('message', function(msg) {
pub.publish("lobby", msg);
});
/*
* Subscribe to the lobby and receive messages.
*/
var sub = redis.createClient('127.0.0.1', 6379);
sub.subscribe("lobby");
sub.on('message', function(channel, msg) {
socket.send(msg);
});
});
I've also written a script presented below that connects to the server and spawns connections in the setInterval function, which spawns a new connection each 10milisecons, so it's spawning quite a lot of connections.
#!/usr/bin/env node
var io = require('socket.io-client');
var reconn = {'force new connection': true};
var sockets = [];
var num = 1000;
function startSocket(i) {
sockets[i] = io.connect("http://127.0.0.1:8080", reconn);
sockets[i].on('connect', function() {
console.log("Socket["+i+"] connected.");
});
sockets[i].on('message', function(msg) {
console.log("Socket["+i+"] Message received: "+msg);
});
}
/*
* Start number of sockets.
*/
for(var i=0; i<num; i++) {
startSocket(i);
}
/*
* Send messages forever.
*/
setInterval(function() {
for(var i=0; i<num; i++) {
sockets[i].send("Hello from socket "+i+".");
}
}, 10);
This script is a benchmark tool spawning 1000 connections to the server, but when running for several minutes, the server dies with the following error message:
node.js:0 // Copyright Joyent, Inc. and other Node contributors. ^
RangeError: Maximum call stack size exceeded
I know that there's not enough stack space available so the exception occurs and the process is terminated, but even if I enlarge the stack with the --stack-size variable, this doesn't actually solve the problem, because I can always spawn more connections, which will eventually kill the server.
My question is: how can I prevent this. This is an effective DoS scenario, where anybody can hack together this little script and force the node server to terminate, but I would like to prevent this from happening. I would like Node server to never terminate, just process messages slowly.
Any ideas if this can be prevented. I'm not sure that I would like to block IPs, since I would also like mobile phones to login to the system, where many of them use the same IP, so the node server can mistakenly think a DoS is being in place by one mobile network operation and blocks its IP.
Thank you

If you would like your node server to run forever, no matter what, use https://github.com/nodejitsu/forever
As for the Exception - My hunch is that var sub = redis.createClient('127.0.0.1', 6379); may allocate a variable in the stack each time a connection is established.
I would first try to put var subs = [] in the global scope and
subs[socket.id] = redis.createClient('127.0.0.1', 6379);
Or something like socket.sub = redis.createClient('127.0.0.1', 6379); to piggyback the existing, hopefully heap based, socket.io data structures.
If not working, try to isolate the problem by removing the use of Redis...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Node Cluster issue using Socket.io and Redis - node.js

Related

socket.io: clientsCount and on connection and on disconnect events

Node-Red: Create server and share input

Node.js net module remoteAddress is undefined

NodeJS on multiple processors (PM2, Cluster, Recluster, Naught)

Socket.IO Maximum call stack size exceeded

Categories

Resources