socketio and redisstore scaling efficiency - node.js

I am working on a pretty big project that involves sending data between clients. So, I am just researching on some new technologies out there. Anyways, thought I'd give Nodejs a try. I just have a question about socketio and redis.
When we are using the pub/sub functions in socketio, does every client connection create a new connection to redis? Or, does socketio use a max of create three connections (in total, regardless of the number of clients) to do the pub/sub stuff?

From the source, it seems that each client connection has two associated subscriptions to Redis (this.store in the code), but that each socket.io server has only three connections to Redis (source).
this.store.subscribe('message:' + data.id, function (packet) {
self.onClientMessage(data.id, packet);
});
this.store.subscribe('disconnect:' + data.id, function (reason) {
self.onClientDisconnect(data.id, reason);
});
Redis should be able to handle a lot of connections as well as subscriptions, but benchmarking is recommended as always.

Related

Efficient Socket.io distribution with Mongoose stream

I'm trying to create an efficient streaming node.js app, where the server would connect to a stream (capped collection) in MongoDB with mongoose, and then emit the stream directly to the client browsers.
What I'm worried about is the scalability of my design. Let me know if I'm wrong, but it seems that right now, for every new web browser that is opened, a new connection to MongoDB will also be opened (it won't re-use the previously utilized connection), and therefore there will be a lot of inefficiencies if I have a lot of user connected at the same time. How can I improve that?
I'm thinking of a one server - multiple client type of design in socket.io but I don't know how to achieve that.
Code below:
server side (app.js):
io.on('connection', function (socket) {
console.log("connected!");
var stream = Json.find().lean().tailable({ "awaitdata": true, numberOfRetries: Number.MAX_VALUE}).stream();
stream.on('data', function(doc){
socket.emit('rmc', doc);
}).on('error', function (error){
console.log(error);
}).on('close', function () {
console.log('closed');
});
});
client side (index.html):
socket.on('rmc', function(json) {
doSomething(); // it just displays the data on the screen
});
Unfortunately this will not depend only on mongo performance . unless you have a high level of concurrency (+1000 streams) you shouldn't worry about mongo (for the moment).
because with that kind of app you have bigger problems example: the data type and compression , buffer overflows , bandwith limit , socket.io limits , os limits . These are the kind of problems you will most likely face first.
now to answer your question. As far as i know no you are not opening a connection to mongo per user. the users are connected to the app not the database . the app is connected with the database.
lastly , these links will help you understand and tweak your queries for this kind of job (streaming)
https://github.com/Automattic/mongoose/issues/1248
https://codeandcodes.com/tag/mongoose-vs-mongodb-native/
http://drewww.github.io/socket.io-benchmarking/
hope it helps !

Need to know something regarding socket.io and redis and nginx

My goal is to build a chat application - similar to whatsapp
To my understanding, socket.io is a real-time communication library written in javascript and it is very simple to use
For example
// Serverside
io.on('connection', function(socket) {
socket.on('chat', function(msg) {
io.emit('chat', msg);
});
});
// ClientSide (Using jquery)
var socket = io();
$('form').submit(function(){
socket.emit('chat', $('#m').val());
$('#m').val('');
return false;
});
socket.on('chat', function(msg){
$('#messages').append($('<li>').text(msg));
});
1) do I always need to start an io.on('connection') to use the real-time feature or i could just start using socket.on object instead? for example i have a route
app.post('/postSomething', function(req, res) {
// Do i need to start an io.on or socket.on here?
});
because i want the real-time feature to be listen only on specific route.
2) Redis is a data structure library which handles the pub/sub, why do we need to use pub/sub mechanism?
I read alot of articles but couldn't grasp the concept. Article example http://ejosh.co/de/2015/01/node-js-socket-io-and-redis-intermediate-tutorial-server-side/
for example the code below
// Do i need redis for this, if so why? is it for caching purposes?
// Where does redis fit in this code?
var redis = require("redis");
var client = redis.createClient();
io.on('connection', function(socket) {
socket.on('chat', function(msg) {
io.emit('chat', msg);
});
});
3) Just wondering why I need nginx to scale node.js application? i found this stackoverflow answer:
Strategy to implement a scalable chat server
It says something about load balancing, read that online and couldn't grasp the concept as well.
So far I have only been dealing with node.js , mongoose simple CRUD application, but I'm willing work really hard if you guys could share some of your knowledge and share some useful resources so that I could deepen my knowledge about all of these technologies.
Cheers!
Q. Socket.on without IO.on
io.on("connection" ... )
Is called when you receive a new connection. Socket.on listens to all the emits at the client side. If you want your client to act as a server for some reason then (in short) yes io.on is required
Q. Redis pub/sub vs Socket.IO
Take a look at this SO question/anwer, quoting;
Redis pub/sub is great in case all clients have direct access to redis. If you have multiple node servers, one can push a message to the others.
But if you also have clients in the browser, you need something else to push data from a server to a client, and in this case, socket.io is great.
Now, if you use socket.io with the Redis store, socket.io will use Redis pub/sub under the hood to propagate messages between servers, and servers will propagate messages to clients.
So using socket.io rooms with socket.io configured with the Redis store is probably the simplest for you.
Redis can act like a message queue if it is a requirement. Redis is a datastore support many datatypes.
Q. Why Nginx with Node.js
Node.js can work standalone but nginx is faster to server static content.
Since nginx is a reverse proxy therefore servers are configured with nginx to handle all the static data (serving static files, doing redirects, handling SSL certificates and serving error pages.
) and every other request is sent to node.js
Check this Quora post as well: Should I host a node.js project without nginx?
Quoting:
Nginx can be used to remove some load from the Node.js processes, for example, serving static files, doing redirects, handling SSL certificates and serving error pages.
You can do everything without Nginx but it means You have to code it yourself, so why not use a fast and proven solution for this.

why is performance of redis+socket.io better than just socket.io?

I earlier had all my code in socket.io+node.js server. I recently converted all the code to redis+socket.io+socket.io+node.js after noticing slow performance when too many users send messages across the server.
So, why socket.io alone was slow because it is not multi threaded, so it handles one request or emit at a time.
What redis does is distribute these requests or emits across channels. Clients subscribe to different channels, and when a message is published on a channel, all the client subscribed to it receive the message. It does it via this piece of code:
sub.on("message", function (channel, message) {
client.emit("message",message);
});
The client.on('emit',function(){}) takes it from here to publish messages to different channels.
Here is a brief code explaining what i am doing with redis:
io.sockets.on('connection', function (client) {
var pub = redis.createClient();
var sub = redis.createClient();
sub.on("message", function (channel, message) {
client.emit('message',message);
});
client.on("message", function (msg) {
if(msg.type == "chat"){
pub.publish("channel." + msg.tousername,msg.message);
pub.publish("channel." + msg.user,msg.message);
}
else if(msg.type == "setUsername"){
sub.subscribe("channel." +msg.user);
}
});
});
As redis stores the channel information, we can have different servers publish to the same channel.
So, what i dont understand is, if sub.on("message") is getting called every time a request or emit is sent, why is redis supposed to be giving better performance? I suppose even the sub.on("message") method is not multi threaded.
As you might know, Redis allows you to scale with multiple node instances. So the performance actually comes after the fact. Utilizing the Pub/Sub method is not faster. It's technically slower because you have to communicate between Redis for every Pub/Sign signal. The "giving better performance" is only really true when you start to horizontally scale out.
For example, you have one node instance (simple chat room) -- that can handle a maximum of 200 active users. You are not using Redis yet because there is no need. Now, what if you want to have 400 active users? Whilst using your example above, you can now achieve this 400 user mark, which is a "performance increase". In the sense you can now handle more users, but not really a speed increase. If that makes sense. Hope this helps!

Socket.IO messaging to multiple rooms

I'm using Socket.IO in my Node Express app, and using the methods described in this excellent post to relate my socket connections and sessions. In a comment the author describes a way to send messages to a particular user (session) like this:
sio.on('connection', function (socket) {
// do all the session stuff
socket.join(socket.handshake.sessionID);
// socket.io will leave the room upon disconnect
});
app.get('/', function (req, res) {
sio.sockets.in(req.sessionID).send('Man, good to see you back!');
});
Seems like a good idea. However, in my app I will often by sending messages to multiple users at once. I'm wondering about the best way to do this in Socket.IO - essentially I need to send messages to multiple rooms with the best performance possible. Any suggestions?
Two options: use socket.io channels or socket.io namespaces. Both are documented on the socket.io website, but in short:
Using channels:
// all on the server
// on connect or message received
socket.join("channel-name");
socket.broadcast.to("channel-name").emit("message to all other users in channel");
// OR independently
io.sockets.in("channel-name").emit("message to all users in channel");
Using namespaces:
// on the client connect to namespace
io.connect("/chat/channel-name")
// on the server receive connections to namespace as normal
// broadcast to namespace
io.of("/chat/channel-name").emit("message to all users in namespace")
Because socket.io is smart enough to not actually open a second socket for additional namespaces, both methods should be comparable in efficiency.

I'm receiving duplicate messages in my clustered node.js/socket.io/redis pub/sub application

I'm using Node.js, Socket.io with Redisstore, Cluster from the Socket.io guys, and Redis.
I've have a pub/sub application that works well on just one Node.js node. But, when it comes under heavy load is maxes out just one core of the server since Node.js isn't written for multi-core machines.
As you can see below, I'm now using the Cluster module from Learnboost, the same people who make Socket.io.
But, when I fire up 4 worker processes, each browser client that comes in and subscribes gets 4 copies of each message that is published in Redis. If there are are three worker processes, there are three copies.
I'm guessing I need to move the redis pub/sub functionality to the cluster.js file somehow.
Cluster.js
var cluster = require('./node_modules/cluster');
cluster('./app')
.set('workers', 4)
.use(cluster.logger('logs'))
.use(cluster.stats())
.use(cluster.pidfiles('pids'))
.use(cluster.cli())
.use(cluster.repl(8888))
.listen(8000);
App.js
redis = require('redis'),
sys = require('sys');
var rc = redis.createClient();
var path = require('path')
, connect = require('connect')
, app = connect.createServer(connect.static(path.join(__dirname, '../')));
// require the new redis store
var sio = require('socket.io')
, RedisStore = sio.RedisStore
, io = sio.listen(app);
io.set('store', new RedisStore);io.sockets.on('connection', function(socket) {
sys.log('ShowControl -- Socket connected: ' + socket.id);
socket.on('channel', function(ch) {
socket.join(ch)
sys.log('ShowControl -- ' + socket.id + ' joined channel: ' + ch);
});
socket.on('disconnect', function() {
console.log('ShowControll -- Socket disconnected: ' + socket.id);
});
});
rc.psubscribe('showcontrol_*');
rc.on('pmessage', function(pat, ch, msg) {
io.sockets.in(ch).emit('show_event', msg);
sys.log('ShowControl -- Publish sent to channel: ' + ch);
});
// cluster compatiblity
if (!module.parent) {
app.listen(process.argv[2] || 8081);
console.log('Listening on ', app.address());
} else {
module.exports = app;
}
client.html
<script src="http://localhost:8000/socket.io/socket.io.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.0/jquery.min.js"></script>
<script>
var socket = io.connect('localhost:8000');
socket.emit('channel', 'showcontrol_106');
socket.on('show_event', function (msg) {
console.log(msg);
$("body").append('<br/>' + msg);
});
</script>
I've been battling with cluster and socket.io. Every time I use cluster function (I use the built in Nodejs cluster though) I get alot of performance problems and issues with socket.io.
While trying to research this, I've been digging around the bug reports and similar on the socket.io git and anyone using clusters or external load balancers to their servers seems to have problems with socket.io.
It seems to produce the problem "client not handshaken client should reconnect" which you will see if you increase the verbose logging. This appear alot whenever socket.io runs in a cluster so I think it reverts back to this. I.E the client gets connected to randomized instance in the socket.io cluster every time it does a new connection (it does several http/socket/flash connections when authorizing and more all the time later when polling for new data).
For now I've reverted back to only using 1 socket.io process at a time, this might be a bug but could also be a shortcoming of how socket.io is built.
Added: My way of solving this in the future will be to assign a unique port to each socket.io instance inside the cluster and then cache port selection on client side.
Turns out this isn't a problem with Node.js/Socket.io, I was just going about it the completely wrong way.
Not only was I publishing into the Redis server from outside the Node/Socket stack, I was still directly subscribed to the Redis channel. On both ends of the pub/sub situation I was bypassing the "Socket.io cluster with Redis Store on the back end" goodness.
So, I created a little app (with Node.js/Socket.io/Express) that took messages from my Rails app and 'announced' them into a Socket.io room using the socket.io-announce module. Now, by using Socket.io routing magic, each node worker would only get and send messages to browsers connected to them directly. In other words, no more duplicate messages since both the pub and sub happened within the Node.js/Socket.io stack.
After I get my code cleaned up I'll put an example up on a github somewhere.

Resources