Efficient Socket.io distribution with Mongoose stream - node.js

I'm trying to create an efficient streaming node.js app, where the server would connect to a stream (capped collection) in MongoDB with mongoose, and then emit the stream directly to the client browsers.
What I'm worried about is the scalability of my design. Let me know if I'm wrong, but it seems that right now, for every new web browser that is opened, a new connection to MongoDB will also be opened (it won't re-use the previously utilized connection), and therefore there will be a lot of inefficiencies if I have a lot of user connected at the same time. How can I improve that?
I'm thinking of a one server - multiple client type of design in socket.io but I don't know how to achieve that.
Code below:
server side (app.js):
io.on('connection', function (socket) {
console.log("connected!");
var stream = Json.find().lean().tailable({ "awaitdata": true, numberOfRetries: Number.MAX_VALUE}).stream();
stream.on('data', function(doc){
socket.emit('rmc', doc);
}).on('error', function (error){
console.log(error);
}).on('close', function () {
console.log('closed');
});
});
client side (index.html):
socket.on('rmc', function(json) {
doSomething(); // it just displays the data on the screen
});

Unfortunately this will not depend only on mongo performance . unless you have a high level of concurrency (+1000 streams) you shouldn't worry about mongo (for the moment).
because with that kind of app you have bigger problems example: the data type and compression , buffer overflows , bandwith limit , socket.io limits , os limits . These are the kind of problems you will most likely face first.
now to answer your question. As far as i know no you are not opening a connection to mongo per user. the users are connected to the app not the database . the app is connected with the database.
lastly , these links will help you understand and tweak your queries for this kind of job (streaming)
https://github.com/Automattic/mongoose/issues/1248
https://codeandcodes.com/tag/mongoose-vs-mongodb-native/
http://drewww.github.io/socket.io-benchmarking/
hope it helps !

Related

How to automate API get data request? when using web sockets

As far as I know Web Sockets allows bi-directional communication. and web sockets (for example: Socket.io) connections are always open. so, whenever new data has arrived data should be automatically pushed to the view via socket.
but in below code I am using set_interval to make a http.get call. and set_interval is called once every 1 second.
now, doing these does not give a real-time feel that is, the new data is pulled once every 1 second. which is statically defined.
in-short, I want to automate what set_interval does in below code. I don't want a static fetch interval value. This is because at-times stock price could change within 100ms and at times it would change once in few seconds.
Now, if I set interval to 1 sec, that is make a call every 1 second. the real feel of high fluctuation in market move would not be seen.
I am not sure how usually developers fetch data in IOT applications. for example car is monitored in real-time and let's say speed of the car is fetched in real time and graphed on a web or mobile application.
How do I achieve something similar like that in Stock Ticker? I want to simply plugin the application to an API and when new data arrives instantly push it to all the viewers (subscribers) in real-time.
Code below
////
// CONFIGURATION SETTINGS
////
var FETCH_INTERVAL = 1000;
var PRETTY_PRINT_JSON = true;
////
// START
////
var express = require('express');
var http = require('http');
var https = require('https');
var io = require('socket.io');
var cors = require('cors');
function getQuote(socket, ticker) {
https.get({
port: 443,
method: 'GET',
hostname: 'www.google.com',
path: '/finance/info?client=ig&q=' + ticker,
timeout: 1000
}, function(response) {
response.setEncoding('utf8');
var data = '';
response.on('data', function(chunk) {
data += chunk;
});
response.on('end', function() {
if(data.length > 0) {
var dataObj;
try {
dataObj = JSON.parse(data.substring(3));
} catch(e) {
return false;
}
socket.emit(ticker, dataObj[0].l_cur);
}
});
});
}
I am making a call to method getQuote depending on FETCH_INTERVAL set above
function trackTicker(socket, ticker) {
// run the first time immediately
getQuote(socket, ticker);
// every N seconds
var timer = setInterval(function() {
getQuote(socket, ticker);
}, FETCH_INTERVAL);
socket.on('disconnect', function () {
clearInterval(timer);
});
}
var app = express();
app.use(cors());
var server = http.createServer(app);
var io = io.listen(server);
io.set('origins', '*:*');
app.get('/', function(req, res) {
res.sendfile(__dirname + '/index.html');
});
io.sockets.on('connection', function(socket) {
socket.on('ticker', function(ticker) {
trackTicker(socket, ticker);
});
});
server.listen(process.env.PORT || 4000);
Edits - Update
Okay, so I would need real-time feed. (this bit is sorted)
As far as I know, Real-time feeds are quite expensive and buying 10,000+ end points for each online client is quite expensive.
1) How do I make use of real-time feed to serve 1000s of end users? Can I use web sockets, Redis, publish/subscribe, broadcasting or some technology that copies real-time feed to tonnes of users? I want a efficient solution because I want to keep the expense of real-time data feed as low as possible.
How do I tackle that issue?
2) Yes, I understand polling needs to be done on server side and not on a client-side (to avoid doing polling for each client). but then what tech do I need to use? websockets, redis, pub/sub etc..
I have API URL and a token to access the API.
3) I am not just in need to fetch the data and push it to end users. But I would need to do some computation on the fetched data, will need to pull data from Redis or database as well and do calculations on it then push it to the view.
for example:
1) data I get in real-time market feed {"a":10, "b":20}
2) get data from DB or Redis {"x":2, "y":4}
3) do computation : z = a * x + b * y
4) finally push value of z in the view.
How do I do all these in real-time at the same-time push it to multiple clients?
Can you share a roadmap with me? I got the first piece of the puzzle getting real-time datafeed.
1) How do I make use of real-time feed to serve 1000s of end users? Can I use web sockets, Redis, publish/subscribe, broadcasting or some technology that copies real-time feed to tonnes of users? I want a efficient solution because I want to keep the expense of real-time data feed as low as possible.
How do I tackle that issue?
To "push" data to browser clients, you would want to use a webSocket or socket.io (built on top of webSockets). Then, anytime your server knows there's an update, it can immediately send that update to any currently connected client that is interested in that info. The basic idea is that the client connects to your server as soon as the web page is loaded and keeps that connection open for as long as the web page(s) are open.
2) Yes, I understand polling needs to be done on server side and not on a client-side (to avoid doing polling for each client). but then what tech do I need to use? websockets, redis, pub/sub etc..
It isn't clear to me what exactly you're asking about here. You will get updated prices using whatever the most efficient technology is that is offered by your provider. If all they provide is http calls, then you have to poll regularly using http requests. If they provide a webSocket interface to get updates, then that would be preferable.
There are lots of choices for how to keep track of which clients are interested in which pieces of information and how to distribute the updates. For a single server, you could easily build your own with just a Map of stock prices where the stock symbol is the key and an array of client identifiers is the value in the Map. Then, any time you get an update for a given stock, you just fetch the list of client IDs that are interested in that stock and send the update to them (over their webSocket/socket.io connection).
This is also a natural pub/sub type of application so anyone of the backends that support pub/sub would work just fine too. You could even use an EventEmitter where you .emit(stock, price) and each separate connection adds a listener for the stock symbols they are interested in.
For multiple servers at scale, you'd probably want to use some external process that manages the pub/sub process. Redis is a candidate for that.
3) I am not just in need to fetch the data and push it to end users. But I would need to do some computation on the fetched data, will need to pull data from Redis or database as well and do calculations on it then push it to the view.
I don't really see what question there is here. Pick your favorite database to store the info you need to fetch so you can get it upon demand.
How do I do all these in real-time at the same-time push it to multiple clients? Can you share a roadmap with me? I got the first piece of the puzzle getting real-time datafeed.
Real-time data feed.
Database to store your meta data used for calculations.
Some pub/sub system, either home built or from a pre-built package.
Then, follow this sequence of events.
Client signs in, connects a webSocket or socket.io connection.
Server accepts client connection and assigns a clientID and keeps track of the connection in some sort of Map between clientID and webSocket/socket.io connection. FYI, socket.io does this automatically for you.
Client tells server which items it wants to monitor (probably message sent over webSocket/socket.io connection.
Server registers that interest in pub/sub system (essentially subscribing the client to each item it wants to monitor.
Other clients do the same thing.
Each time client requests data on a specific item, the server makes sure that it is getting updates for that item (however the server gets its updates).
Server gets new info for some item that one or more clients is interested in.
New data is sent to pub/sub system and pub/sub system broadcasts that information to those clients that were interested in info on that particular item. The details of how that works depend upon what pub/sub system you choose and how it notifies subscribers of a change, but eventually a message is sent over webSocket/socket.io for the item that has changed.
When a client disconnects, their pub/sub subscriptions are "unsubscribed".

The efficiency of continuously polling MongoDB in Node

I need to continuously update data on the client based on DB changes. I'm thinking about having a 5 second interval function that repeatedly gathers all the DB information and use Socket.IO to emit the data to the client.
Currently, I'm doing this on the client itself without socket.io, just repeatedly doing a REST call to the server which then handles the data.
My question is: Are either of these methods efficient or inefficient and is there a better solution to solve what I'm trying to achieve?
Ryan, you can try using MongoDB's collection.watch() which fires an event every time an update is made to a collection. You would need to do that within the socket connection event for it to work though. Something along these lines:
io.sockets.on('connection', function(socket) {
// when the socket is connected, start listening to MongoDB
const MongoClient = require("mongodb").MongoClient;
MongoClient.connect("mongodb://192.168.1.201")
.then(client => {
console.log("Connected correctly to server");
// specify db and collections
const db = client.db("your_db");
const collection = db.collection("your_collection");
const changeStream = collection.watch();
// start listening to changes
changeStream.on("change", function(change) {
console.log(change);
// this is where you can fire the socket.emit('the_change', change)
});
})
.catch(err => {
console.error(err);
});
});
Note that using this approach will require you to set up a replica set. You can follow those instructions or use a Dockerised replica set such as this one.
I need more details to make sure but it doesn't sound like a good solution.
If the data you need does not change rapidly, like let's say in seconds, each of your connection still polling every 5 seconds and that's kind of wasting.
In that case you might just trigger an event where the data got changed, then you can push the message through sockets that are active.

Need to know something regarding socket.io and redis and nginx

My goal is to build a chat application - similar to whatsapp
To my understanding, socket.io is a real-time communication library written in javascript and it is very simple to use
For example
// Serverside
io.on('connection', function(socket) {
socket.on('chat', function(msg) {
io.emit('chat', msg);
});
});
// ClientSide (Using jquery)
var socket = io();
$('form').submit(function(){
socket.emit('chat', $('#m').val());
$('#m').val('');
return false;
});
socket.on('chat', function(msg){
$('#messages').append($('<li>').text(msg));
});
1) do I always need to start an io.on('connection') to use the real-time feature or i could just start using socket.on object instead? for example i have a route
app.post('/postSomething', function(req, res) {
// Do i need to start an io.on or socket.on here?
});
because i want the real-time feature to be listen only on specific route.
2) Redis is a data structure library which handles the pub/sub, why do we need to use pub/sub mechanism?
I read alot of articles but couldn't grasp the concept. Article example http://ejosh.co/de/2015/01/node-js-socket-io-and-redis-intermediate-tutorial-server-side/
for example the code below
// Do i need redis for this, if so why? is it for caching purposes?
// Where does redis fit in this code?
var redis = require("redis");
var client = redis.createClient();
io.on('connection', function(socket) {
socket.on('chat', function(msg) {
io.emit('chat', msg);
});
});
3) Just wondering why I need nginx to scale node.js application? i found this stackoverflow answer:
Strategy to implement a scalable chat server
It says something about load balancing, read that online and couldn't grasp the concept as well.
So far I have only been dealing with node.js , mongoose simple CRUD application, but I'm willing work really hard if you guys could share some of your knowledge and share some useful resources so that I could deepen my knowledge about all of these technologies.
Cheers!
Q. Socket.on without IO.on
io.on("connection" ... )
Is called when you receive a new connection. Socket.on listens to all the emits at the client side. If you want your client to act as a server for some reason then (in short) yes io.on is required
Q. Redis pub/sub vs Socket.IO
Take a look at this SO question/anwer, quoting;
Redis pub/sub is great in case all clients have direct access to redis. If you have multiple node servers, one can push a message to the others.
But if you also have clients in the browser, you need something else to push data from a server to a client, and in this case, socket.io is great.
Now, if you use socket.io with the Redis store, socket.io will use Redis pub/sub under the hood to propagate messages between servers, and servers will propagate messages to clients.
So using socket.io rooms with socket.io configured with the Redis store is probably the simplest for you.
Redis can act like a message queue if it is a requirement. Redis is a datastore support many datatypes.
Q. Why Nginx with Node.js
Node.js can work standalone but nginx is faster to server static content.
Since nginx is a reverse proxy therefore servers are configured with nginx to handle all the static data (serving static files, doing redirects, handling SSL certificates and serving error pages.
) and every other request is sent to node.js
Check this Quora post as well: Should I host a node.js project without nginx?
Quoting:
Nginx can be used to remove some load from the Node.js processes, for example, serving static files, doing redirects, handling SSL certificates and serving error pages.
You can do everything without Nginx but it means You have to code it yourself, so why not use a fast and proven solution for this.

why is performance of redis+socket.io better than just socket.io?

I earlier had all my code in socket.io+node.js server. I recently converted all the code to redis+socket.io+socket.io+node.js after noticing slow performance when too many users send messages across the server.
So, why socket.io alone was slow because it is not multi threaded, so it handles one request or emit at a time.
What redis does is distribute these requests or emits across channels. Clients subscribe to different channels, and when a message is published on a channel, all the client subscribed to it receive the message. It does it via this piece of code:
sub.on("message", function (channel, message) {
client.emit("message",message);
});
The client.on('emit',function(){}) takes it from here to publish messages to different channels.
Here is a brief code explaining what i am doing with redis:
io.sockets.on('connection', function (client) {
var pub = redis.createClient();
var sub = redis.createClient();
sub.on("message", function (channel, message) {
client.emit('message',message);
});
client.on("message", function (msg) {
if(msg.type == "chat"){
pub.publish("channel." + msg.tousername,msg.message);
pub.publish("channel." + msg.user,msg.message);
}
else if(msg.type == "setUsername"){
sub.subscribe("channel." +msg.user);
}
});
});
As redis stores the channel information, we can have different servers publish to the same channel.
So, what i dont understand is, if sub.on("message") is getting called every time a request or emit is sent, why is redis supposed to be giving better performance? I suppose even the sub.on("message") method is not multi threaded.
As you might know, Redis allows you to scale with multiple node instances. So the performance actually comes after the fact. Utilizing the Pub/Sub method is not faster. It's technically slower because you have to communicate between Redis for every Pub/Sign signal. The "giving better performance" is only really true when you start to horizontally scale out.
For example, you have one node instance (simple chat room) -- that can handle a maximum of 200 active users. You are not using Redis yet because there is no need. Now, what if you want to have 400 active users? Whilst using your example above, you can now achieve this 400 user mark, which is a "performance increase". In the sense you can now handle more users, but not really a speed increase. If that makes sense. Hope this helps!

socketio and redisstore scaling efficiency

I am working on a pretty big project that involves sending data between clients. So, I am just researching on some new technologies out there. Anyways, thought I'd give Nodejs a try. I just have a question about socketio and redis.
When we are using the pub/sub functions in socketio, does every client connection create a new connection to redis? Or, does socketio use a max of create three connections (in total, regardless of the number of clients) to do the pub/sub stuff?
From the source, it seems that each client connection has two associated subscriptions to Redis (this.store in the code), but that each socket.io server has only three connections to Redis (source).
this.store.subscribe('message:' + data.id, function (packet) {
self.onClientMessage(data.id, packet);
});
this.store.subscribe('disconnect:' + data.id, function (reason) {
self.onClientDisconnect(data.id, reason);
});
Redis should be able to handle a lot of connections as well as subscriptions, but benchmarking is recommended as always.

Resources