Building a web app to support team collaboration using Socket.io - node.js

I'm building a web application that will allow team collaboration. That is, a user within a team will be able to edit shared data, and their edits should be pushed to other connected team members.
Are Socket.io rooms a reasonable way of achieving this?
i.e. (roughly speaking):
All connected team members will join the same room (dynamically created upon first team member connecting).
Any edits received by the
server will be broadcast to the room (in addition to being persisted,
etc).
On the client-side, any edits received will be used to update
the shared data displayed in the browser accordingly.
Obviously it will need to somehow handle simultaneous updates to the same data.
Does this seem like a reasonable approach?
Might I need to consider something more robust, such as having a Redis database to hold the shared data during an editing session (with it being 'flushed' to the persistant DB at regular intervals)?

All you need is Socket.IO (with RedisStore) and Express.js. With Socket.IO you can setup rooms and also limit the access per room to only users who are auth.
Using Redis you can make your app scale outside a process.
Useful links for you to read:
Handling Socket.IO, Express and sessions
Scaling Socket.IO
How to reuse redis connection in socket.io?
socket.io chat with private rooms
How to handle user and socket pairs with node.js + redis
Node.js, multi-threading and Socket.io

Related

Working with WebSockets and NodeJs clusters

I currently have a Node server running that works with MongoDB. It handles some HTTP requests, but it largely used WebSockets. Basically, the server connects multiple users to rooms with WebSockets.
My server currently has around 12k WebSockets open and it's almost crippling my single threaded server, and now I'm not sure how to convert it over.
The server holds HashMap variables for the connected users and rooms. When a user does an action, the server often references those HashMap variables. So, I'm not sure how to use clusters in this. I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Does anyone have any ideas on what to do?
Thank you.
You can look at the socket.io-redis adapter for architectural ideas or you can just decide to use socket.io and the Redis adapter.
They move the equivalent of your hashmap to a separate process redis in-memory database so all clustered processes can get access to it.
The socket.io-redis adapter also supports higher-level functions so that you can emit to every socket in a room with one call and the adapter finds where everyone in the room is connected, contacts that specific cluster server, and has it send the message to them.
I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Threads in node.js are not lightweight things (each has its own V8 instance) so you will not want a nodejs thread for every WebSocket connection. You could group a certain number of WebSocket connections on a web worker, but at that point, it is likely easier to use clustering because nodejs will handle the distribution across the clusters for you automatically whereas you'll have to do that yourself for your own web worker pool.

Pls validate my apporach on Redis, Socke.io in node.js

I am new to node.js so need your help in validating the below mentioned approach.
Problem: I am developing a node.js application which is broadcasts messages to the people who are specifically subscribed to a topic. If the user in logged into the application either via web or mobile I want to use socket.io to push new messages as and when they are created. As I mentioned I need to push the messages to a selected list of logged in users based on certain filters, the message is not pushed to everyone logged in only to the users matching filter criteria. This is express application.
Approach: As soon as a client makes a connection to server a socket is established. The socket will be allocated a room. The key will be login name, so if there are further request from the same login ex. multiple browser windows those sockets also will be added to the same room. The Login Name and the room with sockets will be stored in Redis. When a new message is created internal application logic will determine the users who needs to be notified. Fetch only those logins from Redis along with the room information and push the message to them. When the sockets are closed remove the Redis entry for that login...Also this needs to be scalable since i might use node cluster in the future.
I read lot of about socket.io and Redis pub/sub approach and i am not using them in the approach above. Just storing the login and sockets as key value pairs
Can you please let me know if this is a good approach. Will there be any performance/scalability issue? Is there any other better ways to do this?
Thanks a lot for all your help....
You're Redis model will have to be a little more complicated than that. You'll need to maintain an index using sets, so you can find intersects which can be used to find all users in a given room. You'll then need to use redis's pub/sub functionality to enable realtime notifications. You'll also need to store messages in indexed sets, then just publish to inform your application that a change has been made, therefore sending the new data from the set.
If you could provide an example I can provide some redis commands to better explain how Redis works.
Update
This is in response to comments below.
Technologies I would use:
Nginx
Socket.io
mranney/node_redis
Redis
Scaling Redis
There are several solutions to scale Redis. If you need higher concurrency you can scale using master-slave replication. If you need more memory you can set up partitioning, or you can use the Redis Cluster beta(3.0.0). Alternatively you can outsource your solution to one of many Redis services(RedisGreen,RedisLabs,etc.), however this is best paired with a PaaS provider(AWS, Google Compute, Joyent) so it can be depolyed in the same cloud.
Scaling Socket.io
Socket.io can be scaled using Nginx. This is pretty common practice when scaling WebSockets. You then can synchronize each node app(with socket.io) using Redis as a messaging protocol(pub/sub).
You can SUBSCRIBE connections to track when a user joins or leaves, on the event of that, which ever app/server fires the event will PUBLISH connections update or PUBLISH connections "user:john left". If a user were to leave like in the latter example, you must also remember to remove that user from the set that represents a room(ex generalChat) so something like this SREM generalChat user:john, then execute the latter upon callback from the SREM command. Once the PUBLISH is sent, all apps/servers connected to redis, already having subscribed, will receive a message query from Redis in realtime notifying them to update. All apps/servers will broadcast to the corresponding room either a new user list(redis set type) or a command notifying the frontend to remove the user.
Basically all your sockets are in sync with Redis, so you can host multiple socket.io servers and use Messaging via Pub/Sub to queue actions across your entire cloud.
Examples
It's not hard to scale socket.io with Redis, however Redis may be cumbersome to setup and scale, but Redis doesn't use that much memory because you manage your own relations so you therefore only have relations mapped for your specific intentions. Also you can lease cloud hosting for 8GB for $80 a month, and that would support higher concurrency than the Big Boy plan from pusher, for less than half the price, and you get persistence as well so your stack is more uniform and has less dependencies.
If you were to use Pusher you'd probably need a persistent storage medium like MongoDB, MySQL, Postgre, etc. With Redis you can rely on it for all your data storage(excluding file storage). This would then create more traffic depending on your implementation.
Ex 1
You can use pusher to notify changes and refer to the backend to populate the new/changed data.
Pusher for Messaging
Boiler Plate:
Client <== Socket Opened ==> Pusher
Client === User Left ==> Pusher
All Clients <== User left === Pusher
All Clients === Request New Data ==> Backend <==> Database
All Clients <== Response === Backend
This can create a lot of problems, and you'd have to implement timeouts. This also takes a lot of Pusher connections, which is expensive.
Ex 2
You can connect to pusher with your backend to save the frontend from handling many requests(probably better for mobile users). This saves pusher traffic, because its not sending to hundreds/thousands of clients, just a handful of your backend servers.
This example assumes that you have 4 socket.io servers running.
Pusher for MQ on Backend
Boiler Plate:
Backend 1/2/3/4 <== Socket Opened ==> Pusher
Backend 1 === Remove User from room ==> Database
Backend 1 === User Leaves ==> Pusher
Backend 1/2/3/4 <== Use Left === Pusher
Backend 1/2/3/4 === Get Data ==> Database
Backend 1/2/3/4 <== Recieve Data === Database
Backend 1/2/3/4 === New Data ==> Room(clients)
Ex 3
You can use Redis as explained above.
Again assuming 4 socket.io servers.
Redis as MQ and datastore
Boiler Plate:
Backend 1/2/3/4 <== Connected ==> Redis
Backend 1/2/3/4 === Subscribe ==> Redis
Backend 1 === User Left ==> Redis (removes user)
Backend 1 === PUBLISH queue that user left ==> Redis
Backend 1/2/3/4 <== User Left Message === Redis
Backend 1/2/3/4 === Get New Data ==> Redis
Backend 1/2/3/4 <== New Data === Redis
Backend 1/2/3/4 === New Data ==> Room(clients)
All of these examples can be improved and optimized significantly, but I won't do that for sake of readability and clarity.
Conclusion
If you know how Redis works implementing this should be fairly straight forward. If you're learning redis you should start out a little smaller to get a hang of how redis works(its more than key:value storage). In the end running redis would be more cost effective, and efficient, but would take longer to develop. Pusher would be much more expensive, include more dependencies into your stack, and wouldn't be as effective(pusher is on a different cloud). Only advantage for using Pusher or any other service similar to it, is the ease of use for the platform they provide. You're essentially paying a monthly fee for boilerplate code and stack management.
Bottom Line
It would be best to reverse proxy with Nginx regardless of which stack you choose, so you can easily scale.
Redis, Socket.io, Node.js stack would be the best for large scale projects, and professional products. It will keep your operating cost down, and increase your concurrency without dramatically increasing your cost as you scale.
Redis, Socket.io(optional), Node.js, Pusher, Database stack would be best for smaller projects that you don't expect much growth out of. Once you get to 5,000 connections you're forking out $199/mo just for pusher, then you have to consider the cost for the rest of your stack. If you connect your backend to Pusher instead you'll save money, increase production time, and you'll still suffer performance hits from retrieving data from a thirdparty cloud.

Using Backbone.iobind (socket.io) with a cluster of node.js servers

I'm using Backbone.iobind to bind my client Backbone models over socket.io to the back-end server which in turn store it all to MongoDB.
I'm using socket.io so I can synchronize changes back to other clients Backbone models.
The problems starts when I try to run the same thing over a cluster of node.js servers.
Setting a session store was easy using connect-mongo which stores the session to MongoDB.
But now I can't inform all the clients on every change, since the clients are distributed between the different node.js servers.
The only solution I found is to set a pub/sub queue between the different node.js servers (e.g. mubsub), which seems like a very heavy weight solution that will trigger an event on all the servers for every change.
How did you reach the conclusion that pub/sub is a "very heavy weight solution"?
Sounds like you got it right up until that part :-)
Oh, and pub/sub is not a queue.
Let's examine that claim:
The nice thing about pub/sub is that you publish and subscribe to channels/topics.
So, using the classic chat server example, let's say you have a million users connected in total, but #myroom only has 50 users in it.
When a message is sent to #myroom, it's being published once. No duplication whatsoever.
In most use-cases you won't even need to store it on disk/RAM, so we're mostly looking at network/bandwidth here. And, I mean, you're probably throwing more data (probably over the wire?) to MongoDB already, so I assume that's not your bottleneck.
If you also use socket.io's rooms features (which is basically its own pub/sub mechanism), that means only 5 users will have that message emitted to them over the websocket.
And no, socket.io won't iterate over 1M clients to find out which of them are in room #myroom ;-)
So the message is published once, each subscriber (node.js instance) will get notified once, and only the relevant clients -- socket.io won't waste CPU cycles in order to find them as it keeps track of them when they join() or leave() a room -- will receive the message.
Doesn't that sound pretty efficient and light-weight?
Give Redis a shot.
It's really simple to set-up, runs entirely in memory, blazing-fast, replication is extremely simple, etc.
That's the way socket.io recommends passing events between nodes.
You can find more information/code here.
Additionally, if MongoDB can't handle the load at any point, you can use Redis as your session-store as well.
Hope this helps!

Switching from stored messages to real time chatting in node and express

I am new to server-side development. I'm trying to learn by doing so I'm building an application with express on the server, mongodb as my database and angularjs with twitter bootstrap on the client-side.
I dont know if this is the most practical way but when thinking about how to implement messaging between users I thought of a mongodb model called Conversation with an id and an array of the ids of every user in the conversation and another array of strings that correspond to messages. And then add this model to my REST API.
But lets say all/some of the users in the conversation are online, why not benefit from socket.io. So how can i switch from this to real time chat? Does the interaction with mongodb occure exactly as explained and socket.io just notifies every online user that an interaction has occured? If yes, how? Or is it something else?
socket.io can send real time events to connected sockets, you can use a database for storing messages that are failed to deliver and for offline users.
also, you might want to use something like Redis for this as it has channels with subscribe and publish capabilities.

Retrieve Socket.io Client from Redis

I'm building a real time data system that allows an Apache/PHP server to send data to my Node.js server, which will then immediately send that data to the associated client via socket.io. So the Apache/PHP server makes a request that includes the data, as well as a user token that tells Node.js which user to send the data to.
Right now this is working fine - I've got an associative array that ties the user's socket.io connection to their user token. The problem is that I need to start scaling this to multiple servers. Naturally, with the default configs of socket.io I can't share connections between node workers.
The solution I had in mind was to use the RedisStore functionality, and just have each of my workers looking at the same Redis store. I've been doing research and there's a lot of documentation on how to use pub/sub functionality for broadcasting messages to large groups (rooms). That's fine, but I need to be able to send messages to a single client, so I need some way to retrieve a user's socket.io connection from the RedisStore.
The only way I can think to do this right now is to create a ton of 'rooms' named with the user's token, and only have one user in each room. Then I could just emit to that room. However, that seems very inefficient.
Is there a better way that I can retrieve user's unique socket.io connections from Redis?
Once a socket connection is made to a server running the node server, it is connected to that instance.
So it seems you need to make a way for your php server to know which node server a client is connected to.
In your redis store you could just store the id of the server as the value by the client id. Then php looks up which node server to use and makes the request.

Resources