Suppose you want to scale your web application that relies on websocket technology (e.g Socket.io library). You have multiple servers and you are using a shared database (e.g Redis) to have a communication with these servers.
In this case you store socket ids of each socket connection in that db. My question is:
Is it possible that two users that are connected to two different
servers get the same socket.id so that you couldn't differentiate them?
In this case, if you want to notify a specific user, you will do it for another user too!
How it is possible?
If it is possible how people solve this problem in real world use cases ?
Is there any trick in programming or in the design ?
EDIT
I want to emphasize on distributed environments.
Related
I currently have a Node server running that works with MongoDB. It handles some HTTP requests, but it largely used WebSockets. Basically, the server connects multiple users to rooms with WebSockets.
My server currently has around 12k WebSockets open and it's almost crippling my single threaded server, and now I'm not sure how to convert it over.
The server holds HashMap variables for the connected users and rooms. When a user does an action, the server often references those HashMap variables. So, I'm not sure how to use clusters in this. I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Does anyone have any ideas on what to do?
Thank you.
You can look at the socket.io-redis adapter for architectural ideas or you can just decide to use socket.io and the Redis adapter.
They move the equivalent of your hashmap to a separate process redis in-memory database so all clustered processes can get access to it.
The socket.io-redis adapter also supports higher-level functions so that you can emit to every socket in a room with one call and the adapter finds where everyone in the room is connected, contacts that specific cluster server, and has it send the message to them.
I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Threads in node.js are not lightweight things (each has its own V8 instance) so you will not want a nodejs thread for every WebSocket connection. You could group a certain number of WebSocket connections on a web worker, but at that point, it is likely easier to use clustering because nodejs will handle the distribution across the clusters for you automatically whereas you'll have to do that yourself for your own web worker pool.
Disclaimer: I'm new to node.js so I am sorry if this is a weird question :)
I have a node.js using express.js to serv a REST-API. The data served by the REST-API is fetched from a nosql database by the node.js app. All clients only use HTTP-GET. There is one exception though: Data is PUT and DELETEd from the master database (a relational database on another server).
The thought for this setup is of course to let the 'node.js/nosql database' server(s) be a public front end and thereby protecting the master database from heavy traffic.
Potentially a number of different client applications will use the REST-API, but mainly it will be used by a client app with a long lifetime (typically 0.5 to 2 hours). Instead of letting this app constantly polling the REST-API for possible new data I want to use websockets so that data is only sent to client when there is any new data. I will use a node.js app for this and probably socket.io so that it could fall back to api-polling if websockets are not supported by the client. New data should be sent to clients each time the master database PUTs or DELETEs objects in the nosql database.
The question is if I should use one node.js for both the API and the websockets or one for the API and one for the websockets.
Things to consider:
- Performance: The app(s) will be hosted on a cluster of servers with a load balancer and a HTTP accelerator in front. Would one app handling everything perform better than two apps with distinct tasks?
- Traffic between app: If I choose a two app solution the api app that receives PUTs and DELETEs from the master database will have to notice the websocket app every time it receives new data (or the master database will have to notice both apps). Could the doubled traffic be a performance issue?
- Code cleanlines: I believe two apps will result in cleaner and better code, but then again there will surely be some common code for both apps which will lead to having two copies it.
As to how heavy the load can be it is very difficult to say, but a possible peak can involve:
50000 clients
each listening to up to 5 different channels
new data being sent from master each 5th second
new data should be sent to approximately 25% of the clients (for some data it should be sent to all clients and other data probably below 1% of the clients)
UPDATE:
Thanks for the answers guys. More food for thoughts here. I have decided to have two node.js apps, one for the REST-API and one for web sockets. The reason is that I belive it will be easier to scale them. To begin with the whole system will be hosted on three physical servers and one node.js app for the REST-API on each server should bu sufficient, but for the websocket app there probably needs to several instances of it on each physical server.
This is a very good question.
If you are looking at a legacy system, and you already have a REST interface defined, there is not a lot of advantages to adding WebSockets. Things that may point you to WebSockets would be:
a demand for server-to-client or client-to-client real-time data
a need to integrate with server-components using a classic bi-directional protocol (e.g. you want to write an FTP or sendmail client in javascript).
If you are starting a new project, I would try to have a hard split in the project between:
the serving of static content (images, js, css) using HTTP (that was what it was designed for) and
the serving of dynamic content (real-time data) using WebSockets (load-balanced, subscription/messaging based, automatic reconnect enabled to handle network blips).
So, why should we try to have a hard separation? Let's consider the advantages of a HTTP-based REST protocol.
The use of the HTTP protocol for REST semantics is an invention that has certain advantages
Stateless Interactions: none of the client's context is to be stored on the server side between the requests.
Cacheable: Clients can cache the responses.
Layered System: undetectability of intermediaries
Easy testing: it's easy to use curl to test an HTTP-based protocol
On the other hand...
The use of a messaging protocol (e.g. AMQP, JMS/STOMP) on top of WebSockets does not preclude any of these advantages.
WebSockets can be transparently load-balanced, messages and state can be cached, efficient stateful or stateless interactions can be defined.
A basic reactive analysis style can define which events trigger which messages between the client and the server.
Key additional advantages are:
a WebSocket is intended to be a long-term persistent connection, usable for multiple different messaging purpose over a single connection
a WebSocket connection allows for full bi-directional communication, allowing data to be sent in either direction in sympathy with network characteristics.
one can use connection offloading to share subscriptions to common topics using intermediaries. This means with very few connections to a core message broker, you can serve millions of connected users efficiently at scale.
monitoring and testing can be implemented with an admin interface to send/recieve messages (provided with all message brokers).
the cost of all this is that one needs to deal with re-establishment of state when the WebSocket needs to reconnect after being dropped. Many protocol designers build in the notion of a "sync" message to provide context from the server to the client.
Either way, your model object could be the same whether you use REST or WebSockets, but that might mean you are still thinking too much in terms of request-response rather than publish/subscribe.
The first thing you must think about, is how you're going to scale the servers and manage their state. With a REST API this is largely straightforward, as they are for the most part stateless, and every load balancer knows how to proxy http requests. Hence, REST APIs can be scaled horizontally, leaving the few bits of state to the persistence layer (database) to deal with. With websockets, often times its a different matter. You need to research what load balancer you're going to use (if its a cloud deployment, often times it depends on the cloud provider). Then figure out what type of websocket support or configuration the load balancer will need. Then depending on your application, you need to figure out how to manage the state of your websocket connections across the cluster. Think about the different use cases, e.g. if a websocket event on one server alters the state of the data, will you need to propagate this change to a different user on a different connection? If the answer is yes, then you'll probably need something like Redis to manage your ws connections and communicate changes between the servers.
As for performance, at the end of the day its still just HTTP connections, so I doubt there will be a big difference in separating the server functionality. However, I think two servers would go a big way in improving code cleanliness, as long as you have another 'core' module to isolate code common to both servers.
Personally I would do them together, this is because you can share the models and most of the code between the REST and the WS.
At the end of the day what Yuri said in his answer is correct, but is not so much work to load balance WS any way, everyone does it nowadays. The approach I took is have REST for everything and then create some WS "endpoints" for subscribing for realtime data server-client.
So for what I understood, your client would just get notifications from the server, with updates, so definitely I would go with WS. You subscribe to some events and then you get new results when there are. Keep asking with HTTP calls is not the best way.
We had this need and basically built a small framework around this idea http://devalien.github.io/Axolot/
Basically you can understand our approach in the controller (this is just an example, in our real world app we have subscriptions so we can notify when we have new data or when we finish a procedure). In actions there are the rest endpoints and in sockets the websockets endpoints.
module.exports = {
model: 'user', // We are attaching the user to the model, so CRUD operations are there (good for dev purposes)
path: '/user', // Tthis is the end point
actions: {
'get /': [
function (req, res) {
var query = {};
Model.user.find(query).then(function(user) { // Find from the User Model declared above
res.send(user);
}).catch(function (err){
res.send(400, err);
});
}],
},
sockets: {
getSingle: function(userId, cb) { // This one is callable from socket.io using "user:getSingle
Model.user.findOne(userId).then(function(user) {
cb(user)
}).catch(function (err){
cb({error: err})
});
}
}
};
I am creating a TCP based game server for iOS, it involves registration and login.
Users will be stored as a collection in MongoDB.
when login is done, I generate a unique session id - How ?
I wanted to know what all data remains with node server and what can be stored in db.
data like session tokens, or collection of sockets if I am maintaining a persistent connection etc.
Node.JS does not have any sessions by default. In fact, there is a plug-in for managing sessions with MongoDB.
It's not clear that you really need sessions however. If you're opening a direct socket with socket.io, that is a defacto session.
Node.js itself does not manage sessions for you. It simply exposes an API to underlying unix facilities for Socket communication. HTTP in it self is a stateless protocol and does not have sessions either. SSH on the other hand is a stateful protocol, but I do not think either one would be good for you.
Creating a uniuqe ID is really simple, all you need to do is hash some data about the user. Their SHA(IP address + time + username). See: http://nodejs.org/api/crypto.html
One approach a lot of applications take is to create their own protocol and send messages using that. You will have to handle a lot of cases with that. And I myself have never dealt with mobile where you have serious connectivity challenges and caching requirements that are not a big problem on desktops.
To solve these problem, founder of Scribd started a company called Parse which should make it much easier for your to do things. Have a look at their website: https://parse.com/.
If you want to do some authentication however, have a look at Everyauth, it provides a lot of that for you. You can find it here: https://github.com/bnoguchi/everyauth/.
I am investigating nodejs/socket.io for real time chat, and I need some advice for implementing rooms.
Which is better, using namespace or using the room feature to completely isolate grops of chatters from each other?
what is the real technical difference between rooms and namespace?
Is there any resource usage difference?
This is what namespaces and rooms have in common (socket.io v0.9.8 - please note that v1.0 involved a complete rewrite, so things might have changed):
Both namespaces (io.of('/nsp')) and rooms (socket.join('room')) are created on the server side
Multiple namespaces and multiple rooms share the same (WebSocket) connection
The server will transmit messages over the wire only to those clients that connected to / joined a nsp / room, i.e. it's not just client-side filtering
The differences:
namespaces are connected to by the client using io.connect(urlAndNsp) (the client will be added to that namespace only if it already exists on the server)
rooms can be joined only on the server side (although creating an API on the server side to enable clients to join is straightforward)
namespaces can be authorization protected
authorization is not available with rooms, but custom authorization could be added to the aforementioned, easy-to-create API on the server, in case one is bent on using rooms
rooms are part of a namespace (defaulting to the 'global' namespace)
namespaces are always rooted in the global scope
To not confuse the concept with the name (room or namespace), I'll use compartment to refer to the concept, and the other two names for the implementations of the concept. So if you
need per-compartment authorization, namespaces might be the easiest route to take
if you want hierarchically layered compartments (2 layers max), use a namespace/room combo
if your client-side app consists of different parts that (do not themselves care about compartments but) need to be separated from each other, use namespaces.
An example for the latter would be a large client app where different modules, perhaps developed separately (e.g. third-party), each using socket.io independently, are being used in the same app and want to share a single network connection.
Not having actually benchmarked this, it seems to me if you just need simple compartments in your project to separate and group messages, either one is fine.
Not sure if that answers your question, but the research leading up to this answer at least helped me see clearer.
It's an old question but after doing some research on the topic I find that the accepted answer is not clear on an important point. According to Guillermo Rauch himself (see link):
although it is theoretically possible to create namespaces dynamically on a running app you use them mainly as predefined separate sections of you application. If, on the other hand you need to create ad hoc compartments, on the fly, to accommodate groups of users/connections, it is best to use rooms.
It depends what you wanna do.
The main difference is that rooms are harder to implement.
You must make a method for join the rooms with each page reload.
With namespaces you just need to write var example = io.connect('http://localhost/example'); in your javascript client and client are automatically added in the namespaces.
Example of utilization:
rooms: private chat.
namespaces: the chat of the page.
Rooms and namespaces segment communication and group individual sockets.
A broadcast to a room or to a namespace will not reach everyone just the members.
The difference between namespaces and rooms is the following:
Namespaces: are managed in the frontend meaning the user, or an attacker, joins through the frontend and the joining and disconnecting is managed here.
Rooms: are managed in the backend, meaning the server assigns joining and leaving rooms.
The difference is mainly who manages them
To decide what to use you must decide if the segmentation should be managed in the frontend or in the backend
There can be rooms within namespaces, which helps to organize the code but there cannot be namespaces inside of rooms. So namespace is a top level segmentation and rooms is a lower level one.
Namespaces allow you to create objects with the same name, but they would be separate as they will live in different namespaces, otherwise known as scopes.
This is the same thought process you should have with Socket.IO namespaces. If you are building a modular Node web application, you will want to namespace out the different modules. If you look back at our namespace code, you will see that we were able to listen for the same exact events in different namespaces. In Socket.IO, the connection event on the default connection and connection event on a /xxx namespace are different. For example, if you had a chat and comment system on your site and wanted both to be real time, you could namespace each. This allows you to build an entire Socket.IO application that lives only in its own context.
This would also be true if you were building something to be packaged and installed. You cannot know if someone is already using certain events in the default namespace, so you should create your own and listen there. This allows you to not step on the toes of any developer who uses your package.
Namespaces allow us to carve up connections into different contexts. We can compare this to rooms, which allow us to group connections together.We can then have the same connection join other rooms, as well.
Namespaces allow you to create different contexts for Socket.IO to work in. Rooms allow you to group client connections inside of those contexts.
I'm building a web application that will allow team collaboration. That is, a user within a team will be able to edit shared data, and their edits should be pushed to other connected team members.
Are Socket.io rooms a reasonable way of achieving this?
i.e. (roughly speaking):
All connected team members will join the same room (dynamically created upon first team member connecting).
Any edits received by the
server will be broadcast to the room (in addition to being persisted,
etc).
On the client-side, any edits received will be used to update
the shared data displayed in the browser accordingly.
Obviously it will need to somehow handle simultaneous updates to the same data.
Does this seem like a reasonable approach?
Might I need to consider something more robust, such as having a Redis database to hold the shared data during an editing session (with it being 'flushed' to the persistant DB at regular intervals)?
All you need is Socket.IO (with RedisStore) and Express.js. With Socket.IO you can setup rooms and also limit the access per room to only users who are auth.
Using Redis you can make your app scale outside a process.
Useful links for you to read:
Handling Socket.IO, Express and sessions
Scaling Socket.IO
How to reuse redis connection in socket.io?
socket.io chat with private rooms
How to handle user and socket pairs with node.js + redis
Node.js, multi-threading and Socket.io