Retrieve Socket.io Client from Redis - node.js

I'm building a real time data system that allows an Apache/PHP server to send data to my Node.js server, which will then immediately send that data to the associated client via socket.io. So the Apache/PHP server makes a request that includes the data, as well as a user token that tells Node.js which user to send the data to.
Right now this is working fine - I've got an associative array that ties the user's socket.io connection to their user token. The problem is that I need to start scaling this to multiple servers. Naturally, with the default configs of socket.io I can't share connections between node workers.
The solution I had in mind was to use the RedisStore functionality, and just have each of my workers looking at the same Redis store. I've been doing research and there's a lot of documentation on how to use pub/sub functionality for broadcasting messages to large groups (rooms). That's fine, but I need to be able to send messages to a single client, so I need some way to retrieve a user's socket.io connection from the RedisStore.
The only way I can think to do this right now is to create a ton of 'rooms' named with the user's token, and only have one user in each room. Then I could just emit to that room. However, that seems very inefficient.
Is there a better way that I can retrieve user's unique socket.io connections from Redis?

Once a socket connection is made to a server running the node server, it is connected to that instance.
So it seems you need to make a way for your php server to know which node server a client is connected to.
In your redis store you could just store the id of the server as the value by the client id. Then php looks up which node server to use and makes the request.

Related

How to use socket.io properly with express app

I wonder how do I use socket.io properly with my express app.
I have a REST API written in express/node.js and I want to use socket.io to add real-time feature for my app. Consider that I want to do something I can do just by sending a request to my REST API. What should I do with socket.io? Should I send request to the REST API and send socket.io client the result of the process or handle the whole process within socket.io emitter and then send the result to socket.io client?
Thanks in advance.
Question is not that clear but from what I'm getting from it, is that you want to know what you would use it for that you cant already do with your current API?
The short answer is, well nothing really.. Websockets are just the natural progression of API's and the need for a more 'real-time' interface between systems.
Old methods (and still used and relevant for the right use case) is long polling where you keep checking back to the server for updated items and if so grab them.. This works but it can be expensive in terms of establishing a connection, performing a lookup, then closing a connection.
websockets keep that connection open, allowing both the client and server to communicate real time. So for example, lets say you make an update to your backend data and want users to get that update, using long polling you would rely on each client to ping back to the server, check if there is an update and if so grab it. This can cause lags between updates, some users have updated data while other do not etc.
Now, take the same scenario with websockets, you make an update to the backend data, hit submit, this then emits to your socket server. Socket server takes the call, performs the task ( grabs updated data ) and emits it to the users, each connected user instantly gets that update.
Socket servers are typically used for things like real time chats or polling where packets are smaller but they are also used for web games etc. Depending on the size of your payloads will determine how best to send data back and forth because the larger the payload the more resources / bandwidth it will take on the socket server so its something to consider.

how to access socket session in all clusters

I am working on setting up socket.io in cluster mode using PM2.
I am using socket.io-redis package and it works fine in cluster mode.
But the problem arises when I want to access all connected sockets. Because processes don't know about socket connections in other processes in cluster mode.
I thought socket.io-redis keeps track of all the connected sockets and all its session info but it didn't.
Is there any way or solution to access all the socket connection existing in all processes in socket.io/Nodejs?
Socket.io-redis does keep track in a sense..
From their docs
"The Redis adapter extends the broadcast function of the in-memory adapter: the packet is also published to a Redis channel (see below for the format of the channel name).
Each Socket.IO server receives this packet and broadcasts it to its own list of connected sockets."
So basically, redis is used as the broker to tell each socket server to emit based on X channel etc. Allowing you to have a socket.io server in cluster mode work, but as you have mentioned it can fall short when you need to keep track of things outside of just an emit.
So where does this leave us.. Well you can use custom hooks via socket.io-redis but personally I found it to be really difficult to understand and use and had limited success personally. I think with the new version of socket.io and socket.io redis there were some tweaks to make this simpler however I have not tried them.
Instead, what we do is use redis hset and jget to store the socket and an ID of a users, then when we want to get all users online we can query redis to get the list of online users or users in a specific room etc.
What you will want to do is add the redis package and connect in additon to the regular pub / sub.
Then, when a user joins a room or your server for that matter you will do an hset. On the first join ours looks something like this
redis.hset([collection-name],[Field],[value])
So in code it looks like
redis.hset(decoded.cID,"socket-" + socket.id,socket.nickname)
This will set a value in redis, so the collection name is a value ( for us its a unique id of the channel ) then we stock the 'socket.id' for the Field along with a 'nick-name' for the value. This value is the users ID OR its anonymous if they are not logged in
Then, when we want to grab who is in a room we use the hget command
redis.HGETALL([collection-name],function(err,results){}
So inside of say the emit, we call the redis.HGETALL command to get all items inside a specific collection that we pass in and send that back to all connected users.

How to authenticate with express and socket.io with TOKENS and not with COOKIES.

I'm building a realtime mobile app (native) and I'm interested in starting the app from a user login screen and then move on.
I figured I need Express + primus with socket.io (or sockjs) + passport.socketio + redis (not 100% sure I need redis yet) to build my backend.
I even found this step by step tutorial which is really helpful, it takes me step by step to making a secure api.
My question is a double one:
How can I tweak this example to use TOKENS instead of cookies (since I'm building a native mobile app and not a browser web app) and its more secure according to this.
How to bind express with socket.io - in other words, how does socket.io get to know if the user is authenticated or not?
I welcome any comment or advice.
Thank you.
First, I would use a different websocket library instead of socket.io. The socket.io developers are currently working on engine.io and socket.io appears to not be very actively maintained. I've experienced many of the issues described in the following links and since moving to sockjs have not had any problems.
http://www.quora.com/Sock-js/What-are-the-pros-and-cons-of-socket-io-vs-sockjs?share=1
https://github.com/LearnBoost/socket.io/issues
https://github.com/ether/etherpad-lite/issues/1798
http://baudehlo.com/2013/05/07/sockjs-multiple-channels-and-why-i-dumped-socket-io/
You may have to implement your own custom events on top of sockjs, but that's pretty trivial. Since it sounds like you're already using redis then implementing rooms and pub/sub should be pretty easy too.
Here's how we do our token based socket authentication.
First the client makes an HTTP request to the server to request a token. This routes the request through express' middleware and gives you easy access to the session data. This is where you would interact with passport to access their account or session data. If the user is logged in then generate a UUID and store their session data in redis as a key/value pair where the key is the UUID and the value is their stringified session/account data. Only send the UUID back to the client.
When the client first creates a websocket connection set a flag on the socket that marks it as unauthenticated.
When a message comes in on a socket check to see if the socket is authenticated. If not then check for a token in the message. If it doesn't exist then kill the connection. If it does then query redis for the key/value pair keyed by the token. If redis returns any data then you now have the session data for that user and can attach it to the socket and mark the socket as authenticated. If there's nothing in redis keyed by the token then kill the connection.
Now when you perform any operations on a socket you should have access to the session data for that user.

how to send a message to individual clients with socket.io with multiple server processes?

I'm about to begin with socket.io and this is more of a theoretical question,
let's say that I want to send a message to a specific user with socket.io,
normally I would have to store the socketid with the relevant userid and when sending, get the socketid and send to.
but what if I have mutliple server processes running ? I'll have to make sure the correct server that the client is actually connected to does the sending. is it possible ?
For multiple server instances, you need to have a caching service (memcache, redis) for authentication and a central message queue service (stormMQ, rabbitMQ, AQ, java-based mq) where all your node instances bind to. Thus, a Node instance binds to the message queue for each client / channel / whatever, and all the other bound Node instances receive the messages and forward them to the client.
The problem is typically about how to play with a WebSocket cluster:
Several front-end servers which will be in charge of handling bidirectional connections with each client. They form the WebSocket cluster.
Several back-end servers which will be in charge of handling the business logic of your application.
Each time the back-end wants to inform the client, it will send a request to the WebSocket cluster which has the responsibility to communicate with the client.
A possible scenario:
Identify each WebSocket cluster's server with a unique id.
Identify each client with a unique id.
Each time a client will connect one of your WebSocket cluster's server, store its unique id along with the server's unique id in a a distributed key/value like database.
Thus you know which client is connected with which server.
The next time your back-end application wants to notify a client there are two possibilities:
The pair (clientId, serverId) is not present in the database and you cannot inform the client.
The pair (clientId, serverId) is present in the database, then you have to ask to the server identified by serverId to notify the client identified by clientId.
Notes:
Each WebSocket cluster's server can run a node.js instance supercharged with socket.io. It has to provide a route which will take the clientId as a parameter and will use socket.io to notify this client. Indeed, socket.io is aware of whcih client is using which socket on this server.
Every time your server will crash, you have to clean your database and remove all pairs which contain the server id.
Deploying a WebSocket cluster can be tedious, so you have commercial offers like Kaazing.
A good distributed key/value like database is Riak. It is better than Redis or Memcached for the above purpose because it can be easily distributed in a data-center and over several data-centers.

RESTful backend and socket.io to sync

Today, i had the idea of the following setup. Create a nodejs server along with express and socket.io. With express, i would create a RESTful API, which is connected to a mongo. BackboneJS or similar would connect the client to that REST API.
Now every time the mongodb(ie the data in it iam interested in) changes, socket.io would fire an event to the client, which would carry a courser to the data which has changed. The client then would trigger the appropriate AJAX requests to the REST to get the new data, where it needs it.
So, the socket.io connection would behave like a synchronize trigger. It would be there for the entire visit and could also manage sessions that way. All the payload would be send over http.
Pros:
REST API for use with other clients than web
Auth could be done entirely over socket.io. Only sending token along with REST requests.
Use the benefits of REST.
Would also play nicely with pub/sub service like Redis'
Cons:
Greater overhead, than using pure socket.io.
What do you think, are there any great disadvantages i did not think of?
I agree with #CharlieKey, you should send the updated data rather than re-requesting.
This is exactly what Tower is doing:
save some data: https://github.com/viatropos/tower/blob/development/src/tower/model/persistence.coffee#L77
insert into mongodb (cursor is a query/persistence abstraction): https://github.com/viatropos/tower/blob/development/src/tower/model/cursor/persistence.coffee#L29
notify sockets: https://github.com/viatropos/tower/blob/development/src/tower/model/cursor/persistence.coffee#L68
emit updated records to client: https://github.com/viatropos/tower/blob/development/src/tower/server/net/connection.coffee#L62
The disadvantage of using sockets as a trigger to re-request with Ajax is that every connected client will have to fetch the data, so if 100 people are on your site there's going to be 100 HTTP requests every time data changes - where you could just reuse the socket connections.
I think that pushing the updated data with the socket.io event would be better than re-requesting the lastest. Even better you could only push the modified pieces of data decreasing the amount of data sent over the line. Overall though a interesting idea.
I'd look into Now.js since it does pretty much exactly what you need.
It creates a namespace which is shared among the client and server. The server can call functions on the client directly and vice versa.
That is if you insist on your current infrastructure decision to use MongoDB and Node.js, otherwise there would be CouchDB which is a full web server and document database with sophisticated replication mechanisms built-in.

Resources