Socket.io as a load balancer - node.js

I am developing a Twitter app that (on the backend) consumes Tweets does some fairly intense processing and then stores the data in a database for use later by the client. All of my servers are running node.js.
I am going to have a server connected to the Twitter Streaming API using nTwitter for node.js. I want to then have this server pass the Tweets along to worker servers and distribute the load based on the Tweet ID (the last digit in the ID would be used).
Right now, I am using Socket.io (and Socket.io-client) which seems to run pretty well. It seems like the Websocket protocol is ideal for this. I am wondering if there are there any reasons not to use Socket.io in this manner?

Related

Prevent DDOS on websocket server nodejs

I have a app which lets yoy keep your notes at a single place its realtime bw all the devices you are logged in I am using a nodejs wesocket it was working fine but a recently i found out someone was sending a huge amount of requests to my websocket server. He sent a large amount of data through websockets to my mongodb and the data was sent just for the purpose of taking the app down (useless crap data just had 'aaaaa')
What i want is prevent those clients from using the websockets who are making more than 10requests per minute.
As mentioned in the comments its better to go with services like CloudFlare, but for your specific use case (to implement directly on server) you should look at ways to rate limit the requests.
Here is an example of an library to rate limit web-sockets in node
https://www.npmjs.com/package/ws-rate-limit

Route traffic to multiple node servers based on a condition

I'm coding an online multiplayer game using nodejs and HTML5 and I'm at the point where I would like to have multiple maps for people to play on, but I'm having a scaling issue. The server I'm running this on isn't able to support the game loops for more than a few maps on its own, and even though it has 4 cores I can only utilize one with a single node process.
I'd like to be able to scale this to not even necessarily be limited to a single server. I'd like to be able to start up a node process for each map in the game, then have a master process that looks up what map a player is in and passes their connection to the correct sub process for handling, updating with game information, etc.
I've found a few ways to use a proxy like nginx or the built in node clusters to load balance but from what I can tell the examples I've seen just give a connection to whatever the next available process is, and I need to hand them out specifically. Is there some way for me to route a connection to a node process based on a condition like that? I'm using Express to serve my static content and socket.io for client to server communication currently. The information for what map the player is in will be in MongoDB along with the rest of the player data, if that makes a difference.
There are many ways to adress your problem, here are two suggestions based on your description.
1 - Use a router server which will dispatch players queries to "Area servers" : in this topology all clients queries will arrive to your route server, the server tag each query with a unique id and dispatch it to the right area server, the area server handle the query and sendit back to the route server which will recognize it from the unique tag and send back the response to the client.
this solution will dispatch the CPU/memory load but not the bandwidth !
2 - Use an authentication server which redirect client to the servers with less load : in this case, you'll have multiple identical servers and one authentication server, when a client authenticate, send the url and an auth token of available server to the client and an authentication ticket to the server.
the client then connect to the server which will recognize using the auth toekn/auth ticket.
this solution will dispatch all CPU/Memory/Bandwidth, but might not be suited to all games since you can be sent to different server each connection and you'll not see the players in the same area if you are not on the same server.
those are only two simple suggestions, you can mix the two approaches or add other stuff (for example inter-communication area servers etc) which will solve the mensioned issues but will add complexity.

how to distribute socket.io

Im using nodejs and socket.io to deliver a chat on my business app, but i want to distribute the deploy so i can have as many chat servers i want to balance the load of the traffic.
I try the load balance approach from nginx but that just do that balance the traffic but the communication between the socket.io serves its not the same, so one chat message send from user A to server S1 wont travel to user B on server S2.
There is any tool or approach to do this.
Thanks in advance.
===== EDIT =====
Here is the architecture of the app.
The main app frontend on PHP CodeIgniter lets tag it as PHPCI
The chat app backend on NodeJs and SocketIO lets tag it as CHAT
The chat model data on Redist lets tag it as REDIST
So what i have now its PHPCI -> CHAT -> REDIST. That work just fine.
What i need is to distribute the application so i can have as many PHPCI or CHAT or REDIST i want, example
PHPCI1 CHAT1
PHPCI2 -> -> REDIST1
PHPCI3 CHAT2
Where the numbers represent instances not different apps.
So a User A connected to PHPCI1 can send a message to a user B connected on PHPCI3.
I think some queue in the middle of CHAT can handle this something like rabbitmq that can only use the SocketIO to deliver the messages to the client.
If you're distributing the server load (and that's a requirement), I'd suggest adding a designated chat data server (usually an in-memory database or message queue) to handle chat state and message passing across edge servers.
Redis Pub/Sub is ideal for this purpose, and can scale up to crazy levels on even a low-end machine. The Redis Cookbook has a chapter on precisely this use case.
If you set up the server-side of your chat app correctly, you shouldn't have to distribute socket.io. Since node.js is browser-based and doesn't require any client-side code (other than the resources downloaded from the webpage), it works automatically. With a webpage, the files required to run socket.io are temporarily downloaded to users when they are correctly included (just like with jQuery). If you are using node.js and socket.io to make an android app, the files should be included in your application when you distribute it, not separately.
In addition, if you wish to use two separate socket.io servers, you should be able to establish communication between the two by connecting them in a similar manner that a client connects to the server, but with a special parameter that lets the other server know that a server connected and it can respond and set a variable for the other server.

node.js server with socket.io handling 50000 simultaneous clients

We are developing a Javascript control which should be constantly connected to a server for receiving animation updates.
We are planning to host this stuff on an Amazon cloud.
The scenario is like this: server connects to activemq queue waiting for updates, for each update it broadcasts it to all connected clients.
Is it even possible to handle such load with node.js + socket.io?
Will a single node.js server be able to handle such load?
How to organize fast transport between different nodes if we will have to use more than one node?
Will single node.js server be able to handle such load?.. How to organize fast transport between different nodes if we will have to use more than one node
You say that you are planning to host on Amazon. So first off, nothing should be scoped for a single server. Amazon machines will simply "disappear", you have to assume that you are going to use multiple computers.
...handling 50k simultaneous clients
So to start with, 50k connections for a single box is a very big number. Here's a very detailed blog post discussing "getting to 10k" with node.js+socket.io.
Here's a very telling quote:
it seemed as though 10,000 clients simply required more serialization
than my server was able to handle.
So a key component to "getting to 50k" is going to be the amount of work required just pushing data over the wire.
How to organize fast transport between different nodes if we will have to use more than one node.
That blog post is the first of 3. When you're done the first, read the other two. That should point you in the right direction.

RESTful backend and socket.io to sync

Today, i had the idea of the following setup. Create a nodejs server along with express and socket.io. With express, i would create a RESTful API, which is connected to a mongo. BackboneJS or similar would connect the client to that REST API.
Now every time the mongodb(ie the data in it iam interested in) changes, socket.io would fire an event to the client, which would carry a courser to the data which has changed. The client then would trigger the appropriate AJAX requests to the REST to get the new data, where it needs it.
So, the socket.io connection would behave like a synchronize trigger. It would be there for the entire visit and could also manage sessions that way. All the payload would be send over http.
Pros:
REST API for use with other clients than web
Auth could be done entirely over socket.io. Only sending token along with REST requests.
Use the benefits of REST.
Would also play nicely with pub/sub service like Redis'
Cons:
Greater overhead, than using pure socket.io.
What do you think, are there any great disadvantages i did not think of?
I agree with #CharlieKey, you should send the updated data rather than re-requesting.
This is exactly what Tower is doing:
save some data: https://github.com/viatropos/tower/blob/development/src/tower/model/persistence.coffee#L77
insert into mongodb (cursor is a query/persistence abstraction): https://github.com/viatropos/tower/blob/development/src/tower/model/cursor/persistence.coffee#L29
notify sockets: https://github.com/viatropos/tower/blob/development/src/tower/model/cursor/persistence.coffee#L68
emit updated records to client: https://github.com/viatropos/tower/blob/development/src/tower/server/net/connection.coffee#L62
The disadvantage of using sockets as a trigger to re-request with Ajax is that every connected client will have to fetch the data, so if 100 people are on your site there's going to be 100 HTTP requests every time data changes - where you could just reuse the socket connections.
I think that pushing the updated data with the socket.io event would be better than re-requesting the lastest. Even better you could only push the modified pieces of data decreasing the amount of data sent over the line. Overall though a interesting idea.
I'd look into Now.js since it does pretty much exactly what you need.
It creates a namespace which is shared among the client and server. The server can call functions on the client directly and vice versa.
That is if you insist on your current infrastructure decision to use MongoDB and Node.js, otherwise there would be CouchDB which is a full web server and document database with sophisticated replication mechanisms built-in.

Resources