Figuring out how many simultaneous connections Heroku can have with socket.io - node.js

I have a Node.js app on Heroku that uses socket.io. I got Heroku to work with socket.io using long-polling. Then I recently added the WebSocket module to heroku and got socket.io working with WebSockets.
My question is, how can I measure the maximum number of connections the Heroku instance is able to have simultaneously before it thrashes or decreases in performance.

You'll need to have two things in place:
Some sort of testing client or script that you can ask to fire up an arbitrary number of WebSockets and keep them open for the remainder of the test.
Proper monitoring on your dyno's performance. For this you want to use a monitoring plugin. I like to use Librato.
After that it's just about running your test scenario and tweaking until you're satisfied with your memory and load limits.

Related

Is socketcluster redundant when using cluster js in a node js app?

Let's say I am trying to scale my app for over 200k+ socket connection in a node js app, how do I go about this. I have been doing a lot of research and was thinking about:
-application layer
-socket.io layer
-load balancer
This was after I read a socket.io benchmark testing article but stumbled upon socketcluster afterwards! I have also thought about using socket.io + redis and increasing heroku dyno's, not sure how much this will increase socket.io scalability.
My question now is wouldn't cluster js already run multiple instances of the node js server with socket.io on all cores available? If so why would you need socketcluster?
Thanks!
Socket.io requires sticky load balancing which makes your app more vulnerable to DoS attacks (with sticky load balancing, a malicious user could target specific workers and crash them one at a time). Also, sticky load balancing can lead to uneven distribution of traffic across workers when you have to handle users who are behind a corporate proxy.
Unlike SocketCluster, Socket.io does not support pub/sub which makes it difficult to communicate between users who are connected to different workers/hosts.

Too many connections to MongoDb - Node.js with Sails

I'm developing an application using Node.js and Sails.
I'm going to run like: 20 instances of the same app at the same time, and all of them will use a Local MongoDB to store model data.
My problem started like this: Only the first 7 or 8 launched apps was starting, others were failing because they couldn't connect to the database.
Ok, I went though some searching, and saw that I had to increase the number of connections, but what made me thing something was wrong, is that: Each app launched, is creating about 35 connections!
So, when launching 6 or 8 apps, they were taking about 250 connections!!!
That seems to much, since only one connection per app is enough (I think). Is this 'normal', or is it some problem in Sails Waterline core?
Any solution on this issue?
I have the same issue ( load balanced instances connecting to mongo ) without using sails...
Another issue is that due to "zero downtime deploy" i clone the cluster and then change the DNS so temporarily having double amount of connections.
So on my case i'm also listening to SIGINT and SIGQUIT and closing the connections before the app terminates, so then hopefully "keep alive" connections will die together with the app.
There is tons of people with similar problems around, but i failed to find a spot-on solution /=

How to Scale Node.js WebSocket Redis Server?

I'm writing a chat server for Acani, and I have some questions about Scaling node.js and websockets with load balancer scalability.
What exactly does it mean to load balance Node.js? Does that mean there will be n independent versions of my server application running, each on a separate server?
To allow one client to broadcast a message to all the others, I store a set of all the webSocketConnections opened on the server. But, if I have n independent versions of my server application running, each on a separate server, then will I have n different sets of webSocketConnections?
If the answers to 1 & 2 are affirmative, then how do I store a universal set of webSocketConnections (across all servers)? One way I think I could do this is use Redis Pub/Sub and just have every webSocketConnection subscribe to a channel on Redis.
But, then, won't the single Redis server become the bottleneck? How would I then scale Redis? What does it even mean to scale Redis? Does that mean I have m independent versions of Redis running on different servers? Is that even possible?
I heard Redis doesn't scale. Why would someone say that. What does that mean? If that's true, is there a better solution to for pub/sub and/or storing a list of all broadcasted messages?
Note: If your answer is that Acani would never have to scale, even if each of all seven billion people (and growing) on Earth were to broadcast a message every second to everyone else on earth, then please give a valid explanation.
Well, few answers for your question:
To load balance Node.js, it means exactly what you thought about what it is, except that you don't really need separate server, you can run more then one process of your node server on the same machine.
Each server/process of your node server will have it's own connections, the default store for websockets (for example Socket.IO) is MemoryStore, it means that all the connections will be stored on the machine memory, it is required to work with RedisStore in order to work with redis as a connection store.
Redis PUB/SUB is a good way to achieve this task
You are right about what you said here, redis doesn't scale at this moment and running a lot of processes/connections connected to redis can make redis to be a bottleneck.
Redis doesn't scale, that is correct, but according to this presentation you can see that a cluster development is in top priority at redis and redis do have a cluster, it's just not stable yet: (taken from http://redis.io/download)
Where's Redis Cluster?
Redis development is currently focused on Redis 2.6 that will bring you support for Lua scripting and many other improvements. This is our current priority, however the unstable branch already contains most of the fundamental parts of Redis Cluster. After the 2.6 release we'll focus our energies on turning the current Redis Cluster alpha in a beta product that users can start to seriously test.
It is hard to make forecasts since we'll release Redis Cluster as stable only when we feel it is rock solid and useful for our customers, but we hope to have a reasonable beta for summer 2012, and to ship the first stable release before the end of 2012.
See the presentation here: http://redis.io/presentation/Redis_Cluster.pdf
2) Using Redis might not work to store connections: Redis can store data in string format, and if the connecion object has circular references (ie, Engine.IO) you won't be able serialise them
3) Creating a new Redis client for each client might not be a good approach so avoid that trap if you can
Consider using ZMQ node library to have processes communicate with each other through TCP (or IPC if they are clustered as in master-worker)

Which Node.js Concurrent Web Server is best on Heroku?

I have just learned about Heroku and was pretty much excited to test it out. Ive quickly assembled their demo's with Node.js Language and stumbled across a problem. When running the application locally, apache benchmark prints roughly about 3500 request/s but when its on the cloud that drops to 10 request/s and does not increase or lower based on network latency. I cannot believe that this is the performance they are asking 5 cents/hour for and highly suspect my application to be not multi-threaded.
This is my code.js: http://pastebin.com/hyM47Ue7
What configuration do I need to apply in order to get it 'running' (faster) on Heroku ? Or what other web servers for node.js could I use ?
I am thankful for every answer on this topic.
Your little example is not multi-threaded. (Not even on your own machine.) But you don't need to pay for more dyno's immediately, as you can make use of multiple cores on a dyno, see this answer Running Node.js App with cluster module is meaningless in Heroku?
To repeat that answer: a node solution to using multiple processes that should increase your throughput is to use the (built-in) cluster module.
I would guess that you can easily get more than 10 req/s from a heroku dyno without a problem, see this benchmark, for example:
http://openhood.com/ruby/node/heroku/sinatra/mongo_mapper/unicorn/express/mongoose/cluster/2011/06/14/benchmark-ruby-versus-node-js/
What do you use to benchmark?
You're right, the web server is not multi-threaded, until you pay for more web dynos. I've found Heroku is handy for prototyping; depending on the monetary value of your time, you may or may not want to use it to set up a scalable server instead of using EC2 directly.

NodeJS + SocketIO: Scaling and preventing single point of failure

So the first app that people usually build with SocketIO and Node is usually a chatting app. This chatting app basically has 1 Node server that will broadcast to multiple clients. In the Node code, you would have something like.
//Psuedocode
for(client in clients){
if(client != messageSender){
user.send(message);
}
}
This is great for a low number of users, but I see a problem with this. First of all, there is a single point of failure which is the Node server. Second of all, the app will slow down as the number of clients grow. What is there to do then when we reach this bottleneck? Is there an architecture (horizontal/vertical scaling) that can be used to alleviate this problem?
For that "one day" when your chat app needs multiple, fault-tolerant node servers, and you want to use socket.io to cross communicate between the server and the client, there is a node.js module that fits the bill.
https://github.com/hookio/hook.io
It's basically an event emitting framework to cross communicate between multiple "things" -- such as multiple node servers.
It's relatively complicated to use, compared to most modules, which is understandable since this is a complex problem to solve.
That being said, you'd probably have to have a few thousand simultaneous users and lots of other problems before you begin to have problems with this.
Another thing you can do, is try to develop your application in a way so that if a connection is lost (which happens all the time anyway), eg. server goes down, client has network issues (eg. mobile user), etc, your application should be able to handle that and recover from such issues gracefully.
Since Node.js has a single event-loop thread, this single point of failure is written into its DNA. Even reloading a server after code changes require this thread to be stopped.
There are however a lot of tools available to handle such failures gracefully. You could use forever; a simple CLI tool for ensuring that a given script runs continuously. Other options include distribute and up. Distribute is a load balancing middleware for Node. Up builds on top of Distribute to offer zero downtime reloads using either a JavaScript API or command line interface:
Further reading I find you just need to use Redis Store with Socket.io to maintain connection references between two or more processes/ servers. These options have already been discussed extensively here and here.
There's also the option of using socket.io-clusterhub if you don't intend to use the Redis store.

Resources