Running multiple Node apps on Node cluster - node.js

We need to deploy multiple Node/Express apps on a set of servers (behind load balancer). These apps are fully independent of each other in terms of functionality. I'll first explain how I am thinking of doing this and then I am looking for input in terms of best practices, if there are any red flags in my design etc.
Here is the setup I am thinking about:
Frontend server behind load balancer will have node-http-proxy running and it'll accept incoming requests on port 80. This reverse proxy will route the requests to appropriate node apps that are running on different ports on this server. for ex:
var http = require('http'),
httpProxy = require('http-proxy');
var options = {
router: {
'myapphost.com/MyFirstApp': 'myapphost.com:3000',
'myapphost.com/MySecondApp': 'myapphost.com:3001'
}
}
// ...and then pass them in when you create your proxy.
var proxyServer = httpProxy.createServer(options).listen(80);
Each of the node app will be running on node cluster using something like Cluster2 to take advantage of multi-core systems.
My questions:
Is this right design and strategy?
Some of our applications need to be stateful. What is the best way to do the state management in this kind of a setup? Using external session storage such as Redis is the right approach? Or pinning a session to a given frontend machine and using in-memory session storage?
UPDATE:
Since I posted this question, after talking to couple of people, there is one more approach that has come up--
I can use Nginx as a reverse proxy and load balancer in front of my frontend machines. Each frontend machine will serve only one app. There can be one more back up machine for that app. (depending upon requirement). So if I have three apps, I'll have three separate machines each one serving different app. All the requests will be received by Nginx on port 80, Nginx reverse proxy will route the request to the right frontend machine. Each machine will have Node cluster to take advantage of multi-core system. Advantage of this approach is-- deployment becomes lot easier for each app. Also, we can scale each app separately.
Please share your thoughts on this approach as well.

This is a right approach, as well you mentioned redis, or you could use any other session storages like connect-mongo or any other session storage.
If you are using Load Balancer, I guess you have multiple instances of same servers? If this is the case, then you need to analyze session performance and how much it will be used, and then decide if you need to shard session storage/database, or one single machine that every balanced instance will be talking to.
You already think the right way, why wont you try and hack some of it, and see if it fits your needs?
As well you need to think about static media, possibly to store it separately (S3 + CloudFront?).
As well as how you are going to deliver updates to instances and restart them while still keeping application logic consistency.
This structure as well enables possibilities for AB testing (you can load balance to specific instances with "testing version" of application. While majority will be balanced across main instances.
It all depends on your scale as well, and sometimes you really don't need that much initially, just be ready to improve and scale in the future.

Related

Scale up a nodejs app on AWS

I have nodejs app running on AWS EC2.
I would like to scale it up by creating more instances of it.
I don't quite understand how to do it on the networking side.
Lets say I create another instance and it's listening to a different port.
Should I change the client side to request from two different ports? I believe this could lead to race conditions on the DB
Am I suppose to listen on one port on the EC2 machine and direct the request to one of the instances? In that case the port will be busy until the instance is done with the request instead of processing requests in parallel with the other instance
Does anyone has some pointers or maybe can point me to some documents about this subject?
At the basic level, you'll have multiple instances of your Node.js application, connecting to a common database backend. (Your database should be clustered as well, but that's a topic for another post.)
If you're following best practices already, one request will be one response, and it won't matter if subsequent requests land on a different application server. That is, the client could hit any of them at any time and there won't be any side effects. If not, you'll need some sort of server pinning (usually done via cookie or similar methods) to ensure clients always land on the same application server.
The simplest way to load balance is to have your clients connect to a host name, and that hostname resolved round-robin style to several hosts. This will spread the traffic around, but not necessarily correctly. For instance, a particular command or issue on one host could mean it can only handle 5 requests, while the other servers can handle 5000. Most cloud providers have management of load balancing. AWS certainly does.
Since you're already on AWS, I'd recommend you deploy your application via Elastic Beanstalk. It automates the spin-up/tear-down as-configured. You could certainly roll your own solution, but it's probably just easier to use what's already there. The load balancer is configured with Beanstalk.

How to direct a user to an available websocket server when she logs in to my multi-server Node.js app?

This is more like a design question but I have no idea where to start.
Suppose I have a realtime Node.js app that runs on multiple servers. When a user logs in she doesn't know which server she will be assigned to. She will just login, do something and logout and that's it. A user won't be interacting with other users on a different server, nor will her details be stored on another server.
In the backend I assume the Node.js server will put the user's login details to some queue and then when there is space it will assign this user to an available server (A server that has the lowest ping value or is not full). Because there is a limit number of users on one physical server when the users try to login to a "full" server it will direct her to another available server.
I am using ws module of node.js. Is there any service available for this purpose or do I have to build my own? How difficult would that be?
I am not sure how websocket fits into this question. Ignoring it. I guess your actual question is about load balancing... Let me try paraphasing it.
Q: Does NodeJS has any load balancing feature that I can leverage?
Yes and it is called cluster in NodeJS. Instead of the traditional one node process listening on a single port, this module allows you to spawn a group of node processes and have them all binded to the same port.
This means is that all the user know is only the service's endpoint. He sends a request to it and 1 of the available server in the group will serve him whenever possible.
Alternatively using Nginx, the web server, as your load balancer is also a very popular approach to this problem.
References:
Cluster API: https://nodejs.org/api/cluster.html
Nginx as load balancer: http://nginx.org/en/docs/http/load_balancing.html
P.S
I guess the key word for googling solutions to your problem is load balancer.
Out of the 2 solutions I would recommend going the Nginx way as it is a much scalable approach
Example:
Your Node process could possibly be spread across multiple hosts (horizontal scaling). The former solution is more for vertical scaling, taking advantages of multi-cores machine.

Is a node.js app that both servs a rest-api and handles web sockets a good idea?

Disclaimer: I'm new to node.js so I am sorry if this is a weird question :)
I have a node.js using express.js to serv a REST-API. The data served by the REST-API is fetched from a nosql database by the node.js app. All clients only use HTTP-GET. There is one exception though: Data is PUT and DELETEd from the master database (a relational database on another server).
The thought for this setup is of course to let the 'node.js/nosql database' server(s) be a public front end and thereby protecting the master database from heavy traffic.
Potentially a number of different client applications will use the REST-API, but mainly it will be used by a client app with a long lifetime (typically 0.5 to 2 hours). Instead of letting this app constantly polling the REST-API for possible new data I want to use websockets so that data is only sent to client when there is any new data. I will use a node.js app for this and probably socket.io so that it could fall back to api-polling if websockets are not supported by the client. New data should be sent to clients each time the master database PUTs or DELETEs objects in the nosql database.
The question is if I should use one node.js for both the API and the websockets or one for the API and one for the websockets.
Things to consider:
- Performance: The app(s) will be hosted on a cluster of servers with a load balancer and a HTTP accelerator in front. Would one app handling everything perform better than two apps with distinct tasks?
- Traffic between app: If I choose a two app solution the api app that receives PUTs and DELETEs from the master database will have to notice the websocket app every time it receives new data (or the master database will have to notice both apps). Could the doubled traffic be a performance issue?
- Code cleanlines: I believe two apps will result in cleaner and better code, but then again there will surely be some common code for both apps which will lead to having two copies it.
As to how heavy the load can be it is very difficult to say, but a possible peak can involve:
50000 clients
each listening to up to 5 different channels
new data being sent from master each 5th second
new data should be sent to approximately 25% of the clients (for some data it should be sent to all clients and other data probably below 1% of the clients)
UPDATE:
Thanks for the answers guys. More food for thoughts here. I have decided to have two node.js apps, one for the REST-API and one for web sockets. The reason is that I belive it will be easier to scale them. To begin with the whole system will be hosted on three physical servers and one node.js app for the REST-API on each server should bu sufficient, but for the websocket app there probably needs to several instances of it on each physical server.
This is a very good question.
If you are looking at a legacy system, and you already have a REST interface defined, there is not a lot of advantages to adding WebSockets. Things that may point you to WebSockets would be:
a demand for server-to-client or client-to-client real-time data
a need to integrate with server-components using a classic bi-directional protocol (e.g. you want to write an FTP or sendmail client in javascript).
If you are starting a new project, I would try to have a hard split in the project between:
the serving of static content (images, js, css) using HTTP (that was what it was designed for) and
the serving of dynamic content (real-time data) using WebSockets (load-balanced, subscription/messaging based, automatic reconnect enabled to handle network blips).
So, why should we try to have a hard separation? Let's consider the advantages of a HTTP-based REST protocol.
The use of the HTTP protocol for REST semantics is an invention that has certain advantages
Stateless Interactions: none of the client's context is to be stored on the server side between the requests.
Cacheable: Clients can cache the responses.
Layered System: undetectability of intermediaries
Easy testing: it's easy to use curl to test an HTTP-based protocol
On the other hand...
The use of a messaging protocol (e.g. AMQP, JMS/STOMP) on top of WebSockets does not preclude any of these advantages.
WebSockets can be transparently load-balanced, messages and state can be cached, efficient stateful or stateless interactions can be defined.
A basic reactive analysis style can define which events trigger which messages between the client and the server.
Key additional advantages are:
a WebSocket is intended to be a long-term persistent connection, usable for multiple different messaging purpose over a single connection
a WebSocket connection allows for full bi-directional communication, allowing data to be sent in either direction in sympathy with network characteristics.
one can use connection offloading to share subscriptions to common topics using intermediaries. This means with very few connections to a core message broker, you can serve millions of connected users efficiently at scale.
monitoring and testing can be implemented with an admin interface to send/recieve messages (provided with all message brokers).
the cost of all this is that one needs to deal with re-establishment of state when the WebSocket needs to reconnect after being dropped. Many protocol designers build in the notion of a "sync" message to provide context from the server to the client.
Either way, your model object could be the same whether you use REST or WebSockets, but that might mean you are still thinking too much in terms of request-response rather than publish/subscribe.
The first thing you must think about, is how you're going to scale the servers and manage their state. With a REST API this is largely straightforward, as they are for the most part stateless, and every load balancer knows how to proxy http requests. Hence, REST APIs can be scaled horizontally, leaving the few bits of state to the persistence layer (database) to deal with. With websockets, often times its a different matter. You need to research what load balancer you're going to use (if its a cloud deployment, often times it depends on the cloud provider). Then figure out what type of websocket support or configuration the load balancer will need. Then depending on your application, you need to figure out how to manage the state of your websocket connections across the cluster. Think about the different use cases, e.g. if a websocket event on one server alters the state of the data, will you need to propagate this change to a different user on a different connection? If the answer is yes, then you'll probably need something like Redis to manage your ws connections and communicate changes between the servers.
As for performance, at the end of the day its still just HTTP connections, so I doubt there will be a big difference in separating the server functionality. However, I think two servers would go a big way in improving code cleanliness, as long as you have another 'core' module to isolate code common to both servers.
Personally I would do them together, this is because you can share the models and most of the code between the REST and the WS.
At the end of the day what Yuri said in his answer is correct, but is not so much work to load balance WS any way, everyone does it nowadays. The approach I took is have REST for everything and then create some WS "endpoints" for subscribing for realtime data server-client.
So for what I understood, your client would just get notifications from the server, with updates, so definitely I would go with WS. You subscribe to some events and then you get new results when there are. Keep asking with HTTP calls is not the best way.
We had this need and basically built a small framework around this idea http://devalien.github.io/Axolot/
Basically you can understand our approach in the controller (this is just an example, in our real world app we have subscriptions so we can notify when we have new data or when we finish a procedure). In actions there are the rest endpoints and in sockets the websockets endpoints.
module.exports = {
model: 'user', // We are attaching the user to the model, so CRUD operations are there (good for dev purposes)
path: '/user', // Tthis is the end point
actions: {
'get /': [
function (req, res) {
var query = {};
Model.user.find(query).then(function(user) { // Find from the User Model declared above
res.send(user);
}).catch(function (err){
res.send(400, err);
});
}],
},
sockets: {
getSingle: function(userId, cb) { // This one is callable from socket.io using "user:getSingle
Model.user.findOne(userId).then(function(user) {
cb(user)
}).catch(function (err){
cb({error: err})
});
}
}
};

Node.JS/Meteor on multiple servers serving the same application

When coming to deploy Node.JS/Meteor for large scale application a single CPU will not be sufficient. We also would like to have it on multiple servers for redundancy.
What is the recommended setup for such deployment ? how does the load balancing works ? will this support the push data technology for cross servers clients (one client connects to server 1, 2nd client connects to server 2 and we would like an update in client one to be seen in client 2 and vice versa).
Thanks Roni.
At the moment you just need to use a proxy between them. The paid galaxy solution should help but details are scarce at the moment as the product isn't out yet.
You can't simply proxy (normally using nginx, etc) between two servers as each server will store the user's state (i.e their login state) during the DDP Session (the raw wire protocol meteor uses to transmit data).
There is one way you could do it at the moment. Get meteorite and install a package called meteor-cluster.
The package should help you relay data between instances and relay data between the instances via Redis. A youtube video also shows this and how to set it up.
An alternative solution is to use Hipache to manage the load balancing. You can use multiple workers (backends) for one frontent, like so:
$ redis-cli rpush frontend:www.yourdomain.com http://address.of.server.2
$ redis-cli rpush frontend:www.yourdomain.com http://address.of.server.2
There is more information on how to do this in the git page I linked to above, there is a single config file to edit and the rest is done for you. You may also want to have a dedicated server for MongoDB.

Node.js high-level servers' communication API

folks. I wander whether there is any high-level API for servers' communication in Node.js framework? For example, I have several servers, where my application runs and I want to control loading of this servers. Sometimes, if some server is overloaded I want to redirect some connection requests to another(more free one). Are there any functions which could help me? Or I have to implement my own functionality?
Try looking at cluster. This allows you to control multiple node proccess and scale nicely.
Alternatively just set up TCP sockets and pass messages around over TCP or pass messages around over a database like redis.
You should be able to pipe HTTP connection down streams. You have one HTTP server as a load balancer and this server just passes messages on to your other servers and passes them back.
You're looking for what's called a load balancer. There are many off-the-shelf solutions, nginx being one of the standards today (and VERY quick/easy to set up).
I don't know of a node-native solution, but it's not that hard to write one. In general, however, load balancers don't actually monitor server load, they monitor whether a server is live or not and distribute traffic relatively equally.
As for your communications question, no -- there's no standardized API to communicate to/from node.js servers. Again, however, not hard to set up -- Assuming you're already hosting HTTP (using express, or native), just listen for specific requests, perhaps to /comm/ or whatever you deem appropriate and pass JSON back-and-forth.
Not too sure for Nodejs but I've heard ppl using Capistrano with Nodejs

Resources