There is a cluster module in node http://nodejs.org/docs/v0.6.19/api/cluster.html
But I found some other implementations like this one https://github.com/learnboost/cluster
What is the best, who is experienced?
Other question,
Is it necessary to use nginx in production? If so, why? How many simultaneous connections can be handled by single modern multicore server with node, 100K, 200k?
Thanx!
The cluster module from https://github.com/learnboost/cluster is only available for Node v0.2.x and v0.4.x, while the official cluster module is baked into the Node core since v0.6.x. Note that the API will change for v0.8.x (which is around the corner).
So you should use the latest version of Node, with Cluster built in.
NGiNX is faster for serving static files, but other than that I don't see any solid reason to use it. If you want a reverse proxy something like HAProxy is better (or you can use a Node solution like node-http-proxy or bouncy).
Unless you are using a "Hello World" example in production, you cannot accurately predict how many simultaneous connection can be handled. Normally a single Node process can handle thousand of concurrent connections.
Resources:
https://github.com/nodejitsu/node-http-proxy
https://github.com/substack/bouncy
Related
Why there are single web service just for mongodb? Unlike LAMP, I will just install everything on my ec2. So now I'm deploying MEAN stack, should I seperate mongodb and my node server? I'm confused. I don't see any limitation mixing node with mongod under one single instance, I can use tools like mongolab as well.
Ultimately it depends how much load you expect your application to have and whether or not you care about redundancy.
With mongo and node you can install everything on one instance. When you start scaling the first separation is to separate the application from the database. Often its easier to set everything up that way especially if you know you will have the load to require it.
I've searched a lot to figure out this question, but I didn't get clear explanation. Is there only one difference thing that clustered app can be scaled out and forked app cannot be?
PM2's public site explains Cluster mode can do these feature but no one says about pros of Fork mode (maybe, it can get NODE_APP_INSTANCE variable).
I feel like Cluster might be part of Fork because Fork seems like to be used in general. So, I guess Fork means just 'forked process' from the point of PM2 and Cluster means 'forked process that is able to be scaled out'. Then, why should I use Fork mode?
The main difference between fork_mode and cluster_mode is that it orders pm2 to use either the child_process.fork api or the cluster api.
What does this means internally?
Fork mode
Take the fork mode as a basic process spawning. This allows to change the exec_interpreter, so that you can run a php or a python server with pm2. Yes, the exec_interpreter is the "command" used to start the child process. By default, pm2 will use node so that pm2 start server.js will do something like:
require('child_process').spawn('node', ['server.js'])
This mode is very useful because it enables a lot of possibilities. For example, you could launch multiple servers on pre-established ports which will then be load-balanced by HAProxy or Nginx.
Cluster mode
The cluster will only work with node as it's exec_interpreter because it will access to the nodejs cluster module (eg: isMaster, fork methods etc.). This is great for zero-configuration process management because the process will automatically be forked in multiple instances.
For example pm2 start -i 4 server.js will launch 4 instances of server.js and let the cluster module handle load balancing.
Node.js is single-thread.
That means only 1 core of your Intel quad-core CPU can execute the node application.
It called: fork_mode.
We use it for local dev.
pm2 start server.js -i 0 helps you running 1 node thread on each core of your CPU.
And auto-load-balance the stateless coming requests.
On the same port.
We call it: cluster_mode.
Which is used for the sake of performance on production.
You may also choose to do this on local dev if you want to stress test your PC :)
Documentation and sources are really misleading here.
Reading up on this in the sources, the only differences seems to be, that they use either node cluster or child_process API. Since cluster uses the latter, you are actually doing the same. There is just a lot more custom stdio passing around happening inn fork_mode. Also cluster can only be communicated with via strings, not objects.
By default you are using fork_mode. If you pass the the -i [number]-option, you're going into cluster_mode, which you generally aim for w/ pm2.
Also fork_mode instance probably can't listen on the same port due to EADDRINUSE. cluster_mode can. This way you also can structure you app to run on the same port being automatically load balanced. You have to build apps without state then though e.g. sessions, dbs.
I have been a fan of the ease with which I can create/compose application functionality using NodeJS. NodeJS, to me, is easy.
When looking at how to take advantage of multi-core machines (and then also considering the additional complexity of port specific apps - like a web app on 80/443), my original solutions looked at NodeJS Cluster (or something like pm2) and maybe a load balancer.
But I'm wondering what would be the downside (or the reason why it wouldn't work) of instead running multiple containers (to address the multi-core situation) and then load balancing across their respective external ports? Past that, would it be better to just use Einhorn or... how does Einhorn fit into this picture?
So, the question is - for NodeJS only (because I'm also thinking about Go) - am I correct in considering "clustering" vs "multiple docker containers with load balancing" as two possible ways to utilize multiple cores?
As a separate question, is Einhorn just an alternative third-party way to achieve the same thing a NodeJS clustering (which could also be used to load balance a Go app, for example)?
Docker is starting to take on more and more of the clustering and load-balancing aspects we used to handle independently, either directly or by idiomatic usage patterns. With NodeJS for example, you can have one nginx or haproxy container load balance between multiple NodeJS containers. I prefer using something like fig, and also setting the restart-policy so that the containers are restarted automatically. This removes the need for other clustering solutions in most cases.
When using pm2 cluster there's a pretty severe warning saying you shouldn't use it in production, nor for load balancing, use nginx instead. Unfortunately that's exactly how I planned to use PM2. Is it really not intended to be used for that purpose or is it just not completely ready yet?
The nodejs cluster (0.10) has a lot of issues and is not safe to use in production!
You may want to give a try with 0.11, there were some improvements.
This feature has not anything to do with pm2, it's in fact directly related to node cluster module.
A little update to the common question. As of the current version of Nodejs v0.6.5, is it safe to run it as a webserver in production? I really wanna skip the step of using nginx for example for proxy. I am gonna use Expressjs, nowjs, gzippo. And nginx doesnt support websockets yet, and it's a little hard to setup socket.io over ssl. Are there any more benfits to nginx other than that it serves static files better?
Any advice on this matter? And if it's ok to run as a webserver, are there any other modules worth concidering?
To be honest aside from serving static file I don't really see any important benefits (though Nginx may have more server-specific extensions).
Also you might want to use bouncy or node-http-proxy for proxying and browserify to use you server-side modules on the frontend.
Edit: also you would not be the first using Node without Nginx, as far as I know Trello and other websites are also using it.
Other benefits of Nginx besides serving static files.
You can have it compress dynamically or load up a .gz file even if the non-compressed is reqeusted.
You can cache the generation of anything, reducing a call back to node.js.
You can have it route to a cluster of node application servers
Lots of other neat stuff http://wiki.nginx.org/Modules
Using nginx though isn't required, and running node with nothing in front of it is perfectly fine.