Node.js - child process handle urls related to itself - node.js

I'm trying to scale a chatter app using socket.io + cluster. Is it possible for child processes to handle incoming request belong to its process id (assigned when fork)?
For example:
http://mydomain/calculate?process=1
The above request is only handled by process 1, other processes will ignore it. In this way, I want to make sure requests of the same room are handled by same process, so I may don't have to use RedisStore as socket.io backend.
I also wonder how RedisStore works, because when using it, I found io.sockets.manager.rooms data are not accurate in all processes.
Edit:
Put it another way: can cluster master process dispatch request to different child processes based on the querystring?

The answer is no. The OS takes care of load balancing in this situation and in order to process query string you already have to be connected to a web server ( in your case child process ).
From my experience I find cluster a bit useless. It is a lot easier to spawn multiple NodeJS processes ( on multiple ports ) and put a proxy ( nginx? ) in front of them. It is easy and scalable.
As for socket.io: I don't think it works correctly with cluster ( because of sharing global variables, which causes issues ). Again: spawning separate NodeJS processes should fix the problem. Also it will be useful once you reach the point when you will have to scale to multiple machines. Any tricks with cluster won't help you at that point.
One last note: socket.io does not scale well. I suggest writing your own WebSocket server ( based on WS for example ) and implement your own scaling mechanism. For example based on all-to-all UDP pinging, which should scale well when dealing with small amount of servers ( 50? 100? ).

Related

Why does my Node express pm2 primary process in cluster mode never handle incoming requests?

I am running a node express app in pm2 cluster mode. Everything is working fine, however; I have noticed that incoming connections to my express routes only ever hit the forked worker app instances and never the primary (master) process.
In the pm2 documentation (https://pm2.keymetrics.io/docs/usage/cluster-mode/) on cluster mode they say
Under the hood, this uses the Node.js cluster module
In the "how it works" section on the Node.js website (https://nodejs.org/api/cluster.html#cluster_how_it_works) it says
The cluster module supports two methods of distributing incoming
connections. The first one (and the default one on all platforms
except Windows) is the round-robin approach, where the primary process
listens on a port, accepts new connections and distributes them across
the workers in a round-robin fashion, with some built-in smarts to
avoid overloading a worker process.
Does this mean the primary process will never actually handle any incoming requests? That can't be!! That would make the entire primary process a glorified load balancer and essentially a dead weight with a bunch of code and a full CPU never really getting used.
If the above IS accurate does that mean that the primary process is a bottleneck for all incoming express connections?
What am I understanding incorrectly or doing wrong that the primary (master) process never actually handles any requests please?
After I completely removed and re installed pm2 and then re-added all my node apps back in cluster mode via cli the first instance (app 0) started receiving messages. I didn't change any code so I'm not exactly sure what the issues was. Thank you to #JonePolvora for your time with comments that lead me to troubleshoot more.

nodejs cluster distributing connection

In nodejs api doc, it says
The cluster module supports two methods of distributing incoming
connections.
The first one (and the default one on all platforms except Windows),
is the round-robin approach, where the master process listens on a
port, accepts new connections and distributes them across the workers
in a round-robin fashion, with some built-in smarts to avoid
overloading a worker process.
The second approach is where the master process creates the listen
socket and sends it to interested workers. The workers then accept
incoming connections directly.
The second approach should, in theory, give the best performance. In
practice however, distribution tends to be very unbalanced due to
operating system scheduler vagaries. Loads have been observed where
over 70% of all connections ended up in just two processes, out of a
total of eight.
I know PM2 is using the first one, but why it doesn't use the second? Just because of unbalnced distribution? thanks.
The second may add CPU load when every child process is trying to 'grab' the socket master sent.

Zero downtime deploy with node.js and mongodb?

I'm looking after building a global app from the ground up that can be updated and scaled transparently to the user.
The architecture so far is very easy, each part of the application has it own process and talk to other trough sockets.
This way i can spawn as many instances i want for each part of the application and distribute them across the globe accordingly to my needs.
In the front of the system i'll have a load balancer, which will them route the users to their closest instance, and when new code is spawned my instances will spawn new processes with the new code and route new requests to it and gracefully shutdown.
Thank you very much for any advice.
Edit:
The question is: What is the best ( and simplest ) solution for achieving zero downtime when deploying node to multiple instances ?
About the app:
https://github.com/Raynos/boot for "socket" connections,
http for http requests,
mongo for database
Solutions i'm trying at the moment:
https://www.npmjs.org/package/thalassa ( which managed haproxy configuration files and app instances ), if you don't know it, watch this talk: https://www.youtube.com/watch?v=k6QkNt4hZWQ and be aware crowsnest is being replaced by https://github.com/PearsonEducation/thalassa-consul
Deployment with zero downtime is only possible if the data you share between old and new nodes are compatible.
So in case you change the structure, you have to build a intermediate release, that can handle the old and new data structure without utilizing the new structure until you have replaced all nodes with that intermediate version. Then roll out the new version.
Taking nodes in and out of production can be done with your loadbalancer (and a grace time until all sessions expired on the nodes) (don't know enough about your application).

How does the cluster module work in Node.js?

Can someone explain in detail how the core cluster module works in Node.js?
How the workers are able to listen to a single port?
As far as I know that the master process does the listening, but how it can know which ports to listen since workers are started after the master process? Do they somehow communicate that back to the master by using the child_process.fork communication channel? And if so how the incoming connection to the port is passed from the master to the worker?
Also I'm wondering what logic is used to determine to which worker an incoming connection is passed?
I know this is an old question, but this is now explained at nodejs.org here:
The worker processes are spawned using the child_process.fork method,
so that they can communicate with the parent via IPC and pass server
handles back and forth.
When you call server.listen(...) in a worker, it serializes the
arguments and passes the request to the master process. If the master
process already has a listening server matching the worker's
requirements, then it passes the handle to the worker. If it does not
already have a listening server matching that requirement, then it
will create one, and pass the handle to the worker.
This causes potentially surprising behavior in three edge cases:
server.listen({fd: 7}) -
Because the message is passed to the master,
file descriptor 7 in the parent will be listened on, and the handle
passed to the worker, rather than listening to the worker's idea of
what the number 7 file descriptor references.
server.listen(handle) -
Listening on handles explicitly will cause the
worker to use the supplied handle, rather than talk to the master
process. If the worker already has the handle, then it's presumed that
you know what you are doing.
server.listen(0) -
Normally, this will cause servers to listen on a
random port. However, in a cluster, each worker will receive the same
"random" port each time they do listen(0). In essence, the port is
random the first time, but predictable thereafter. If you want to
listen on a unique port, generate a port number based on the cluster
worker ID.
When multiple processes are all accept()ing on the same underlying
resource, the operating system load-balances across them very
efficiently. There is no routing logic in Node.js, or in your program,
and no shared state between the workers. Therefore, it is important to
design your program such that it does not rely too heavily on
in-memory data objects for things like sessions and login.
Because workers are all separate processes, they can be killed or
re-spawned depending on your program's needs, without affecting other
workers. As long as there are some workers still alive, the server
will continue to accept connections. Node does not automatically
manage the number of workers for you, however. It is your
responsibility to manage the worker pool for your application's needs.
NodeJS uses a round-robin decision to make load balancing between the child processes. It will give the incoming connections to an empty process, based on the RR algorithm.
The children and the parent do not actually share anything, the whole script is executed from the beginning to end, that is the main difference between the normal C fork. Traditional C forked child would continue executing from the instruction where it was left, not the beginning like NodeJS. So If you want to share anything, you need to connect to a cache like MemCache or Redis.
So the code below produces 6 6 6 (no evil means) on the console.
var cluster = require("cluster");
var a = 5;
a++;
console.log(a);
if ( cluster.isMaster){
worker = cluster.fork();
worker = cluster.fork();
}
Here is a blog post that explains this
As an update to #OpenUserX03's answer, nodejs has no longer use system load-balances but use a built in one. from this post:
To fix that Node v0.12 got a new implementation using a round-robin algorithm to distribute the load between workers in a better way. This is the default approach Node uses since then including Node v6.0.0

nodeJS multi node Web server

I need to create multi node web server that will be allow to control number of nodes in real time and change process UID and GUID.
For example at start server starts 5 workers and pushes them into workers pool.
When the server gets the new request it searches for free workers, sets UID or GUID if needed, and gives it the request to proces. In case if there is no free workers, server will create new one, set GUID or UID, also pushes it into pool and so on.
Can you suggest me how it can be implemented?
I've tried this example http://nodejs.ru/385 but it doesn't allow to control the number of workers, so I decided that there must be other solution but I can't find it.
If you have some examples or links that will help me to resolve this issue write me please.
I guess you are looking for this: http://learnboost.github.com/cluster/
I don't think cluster will do it for you.
What you want is to use one process per request.
Have in mind that this can be very innefficient, and node is designed to work around those types of worker processing, but if you really must do it, then you must do it.
On the other hand, node is very good at handling processes, so you need to keep a process pool, which is easily accomplished by using node internal child_process.spawn API.
Also, you will need a way for you to communicate to the worker process.
I suggest opening a unix-domain socket and sending the client connection file descriptor, so you can delegate that connection into the new worker.
Also, you will need to handle edge-cases for timeouts, etc.
https://github.com/pgte/fugue I use this.

Resources