Forking a new process when Node.JS receives a connection - node.js

I'm running a Node.JS application involving heavy child process I/O. Due to the way Node.JS handles file descriptors (among other reasons), I want to fork a new V8 instance for every connection to the server. (Yes, I'm aware that this is a potentially expensive operation, but that's not the point of this question.)
I am using nssocket for my server, but this question should apply to other types of Node.JS servers (express, Socket.IO, etc) as well.
Right now I have:
var server = require("nssocket").createServer(function(socket){
// Do stuff with the new connection
}).listen(8000);
The intuitive thing to do is this:
// master.js
var server = require("nssocket").createServer(function(socket){
// Fork a new process to handle the connection
child_process.fork("worker.js");
}).listen(8000);
// worker.js
// Do stuff with the new connection
However, then the child process won't have access to the socket variable.
I've read about the new cluster API in Node, but it doesn't look like it's designed for the case when you want every connection to spawn a new worker.
Any ideas?

The cluster API is probably closest to what you want. In theory you can call cluster.fork() at any time within the master process. Note that once the socket connection is established, there is afaik no way to hand it over to another process.
To forward the communication to the worker, you could use message passing (i.e. worker.send) or you could open another port in the worker process and direct the client there.
I should stress that running significantly more worker processes than CPU cores is probably not a good idea. Have you considered pooling the workers or using a work queue like Beanstalkd?

You can use cluster module to fork workers, then IPC (Inter-process communication) channel plus a messaging queue to pass objects between master process and workers. A good option would be ZMQ.

Related

Node clustering with websockets

I have a node cluster where the master responds to http requests.
The server also listens to websocket connections (via socket.io). A client connects to the server via the said websocket. Now the client choses between various games (with each node process handles a game).
The questions I have are the following:
Should I open a new connection for each node process? How to tell the client that he should connect to the exact node process X? (Because the server might handle incoming connection-requests on its on)
Is it possible to pass a socket to a node process, so that there is no need for opening a new connection?
What are the drawbacks if I just use one connection (in the master process) and pass the user messages to the respective node processes and the process messages back to the user? (I feel that it costs a lot of CPU to copy rather big objects when sending messages between the processes)
Is it possible to pass a socket to a node process, so that there is no
need for opening a new connection?
You can send a plain TCP socket to another node process as described in the node.js doc here. The basic idea is this:
const child = require('child_process').fork('child.js');
child.send('socket', socket);
Then, in child.js, you would have this:
process.on('message', (m, socket) => {
if (m === 'socket') {
// you have a socket here
}
});
The 'socket' message identifier can be any message name you choose - it is not special. node.js has code that when you use child.send() and the data you are sending is recognized as a socket, it uses platform-specific interprocess communication to share that socket with the other process.
But, I believe this only works for plain sockets that do not yet have any local state established yet other than the TCP state. I have not tried it with an established webSocket connection myself, but I assume it does not work for that because once a webSocket has higher level state associated with it beyond just the TCP socket (such as encryption keys), there's a problem because the OS will not automatically transfer that state to the new process.
Should I open a new connection for each node process? How to tell the
client that he should connect to the exact node process X? (Because
the server might handle incoming connection-requests on its on)
This is probably the simplest means of getting a socket.io connection to the new process. If you make sure that your new process is listening on a unique port number and that it supports CORS, then you can just take the socket.io connection you already have between the master process and the client and send a message to the client on it that tells the client where to reconnect to (what port number). The client can then contain code to listen for that message and make a connection to that new destination.
What are the drawbacks if I just use one connection (in the master
process) and pass the user messages to the respective node processes
and the process messages back to the user? (I feel that it costs a lot
of CPU to copy rather big objects when sending messages between the
processes)
The drawbacks are as you surmise. Your master process just has to spend CPU energy being the middle man forwarding packets both ways. Whether this extra work is significant to you depends entirely upon the context and has to be determined by measurement.
Here's ome more info I discovered. It appears that if an incoming socket.io connection that arrives on the master is immediately shipped off to a cluster child before the connection establishes its initial socket.io state, then this concept could work for socket.io connections too.
Here's an article on sending a connection to another server with implementation code. This appears to be done immediately at connection time so it should work for an incoming socket.io connection that is destined for a specific cluster. The idea here is that there's sticky assignment to a specific cluster process and all incoming connections of any kind that reach the master are immediately transferred over to the cluster child before they establish any state.

Does Socket.IO forks or spawns a new process when run?

I have a node application that uses Socket.IO for the messaging.
And I run it using
node --expose_gc /path/to/app.js
Now, when I check on the htop utility, I noticed that instead of 1, I am getting multiple processes of the same command.
Can someone, in noob terms, explain to me why and what is going on here? I'm also worried that it may consume unexpected memory/cpu usage too.
socket.io does not fork or spawn any child processes.
usually sub processes that run node.js are spawned via cluster module but socket.io does no such thing.
it just adds a handler on top of a http server.
socket.io is just a library that hooks into a web server and listens for certain incoming requests (those requests that initiate a webSocket/socket.io connection). Once a socket.io connection is initiated, it just uses normal socket programming to send/receive messages.
It does not start up any additional processes by itself.
Your multiple processes are either because you accidentally started your own app multiple times without shutting it down or there is something else in your app that is starting up multiple processes. socket.io does not do that.

nodejs cluster messaging vs websockets

I am developing an app with nodeJS and cluster with multiple workers with different port on each worker, I may need to make the workers communicate each other, I know that nodeJS cluster has built in support for messaging between master and other workers.
I have 3 questions regarding this.
1 . can I send the message between workers without master to be in the middle, for faster process ?
Is it good idea to open a websocket on each worker to listen from other workers to replace the built in messaging in cluster, is it faster ?
if the app would scale to multiple servers I think websocket would be the answer, any alternatives please suggest ?
No. Generally child processes are not aware of each other. They are only aware of the parent.
It is not faster, it is definitely slower. It might be better though since you won't be able to scale onto multiple machines otherwise (cluster only creates subprocesses). Depending on your needs.
Try zeromq for example (I'm sure there is a binding for NodeJS, google it). Or a dedicated message broker (like RabbitMQ). These were created to solve that particular problem of yours unlike websockets.
IPC.
Each NodeJS child process (started with fork) has process.send(<data>); method. And can listen for messages from parent. Feel free to pass objects, if you want. It will be stringified and parsed on other side.
process.on('message', (data) => {});
When you fork process on parent side - it also has methods for messaging.
const child = child_process.fork('child.js');
child.on('message', (data) => { });
If you want to make workers/childs communicate between each other - You can manage it by sending specific messages to master, and master will forward it to specified worker.
Docs: Child Process Event "Message"

handling nodejs http requests in a separate process

I want to handle specific http requests in a child process. These requests being identified by the URL path. There are several examples in the node documentation and elsewhere online that almost do this or that simply do not work.
The reason for this is that the main server must be reliable and that certain requests may be handled by code that is not necessarily of the same quality. For this reason the entire request should be handed over to an external process that can be resurrected if it dies.
Ideally the external process should look as much like a normal node http server as possible and the connection between parent and child processes should not be over a socket.
It seems that the fork function and messages might do what I require but I cannot see any way to pass the request and response to the child process for handling.
Have you looked at nodejs cluster module?
It is not for specific requests but basically the master forks multiple workers that can then handle http requests (1 worker per cpu core in general). If the worker dies, the master forks a new one.

multiple child_process with node.js / socket.io

This is more of a design question rather than implementation but I am kind of wondering if I can design something like this. I have an interactive app (similar to python shell). I want to host a server (lets say using either node.js http server or socket.io since I am not sure which one would be better) which would spawn a new child_process for every client that connects to it and maintains a different context for that particular client. I am a complete noob in terms of node.js or socket.io. The max I have managed is to have one child process on a socket.io server and connect the client to it.
So the question is, would this work ? If not is there any other way in node to get it to work or am I better off with a local server.
Thanks
Node.js - is single process web platform. Using clustering (child_process), you will create independent execution of same application with separate thread.
Each thread cost memory, and this is generally why most of traditional systems is not much scalable as will require thread per client. For node it will be extremely inefficient from hardware resources point of view.
Node is event based, and you dont need to worry much about scope as far as your application logic does not exploit it.
Count of workers is recommended to be equal of CPU Cores on hardware.
There is always a master application, that will create workers. Each worker will create http + socket.io listeners which technically will be bound to master socket and routed from there.
http requests will be routed for to different workers while sockets will be routed on connection moment, but then that worker will handle this socket until it gets disconnected.

Resources