multiple child_process with node.js / socket.io - node.js

This is more of a design question rather than implementation but I am kind of wondering if I can design something like this. I have an interactive app (similar to python shell). I want to host a server (lets say using either node.js http server or socket.io since I am not sure which one would be better) which would spawn a new child_process for every client that connects to it and maintains a different context for that particular client. I am a complete noob in terms of node.js or socket.io. The max I have managed is to have one child process on a socket.io server and connect the client to it.
So the question is, would this work ? If not is there any other way in node to get it to work or am I better off with a local server.
Thanks

Node.js - is single process web platform. Using clustering (child_process), you will create independent execution of same application with separate thread.
Each thread cost memory, and this is generally why most of traditional systems is not much scalable as will require thread per client. For node it will be extremely inefficient from hardware resources point of view.
Node is event based, and you dont need to worry much about scope as far as your application logic does not exploit it.
Count of workers is recommended to be equal of CPU Cores on hardware.
There is always a master application, that will create workers. Each worker will create http + socket.io listeners which technically will be bound to master socket and routed from there.
http requests will be routed for to different workers while sockets will be routed on connection moment, but then that worker will handle this socket until it gets disconnected.

Related

Working with WebSockets and NodeJs clusters

I currently have a Node server running that works with MongoDB. It handles some HTTP requests, but it largely used WebSockets. Basically, the server connects multiple users to rooms with WebSockets.
My server currently has around 12k WebSockets open and it's almost crippling my single threaded server, and now I'm not sure how to convert it over.
The server holds HashMap variables for the connected users and rooms. When a user does an action, the server often references those HashMap variables. So, I'm not sure how to use clusters in this. I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Does anyone have any ideas on what to do?
Thank you.
You can look at the socket.io-redis adapter for architectural ideas or you can just decide to use socket.io and the Redis adapter.
They move the equivalent of your hashmap to a separate process redis in-memory database so all clustered processes can get access to it.
The socket.io-redis adapter also supports higher-level functions so that you can emit to every socket in a room with one call and the adapter finds where everyone in the room is connected, contacts that specific cluster server, and has it send the message to them.
I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Threads in node.js are not lightweight things (each has its own V8 instance) so you will not want a nodejs thread for every WebSocket connection. You could group a certain number of WebSocket connections on a web worker, but at that point, it is likely easier to use clustering because nodejs will handle the distribution across the clusters for you automatically whereas you'll have to do that yourself for your own web worker pool.

Multi threading nodeJS and Socket IO

Okay so multithreading nodeJS isn't much problem from what I've been reading. Just deploy several identical apps and use nginx as reverse proxy and load balancer of all the apps.
But actually native cluster module works pretty well too, I found.
However, what if I have socket.io with the nodeJS app? I have tried the same strategy with nodeJS + socket.IO; however, it obviously did not work because every socket event emitted will be more or less evenly distributed and sockets other than the one that made the connection would have no idea where the request came from.
So best method I can think of right now is to separate nodeJS server and socket.IO server. Scale nodeJS server horizontally (multiple identical apps) but just have one socket.IO server. Although I believe it would be enough for the purpose of our solution, I still need to look out for future. Has anyone succeeded in horizontally scaling Socket.IO? So multiple threads?
The guidelines on the socket.io website use Redis with a package called socket.io-redis
https://socket.io/docs/using-multiple-nodes/
Looks like is just acts like a single pool for the connections, and each node instance connects to that.
Putting your socket server on a separate service (micro-service) is a probably fine, the downside is needing to manage communications between the two instances.

Does node.js is creating an instance of a server for each client?

Does node.js is create an instance of a node.js for each client, or there is only one instance of node.js server for a whole variety of clients and unique instances created only for paths for each client ?
Nodejs doesn't create a new server instance for each client, neither do other options out there.
You're probably thinking of multithreading as traditionally multithreaded web servers create a new thread for each client request, however since node.js runs JavaScript which is single threaded the answer is no - every client request is handled by the same single thread.
That is why Node.js and JavaScript are often associated with the word blocking referring to the fact that if you write code that takes a long time to complete, it will block all the other users from getting served. You don't however have to worry about blocking when performing I/O since Node.js (JavaScript) is asynchronous - meaning that client requests won't block each other when performing I/O operations such as network requests or disk reads.
To read more on Node.js being single threaded, see this S/O answer: Why is Node.js single threaded?

What is the best way to communicate between two servers?

I am building a web app which has two parts. In one part it uses a real time connection between the server and the client and in the other part it does some cpu intensive task to provide relevant data.
Implementing the real time communication in nodejs and the cpu intensive part in python/java. What is the best way the nodejs server can participate in a duplex communication with the other server ?
For a basic solution you can use Socket.IO if you are already using it and know how it works, it will get the job done since it allows for communication between a client and server where the client can be a different server in a different language.
If you want a more robust solution with additional options and controls or which can handle higher traffic throughput (though this shouldn't be an issue if you are ultimately just sending it through the relatively slow internet) you can look at something like ØMQ (ZeroMQ). It is a messaging queue which gives you more control and lots of different communications methods beyond just request-response.
When you set either up I would recommend using your CPU intensive server as the stable end(server) and your web server(s) as your client. Assuming that you are using a single server for your CPU intensive tasks and you are running several NodeJS server instances to take advantage of multi-cores for your web server. This simplifies your communication since you want to have a single point to connect to.
If you foresee needing multiple CPU servers you will want to setup a routing server that can route between multiple web servers and multiple CPU servers and in this case I would recommend the extra work of learning ØMQ.
You can use http.request method provided to make curl request within node's code.
http.request method is also used for implementing Authentication api.
You can put your callback in the success of request and when you get the response data in node, you can send it back to user.
While in backgrount java/python server can utilize node's request for CPU intensive task.
I maintain a node.js application that intercommunicates among 34 tasks spread across 2 servers.
In your case, for communication between the web server and the app server you might consider mqtt.
I use mqtt for this kind of communication. There are mqtt clients for most languages, including node/javascript, python and java. In my case I publish json messages using mqtt 'topics' and any task that has registered to subscribe to a 'topic' receives it's data when published. If you google "pub sub", "mqtt" and "mosquitto" you'll find lots of references and examples. Mosquitto (now an Eclipse project) is only one of a number of mqtt brokers that are available. Another very good broker that is written in Java is called hivemq.
This is a very simple, reliable solution that scales well. In my case literally millions of messages reliably pass through mqtt every day.
You must be looking for socketio
Socket.IO enables real-time bidirectional event-based communication.
It works on every platform, browser or device, focusing equally on reliability and speed.
Sockets have traditionally been the solution around which most
realtime systems are architected, providing a bi-directional
communication channel between a client and a server.

Forking a new process when Node.JS receives a connection

I'm running a Node.JS application involving heavy child process I/O. Due to the way Node.JS handles file descriptors (among other reasons), I want to fork a new V8 instance for every connection to the server. (Yes, I'm aware that this is a potentially expensive operation, but that's not the point of this question.)
I am using nssocket for my server, but this question should apply to other types of Node.JS servers (express, Socket.IO, etc) as well.
Right now I have:
var server = require("nssocket").createServer(function(socket){
// Do stuff with the new connection
}).listen(8000);
The intuitive thing to do is this:
// master.js
var server = require("nssocket").createServer(function(socket){
// Fork a new process to handle the connection
child_process.fork("worker.js");
}).listen(8000);
// worker.js
// Do stuff with the new connection
However, then the child process won't have access to the socket variable.
I've read about the new cluster API in Node, but it doesn't look like it's designed for the case when you want every connection to spawn a new worker.
Any ideas?
The cluster API is probably closest to what you want. In theory you can call cluster.fork() at any time within the master process. Note that once the socket connection is established, there is afaik no way to hand it over to another process.
To forward the communication to the worker, you could use message passing (i.e. worker.send) or you could open another port in the worker process and direct the client there.
I should stress that running significantly more worker processes than CPU cores is probably not a good idea. Have you considered pooling the workers or using a work queue like Beanstalkd?
You can use cluster module to fork workers, then IPC (Inter-process communication) channel plus a messaging queue to pass objects between master process and workers. A good option would be ZMQ.

Resources