Long-running computations in node.js

Long-running computations in node.js - multithreading

I'm writing a game server in node.js, and some operations involve heavy computation on part of the server. I don't want to stop accepting connections while I run those computations -- how can I run them in the background when node.js doesn't support threads?

I can't vouch for either of these, personally, but if you're hell-bent on doing the work in-process, there have been a couple of independent implementations of the WebWorkers API for node, as listed on the node modules page:
http://github.com/cramforce/node-worker
http://github.com/pgriess/node-webworker
At first glance, the second looks more mature, and these would both allow you to essentially do threaded programming, but it's basically actor-model, so it's all done with message passing, and you can't have shared data structures or anything.
Also, for what it's worth, the node.js team intends to implement precisely this API natively, eventually, so these tools, even if they're not perfect, may be a decent stopgap.

var spawn = require('child_process').spawn;
listorwhatev = spawn('ls', ['-lh', '/usr']);//or whatever server action you need
//then you can attach events to that list like this
listorwhatev.on('exit', function(code){});
///or in this ls example as it streams info
listorwhatev.stdout.on('data', function(info){sys.puts(info);});
ensure spawn process occurs once per application then feed stuff into it and watch it for events per connection.
you should also check that the listorwhatev is still running before handling it. As we all love those uncaught errors in node crashing the app don't we ;)
When the spawn (pid) is exited though a kill or something bad happens on your machine and you didn't exit the spawn in your code gracefully, your stream event handler will crash your app.

and some operations involve heavy
computation on part of the server
How did you write code that is computation heavy in the first place. That's very hard to do in node.js.
how can I run them in the background
when node.js doesn't support threads
you could spawn a couple of worker(node) instances and communicate with accepting connections(node instance) using for example redis blocking pop. Node.js redis library is none blocking.

Related

When to use synchronous - blocking code in Node.js

I had asked in an interview, are there any cases that may force you to use blocking code in a node.js server?
my answer was: I didn't ever need that in any project but I think it may be useful in some tasks that need much CPU processing like Some Image Processing or video generation.
so experts, can you correct that for me, is there any case that a blocking code would be a must?

First off, you have to distinguish between the different types of programs. A server that you expect to be responsive to many different incoming requests has very different needs than a single user program you write to do some file management or fetch some content and insert it in a database.
So, if you're not a multi-user server, you may be able to use synchronous I/O everywhere it's offered (most specifically for file access). For example, I have several scripts that do file management on my hard disk. These scripts don't have any server component and are run automatically in the middle of the night to trim backups, trim log files, etc... These scripts are perfectly OK to use synchronous I/O for pretty much anything.
If, on the other hand, you are a mutli-user server and you need to be responsive to incoming requests that can arrive at any time, then the only two times you can/should use blocking I/O or blocking crypto are at startup time or in some sort of shut-down scenario. For all other code in service of incoming requests, you have to use non-blocking, asynchronous I/O to avoid locking up your server during a request and making it non-responsive to new incoming requests.
If you have time consuming, CPU-intensive operations such as image processing or video generation, then you will want to offload that processing to another thread or process so that your main server thread is not blocked doing that processing. A typical way of handling that would be to create a worker pool of N processes/threads that can be sent jobs to crunch on. Then, you keep your most CPU-intensive work out of the main nodejs thread, allowing it to stay responsive to incoming requests.
so experts, can you correct that for me, is there any case that a blocking code would be a must?
Synchronous (blocking) I/O vastly simplifies server startup as you can do things like read configurations synchronously. You could write that code asynchronously, but then your module interface often end up having to return promises that indicate when it's actually ready and done with its initialization which complicates using the module.
For example, require() is synchronous and this really, really helps make initialization a lot simpler.
The only place I know of in a server where blocking code might be required is if you're trying to write something to disk right before your program exits when it's already in the process of exiting. You get notified of an exit event and if you try to use asynchronous file I/O, then your program will exit before the I/O finishes. In that case, you may need to use synchronous file I/O (which is not a problem in that circumstance).

Should I spawn a new node process per game room?

I am making a card game on node.js and I am thinking about spawning a new process per game room. I plan on doing the connection using fork(). After quite some research, I found that this isn't the best approach, because I should have a node process per physical core. But isn't my approach better for scalability and modularity? If let's say a game room crashes, it wouldn't crash the rest of them. Can someone help me analyse a bit better the situation? I plan on running the game on aws EC2 instances and expect a maximum of 1500 concurrent users, playing on rooms of 4 people, and communicating with socket.io messages.

A single Node instance can handle that type of load, as one of NodeJS's strongpoints is real time communication and many concurrent connections.
Regarding crashes, you need to plan for those. Some initial tips:
Catch errors and log error messages so that your Node instance does not completely fail. Often errors may stop that particular function chain from finishing correctly, but will not kill your process.
Persist your game state to another service, like a database, so things (like connections) can recover. Use Case example: "User loses connection and logs back in, they are re-connected to the room and can see the game in its current state"
You can auto-recover your Node process by running something like forever or PM2 (there are others). These will monitor and restart on process failure (though that shouldn't happen too often).

Node/Express: running specific CPU-instensive tasks in the background

I have a site that makes the standard data-bound calls, but then also have a few CPU-intensive tasks which are ran a few times per day, mainly by the admin.
These tasks include grabbing data from the db, running a few time-consuming different algorithms, then reuploading the data. What would be the best method for making these calls and having them run without blocking the event loop?
I definitely want to keep the calculations on the server so web workers wouldn't work here. Would a child process be enough here? Or should I have a separate thread running in the background handling all /api/admin calls?

The basic answer to this scenario in Node.js land is to use the core cluster module - https://nodejs.org/docs/latest/api/cluster.html
It is an acceptable API to :
easily launch worker node.js instances on the same machine (each instance will have its own event loop)
keep a live communication channel for short messages between instances
this way, any work done in the child instance will not block your master event loop.

What's the limit of spawning child_processes?

I have to serve a calculation via algorithm, I've been advised to use a child process per each opened socket, what I am about to do is something like that:
var spawn = require('child_process').spawn;
var child = spawn('node', ['algorithem.js']);
I know how to send argument to the algorithm process and how to receive results.
What I am concerned about, is how many socket (each socket will spawn a process) I can have?
How can I resolve this with my cloud hosting provider? so that my app gets auto scaled?
What's the recommended node js cloud hosting provider?
Finally, is this a good approach in using child processes?

Yes, this is a fair approach when you have to do some heavy processing in node. However, starting a new process introduces some overhead, so be aware. The number of sockets (file descriptors) you can open is limited by your operating system. On Linux, the limits can seen using for example the ulimit-utility.
One alternative approach, that would remove the number of sockets/processes worry, is to run a separate algorithm/computation-server. This server could spawn N worker threads and would listen on a socket. When a computation request is received, this can for example be queued and processed by the first available thread. An advantage of this approach is that your computation server can run on any machine, freeing up resources for your node instance.

How node.js works?

I don't understand several things about nodejs. Every information source says that node.js is more scalable than standard threaded web servers due to the lack of threads locking and context switching, but I wonder, if node.js doesn't use threads how does it handle concurrent requests in parallel? What does event I/O model means?
Your help is much appreciated.
Thanks

Node is completely event-driven. Basically the server consists of one thread processing one event after another.
A new request coming in is one kind of event. The server starts processing it and when there is a blocking IO operation, it does not wait until it completes and instead registers a callback function. The server then immediately starts to process another event (maybe another request). When the IO operation is finished, that is another kind of event, and the server will process it (i.e. continue working on the request) by executing the callback as soon as it has time.
So the server never needs to create additional threads or switch between threads, which means it has very little overhead. If you want to make full use of multiple hardware cores, you just start multiple instances of node.js
Update
At the lowest level (C++ code, not Javascript), there actually are multiple threads in node.js: there is a pool of IO workers whose job it is to receive the IO interrupts and put the corresponding events into the queue to be processed by the main thread. This prevents the main thread from being interrupted.

Although Question is already explained before a long time, I'm putting my thoughts on the same.
Node.js is single threaded JavaScript runtime environment. Basically it's creator Ryan Dahl concern was that parallel processing using multiple threads is not the right way or too complicated.
if Node.js doesn't use threads how does it handle concurrent requests in parallel
Ans: It's completely wrong sentence when you say it doesn't use threads, Node.js use threads but in a smart way. It uses single thread to serve all the HTTP requests & multiple threads in thread pool(in libuv) for handling any blocking operation
Libuv: A library to handle asynchronous I/O.
What does event I/O model means?
Ans: The right term is non-blocking I/O. It almost never blocks as Node.js official site says. When any request goes to node server it never queues the request. It take request and start executing if it's blocking operation then it's been sent to working threads area and registered a callback for the same as soon as code execution get finished, it trigger the same callback and goes to event queue and processed by event loop again after that create response and send to the respective client.
Useful link:
click here

Node JS is a JavaScript runtime environment. Both browser and Node JS run on V8 JavaScript engine. Node JS uses an event-driven, non-blocking I/O model that makes it lightweight and efficient. Node JS applications uses single threaded event loop architecture to handle concurrent clients. Actually its' main event loop is single threaded but most of the I/O works on separate threads, because the I/O APIs in Node JS are asynchronous/non-blocking by design, in order to accommodate the main event loop. Consider a scenario where we request a backend database for the details of user1 and user2 and then print them on the screen/console. The response to this request takes time, but both of the user data requests can be carried out independently and at the same time. When 100 people connect at once, rather than having different threads, Node will loop over those connections and fire off any events your code should know about. If a connection is new it will tell you .If a connection has sent you data, it will tell you .If the connection isn’t doing anything ,it will skip over it rather than taking up precision CPU time on it. Everything in Node is based on responding to these events. So we can see the result, the CPU stay focused on that one process and doesn’t have a bunch of threads for attention.There is no buffering in Node.JS application it simply output the data in chunks.

Though its been answered , i would like to just share my understandings in simple terms
Nodejs uses a library called Libuv , so this Libuv is written in C
language which uses the concept of threads . These threads are called
as workers and these workers take care of the multiple requests from client.
Parallel processing in nodejs is achieved with the help of 2 concepts
Asynchronous
Non blocking IO

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string