nodejs - Dedicated workers library - node.js

I want to create a pool of N workers, and pass work to them over time.
I checked out workerpool and workerfarm but from what I see, these create the worker when work is posted. I.e time lost when initializing the worker.
I could just use the child_process module to do this myself, but would then have to set up fair distribution too. Is there an already existing module that does this type of stuff? (create N workers at start time and post work to them)

If you use the built in cluster module, you can spin up N workers and pass work between them through messages.
https://gist.github.com/jpoehls/2232358

Related

How does PM2 manages the cluster when PM2 run the application in the cluster mode?

I explored that PM2 uses the node cluster module to run the application in cluster,
node cluster module follows two approches to handle the cluster. uses the round-robin approach by default for handling cluster.
Document for how the node cluster module works
In that document they mentions like, The cluster module supports two methods of distributing incoming connections.
The first one (and the default one on all platforms except Windows) is the round-robin approach, where the primary process listens on a port, accepts new connections and distributes them across the workers in a round-robin fashion, with some built-in smarts to avoid overloading a worker process.
In the default approach, it uses the round-robin, then my assumsion is,in PM2 if the first request goes to first instance, then the second request should goes to the next instance that could be the instance 2.
But What I found is, all requests go to an instance till some time, after that time all request go to the next instance. So here what my question is, I want to know that time duration (How long all requests are going to same instance), How the time duration is managed by PM2?

how to run multiple instances without duplicate job in nodejs

I have a problem when I scale the project (nestjs) to multiple instance. In my project, I have a crawler service that run each 10 minutes. When 2 instance running, crawler will run on both instances, so data will duplicate. Does anyone know how to handle it?
Looks like it can be processed using a queue, but I don't have a solution yet
Jobs aren't the right construct in this case.
Instead, use a job Queue: https://docs.nestjs.com/techniques/queues
You won't even need to set up a separate worker server to handle the jobs. Instead, add Redis (or similar) to your setup, configure a queue to use it, then set up 1) a producer module to add jobs to the queue whenever they need to be run, 2) a consumer module which will pull jobs off the queue and process them. Add logic into the producer module to ensure that duplicate jobs aren't being created, if that logic is running on both your machines.
Conversely it may be easier just to separate job production/processing into a separate server.

How do I get a list of worker/process IDs inside a strongloop cluster?

It looks like each process in a Strongloop cluster is considered a worker, and therefore if you use a tool like node-scheduler that schedules jobs and you have multiple workers, the job is executed multiple times.
Ideally I'd be able to do something like:
var cluster = require('cluster');
if (cluster.isMaster) {
// execute code
}
Since this doesn't seem to be possible, I wonder if there is a way to get a list of all worker or process IDs from inside the node app so that I can do this same sort of thing with one worker? This will need to be something dynamic, as cluster.worker.id does not appear to be a reliable way to do this since the worker IDs are unpredictable.
Ideas?
"strongloop cluster" isn't a thing, its a node cluster: https://nodejs.org/dist/latest-v6.x/docs/api/cluster.html
No such API exists, and it wouldn't help you, you'd need to implement some kind of consensus algorithm to choose one of a (dynamic, workers can die and get replaced/restarted) set of workers as the "singleton"
Compose your system as microservices, if you need a singleton task runner, make it a service, and run it with a cluster size of 1.
This isn't really a cluster problem, its an inability to scale problem, isn't it? Cluster does internal scaling, you can limit to one for the scheduling service.... but when you will scale across multiple VMs (multiple Heroku dyno's, multiple docker containers, etc.) this will still fall apart... which will be the source of the timed node-schedule jobs?

Node/Express: running specific CPU-instensive tasks in the background

I have a site that makes the standard data-bound calls, but then also have a few CPU-intensive tasks which are ran a few times per day, mainly by the admin.
These tasks include grabbing data from the db, running a few time-consuming different algorithms, then reuploading the data. What would be the best method for making these calls and having them run without blocking the event loop?
I definitely want to keep the calculations on the server so web workers wouldn't work here. Would a child process be enough here? Or should I have a separate thread running in the background handling all /api/admin calls?
The basic answer to this scenario in Node.js land is to use the core cluster module - https://nodejs.org/docs/latest/api/cluster.html
It is an acceptable API to :
easily launch worker node.js instances on the same machine (each instance will have its own event loop)
keep a live communication channel for short messages between instances
this way, any work done in the child instance will not block your master event loop.

How to test master behaviour in a Node.JS cluster?

Suppose you are running a cluster in Node.JS and you wish to unit-test it. For instance, you'd like to make sure that if a worker dies the cluster takes some action, such as forking another worker and possibly some related job. Or that, under certain conditions, additional workers are spawned.
I suppose that in order to do this one must launch the cluster and have somehow access to its internal state; then (for instance) force workers to get stuck, and check the state after a delay. If so, how to export the state?
You'll have to architect your master to return a reference to its cluster object. In your tests, you can kill one of its workers with cluster.workers[2].kill(). The worker object also has a reference to the child's process object, which you can use to simulate various conditions. You may have to use a setTimeout to ensure the master has the time to do its thing.
The above methods however still creates forks, which may be undesirable in a testing scenario. Your other option is to use a mocking library (SinonJS et al) to mock out cluster's fork method, and then spy the number of calls it gets. You can simulate worker death by using cluster.emit('exit') on the master cluster object.
Note: I'm not sure if this is an issue only with me, but cluster.emit always seems to emit twice for me, for some reason.

Resources