If what i understand is correct, processing background tasks is a good way to free the main thread of cpu bound tasks.
What i don't get is what is used by systems like bull or kue to run the tasks out of the main thread.
Do they use threads ? Do they require an entire node process fork? Do they spawn child processes?
Bull is based on Redis which handles in it's own process the handling and Queuing of data for these jobs.
It is a lightweight, robust and fast job processing queue. It uses redis for persistence, so the queue is not lost if the server goes down for any reason.
the internal implementation of the Job can be seen here
The same is with Kue module which is a priority job queue backed by redis processes, built for node.js. The background tasks are powered by Redis.
This means that these module depends on Redis external process that enables different creating background jobs.
Job-specific events are fired on the Job instances via Redis pubsub:
enqueue the job is now queued
promotion the job is promoted from delayed state to queued
progress the job's progress ranging from 0-100
failed attempt the job has failed, but has remaining attempts yet
failed the job has failed and has no remaining attempts
complete the job has completed
remove the job has been removed
The delayed Jobs are powered by Redis Queue which notifies/triggers back the callbacks in the module.
That's not how node.js works. Node.js internally uses an event loop to handle requests (and thus, making it an event driven framework)
This entire event loop is running in a single thread. When you execute a long running command (such as I/O or network operations) the request is enqueued to the loop and the process doesn't block. When the operation completes, it triggers a callback in your code
Related
I'm implementing a web server using nodejs which must serve a lot of concurrent requests. As nodejs processes the requests one by one, it keeps them in an internal queue (in libuv, I guess).
I also want to run my web server using cluster module, so there will be one requests queue per worker.
Questions:
If any worker dies, how can I retrieve its queued
requests?
How can I put retrieved requests into other workers' queues?
Is there any API to access to alive workers' requests queue?
By No. 3 I want to keep queued requests somewhere such as Redis (if possible), so in case of server crash, failure or even hardware restart I can retrieve them.
As you mentioned in the tags that you are-already-using/want-to-use redis, you can use queue-manager based on redis to do all the work for you.
Checkout https://github.com/OptimalBits/bull (or it's alternatives).
bull has a concept of queue. you add jobs to the queue and listen to the same queue from different processes/vms. bull will send the same job to only one listener and you have the ability to control how many jobs each listener is processing at the same time (concurrency-level).
In addition, if one of the jobs fails to run (in other words, the listener of the queue threw an error), bull will try to give the same job to different listener.
I can see that NodeJS is bringing in multi-threading support via its worker threads module. My current assumption (I have not yet explored personally) is that I can offload a long running /cpu intensive operation to these worker threads.
I want to understand the behaviour if this long running piece of code has some intermittent event callbacks or chain of promises. Do these callbacks still execute on the worker threads, or do they get passed on back to the main thread?
If these promises come back to main thread, the advantage of executing the worker thread may be lost.
Can someone clarify?
Update => Some context of the question
I have a http req that initiates some background processing and returns a 202 status. After receiving such request, I am starting a background processing via
setTimeout (function() { // performs long running file read operations.. })
and immediately return a 202 to the caller.
However, I have observed that, during this time while this background operation is going, other http requests are either not being processed, or very very sluggish at the best.
My hypothesis is that this continuous I/O processing of a million+ lines is filling up the event loop with callbacks / promises that the main thread is unable to process other pending I/O tasks such as accepting new requests.
I have explored the nodejs cluster option and this works well, as the long task is delegated to one of the child processes, and other instances of cluster are available to take up additional requests.
But I was thinking that worker threads might solve the same problem, without the overhead of cloning the process.
I assume each worker thread would have its own event loop.
So if you emit an event in a worker thread, only that thread would receive it and trigger the callback. The same for promises, if you create a promise within a worker, it will only be resolved by that worker.
This is supported by their statement in the documentation regarding Class: Worker: Most Node.js APIs are available inside of it (with some exceptions that are not related to event processing).
However they mention this earlier in the docs:
Workers are useful for performing CPU-intensive JavaScript operations; do not use them for I/O, since Node.js’s built-in mechanisms for performing operations asynchronously already treat it more efficiently than Worker threads can.
I think some small scale async code in worker threads would be fine, but having more callbacks/promises would hurt performance. Some benchmarks could shed some light on this.
I want to implement something akin to work stealing or task migration in multiprocessor systems. Details below.
I am simulating a scheduling system with multiple worker nodes (resources, each with multiple capacity), and tasks (process) that arrive randomly and are queued by the scheduler at a specific worker node. This is working fine.
However, I want to trigger an event when a worker node has spare capacity, so that it steals the front task from the worker with the longest wait queue.
I can implement the functionality described above. The problem is that all the tasks waiting on the worker queue from which we are stealing work receive the event notification. I want to notify ONLY the task at the front of the queue (or only N tasks at the front of the queue).
The Bank reneging example is the closest example to what I want to implement. However, it (1) ALL the customers leave the queue when they are notified that the event was triggered, and (2) when event is triggered, the customers leave the system; in my example, I want to make the task wait at another worker (though it wouldn't wait, since the queue of that worker is empty).
Old question: Can this be done in SimPy?
New questions: How can I do this in SimPy?
1) How can I have many processes, waiting for a resource, listen for an event, but notify only the first one?
2) How can I make a process migrate to another resource?
I am using node-cron to do some heavy tasks (update database) every minute. Does this task use main process to work or nodejs will create some workers to do these taks?
var CronJob = require('cron').CronJob;
new CronJob('0 * * * * *', function() {
//Update database every minute here
console.log('Update database every minute');
}, null, true, 'America/Los_Angeles');
It is supposed to create a worker for you.. It is not well documented in the library docs but:
1) You can see at the dependencies, it depends on node-worker.
2) If the cron job were to be blocking, then the waiting for the cron job to execute (in this case, a minute) would be blocking as well. This is because the main thread will just wait until it has to do it. Which in this case, it will be no cron job because it will be a simple sleep() and then execute.
Although, if you want to be sure, try doing a nodejs main program with a "while true" and inside probably writing something to console. And make a cronjob that every minute it will execute a sleep() command for the time you wish. The expected symptom is that the writing in console should never stop ..
Hope this helps..
Cheers
Any blocking operation will block the main thread indeed, at least with node-cron.
I have tried with an expressjs app where the cron job attemps to fetch data from web regularly:
// app.js
...
/* Routes */
app.use("/", valueRoutes);
/* Cron Job */
cron.schedule(CRON_EXP, refreshData); // long running asyn operation
export default app;
During the refreshData method execution, the express app is not able to respond to requests.
This question has been addressed here: https://github.com/node-cron/node-cron/issues/114
Internally node-cron performs the given function asynchronously, inside a setTimeout.
But if inside your function, if you do some block io, as a for, it'll block all your thread.
First, node-cron has the same merits and demerits as Node.js, being a runtime of JavaScript, which happens to be a non-blocking single-threaded language that uses the event loop.
Secondly, to understand the merit part of that fact, note that there is a difference between an asynchronous task and a synchronous task. That difference is about whether the task or code instruction is to run outside your program in case of asynchronous and whether it's to run inside your program in case of synchronous. So, where Node.js shines is that it does not pause your program execution resource (a single thread) when it encounters an instruction that is to run outside your program (an example of which is waiting for results of interacting with a database like in your case), and rather uses the event loop to wait for the response from the external land that handles that task, after which it can process the result according to whatever functionality (callback) you have hooked to run the received result. Until recently, many popular programming languages will always block the program execution resource (a thread your program is using albeit they often have multiple threads) while waiting for an asynchronous task, despite such task's execution being outside of your program. That's why Node.js is highly performant when your application is doing heavy i/o interactions with various external resources, unlike other blocking variants for asynchronous tasks, where their multiple threads get blocked pretty fast as they are not released while waiting for results that are not to be processed by them. Enough said about the plus for Node.js. Next is the demerit of the single-threaded nature of Node.js.
Thirdly, the demerit of the single-threaded nature of Node.js comes from heavy synchronous tasks. These are tasks that need to run inside your program and are CPU intensive, imagine looping through a very long list or rendering or processing high fidelity graphics. Since Node.js has a single thread, any other request in that meanwhile of processing a heavy synchronous task will have to wait till the heavy synchronous task finishes processing. Enough said about the minus for Node.js. Next is the solution to this problem.
Enter worker threads. From Node.js v10.5 upwards, a node app, which is running on a single thread that can be seen as the main thread, is able to orchestrate delegation and reporting of tasks to and from other child threads, each of which is also essentially running an isolated single-threaded JavaScript instance. Thereby, if a CPU-heavy task is encountered, you can make the main thread delegate such a task to a child thread, and thereby make the main thread to be available to service any other request. Next is to clarify if node-cron as a job scheduler uses this feature.
node-cron doesn't use the worker thread functionality of Node.js. In the case of your own job, that is not a problem as your job is asynchronous. However, there is bree.js, which is a very robust Node.js job scheduler that goes on to use the worker threads in Node.js, and I believe you now know that you will need something like that to performantly run heavy synchronous jobs.
Finally, do well to explore worker threads whenever you have heavy synchronous tasks because while Node.js supports worker threads, it won't apply that for you automatically when need be.
For example:
I have a task named "URLDownload", the task's function is download a large file from internet.
Now I have a Worker Process running, but have about 1000 files to download.
It is easy for a Client Process to create 1000 task, and send them to Gearman Server.
My Question is the Worker Process will do the task one by one, or it will accept multi-tasks at one time,
If the Worker Process can accept multi-tasks, So How can I limit the task-pool-size in Worker Process.
Workers process one request at a time. You have a few options:
1) You can run multiple workers (this is the most common method). Workers sit in poll() when they aren't processing so this model works pretty well.
2) Write a fork() implementation around the worker. This way you can fire up a set number of worker processes, but don't have to monitor multiple processes.