Node js how to schedule jobs (non-blocking) - node.js

I want to schedule jobs to run every x seconds so how can it be done.
An example is to ping a server every 10 seconds but all this should be async so non of the other functionality stops.

It depends upon what you mean by "none of the other functionality stops". Pinging a server is already, non-blocking and asynchronous as all networking in nodejs is that way.
So, code like this:
setInterval(() => {
// put code to do the ping here
}, 5000);
Will not block the rest of your server because code do to a ping is already non-blocking.
However, the rest of your server could block the setInterval() from firing on time. If you were executing some long running blocking code in your server, then the setInterval() timer callback would not happen until that code was done and it might be delayed from exactly when it was scheduled to run. Since nodejs is single threaded and event-driven, it can't process the next timer event until the previous event (whatever that was) is done executing the blocking portion of its code.
If you wanted to make absolutely sure that your timer would always run very close to when it should run, you either have to make sure the rest of your program never blocks for very long or you will need to move the timer out into another Javascript thread or process and communicate back to the main thread via messaging.
You could use a WorkerThread within your existing nodejs process or you can use the child_process module to run a separate child program that does your pinging for you. Nodejs has built-in messages that will work from either the WorkerThread or the child_process back to or from your main program.

You would need to utilize multithreading to make non blocking code. I would suggest using the built in node.js multithreading package named worker_threads.
https://nodejs.org/api/worker_threads.html

Related

Node.js API that runs a script executing continuously in the background

I need to build a Node.js API that, for each different user that calls it, starts running some piece of code (a simple script that sets up a Telegram client, listens to new messages and performs a couple of tasks here) that'd then continuously run in the background.
My ideas so far have been a) launching a new child process for each API call and b) for each call automatically deploying the script on the cloud.
I assume the first idea wouldn't be scalable, as for the second I have no experience on the matter.
I searched a dozen of keyword and haven't found anything relevant so far. Is there any handy way to implement this? In which direction can I search?
I look forward to any hint
Im not a node dev, but as a programmer you can do something like:
When user is active, it calls a function
this function must count the seconds that has passed to match the 24h (86400 seconds == 24 hours) and do the tasks;
When time match, the program stops
Node.js is nothing more that an event loop (libuv) whose execution stack run on v8 (javascript). The process will keep running until the stack is empty.
Keep it mind that there is only one thread executing your code (the event loop) and everything will happen as callback.
As long as you set up your telegram client with some listeners, node.js will wait for new messages and execute related listener.
Just instantiate a new client on each api call and listen to it, no need to spam a new process.
Anyway you'll eventually end in out of memory if you don't limit the number of parallel client of if you don't close them after some time (eg. using setInterval()).

NodeJs App - Repeated job - Single or multiple child processes?

I am currently developing a node js app with a REST API that exposes data from a mongo db.
The application needs to update some data every 5 minutes by calling an external service (could take more than one minute to get the new data).
I decided to isolate this task into a child_process but I am not sure about what should I need put in this child process :
Only the function to be executed. The schedule is managed by the main process.
Having a independent process that auto-refresh data every 5 minute and send a message to main process every time the refresh is done.
I don't really know if there is a big cost to start a new child process every 5 minutes or if I should use only one long time running child process or if I am overthinking the problem ^^
EDIT - Inforamtion the update task
the update task can take up than one minute but it consists in many smaller tasks (gathering information from many external providers) than run asynchronously do many I don't even need a child process ?
Thanks !
Node.js has an event-driven architecture capable of handling asynchronous calls hence it is unlike your typical C++ program where you will go with a multi-threaded/process architecture.
For your use-case I'm thinking maybe you can make use of the setInterval to repeatedly perform an operation which you can define more tiny async calls through using some sort of promises framework like bluebirdJS?
For more information see:
setInterval: https://developer.mozilla.org/en-US/docs/Web/API/WindowTimers/setInterval
setInterval()
Repeatedly calls a function or executes a code snippet, with a fixed
time delay between each call. Returns an intervalID.
Sample code:
setInterval(function() {
console.log("I was executed");
}, MILLISECONDS_IN_FIVE_MINUTE);
Promises:
http://bluebirdjs.com/docs/features.html
Sample code:
new Promise(function(resolve, reject) {
updateExternalService(data)
.then(function(response) {
return this.parseExtResp(response);
})
.then(function(parsedResp) {
return this.refreshData(parsedResp);
})
.then(function(returnCode) {
console.log("yay updated external data source and refreshed");
return resolve();
})
.catch(function(error) {
// Handle error
console.log("oops something went wrong ->" + error.message);
return reject();
});
}
It does not matter the total clock time that it takes to get data from an external service as long as you are using asynchronous requests. What matters is how much CPU you are using in doing so. If the majority of the time is waiting for the external service to respond or to send the data, then your node.js server is just sitting idle most of the time and you probably do not need a child process.
Because node.js is asynchronous, it can happily have many open requests that are "in flight" that it is waiting for responses to and that takes very little system resources.
Because node.js is single threaded, it is CPU usage that typically drives the need for a child process. If it takes 5 minutes to get a response from an external service, but only 50ms of actual CPU time to process that request and do something with it, then you probably don't need a child process.
If it were me, I would separate out the code for communicating with the external service into a module of its own, but I would not add the complexity of a child process until you actually have some data that such a change is needed.
I don't really know if there is a big cost to start a new child
process every 5 minutes or if I should use only one long time running
child process or if I am overthinking the problem
There is definitely some cost to starting up a new child process. It's not huge, but if you're going to be doing it every 5 minutes and it doesn't take a huge amount of memory, then it's probably better to just start up the child process once, have it manage the scheduling of communicating with the external service entirely upon it's own and then it can communicate back results to your other node.js process as needed. This makes the 2nd node process much more self-contained and the only point of interaction between the two processes is to communicate an update. This separation of function and responsibility is generally considered a good thing. In a multi-developer project, you could more easily have different developers working on each app.
It depends on how cohesion between your app and the auto refresh task.
If the auto refresh task can running standalone, without interaction with your app, then it better to start your task as a new process. Use child_process directly is not a good idea, spawn/monitor/respawn child process is tricky, you can use crontab or pm2 to manage it.
If auto refresh task depends on your app, you can use child_process directly, send message to it for schedule. But first try to break this dependency, this will simplify your app, easy to deployment and maintain separately. Child process is long running or one shot is not a question until you have hundreds of such task running on one machine.

Does cron job block the main process or nodejs will create a worker to do cron task

I am using node-cron to do some heavy tasks (update database) every minute. Does this task use main process to work or nodejs will create some workers to do these taks?
var CronJob = require('cron').CronJob;
new CronJob('0 * * * * *', function() {
//Update database every minute here
console.log('Update database every minute');
}, null, true, 'America/Los_Angeles');
It is supposed to create a worker for you.. It is not well documented in the library docs but:
1) You can see at the dependencies, it depends on node-worker.
2) If the cron job were to be blocking, then the waiting for the cron job to execute (in this case, a minute) would be blocking as well. This is because the main thread will just wait until it has to do it. Which in this case, it will be no cron job because it will be a simple sleep() and then execute.
Although, if you want to be sure, try doing a nodejs main program with a "while true" and inside probably writing something to console. And make a cronjob that every minute it will execute a sleep() command for the time you wish. The expected symptom is that the writing in console should never stop ..
Hope this helps..
Cheers
Any blocking operation will block the main thread indeed, at least with node-cron.
I have tried with an expressjs app where the cron job attemps to fetch data from web regularly:
// app.js
...
/* Routes */
app.use("/", valueRoutes);
/* Cron Job */
cron.schedule(CRON_EXP, refreshData); // long running asyn operation
export default app;
During the refreshData method execution, the express app is not able to respond to requests.
This question has been addressed here: https://github.com/node-cron/node-cron/issues/114
Internally node-cron performs the given function asynchronously, inside a setTimeout.
But if inside your function, if you do some block io, as a for, it'll block all your thread.
First, node-cron has the same merits and demerits as Node.js, being a runtime of JavaScript, which happens to be a non-blocking single-threaded language that uses the event loop.
Secondly, to understand the merit part of that fact, note that there is a difference between an asynchronous task and a synchronous task. That difference is about whether the task or code instruction is to run outside your program in case of asynchronous and whether it's to run inside your program in case of synchronous. So, where Node.js shines is that it does not pause your program execution resource (a single thread) when it encounters an instruction that is to run outside your program (an example of which is waiting for results of interacting with a database like in your case), and rather uses the event loop to wait for the response from the external land that handles that task, after which it can process the result according to whatever functionality (callback) you have hooked to run the received result. Until recently, many popular programming languages will always block the program execution resource (a thread your program is using albeit they often have multiple threads) while waiting for an asynchronous task, despite such task's execution being outside of your program. That's why Node.js is highly performant when your application is doing heavy i/o interactions with various external resources, unlike other blocking variants for asynchronous tasks, where their multiple threads get blocked pretty fast as they are not released while waiting for results that are not to be processed by them. Enough said about the plus for Node.js. Next is the demerit of the single-threaded nature of Node.js.
Thirdly, the demerit of the single-threaded nature of Node.js comes from heavy synchronous tasks. These are tasks that need to run inside your program and are CPU intensive, imagine looping through a very long list or rendering or processing high fidelity graphics. Since Node.js has a single thread, any other request in that meanwhile of processing a heavy synchronous task will have to wait till the heavy synchronous task finishes processing. Enough said about the minus for Node.js. Next is the solution to this problem.
Enter worker threads. From Node.js v10.5 upwards, a node app, which is running on a single thread that can be seen as the main thread, is able to orchestrate delegation and reporting of tasks to and from other child threads, each of which is also essentially running an isolated single-threaded JavaScript instance. Thereby, if a CPU-heavy task is encountered, you can make the main thread delegate such a task to a child thread, and thereby make the main thread to be available to service any other request. Next is to clarify if node-cron as a job scheduler uses this feature.
node-cron doesn't use the worker thread functionality of Node.js. In the case of your own job, that is not a problem as your job is asynchronous. However, there is bree.js, which is a very robust Node.js job scheduler that goes on to use the worker threads in Node.js, and I believe you now know that you will need something like that to performantly run heavy synchronous jobs.
Finally, do well to explore worker threads whenever you have heavy synchronous tasks because while Node.js supports worker threads, it won't apply that for you automatically when need be.

How does Node.js execute two different scripts?

I can't imagine how can node.js in one single thread execute two scripts with different code simultaneously. For example I have two different scripts A and B. What will happen if almost simultaneously several clients request A and B. For PHP it is understandable, for example, will be created five threads to handle A and five threads to handle B, and for each request script executes again. But what happens in Node.js? Thank you!
It uses the so called event loop, implemented by libuv
A very simple explanation would be: when a new event occurs, it will be put into a queue. Every now and then, the node process will interupt execution to process these events.
The main difference between PHP and node is that a node.js process is essentially a stand-alone web server (single threaded), while PHP is an interpreter that runs within a web server (i.e. Apache), which is responsible for creating new threads for each request.
Node.js is very good for network applications (like web sites) because in these applications most of the work is I/O which node.js handles asynchronously.
Even if two requests arrive at the same time and Node.js only has one single thread of execution, each one of the requests (in sequence) will be handed off to the operating system for I/O (via libuv as mihai pointed out) and the fact that there is only one JavaScript thread of execution becomes irrelevant. As the I/O completes, the JavaScript thread picks up the result and returns a response.

nodejs: Is it possible to eval js code using runInNewContext and limit its execution time by a timeout?

I'd like to execute an untrusted js code using runInNewContext in node.js but as far as I see there is no way to limit its execution time. Also it is a sync operation. is there a way to set timeout on it or async version of it that will allow me to control its execution from 'outside'?
UPDATE: running in an external process is no good:
takes too much resources
more importantly, I need the code to have access to my data/code though sandbox environment
Run script in external process using dnode or child_process.fork, set deadline timer and kill process if timeout reached or timer if script finished.
I'd like to execute an untrusted js code using runInNewContext in
node.js but as far as I see there is no way to limit its execution
time. Also it is a sync operation. is there a way to set timeout on
it or async version of it that will allow me to control its execution
from 'outside'?
I think What you are saying is completely true. I think the only option is to fill an issue with Joyent/Ryan Dahl. Hopefully he/they can come up with something smart or maybe will tell you it is not possible.
From vm.runInNewContext:
Note that running untrusted code is a tricky business requiring great
care. To prevent accidental global variable leakage,
vm.runInNewContext is quite useful, but safely running untrusted code
requires a separate process.
So to do this safely you need to run in external program. I think the "expensive part" can be avoided by preforking.
A single control process is responsible for launching child processes
which listen for connections and serve them when they arrive. Apache
always tries to maintain several spare or idle server processes, which
stand ready to serve incoming requests. In this way, clients do not
need to wait for a new child processes to be forked before their
requests can be served.
This is now possible because I added timeout parameter support to the Node vm module. You can simply pass in a millisecond timeout value to runInNewContext() and it will throw an exception if the code does not finish executing in the specified amount of time.
Note, this does not imply any kind of security model for running untrusted code. This simply allows you to timeout code which you do trust or otherwise secure.
var vm = require("vm");
try {
vm.runInNewContext("while(true) {}", {}, "loop", 1000);
} catch (e) {
// Exception thrown after 1000ms
}
console.log("finished"); // Will now be executed
Exactly what you would expect:
$ time ./node test.js
finished
real 0m1.069s
user 0m1.047s
sys 0m0.017s

Resources