Set Interval in Node.js vs. Cron Job?

Set Interval in Node.js vs. Cron Job? - node.js

I'm learning node.js and just set up an empty Linux Virtual Machine and installed node.
I'm running a function constantly every minute
var request = require('request')
var minutes = 1, the_interval = minutes * 60 * 1000
setInterval(function() {
// Run code
})
}, the_interval);
And considering adding some other functions based on current time. - (e.g. run function if dateTime = Sunday at noon)
My question is are there any disadvantages to running a set up like this compared to a traditional cron job set up?
Keep in mind I have to run this function in node every minute anyways.

My question is are there any disadvantages to running a set up like this compared to a traditional cron job set up?
As long as //run the code isn't a CPU-bound thing like cryptography, stick with 1 node process, at least to start. Since you are requiring request I guess you might be making an HTTP request, which is IO, which means this will be fine.
It's just simpler to have 1 thing to install/launch/start/stop/upgrade/connect-a-debugger than to deal with an app server as well as a separate cron-managed process. For what it's worth, keeping it in javascript makes it portable across platforms, although that probably doesn't really matter.
There is also a handy node-cron module which I have used as well as approximately one bazillion other alternatives.

It depends on how strictly you have to adhere to that minute interval and if your node script is doing anything else in the meantime. If the only thing the script does is run something every X, I would strongly consider just having your node script do X instead, and scheduling it using the appropriate operating system scheduler.
If you build and run this in node, you have to manage the lifecycle of the app and make sure it's running, recover from crashes, etc. Just executing once a minute via CRON is much more straightforward and in my opinion conforms more to the Unix Philosophy.

Cron, unless your app is really small and simple.
Also, the in-memory setTimeout would not work if you ever end up behind a load balancer. You might have two+ instances of your node server running and thus your function/script running multiple times instead of once.

Related

How to call a function every n milliseconds in "real world" time exactly?

If I understand correctly, setInterval(() => console.log('hello world'), 1000) will place the function to some queue of tasks to run. But if there are other tasks in-front of it, it won't run exactly at 1000 millisecond or every time.
In a single complex program, is it possible to also make calls to some function every n millisecond exactly in real world time with node.js ?

If I understand correctly, setInterval(() => console.log('hello world'), 1000) will place the function to some queue of tasks to run. But if there are other tasks in-front of it, it won't run exactly at 1000 millisecond or every time.
That is correct. It won't run exactly at the desired time if node.js happens to be busy doing something else when the timer is ready to run. node.js will wait until it finishes it's other task before running the timer callback. You can think of node.js as if it has a one-track mind (can only do one thing at a time) and timers don't ever interrupt existing tasks that are running.
In a single complex program, is it possible to also make calls to some function every n millisecond exactly in real world time with node.js ?
No, it is not possible to do that in node.js. node.js runs your Javascript as single-threaded, it's event driven and not-preemptive. All of these mean that you cannot rely on code running at a precise real-world time.
What happens under the covers in node.js is that you set a timer for a specific time in the future. That timer goes is registered with the node.js event loop so that each time it gets through the event loop, it will check if there are any pending timers. But, it only gets through the event loop when other code that was running before the timer was ready to fire finishes running. Here's the sequence of events:
Run some code
Set timer for some time in the future (say time X)
Run some more code
Nothing to do for awhile
Run some more code (while this code is running, time X passes - the time for your timer to run)
Previous block of code finishes running and control returns back to the node.js event loop at time X + n (some time after the timer X was supposed to fire).
Event loop checks to see if there are any pending timers. It finds a timer and calls its callback at time X + n.
So, the only way that your timer gets called at approximately time X is if node.js has nothing else to do at exactly time X. If your program is ever doing anything else, you can't guarantee that your program will be free at exactly time X to run the timer exactly when you want it to run. node.js is NOT a real-time system in any way. single-threaded and non-pre-emptive mean that a timer may have to wait for node.js to finish some other things before it gets to run and thus there is no guarantee that the timer will run exactly on time. Instead, it will run as not before time X when the interpreter is next free to return back to the event loop (done running whatever else might have been running at the time). This could be close to time X or it could be a significant time after time X.
If you really need something to run precisely at a specific time, then you likely need a pre-emptive system (not node.js) that is much more real-time than node.js is.
You could create a "work-around" in node.js by firing up another node.js process (you could use the child_process module) and start a program in that other process that has nothing else to do except serve your timer and execute the code associated with that timer. Then, at least you timer won't be pre-empted by some other Javascript task that might be running and will get to run pretty close to the desired time. Keep in mind that even this work-around still isn't a true real-time system, but might serve some purposes.
Otherwise, you probably want to write this in a more real-time system language that has pre-emptive timers (probably even with thread priorities).

But if there are other tasks in-front of it, it won't run exactly at 1000 millisecond or every time.
Your question is actually operating system specific, assuming the computer is running some (usual) operating system (like Windows, Android, Linux, MacOSX, etc...). I recommend reading Operating Systems: Three Easy Pieces to learn more.
In practice, your computer has many other processes managed by its operating system. Some of them might be running. Your computer might be in a situation where it is loaded enough by other processes to the point of not being able to run your tasks or threads exactly every second. Read about thrashing.
You might want to use some genuine real-time operating system. But then, node.js probably won't run on it.
How to call a function every n milliseconds in “real world” time exactly?
You cannot do that reliably. Because your node.js process (it is actually single threaded, at the system threads level, see pthreads(7) and jfriend00's answer) might not get enough resources from your OS (so if other processes are loading your computer too much, node.js would be starved and won't be able to progress like you want; be also aware of possible priority inversions).
On Linux, see also shed(7), chrt(1), renice(1)

I suggest to make a cron which will run at every n seconds. If your program is complex and it may take more time then you can go with async.
npm install cron
var CronJob = require('cron').CronJob;
new CronJob('* * * * * *', function() {
console.log('You will see this message every second');
callYourFunc();
}, null, true, 'America/Los_Angeles');
For more read this link

Perhaps you could spawn a worker thread and block it while it’s waiting to do the work, in the way suggested by CertainPerformance in the comments. It may not be the most elegant way to do it but at least you can put the blocking logic aside so that it doesn’t affect the rest of the application.
Check out the example in the docs if you’re unfamiliar with the cluster module: https://nodejs.org/docs/latest-v10.x/api/cluster.html

nodejs setTimout loop stopped after many weeks of iterations

Solved : It's a node bug. Happens after ~25 days (2^32 milliseconds), See answer for details.
Is there a maximum number of iterations of this cycle?
function do_every_x_seconds() {
//Do some basic things to get x
setTimeout( do_every_x_seconds, 1000 * x );
};
As I understand, this is considered a "best practice" way of getting things to run periodically, so I very much doubt it.
I'm running an express server on Ubuntu with a number of timeout loops .
One loop that runs every second and basically prints the timestamp.
One that calls an external http request every 5 seconds
and one that runs every X 30 to 300 seconds.
It all seemes to work well enough. However after 25 days without any usage, and several million iterations later, the node instance is still up, but all three of the setTimout loops have stopped. No error messages are reported at all.
Even stranger is that the Express server is still up, and I can load http sites which prints to the same console as where the periodic timestamp was being printed.
I'm not sure if its related, but I also run nodejs with the --expose-gc flag and perform periodic garbage collection and to monitor that memory is in acceptable ranges.
It is a development server, so I have left the instance up in case there is some advice on what I can do to look further into the issue.
Could it be that somehow the event-loop dropped all it's timers?

I have a similar problem with setInterval().
I think it may be caused by the following bug in Node.js, which seem to have been fixed recently: setInterval callback function unexpected halt #22149
Update: it seems the fix has been released in Node.js 10.9.0.

I think the problem is that you are relying on setTimeout to be active over days. setTimeout is great for periodic running of functions, but I don't think you should trust it over extended time periods. Consider this question: can setInterval drift over time? and one of its linked issues: setInterval interval includes duration of callback #7346.
If you need to have things happen intermittently at particular times, a better way to attack this would be to schedule cron tasks that perform the tasks instead. They are more resilient and failures are recorded at a system level in the journal rather than from within the node process.
A good related answer/question is Node.js setTimeout for 24 hours - any caveats? which mentions using the npm package cron to do task scheduling.

Cron-style scheduler that runs only once, but is run on multiple hosts

I have a node.js app that's going to run on multiple machines (potentially on a serverless environment).
I'd like to run something like:
setInterval(() => {
Scanner.process()
}, 1000*60)
The problem is, when this same code is scaled up and run on 5 machines, it'll trigger 5 times every minute, instead of just once.
I thought i could use some sort of Redis lock to make sure the function is run only once at that schedule, no matter how many machines run it.
Any ideas on how to best approach this?
P.S. I can't really rely on a hostname and make the code only run there

I'm assuming you want to solve this without using a central resource like a database to decide who is going to run the job.
You could have all servers run the job, kicking off the interval at random times...
setTimeout(() => {
setInterval(() => {
Scanner.process()
}, 1000 * 60 * num_machines)
}, 1000 * 60 * num_machines * Math.random())
Downside is you need to know how many other machines are running.
Not sure if there's a way to do this without some kind of knowledge of the other machines.
If the result of the job is available to all machines you could have them use that info to align their schedules. For example, if the last job start time is broadly available, each machine could defer execution of it's own schedule. I.e. as long as the job has been run, I won't run it myself until it looks like it hasn't been run in a while.

Run multiple copies of Speedy or PersistentPerl to be called from Tomcat

I have a modern webapp running under Tomcat, which often needs to call some legacy perl code to get some results. Right now, we wrap these in a call to Runtime.getRuntime().exec() which is working fine.
However, as the webapp gets busier we are noticing that often the perl is timing out and we need to control this.
I am using commons-pool to ensure that only X number of copies can be run at a time, and threads will queue up nicely for a perl instance when they need one, timing out after Y seconds and returning an error (this is fine, the client will just retry).
However we still have the problem that Perl takes a long time to start up, interpret the script, execute and return. At busy times we are doing this 30-50 times per second. It's a beefy machine but it's starting to struggle.
I have read up on Speedy and PersistentPerl and am considering holding open a copy of this in memory for each object in my pool, so that we do not need to open and close the Perl each time.
Is this a good idea? Any tips for how to go about doing this?

Those approaches should reduce the overhead from the start up time of your script. If the script is something that can be run as a CGI program then you might be better offer making it work with Plack and running it with a PSGI server. Your Tomcat application could collect and send the request parameters to your script and/or "web application" running in the background.

Independent server side processing in node

Is it possible, or even practical to create a node program (or sub program/loop) that executes independently of the connected clients.
So in my specific use case, I would like to make a mulitplayer game, where each turn a player preforms actions. And at the end of that turn those actions are computed. Is it possible to perform those computations at a specific time regardless of the client/players connecting?
I assume this involves the use of threads somewhere.
Possibly an easier solution would be to compute the outcome when it is observed, but this could cause difficulties if it has an influence in with other entities. But this problem has been a curiosity of mine for a while.

Well, basically, the easiest solution would probably to run the computation onto a cluster. This is spawning a thread who's running independent task and communicating with messages with the main thread.
If you wish however to run a completely separate process (I probably wouldn't, but it is an option), this can happen too. You then just need a communication protocol between the two process. Usually this would be handled by a messaging or a task queue system. A popular queue solving this issue is RabbitMQ.

If the computations each turn is not to heavy you could solve the issue with a simple setTimeout()
function turnCalculations(){
//do loads of stuff every 30 seconds
}
setTimout(turnCalculations,30000)
//normal node server stuff here
This would do the turn calculations every 30 seconds regardless of users connected, but if the calculations take to long they might block your server.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string