nodejs - every minute, on the minute - node.js

How can I wait for a specific system time before firing ?
I want to fire an event when seconds = 0, i.e. every minute
while (1==1) {
var date = new Date();
var sec = date.getSeconds();
if(sec===0) {
Do Something()
}
}

You shouldn't do that, because with this while you will have a blocking operation. Also, there are better things to do in any JavaScript platform, like using setInterval/setTimeout functions.
The node docs for them are here.
A quick example of how to achieve what you want:
setInterval(function() {
var date = new Date();
if ( date.getSeconds() === 0 ) {
DoSomething();
}
}, 1000);
For a more fine grained control over scheduled processes in Node, maybe you should checkout node-cron.

Related

How to perform recurring long running background tasks in an node.js web server

I'm working on a node.js web server using express.js that should offer a dashboard to monitor database servers.
The architecture is quite simple:
a gatherer retrieves the information in a predefined interval and stores the data
express.js listens to user requests and shows a dashboard based on the stored data
I'm now wondering how to best implement the gatherer to make sure that it does not block the main loop and the simplest solution seems be to just use a setTimeout based approach but I was wondering what the "proper way" to architecture this would be?
Your concern is your information-gathering step. It probably is not as CPU-intensive as it seems. Because it's a monitoring app, it probably gathers information by contacting other machines, something like this.
async function gather () {
const results = []
let result
result = await getOracleMetrics ('server1')
results.push(result)
result = await getMySQLMetrics ('server2')
results.push(result)
result = await getMySQLMetrics ('server3')
results.push(result)
await storeMetrics(results)
}
This is not a cpu-intensive function. (If you were doing a fast Fourier transform on an image, that would be a cpu-intensive function.)
It spends most of its time awaiting results, and then a little time storing them. Using async / await gives you the illusion it runs synchronously. But, each await yields the main loop to other things.
You might invoke it every minute something like this. The .then().catch() stuff invokes it asynchronously.
setInterval (
function go () {
gather()
.then()
.catch(console.error)
}, 1000 * 60 * 60)
If you do actually have some cpu-intensive computation to do, you have a few choices.
offload it to a worker thread.
break it up into short chunks, with sleeps between them.
sleep = function sleep (howLong) {
return new Promise(function (resolve) {
setTimeout(() => {resolve()}, howLong)
})
}
async function gather () {
for (let chunkNo = 0; chunkNo < 100; chunkNo++) {
doComputationChunk(chunkNo)
await sleep(1)
}
}
That sleep() function yields to the main loop by waiting for a timeout to expire.
None of this is debugged, sorry to say.
For recurring tasks I prefer to use node-scheduler and shedule the jobs on app start-up.
In case you don't want to run CPU-expensive tasks in the main-thread, you can always run the code below in a worker-thread in parallel instead of the main thread - see info here
Here are two examples, one with a recurrence rule and one with interval in minutes using a cron expression:
app.js
let mySheduler = require('./mysheduler.js');
mySheduler.sheduleRecurrence();
// And/Or
mySheduler.sheduleInterval();
mysheduler.js
/* INFO: Require node-schedule for starting jobs of sheduled-tasks */
var schedule = require('node-schedule');
/* INFO: Helper for constructing a cron-expression */
function getCronExpression(minutes) {
if (minutes < 60) {
return `*/${minutes} * * * *`;
}
else {
let hours = (minutes - minutes % 60) / 60;
let minutesRemainder = minutes % 60;
return `*/${minutesRemainder} */${hours} * * *`;
}
}
module.exports = {
sheduleRecurrence: () => {
// Schedule a job # 01:00 AM every day (Mo-Su)
var rule = new schedule.RecurrenceRule();
rule.hour = 01;
rule.minute = 00;
rule.second = 00;
rule.dayOfWeek = new schedule.Range(0,6);
var dailyJob = schedule.scheduleJob(rule, function(){
/* INFO: Put your database-ops or other routines here */
// ...
// ..
// .
});
// INFO: Verbose output to check if job was scheduled:
console.log(`JOB:\n${dailyJob}\n HAS BEEN SCHEDULED..`);
},
sheduleInterval: () => {
let intervalInMinutes = 60;
let cronExpressions = getCronExpression(intervalInMinutes);
// INFO: Define unique job-name in case you want to cancel it
let uniqueJobName = "myIntervalJob"; // should be unique
// INFO: Schedule the job
var job = schedule.scheduleJob(uniqueJobName,cronExpressions, function() {
/* INFO: Put your database-ops or other routines here */
// ...
// ..
// .
})
// INFO: Verbose output to check if job was scheduled:
console.log(`JOB:\n${job}\n HAS BEEN SCHEDULED..`);
}
}
In case you want to cancel a job, you can use its unique job-name:
function cancelCronJob(uniqueJobName) {
/* INFO: Get job-instance for canceling scheduled task/job */
let current_job = schedule.scheduledJobs[uniqueJobName];
if (!current_job || current_job == 'undefinded') {
/* INFO: Cron-job not found (already cancelled or unknown) */
console.log(`CRON JOB WITH UNIQUE NAME: '${uniqueJobName}' UNDEFINED OR ALREADY CANCELLED..`);
}
else {
/* INFO: Cron-job found and cancelled */
console.log(`CANCELLING CRON JOB WITH UNIQUE NAME: '${uniqueJobName}`)
current_job.cancel();
}
};
In my example the recurrence and the interval are hardcoded, obviously you can also pass the recurrence-rules or the interval as argument to the respective function..
As per your comment:
'When looking at the implementation of node-schedule it feels like a this layer on top of setTimeout..'
Actually, node-schedule is using long-timeout -> https://www.npmjs.com/package/long-timeout so you are right, it's basically a convenient layer on top of timeOuts

How can I handle pm2 cron jobs that run longer than the cron interval?

I have a cron job running on pm2 that sends notifications on a 5 second interval. Although it should never happen, I'm concerned that the script will take longer than 5 seconds to run. Basically, if the previous run takes 6 seconds, I don't want to start the next run until the first one finishes. Is there a way to handle this solely in pm2? Everything I've found says to use shell scripting to handle it, but it's not nearly as easy to replicate and move to new servers when needed.
As of now, I have the cron job just running in a never ending while loop (unless there's an error) that waits up to 5 seconds at the end. If it errors, it exits and reports the error, then restarts because it's running via pm2. I'm not too excited about this implementation though. Are there other options?
edit for clarification of my current logic -
function runScript() {
while (!err) {
// do stuff
wait(5 seconds - however long 'do stuff' took) // if it took 1 second to 'do stuff', then it waits 4 seconds
}
}
runScript()
This feels like a hacky way to get around the cron limits of pm2. It's possible that I'm just being paranoid... I just wanna make sure I'm not using antipatterns.
What do you mean you have the cron job running in a while loop? PM2 is starting a node process which contains a never-ending while loop that waits 5 seconds? Your implementation of a cron seems off to me, maybe you could provide more details.
Instead of a cron, I would use something like setTimeout method. Run your script using PM2 and in the script is a method like such:
function sendMsg() {
// do the work
setTimeout(sendMsg, 5000); // call sendMsg after waiting 5 seconds
}
sendMsg();
By doing it this way, your sendMsg function can take all the time it needs to run, and the next call will start 5 seconds after that. PM2 will restart your application if it crashes.
If you're looking to do it at specific 5 second intervals, but only when the method is not running, simply add a tracking variable to the equation, something like:
let doingWork = false;
function sendMsg() {
if (!doingWork) {
doingWork = true;
// do the work
doingWork = false;
}
}
setInterval(sendMsg, 5000); // call sendMsg every 5 seconds
You could replace setInterval with PM2 cron call on the script, but the variable idea remains the same.
To have exactly 5000 ms between the end your actions:
var myAsyncLongAction = function(cb){
// your long action here
return cb();
};
var fn = function(){
setTimeout(function(){
// your long action here
myAsyncLongAction(function(){
console.log(new Date().getTime());
setImmediate(fn);
});
}, 5000)
};
fn();
To have exactly 5000 ms between the start of your actions :
var myAsyncLongAction = function(cb){
// your long action here
setTimeout(function(){
return cb();
}, 1000);
};
var fn = function(basedelay, delay){
if(delay === undefined)
delay = basedelay;
setTimeout(function(){
// your long action here
var start = new Date().getTime();
myAsyncLongAction(function(){
var end = new Date().getTime();
var gap = end - start;
console.log("Action took "+(gap)+" ms, send next action in : "+(basedelay - gap)+" ms");
setImmediate(fn, basedelay, (gap < basedelay ? 1 : basedelay - gap));
});
}, delay);
};
fn(5000);

Queue up javascript code in a single process

Lets say I have a bunch of tasks in an object, each with a date object. I was wondering if it's even possible to have tasks within the object be run within a single process and trigger when the date is called.
Here's an example:
var tasks = [
"when": "1501121620",
"what": function(){
console.log("hello world");
},
"when": "1501121625",
"what": function(){
console.log("hello world x2");
},
]
I'm fine with having these stored within a database and the what script being evaled from a string. I need a point in the right direction. I've never seen anything like this in the node world.
I'm thinking about using hotload and using the file system so I don't need to deal with databases.
Should I just look into setInterval or is there something out there that is more sophisticated? I know things like cron exist, the thing is I need all of these tasks to occur within an already existing running process. I need to be able to add a new task to the queue without ending the process.
To add a little context I need some way of queuing up socket.io .emit() functions.
Do not reinvent the wheel. Use cron package from npm. He is written pure on js (using second variant from bellow). So all of these tasks will occur within an already existing running process. For example your can create CronJob like this:
var CronJob = require('cron').CronJob;
var job = new CronJob(1421110908157);
job.addCallback(function() { /* some stuff to do */ });
In pure javascript you can do it only through setTimeout and setInterval methods. There are two variants:
1) Set interval callback, which will check your task queue and execute callbacks in appropriate time:
setInterval(function() {
for (var i = 0; ii = tasks.length; ++i) {
var task = tasks[i];
if (task.when*1000 < Date.now()) {
task.what();
tasks.splice(i,1);
--i;
}
};
}, 1000);
As you see accuracy of callback calling time will be dependent on interval time. Less interval time => more accuracy, but also more CPU usage.
2) Create wrapper around your tasks. So when you want to add new task you're calling some method addTask, that will be calling setTimeout with your task callback. Beware that maximum time for setTimeout is 2147483647ms (around 25 days). So if your time exceeds max time, you must set timeout on the maximum time with callback which will be set new timeout with remaining time. For example:
var MAX_TIME = 2147483647;
function addTask(task) {
if (task.when*1000 < MAX_TIME) {
setTimeout(task.what, task.when);
}
else {
task.when -= MAX_TIME/1000;
setTimeout(addTask.bind(null, task), MAX_TIME);
}
}

How do I execute a piece of code no more than every X minutes?

Say I have a link aggregation app where users vote on links. I sort the links using hotness scores generated by an algorithm that runs whenever a link is voted on. However running it on every vote seems excessive. How do I limit it so that it runs no more than, say, every 5 minutes.
a) use cron job
b) keep track of the timestamp when the procedure was last run, and when the current timestamp - the timestamp you have stored > 5 minutes then run the procedure and update the timestamp.
var yourVoteStuff = function() {
...
setTimeout(yourVoteStuff, 5 * 60 * 1000);
};
yourVoteStuff();
Before asking why not to use setTimeinterval, well, read the comment below.
Why "why setTimeinterval" and no "why cron job?"?, am I that wrong?
First you build a receiver that receives all your links submissions.
Secondly, the receiver push()es each link (that has been received) to
a queue (I strongly recommend redis)
Moreover you have an aggregator which loops with a time interval of your desire. Within this loop each queued link should be poll()ed and continue to your business logic.
I have use this solution to a production level and I can tell you that scales well as it also performs.
Example of use;
var MIN = 5; // don't run aggregation for short queue, saves resources
var THROTTLE = 10; // aggregation/sec
var queue = [];
var bucket = [];
var interval = 1000; // 1sec
flow.on("submission", function(link) {
queue.push(link);
});
___aggregationLoop(interval);
function ___aggregationLoop(interval) {
setTimeout(function() {
bucket = [];
if(queue.length<=MIN) {
___aggregationLoop(100); // intensive
return;
}
for(var i=0; i<THROTTLE; ++i) {
(function(index) {
bucket.push(this);
}).call(queue.pop(), i);
}
___aggregationLoop(interval);
}, interval);
}
Cheers!

Node.js long poll logic help!

I m trying to implement a long polling strategy with node.js
What i want is when a request is made to node.js it will wait maximum 30 seconds for some data to become available. If there is data, it will output it and exit and if there is no data, it will just wait out 30 seconds max, and then exit.
here is the basic code logic i came up with -
var http = require('http');
var poll_function = function(req,res,counter)
{
if(counter > 30)
{
res.writeHeader(200,{'Content-Type':'text/html;charset=utf8'});
res.end('Output after 5 seconds!');
}
else
{
var rand = Math.random();
if(rand > 0.85)
{
res.writeHeader(200,{'Content-Type':'text/html;charset=utf8'});
res.end('Output done because rand: ' + rand + '! in counter: ' + counter);
}
}
setTimeout
(
function()
{
poll_function.apply(this,[req,res,counter+1]);
},
1000
);
};
http.createServer
(
function(req,res)
{
poll_function(req,res,1);
}
).listen(8088);
What i figure is, When a request is made the poll_function is called which calls itself after 1 second, via a setTimeout within itself. So, it should remain asynchronous means, it will not block other requests and will provide its output when its done.
I have used a Math.random() logic here to simulate data availability scenario at various interval.
Now, what i concern is -
1) Will there be any problem with it? - I simply don't wish to deploy it, without being sure it will not strike back!
2) Is it efficient? if not, any suggestion how can i improve it?
Thanks,
Anjan
All nodejs code is nonblocking as long as you don't get hunk in a tight CPU loop (like while(true)) or use a library that has blocking I/O. Putting a setTimeout at the end of a function doesn't make it any more parallel, it just defers some cpu work till a later event.
Here is a simple demo chat server that randomly emits "Hello World" every 0 to 60 seconds to and and all connection clients.
// A simple chat server using long-poll and timeout
var Http = require('http');
// Array of open callbacks listening for a result
var listeners = [];
Http.createServer(function (req, res) {
function onData(data) {
res.end(data);
}
listeners.push(onData);
// Set a timeout of 30 seconds
var timeout = setTimeout(function () {
// Remove our callback from the listeners array
listeners.splice(listeners.indexOf(onData), 1);
res.end("Timeout!");
}, 30000);
}).listen(8080);
console.log("Server listening on 8080");
function emitEvent(data) {
for (var i = 0; l = listeners.length; i < l; i++) {
listeners[i](data);
}
listeners.length = 0;
}
// Simulate random events
function randomEvents() {
emitData("Hello World");
setTimeout(RandomEvents, Math.random() * 60000);
}
setTimeout(RandomEvents, Math.random() * 60000);
This will be quite fast. The only dangerous part is the splice. Splice can be slow if the array gets very large. This can be made possibly more efficient by instead of closing the connection 30 seconds from when it started to closing all the handlers at once every 30 seconds or 30 seconds after the last event. But again, this is unlikely to be the bottleneck since each of those array items is backed by a real client connection that probably more expensive.

Resources