Creating scheduled jobs (polling) with NodeJS

Creating scheduled jobs (polling) with NodeJS - node.js

in my NodeJS app i need to send requests every 2-3 seconds to the third-party service. I have database with objects that contains URL to request and when response is coming i link this response with my object.
Now it's like:
// Getting objects from DB and calling ask function
objectsFromDB.find(function(err, data){
if(!err){
for (var i = 0; i < data.length; i++) {
var object = data[i];
// Calling ask function
startAsking(object);
}
}
});
// Start asking objects ...
function startAsking(object){
var intervalId = setInterval(function(){
console.log("Asking " + object.name);
// ...
// Processing and linking response with object
}, config.INTERVAL);
arrayOfIntervals.push(intervalId);
};
So, now i need to stop the job for one of the objects. How can i do this ?
I see that i can save intervalId but what happens if this intervalId is not mutch with the object ?
Also i see many libraries like:
Agenda
nschedule
node-cron
But i think that all of this libraries is oriented to schedule jobs whit large interval, and i don't know how to stop one of the jobs.

Related

setTimeout or child_process.spawn?

I have a REST service in Node.js with one specific request running a bunch of DB commands and other file processing that could take 10-15 seconds to run. Since I didn't want to hold up my browser request thread, I wrote a separate .js script to do the needful, called the script using child_process.spawn() in my Node.js code and immediately returned OK back to the client. This works fine, but then so does calling the same script (as a local function) by just using a simple setTimeout.
router.post("/longRequest", function(req, res) {
console.log("Started long request with id: " + req.body.id);
var longRunningFunction = function() {
// Usually runs a bunch of things that take time.
// Simulating a 10 sec delay for sample code.
setTimeout(function() {
console.log("Done processing for 10 seconds")
}, 10000);
}
// Below line used to be
// child_process.spawn('longRunningFunction.js'
setTimeout(longRunningFunction, 0);
res.json({status: "OK"})
})
So, this works for my purpose. But what's the downside ? I probably can't monitor the offline process easily as child_process.spawn which would give me a process id. But, does this cause problems in the long run ? Will it hold up Node.js processing if the 10 second processing increases to a lot more in the future ?
The actual longRunningFunction is something that reads an Excel file, parses it and does a bulk load using tedious to a MS SQL Server.
var XLSX = require('xlsx');
var FileAPI = require('file-api'), File = FileAPI.File, FileList = FileAPI.FileList, FileReader = FileAPI.FileReader;
var Connection = require('tedious').Connection;
var Request = require('tedious').Request;
var TYPES = require('tedious').TYPES;
var importFile = function() {
var file = new File(fileName);
if (file) {
var reader = new FileReader();
reader.onload = function (evt) {
var data = evt.target.result;
var workbook = XLSX.read(data, {type: 'binary'});
var ws = workbook.Sheets[workbook.SheetNames[0]];
var headerNames = XLSX.utils.sheet_to_json( ws, { header: 1 })[0];
var data = XLSX.utils.sheet_to_json(ws);
var bulkLoad = connection.newBulkLoad(tableName, function (error, rowCount) {
if (error) {
console.log("bulk upload error: " + error);
} else {
console.log('inserted %d rows', rowCount);
}
connection.close();
});
// setup your columns - always indicate whether the column is nullable
Object.keys(columnsAndDataTypes).forEach(function(columnName) {
bulkLoad.addColumn(columnName, columnsAndDataTypes[columnName].dataType, { length: columnsAndDataTypes[columnName].len, nullable: true });
})
data.forEach(function(row) {
var addRow = {}
Object.keys(columnsAndDataTypes).forEach(function(columnName) {
addRow[columnName] = row[columnName];
})
bulkLoad.addRow(addRow);
})
// execute
connection.execBulkLoad(bulkLoad);
};
reader.readAsBinaryString(file);
} else {
console.log("No file!!");
}
};

So, this works for my purpose. But what's the downside ?
If you actually have a long running task capable of blocking the event loop, then putting it on a setTimeout() is not stopping it from blocking the event loop at all. That's the downside. It's just moving the event loop blocking from right now until the next tick of the event loop. The event loop will be blocked the same amount of time either way.
If you just did res.json({status: "OK"}) before running your code, you'd get the exact same result.
If your long running code (which you describe as file and database operations) is actually blocking the event loop and it is properly written using async I/O operations, then the only way to stop blocking the event loop is to move that CPU-consuming work out of the node.js thread.
That is typically done by clustering, moving the work to worker processes or moving the work to some other server. You have to have this work done by another process or another server in order to get it out of the way of the event loop. A setTimeout() by itself won't accomplish that.
child_process.spawn() will accomplish that. So, if you have an actual event loop blocking problem to solve and the I/O is already as async optimized as possible, then moving it to a worker process is a typical node.js solution. You can communicate with that child process in a number of ways, but one possibility would be via stdin and stdout.

How to control serial and parallel control flow with mapped functions?

I've drawn a simple flow chart, which basically crawls some data from internet and loads them into the database. So far, I had thought I was peaceful with promises, however now I have an issue that I'm working for at least three days without a simple step.
Here is the flow chart:
Consider there is a static string array like so: const courseCodes = ["ATA, "AKM", "BLG",... ].
I have a fetch function, it basically does a HTTP request followed by parsing. Afterwards it returns some object array.
fetch works perfectly with invoking its callback with that expected object array, it even worked with Promises, which was way greater and tidy.
fetch function should be invoked with every element in the courseCodes array as its parameter. This task should be performed in parallel execution, since those seperate fetch functions do not affect each other.
As a result, there should be a results array in callback (or Promises resolve parameter), which includes array of array of objects. With those results, I should invoke my loadCourse with those objects in the results array as its parameter. Those tasks should be performed in serial execution, because it basically queries database if similar object exists, adds it if it's not.
How can perform this kind of tasks in node.js? I could not maintain the asynchronous flow in such a scenario like this. I've failed with caolan/async library and bluebird & q promise libraries.

Try something like this, if you are able to understand this:
const courseCodes = ["ATA, "AKM", "BLG",... ]
//stores the tasks to be performed.
var parallelTasks = [];
var serialTasks = [];
//keeps track of courses fetched & results.
var courseFetchCount = 0;
var results = {};
//your fetch function.
fetch(course_code){
//your code to fetch & parse.
//store result for each course in results object
results[course_code] = 'whatever result comes from your fetch & parse code...';
}
//your load function.
function loadCourse(results) {
for(var index in results) {
var result = results[index]; //result for single course;
var task = (
function(result) {
return function() {
saveToDB(result);
}
}
)(result);
serialTasks.push(task);
}
//execute serial tasks for saving results to database or whatever.
var firstSerialTask = serialTasks.shift();
nextInSerial(null, firstSerialTask);
}
//pseudo function to save a result to database.
function saveToDB(result) {
//your code to store in db here.
}
//checks if fetch() is complete for all course codes in your array
//and then starts the serial tasks for saving results to database.
function CheckIfAllCoursesFetched() {
courseFetchCount++;
if(courseFetchCount == courseCodes.length) {
//now process courses serially
loadCourse(results);
}
}
//helper function that executes tasks in serial fashion.
function nextInSerial(err, result) {
if(err) throw Error(err.message);
var nextSerialTask = serialTasks.shift();
nextSerialTask(result);
}
//start executing parallel tasks for fetching.
for(var index in courseCode) {
var course_code = courseCode[index];
var task = (
function(course_code) {
return function() {
fetch(course_code);
CheckIfAllCoursesFetched();
}
}
)(course_code);
parallelTasks.push(task);
for(var task_index in parallelTasks) {
parallelTasks[task_index]();
}
}
Or you may refer to nimble npm module.

Making requests with node.js in a loop performance

I'm trying to benchmark a Node.js express app with the following using the request library:
var request = require('request');
var totalRequests = 100000;
for(var i = 0; i < totalRequests; i++) {
(function(i) {
request('http://localhost:3000/', function(error, response, body) {
console.info('Request ' + (i + 1));
});
})(i);
}
When I run it, I don't see the console.info() request callback for requests for over 40 seconds, then they start. Should'nt I see the requests firing right away?

40 seconds may be the amount of time it takes to prepare 100,000 requests. Since you're looping synchronously, your callbacks can't get called until after all of the requests have been initiated.
I suggest a library like async if you intended to make some or all of the requests in series rather than in parallel.

Store settimeout id from nodejs in mongodb

I am running a web application using express and nodejs. I have a request to a particular endpoint in which I use settimeout to call a particular function repeatedly after varying time intervals.
For example
router.get ("/playback", function(req, res) {
// Define callback here ...
....
var timeoutone = settimeout(callback, 1000);
var timeouttwo = settimeout(callback, 2000);
var timeoutthree = settimeout(callback, 3000);
});
The settimeout function returns an object with a circular reference. When trying to save this into mongodb i get a stack_overflow error. My aim is to be able to save these objects returned by settimeout into the database.
I have another endpoint called cancel playback which when called, will retrieve these timeout objects and call cleartimeout passing them in as an argument. How do I go about saving these timeout objects to the database ? Or is there a better way of clearing the timeouts than having to save them to the database. Thanks in advance for any help provided.

You cannot save live JavaScript objects in the database! Maybe you can store a string or JSON or similar reference to them, but not the actual object, and you cannot reload them later.
Edit: Also, I've just noticed you're using setTimeout for repeating stuff. If you need to repeat it on regular intervals, why not use setInterval instead?
Here is a simple solution, that would keep indexes in memory:
var timeouts = {};
var index = 0;
// route to set the timeout somewhere
router.get('/playback', function(req, res) {
timeouts['timeout-' + index] = setTimeout(ccb, 1000);
storeIndexValueSomewhere(index)
.then(function(){
res.json({timeoutIndex: index});
index++;
});
}
// another route that gets timeout indexes from that mongodb somehow
req.get('/playback/indexes', handler);
// finally a delete route
router.delete('/playback/:index', function(req, res) {
var index = 'timeout-' + req.params.index;
if (!timeouts[index]) {
return res.status(404).json({message: 'No job with that index'});
} else {
timeouts[index].cancelTimeout();
timeouts[index] = undefined;
return res.json({message: 'Removed job'});
}
});
But this probably would not scale to many millions of jobs.
A more complex solution, and perhaps more appropriate to your needs (depends on your playback job type) could involve job brokers or message queues, clusters and workers that subscribe to something they can listen to for their own job cancel signals etc.
I hope this helps you a little to clear up your requirements.

ExpressJS and matching delayed responses with original requests

I am looking for a design pattern than can match a stream of answers with the original request and response objects.
Suppose I receive a web request for all dog pictures. I want to submit that request to a message queue so that a worker process can eventually handle it. When a worker machine grabs the dog picture request, it performs the work and submits the response to an answer queue which is being monitored by Express. As I process the incoming queue, I want to match up the dog picture response with the original request and response objects so I can return the dog list or process it further.
Two solutions occur to me, but each seems inelegant. I could keep a global reference to the original context, find it, then delete it from the global list.
Or I could create a subscription to the response queue and look for my answer among all the answers. This would work, but is brutally inefficient and its complexity rises geometrically. (10x10, 100x100, 1000x1000)

var express = require('express');
var app = express();
app.get('/doglist.txt', function(req, res){
putReqIntoQueue(req,res,"dogs");
});
var theRequests ={};
var i = 0;
var giveUpSecs = 60;
var putReqIntoQueue = function(req,res,payload) {
var index = 'index_'+i;
i++
var obj = {req:req,res:res,payload:payload,index:index}
theReqests[index] = obj;
var timeoutId = setTimeout(function(theIndex) {
theRequest[theIndex].res.send('timeout error');
delete theRequest[theIndex];
}(index),giveUpSecs*1000);
// insertIntoQueue(index,payload,timeoutId)
}
var onNewQueueResponse = function(index,payload,answer,timeoutId) {
clearTimeout(timeoutId);
if (index in theRequests) {
var obj = theRequests[index];
obj.res.send(payload);
delete theRequests[index];
} else {
// must have already timed out
}
}
// Queue("onNewMessage",onNewQueueResponse)
app.listen(3000);
console.log('Listening on port 3000');
This answer assumes some kind of queue that accepts work (insertIntoQueue) and then returns data when it is done through "onNewMessage" event. It times out after 60 seconds.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Creating scheduled jobs (polling) with NodeJS - node.js

Related

setTimeout or child_process.spawn?

How to control serial and parallel control flow with mapped functions?

Making requests with node.js in a loop performance

Store settimeout id from nodejs in mongodb

ExpressJS and matching delayed responses with original requests

Categories

Resources