Minimum amount of execution time before proceeding - node.js

So I need to make my for loop wait AT LEAST 124ms before executing the next loop run, note that it can however take more than 124ms to complete the stuff inside the loop as I'm getting data from a website's API and that has to be received before moving on.
My idea was someting like this:
for(i = 0; i < 1000; i++)
{
var startTime = Date.now();
//Do some stuff
executeTimeCheck();
function executeTimeCheck()
{
setTimeout(executeTimeCheck, 1);
if(((Date.now()) - startTime) >= 124){return;}
else{executeTimeCheck(); return;}
}
}
The problem is that I keep running out of stack (RangeError: Maximum call stack size exceeded). If you have any idea how to make this work please let me know.

Related

Node.js setTimeout() behaviour

I want a piece of code to repeat 100 times with 1 sec of delay in between. This is my code:
for(var i = 0; i < 100; i++){
setTimeout(function(){
//do stuff
},1000);
}
While this seems correct to me it is not. Instead of running "do stuff" 100 times and waiting 1 sec in between what it does is wait 1 sec and then run "do stuff" 100 times with no delay.
Anybody has any idea about this?
You can accomplish it by using setInterval().
It calls function of our choice as long as clearTimeout is called to a variable timer which stores it.
See example below with comments: (and remember to open your developer console (in chrome right click -> inspect element -> console) to view console.log).
// Total count we have called doStuff()
var count = 0;
/**
* Method for calling doStuff() 100 times
*
*/
var timer = setInterval(function() {
// If count increased by one is smaller than 100, keep running and return
if(count++ < 100) {
return doStuff();
}
// mission complete, clear timeout
clearTimeout(timer);
}, 1000); // One second in milliseconds
/**
* Method for doing stuff
*
*/
function doStuff() {
console.log("doing stuff");
}
Here is also: jsfiddle example
As a bonus: Your original method won't work because you are basically assigning 100 setTimeout calls as fast as possible. So instead of them running with one second gaps. They will run as fast as the for loop is placing them to queue, starting after 1000 milliseconds of current time.
For instance, following code shows timestamps when your approach is used:
for(var i = 0; i < 100; i++){
setTimeout(function(){
// Current time in milliseconds
console.log(new Date().getTime());
},1000);
}
It will output something like (milliseconds):
1404911593267 (14 times called with this timestamp...)
1404911593268 (10 times called with this timestamp...)
1404911593269 (12 times called with this timestamp...)
1404911593270 (15 times called with this timestamp...)
1404911593271 (12 times called with this timestamp...)
You can see the behaviour also in: js fiddle
You need to use callback, node.js is asynchronous:
function call_1000times(callback) {
var i = 0,
function do_stuff() {
//do stuff
if (i < 1000) {
i = i + 1;
do_stuff();
} else {
callback(list);
}
}
do_stuff();
}
Or, more cleaner:
setInterval(function () {
//do stuff
}, 1000);
Now that you appreciate that the for loop is iterating in a matter of milliseconds, another way to do it would be to simply adjust the setTimeout delay according to the count.
for(var i = 0; i < 100; i++){
setTimeout(function(){
//do stuff
}, i * 1000);
}
For many use-cases, this could be seen as bad. But in particular circumstances where you know that you definitely want to run code x number of times after y number of seconds, it could be useful.
It's also worth noting there are some that believe using setInterval is bad practise.
I prefer the recursive function. Call the function initially with the value of counter = 0, and then within the function check to see that counter is less than 100. If so, do your stuff, then call setTimeout with another call to doStuff but with a value of counter + 1. The function will run exactly 100 times, once per second, then quit :
const doStuff = counter => {
if (counter < 100) {
// do some stuff
setTimeout(()=>doStuff(counter + 1), 1000)
}
return;
}
doStuff(0)

phantomJS scraping with breaks not working

I'm trying to scrape some URLS from a webservice, its working perfect but I need to scrape something like 10,000 pages from the same web servicve.
I do this by creating multiple phantomJS processes and they each open and evaluate a different URL (Its the same service, all I change is one parameter in the URL of the website).
Problem is I don't want to open 10,000 pages at once, since I don't want their service to crash, and I don't want my server to crash either.
I'm trying to make some logic of opening/evaluating/insertingToDB ~10 pages, and then sleeping for 1 minute or so.
Let's say this is what I have now:
var numOfRequests = 10,000; //Total requests
for (var dataIndex = 0; dataIndex < numOfRequests; dataIndex++) {
phantom.create({'port' : freeport}, function(ph) {
ph.createPage(function(page) {
page.open("http://..." + data[dataIncFirstPage], function(status) {
I want to insert somewhere in the middle something like:
if(dataIndex % 10 == 0){
sleep(60); //I can use the sleep module
}
Every where I try to place sleepJS the program crashes/freezes/loops forever...
Any idea what I should try?
I've tried placing the above code as the first line after the for loop, but this doesn't work (maybe because of the callback functions that are waiting to fire..)
If I place it inside the phantom.create() callback also doesn't work..
Realize that NodeJS runs asynchronously and in your for-loop, each method call is being executing one after the other. That phantom.create call finishes near immediately, and then the next cycle of the for-loop kicks in.
To answer your question, you want the sleep command at the end of the phantom.create block, still in side the for-loop. Like this:
var numOfRequests = 10000; // Total requests
for( var dataIndex = 0; dataIndex < numOfRequests; dataIndex++ ) {
phantom.create( { 'port' : freeport }, function( ph ) {
// ..whatever in here
} );
if(dataIndex % 10 == 0){
sleep(60); //I can use the sleep module
}
}
Also, consider using a package to help with these control flow issues. Async is a good one, and has a method, eachLimit that will concurrently run a number of processes, up to a limit. Handy! You will need to create an input object array for each iteration you wish to run, like this:
var dataInputs = [ { id: 0, data: "/abc"}, { id : 1, data : "/def"} ];
function processPhantom( dataItem, callback ){
console.log("Starting processing for " + JSON.stringify( dataItem ) );
phantom.create( { 'port' : freeport }, function( ph ) {
// ..whatever in here.
//When done, in inner-most callback, call:
//callback(null); //let the next parallel items into the queue
//or
//callback( new Error("Something went wrong") ); //break the processing
} );
}
async.eachLimit( dataInputs, 10, processPhantom, function( err ){
//Can check for err.
//It is here that everything is finished.
console.log("Finished with async.eachLimit");
});
Sleeping for a minute isn't a bad idea, but in groups of 10, that will take you 1000 minutes, which is over 16 hours! Would be more convenient for you to only call when there is space in your queue - and be sure to log what requests are in process, and have completed.

How do I execute a piece of code no more than every X minutes?

Say I have a link aggregation app where users vote on links. I sort the links using hotness scores generated by an algorithm that runs whenever a link is voted on. However running it on every vote seems excessive. How do I limit it so that it runs no more than, say, every 5 minutes.
a) use cron job
b) keep track of the timestamp when the procedure was last run, and when the current timestamp - the timestamp you have stored > 5 minutes then run the procedure and update the timestamp.
var yourVoteStuff = function() {
...
setTimeout(yourVoteStuff, 5 * 60 * 1000);
};
yourVoteStuff();
Before asking why not to use setTimeinterval, well, read the comment below.
Why "why setTimeinterval" and no "why cron job?"?, am I that wrong?
First you build a receiver that receives all your links submissions.
Secondly, the receiver push()es each link (that has been received) to
a queue (I strongly recommend redis)
Moreover you have an aggregator which loops with a time interval of your desire. Within this loop each queued link should be poll()ed and continue to your business logic.
I have use this solution to a production level and I can tell you that scales well as it also performs.
Example of use;
var MIN = 5; // don't run aggregation for short queue, saves resources
var THROTTLE = 10; // aggregation/sec
var queue = [];
var bucket = [];
var interval = 1000; // 1sec
flow.on("submission", function(link) {
queue.push(link);
});
___aggregationLoop(interval);
function ___aggregationLoop(interval) {
setTimeout(function() {
bucket = [];
if(queue.length<=MIN) {
___aggregationLoop(100); // intensive
return;
}
for(var i=0; i<THROTTLE; ++i) {
(function(index) {
bucket.push(this);
}).call(queue.pop(), i);
}
___aggregationLoop(interval);
}, interval);
}
Cheers!

How to setTimeout in node.js?

I need to be able to make retries in node.js in the event of failure inside a function. I've setup a while loop as shown below, but I am getting slightly confused about how I should wrap the function call to not make sure that it won't block my whole server.
What should I do?
while(retryCount < 10 && !success){
// Alternative one
while(new Date().getTime() < now + 1000) {
myFunction();
}
// Or:
setTimeout( myFunction(), 1000);
}
You can store number of tryes in function object. It's will be fine for cronjob. If you need same behaviour in request context you must store attempts counter in request scope (not in function object).
var fnc = function() {
console.log('try');
if (true) { // Error condition
// Error here
if (!fnc.tryes) fnc.tryes = 0;
fnc.tryes++;
console.log(fnc.tryes);
if (fnc.tryes <= 10) {
setTimeout(fnc, 1000);
} else {
fnc.tryes = 0;
}
// Something wrong
} else {
// We hame result
}
};
fnc();
I'd say use the setTimeout method, that way the client won't be stuck inside the while loop that checks the time.
That outer while loop is going to block, you'd have to refactor using only setTimeout. However, the fact that you want this sort of thing indicates to me that your code structure is really terrible and needs more reworking. What is it that you are retrying? How are you detecting an error condition? Does doing it 10 times really make the chances of success higher?
I have a gist containing a generic function that will do this sort of thing for you, but I'm reluctant to share if this is an XY problem.

Can I allow for "breaks" in a for loop with node.js?

I have a massive for loop and I want to allow I/O to continue while I'm processing. Maybe every 10,000 or so iterations. Any way for me to allow for additional I/O this way?
A massive for loop is just you blocking the entire server.
You have two options, either put the for loop in a new thread, or make it asynchronous.
var data = [];
var next = function(i) {
// do thing with data [i];
process.nextTick(next.bind(this, i + 1));
};
process.nextTick(next.bind(this, 0));
I don't recommend the latter. Your just implementing naive time splicing which the OS level process scheduler can do better then you.
var exec = require("child_process").exec
var s = exec("node " + filename, function (err, stdout, stderr) {
stdout.on("data", function() {
// handle data
});
});
Alternatively use something like hook.io to manage processes for you.
Actually you probably want to aggressively redesign your codebase if you have a blocking for loop.
Maybe something like this to break your loop into chunks...
Instead of:
for (var i=0; i<len; i++) {
doSomething(i);
}
Something like:
var i = 0, limit;
while (i < len) {
limit = (i+10000);
if (limit > len)
limit = len;
process.nextTick(function(){
for (; i<limit; i++) {
doSomething(i);
}
});
}
}
The nextTick() call gives a chance for other events to get in there, but it still does most looping synchronously which (I'm guessing) will be a lot faster than creating a new event for every iteration. And obviously, you can experiment with the number (10,000) till you get the results you want.
In theory, you could also use setTimeout() rather than nextTick(), in case it turns out that giving other processes a somewhat bigger "time-slice" helps. That gives you one more variable (the timeout milliseconds) that you can use for tuning.

Resources