How is setTimeout implemented in node.js - node.js

I was wondering if anybody knows how setTimeout is implemented in node.js. I believe I have read somewhere that this is not part of V8. I quickly tried to find the implementation, but could not find it in the source(BIG).I for example found this timers.js file, which then for example links to timer_wrap.cc. But these file do not completely answer all of my questions.
Does V8 have setTimeout implementation? I guess also from the source the answer is no.
How is setTimeout implemented? javascript or native or combination of both? From timers.js I assume something along the line of both:
var Timer = process.binding('timer_wrap').Timer;`
When adding multiple timers(setTimeout) how does node.js know which to execute first? Does it add all the timers to a collection(sorted)? If it is sorted then finding the timeout which needs to be executed is O(1) and O(log n) for insertion? But then again in timers.js I see them use a linkedlist?
But then again adding a lot of timers is not a problem at all?
When executing this script:
var x = new Array(1000),
len = x.length;
/**
* Returns a random integer between min and max
* Using Math.round() will give you a non-uniform distribution!
*/
function getRandomInt (min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
var y = 0;
for (var i = 0; i < len; i++) {
var randomTimeout = getRandomInt(1000, 10000);
console.log(i + ', ' + randomTimeout + ', ' + ++y);
setTimeout(function () {
console.log(arguments);
}, randomTimeout, randomTimeout, y);
}
you get a little bit of CPU usage but not that much?
I am wondering if I implement all these callbacks one by one in a sorted list if I will get better performance?

You've done most of the work already. V8 doesn't provides an implementation for setTimeout because it's not part of ECMAScript. The function you use is implemented in timers.js, which creates an instance of a Timeout object which is a wrapper around a C class.
There is a comment in the source describing how they are managing the timers.
// Because often many sockets will have the same idle timeout we will not
// use one timeout watcher per item. It is too much overhead. Instead
// we'll use a single watcher for all sockets with the same timeout value
// and a linked list. This technique is described in the libev manual:
// http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#Be_smart_about_timeouts
Which indicates it's using a double linked list which is #4 in the linked article.
If there is not one request, but many thousands (millions...), all
employing some kind of timeout with the same timeout value, then one
can do even better:
When starting the timeout, calculate the timeout value and put the
timeout at the end of the list.
Then use an ev_timer to fire when the timeout at the beginning of the
list is expected to fire (for example, using the technique #3).
When there is some activity, remove the timer from the list,
recalculate the timeout, append it to the end of the list again, and
make sure to update the ev_timer if it was taken from the beginning of
the list.
This way, one can manage an unlimited number of timeouts in O(1) time
for starting, stopping and updating the timers, at the expense of a
major complication, and having to use a constant timeout. The constant
timeout ensures that the list stays sorted.
Node.js is designed around async operations and setTimeout is an important part of that. I wouldn't try to get tricky, just use what they provide. Trust that it's fast enough until you've proven that in your specific case it's a bottleneck. Don't get stuck on premature optimization.
UPDATE
What happens is you've got essentially a dictionary of timeouts at the top level, so all 100ms timeouts are grouped together. Whenever a new timeout is added, or the oldest timeout triggers, it is appended to the list. This means that the oldest timeout, the one which will trigger the soonest, is at the beginning of the list. There is a single timer for this list, and it's set based on the time until the first item in the list is set to expire.
If you call setTimeout 1000 times each with the same timeout value, they will be appended to the list in the order you called setTimeout and no sorting is necessary. It's a very efficient setup.

No problem with many timers!
When uv loop call poll, it pass timeout argument to it with closest timer of all timers.
[closest timer of all timers]
https://github.com/joyent/node/blob/master/deps/uv/src/unix/timer.c #120
RB_MIN(uv__timers, &loop->timer_handles)
[pass timeout argument to poll api]
https://github.com/joyent/node/blob/master/deps/uv/src/unix/core.c #276
timeout = 0;
if ((mode & UV_RUN_NOWAIT) == 0)
timeout = uv_backend_timeout(loop);
uv__io_poll(loop, timeout);
Note: on Windows OS, it's almost same logic

Related

Node.Js Threadpool in Windows

So my understanding is, any blocking file system operation (such as fs.readFileSync) is eventually delegated to one of the thread in thread pool to keep event loop free. Now, I am running my application from windows and the command i am using, set UV_THREADPOOL_SIZE=4 & node index.js
My sample code below,
const start = new Date().getTime();
readFile();
readFile();
readFile();
readFile();
readFile();
function readFile() {
fs.readFileSync('./content/1.txt');
const end = new Date().getTime();
console.log('Time took: ', (end - start) / 1000);
}
Now no matter whether I set the thread pool size to one or four, the execution time remains almost same. FYI, there are two CPU cores in my PC. Therefore my expectation was, if I set thread pool size to four (or let the default settings work) out of my 5 function call to read the file, say first four takes x secs (I understand it won't be the exact time for both call but going to be very close), then the last one (x+n), where x & n are random numbers and represents time difference in seconds.
But that's not happening. Irrespective of the number of thread pool calls are taking same time to complete and getting completed one by one.
So, looks like my understanding about how node.js thread pool works is not right. Any help would be appreciated. Thanks.
The first issue is that you're using fs.readFileSync(). That means your file operations are only going to be requested one at a time. The 2nd one won't start until the first one is done. This has nothing to do with the thread pool. This is because you're using the blocking, synchronous version of readFile(). The JS interpreter will be blocked until the first fs.readFileSync() is done and the second one only gets to start after the first one is done and so on. So, because of that, in this case it won't really matter how many threads there are to serve the file system.
If you want to engage more than one thread in file operations, you need to use asynchronous file operations like fs.readFile() so you can have more than one file operation in flight at the same time and thus have more of an opportunity to use more than one thread.
Also, file operations on the same disk are not as scalable with multiple threads/CPUs as some other types of operations because the read/write heads can only be in one place at a time so even if you do change the code to successfully engage multiple threads or CPUs you can't get full parallel file access on the same drive due to the serialization of the read/write head position.
Here's an example of a test using the asynchronous fs.readFile():
const start = new Date().getTime();
let cntr = 0;
readFile(0);
readFile(1);
readFile(2);
readFile(3);
readFile(4);
function readFile(i) {
fs.readFile('./content/1.txt', function(err, data) {
if (err) {
console.log(err);
return;
}
const end = new Date().getTime();
console.log(`Time took: ${i} ${(end - start) / 1000}`)
if (++cntr === 5) {
console.log(`All Done. Total time: ${(end - start) / 1000)}`;
}
});
}
This test would likely be more meaningful if you read a different file (that wasn't already in the OS file cache) for each call to readFile(). As it is the 2-5 requests are likely just fetching data from memory in the OS file cache, not actually accessing the disk.

Concurrency between Meteor.setTimeout and Meteor.methods

In my Meteor application to implement a turnbased multiplayer game server, the clients receive the game state via publish/subscribe, and can call a Meteor method sendTurn to send turn data to the server (they cannot update the game state collection directly).
var endRound = function(gameRound) {
// check if gameRound has already ended /
// if round results have already been determined
// --> yes:
do nothing
// --> no:
// determine round results
// update collection
// create next gameRound
};
Meteor.methods({
sendTurn: function(turnParams) {
// find gameRound data
// validate turnParams against gameRound
// store turn (update "gameRound" collection object)
// have all clients sent in turns for this round?
// yes --> call "endRound"
// no --> wait for other clients to send turns
}
});
To implement a time limit, I want to wait for a certain time period (to give clients time to call sendTurn), and then determine the round result - but only if the round result has not already been determined in sendTurn.
How should I implement this time limit on the server?
My naive approach to implement this would be to call Meteor.setTimeout(endRound, <roundTimeLimit>).
Questions:
What about concurrency? I assume I should update collections synchronously (without callbacks) in sendTurn and endRound (?), but would this be enough to eliminate race conditions? (Reading the 4th comment on the accepted answer to this SO question about synchronous database operations also yielding, I doubt that)
In that regard, what does "per request" mean in the Meteor docs in my context (the function endRound called by a client method call and/or in server setTimeout)?
In Meteor, your server code runs in a single thread per request, not in the asynchronous callback style typical of Node.
In a multi-server / clustered environment, (how) would this work?
Great question, and it's trickier than it looks. First off I'd like to point out that I've implemented a solution to this exact problem in the following repos:
https://github.com/ldworkin/meteor-prisoners-dilemma
https://github.com/HarvardEconCS/turkserver-meteor
To summarize, the problem basically has the following properties:
Each client sends in some action on each round (you call this sendTurn)
When all clients have sent in their actions, run endRound
Each round has a timer that, if it expires, automatically runs endRound anyway
endRound must execute exactly once per round regardless of what clients do
Now, consider the properties of Meteor that we have to deal with:
Each client can have exactly one outstanding method to the server at a time (unless this.unblock() is called inside a method). Following methods wait for the first.
All timeout and database operations on the server can yield to other fibers
This means that whenever a method call goes through a yielding operation, values in Node or the database can change. This can lead to the following potential race conditions (these are just the ones I've fixed, but there may be others):
In a 2-player game, for example, two clients call sendTurn at exactly same time. Both call a yielding operation to store the turn data. Both methods then check whether 2 players have sent in their turns, finding the affirmative, and then endRound gets run twice.
A player calls sendTurn right as the round times out. In that case, endRound is called by both the timeout and the player's method, resulting running twice again.
Incorrect fixes to the above problems can result in starvation where endRound never gets called.
You can approach this problem in several ways, either synchronizing in Node or in the database.
Since only one Fiber can actually change values in Node at a time, if you don't call a yielding operation you are guaranteed to avoid possible race conditions. So you can cache things like the turn states in memory instead of in the database. However, this requires that the caching is done correctly and doesn't carry over to clustered environments.
Move the endRound code outside of the method call itself, using something else to trigger it. This is the approach I've taken which ensures that only the timer or the final player triggers the end of the round, not both (see here for an implementation using observeChanges).
In a clustered environment you will have to synchronize using only the database, probably with conditional update operations and atomic operators. Something like the following:
var currentVal;
while(true) {
currentVal = Foo.findOne(id).val; // yields
if( Foo.update({_id: id, val: currentVal}, {$inc: {val: 1}}) > 0 ) {
// Operation went as expected
// (your code here, e.g. endRound)
break;
}
else {
// Race condition detected, try again
}
}
The above approach is primitive and probably results in bad database performance under high loads; it also doesn't handle timers, but I'm sure with some thinking you can figure out how to extend it to work better.
You may also want to see this timers code for some other ideas. I'm going to extend it to the full setting that you described once I have some time.

How to wait for time interval?

I am busy with a node.js project communicating with an API which involves heavy use of a node library specific to that API. I have read (I think) all the existing questions about the kind of issues involved with pausing and their various solutions but still not sure how to apply a correct solution to my problem.
Simply put, I have a function I call multiple times from the API library and need to ensure they have all completed before continuing. Up to now I have managed to use the excellent caolan/async library to handle my sync/async needs but hit a block with this specific function from the API library.
The function is hideously complicated as it involves https and SOAP calling/parsing so I am trying to avoid re-writing it to behave with caolan/async, in fact I am not even sure at this stage why it is not well behaved.
It is an async function that I need to call multiple times and then wait until all the calls have completed. I have tried numerous ways of of using callbacks and even promises (q library) but just cannot get it to work as expected and as I have successfully done with the other async API functions.
Out of desperation I am hoping for a kludgy solution where I can just wait for say 5 seconds at a point in my program while all existing async functions complete but no further progress is made until 5 seconds have passed. So I want a non-blocking pause of 5 seconds if that is even possible.
I could probably do this using fibres but really hoping for another solution before I go down that route.
one simple solution to your problem would be to increment a counter every time you call your function. Then at the end of the callback have it emit an event. listen to that event and each time it's triggered increment a separate counter. When the two counters are equal you can move on.
This would look something like this
var function_call_counter = 0;
var function_complete_counter = 0;
var self = this;
for(var i = 0; i < time_to_call; i++){
function_call_counter++;
api_call(function(){
self.emit('api_called');
});
}
this.on('api_called', function(){
function_complete_counter++;
});
var id = setInterval(function(){
if(function_call_counter == function_complete_counter){
move_on();
clearInterval(id); // This stops the checking
}
}, 5000 ); // every 5 sec check to see if you can move on
Promises should work also they just might take some finessing. You mentioned q but you may want to check out promises A+

In NodeJS: is it possible for two callbacks to be executed exactly at the same time?

Let's say I have this code:
function fn(n)
{
return function()
{
for(var k = 0; k <= 1000; ++k) {
fs.writeSync(process.stdout.fd, n+"\n");
}
}
}
setTimeout(fn(1), 100);
setTimeout(fn(2), 100);
Is it possible that 1 and 2 will be printed to stdout interchangeably (e.g. 12121212121...)?
I've tested this and they did NOT apper interchangeably, i.e. 1111111...222222222..., but few tests are far from proof and I'm worried that something like 111111211111...2222222... could happen.
In other words: when I register some callbacks and event handlers in Node can two callbacks be executed exactly at the same time?
(I know this could be possible with launching two processes, but then we would have two stdout and the above code would be splitted into separate files, etc.)
Another question: Forgetting the Node and speaking generally: in any language on single process is it possible for two functions to be executed at exactly the same time (i.e. in the same manner as above)?
No, every callback will be executed in its own "execution frame". In other languages "parallel execution" and potential conflicts as locks caused by that are possible if operations occur in different threads.
As long as the callback code is purely sync than no two functions can execute parallel.
Start using some asynchornish things inside, like getting a network result or inserting to a database, tadam: you will have parallelism issues.

Why does node.js handle setTimeout(func, 1.0) incorrectly?

While working on a timing sensitive project, I used the code below to test the granularity of timing events available, first on my desktop machine in Firefox, then as node.js code on my Linux server. The Firefox run produced predictable results, averaging 200 fps on a 1ms timeout and indicating I had timing events with 5ms granularity.
Now I know that if I used a timeout value of 0, the Chrome V8 engine Node.js is built on would not actually delegate the timeout to an event but process it immediately. As expected, the numbers averaged 60,000 fps, clearly processing constantly at CPU capacity (and verified with top). But with a 1ms timeout the numbers were still around 3.5-4 thousand cycle()'s per second, meaning Node.js cannot possibly be respecting the 1ms timeout which would create a theoretical maximum of 1 thousand cycle()'s per second.
Playing with a range of numbers, I get:
2ms: ~100 fps (true timeout, indicating 10ms granularity of timing events on Linux)
1.5: same
1.0001: same
1.0: 3,500 - 4,500 fps
0.99: 2,800 - 3,600 fps
0.5: 1,100 - 2,800 fps
0.0001: 1,800 - 3,300 fps
0.0: ~60,000 fps
The behavior of setTimeout(func, 0) seems excusable, because the ECMAScript specification presumably makes no promise of setTimout delegating the call to an actual OS-level interrupt. But the result for anything 0 < x <= 1.0 is clearly ridiculous. I gave an explicit amount of time to delay, and the theoretical minimum time for n calls on x delay should be (n-1)*x. What the heck is V8/Node.js doing?
var timer, counter = 0, time = new Date().getTime();
function cycle() {
counter++;
var curT = new Date().getTime();
if(curT - time > 1000) {
console.log(counter+" fps");
time += 1000;
counter = 0;
}
timer = setTimeout(cycle, 1);
}
function stop() {
clearTimeout(timer);
}
setTimeout(stop, 10000);
cycle();
From the node.js api docs for setTimeout(cb, ms) (emphasis mine):
It is important to note that your callback will probably not be called in exactly delay milliseconds - Node.js makes no guarantees about the exact timing of when the callback will fire, nor of the ordering things will fire in. The callback will be called as close as possible to the time specified.
I suppose that "as close as possible" means something different to the implementation team than to you.
[Edit] Incidentally, it appears that the setTimeout() function isn't mandated by any specification (although apparently part of the HTML5 draft). Moreover, there appears to be a 4-10ms de-facto minimum level of granularity, so this appears to be "just how it is".
The great thing about open source software is that you can contribute a patch to include a higher resolution per your needs!
For completeness I would like to point out to the nodeJS implementation:
https://github.com/nodejs/node-v0.x-archive/blob/master/lib/timers.js#L214
Which is:
// Timeout values > TIMEOUT_MAX are set to 1.
var TIMEOUT_MAX = 2147483647; // 2^31-1
...
exports.setTimeout = function(callback, after) {
var timer;
after *= 1; // coalesce to number or NaN
if (!(after >= 1 && after <= TIMEOUT_MAX)) {
after = 1; // schedule on next tick, follows browser behaviour
}
timer = new Timeout(after);
...
}
Remember this statement:
IDLE TIMEOUTS
Because often many sockets will have the same idle timeout we will not use one timeout watcher per item. It is too much overhead.
Instead we'll use a single watcher for all sockets with the same timeout value and a linked list.
This technique is described in the libev manual:
http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#Be_smart_about_timeouts
And we pass the same timeout value (1) here.
The implementation for Timer is here:
https://github.com/nodejs/node-v0.x-archive/blob/master/src/timer_wrap.cc

Resources