Why is this node setInterval running slow? - node.js

The following code uses setInterval to increment tick once every 16ms and logs out the number of ticks per second. I'm expecting this to report ~60 ticks per second (as 1000ms / 16ms = ~60) but instead I'm seeing about 35-40. This is on a powerful windows desktop under minimal load (node v14.5.4). Running the same code in the browser produces the expected results.
let lastTick = 0;
let tick = 0;
setInterval(()=>{
tick++;
}, 16);
setInterval(()=>{
console.log("ticks per second: " + (tick-lastTick));
lastTick = tick;
}, 1000);
What am I missing?
Update
So running with node --inspect and then profiling the javascript boosts the values up 60 per second, but only while the profiler is running. Must be some sort of strange CPU throttling on low CPU processes, but I can't find any cause.

I cannot prove this, but anecdotally Chrome seem to perform reasonably reliably for intervals in the tens of ms range because it is performing more work (rendering pages, reading user input etc) and so doesn't fall foul of CPU Scheduling and missing the 'correct' time window for the setInterval tick. Node which isn't doing very much beyond running the code provided regularly gets de-scheduled and therefore misses it's interval.
This theory is supported by the fact that increasing the work that node is doing (ie running the profiler or adding more unrelated work between ticks) improves the frequency of the setInterval ticks.
As node has no reliable timing function in the ms range (aside from running a tight loop with setImmediate) I've offloaded the timing work to another system.

Related

Random slowdowns in node.js execution

I have an optimization algorithm written in node.js that uses cpu time (measured with performance.now()) as a heuristic.
However, I noticed that occasionally some trivial lines of code would cost much more than usual.
So I wrote a test program:
const timings = [];
while (true) {
const start = performance.now();
// can add any trivial line of code here, or just nothing
const end = performance.now();
const dur = end - start;
if (dur > 1) {
throw [
"dur > 1",
{
start,
end,
dur,
timings,
avg: _.mean(timings),
max: _.max(timings),
min: _.min(timings),
last: timings.slice(-10),
},
];
}
timings.push(dur);
}
The measurements showed an average of 0.00003ms and a peak >1ms (with the second highest <1ms but same order of magnitude).
The possible reasons I can think of are:
the average timing isn't the actual time for executing the code (some compiler optimization)
performance.now isn't accurate somehow
cpu scheduling related - process wasn't running normally but still counted in performance.now
occasionally node is doing something extra behind the scenes (GC etc)
something happening on the hardware/os level - caching / page faults etc
Is any of these a likely reason, or is it something else?
Whichever the cause is, is there a way to make a more accurate measurement for the algorithm to use?
The outliers are current causing the algorithm to misbehave & without knowing how to resolve this issue the best option is to use the moving average cost as a heuristic but has its downsides.
Thanks in advance!
------- Edit
I appreciate how performance.now() will never be accurate, but was a bit surprised that it could span 3-4 orders of magnitude (as opposed to 2 orders of magnitude or ideally 1.)
Would anyone have any idea/pointers as to how performance.now() works and thus what's likely the major contributor to the error range?
It'd be nice to know if the cause is due to something node/v8 doesn't have control over (hardware/os level) vs something it does have control over (a node bug/options/gc related), so I can decide whether there's a way to reduce the error range before considering other tradeoffs with using an alternative heuristic.
------- Edit 2
Thanks to #jfriend00 I now realize performance.now() doesn't measure the actual CPU time the node process executed, but just the time since when the process started.
The question now is
if there's an existing way to get actual CPU time
is this a feature request for node/v8
unless the node process doesn't have enough information from the OS to provide this
You're unlikely to be able to accurately measure the time for one trivial line of code. In fact, the overhead in executing performance.now() is probably many times higher than the time to execute one trivial line of code. You have to be careful that what you're measuring takes substantially longer to execute than the uncertainty or overhead of the measurement itself. Measuring very small executions times is not going to be an accurate endeavor.
1,3 and 5 in your list are also all possibilities. You aren't guaranteed that your code gets a dedicated CPU core that is never interrupted to service some other thread in the system. In my Windows system, even when my nodejs is the only "app" running, there are hundreds of other threads devoted to various OS services that may or may not request some time to run while my nodejs app is running and eventually get some time slice of the CPU core my nodejs app was using.
And, as best I know, performance.now() is just getting a high resolution timer from the OS that's relative to some epoch time. It has no idea when your thread is and isn't running on a CPU core and wouldn't have any way to adjust for that. It just gets a high resolution timestamp which you can compare to some other high resolution timestamp. The time elapsed is not CPU time for your thread. It's just clock time elapsed.
Is any of these a likely reason, or is it something else?
Yes, they all sound likely.
is there a way to make a more accurate measurement for the algorithm to use?
No, sub-millisecond time measurements are generally not reliable, and almost never a good idea. (Doesn't matter whether a timing API promises micro/nanosecond precision or whatever; chances are that (1) it doesn't hold up in practice, and (2) trying to rely on it creates more problems than it solves. You've just found an example of that.)
Even measuring milliseconds is fraught with peril. I once investigated a case of surprising performance, where it turned out that on that particular combination of hardware and OS, after 16ms of full load the CPU ~tripled its clock rate, which of course had nothing to do with the code that appeared to behave weirdly.
EDIT to reply to edited question:
The question now is
if there's an existing way to get actual CPU time
No.
is this a feature request for node/v8
No, because...
unless the node process doesn't have enough information from the OS to provide this
...yes.

Executing a function periodically on accurate and precise intervals

I want to implement an accurate and precise countdown timer for my application. I started with the most simple implementation, which was not accurate at all.
loop {
// Code which can take upto 10 ms to finish
...
let interval = std::time::Duration::from_millis(1000);
std::thread::sleep(interval);
}
As the code before the sleep call can take some time to finish, I cannot run the next iteration at the intended interval. Even worse, if the countdown timer is run for 2 minutes, the 10 milliseconds from each iteration add up to 1.2 seconds. So, this version is not very accurate.
I can account for this delay by measuring how much time this code takes to execute.
loop {
let start = std::time::Instant::now();
// Code which can take upto 10 ms to finish
...
let interval = std::time::Duration::from_millis(1000);
std::thread::sleep(interval - start.elapsed());
}
Even though this seems to precise up to milliseconds, I wanted to know if there is a way to implement this which is even more accurate and precise and/or how it is usually done in software.
For precise timing, you basically have to busy wait: while time.elapsed() < interval {}. This is also called "spinning" (you might have heard of "spin lock"). Of course, this is far more CPU intensive than using the OS-provided sleep functionality (which often transitions the CPU in some low power mode).
To improve upon that slightly, instead of doing absolutely nothing in the loop body, you could:
Call thread::yield_now().
Call std::hint::spin_loop()
Unfortunately, I can't really tell you what timing guarantees these two functions give you. But from the documentation it seems like spin_loop will result in more precise timing.
Also, you very likely want to combine the "spin waiting" with std::thread::sleep so that you sleep the majority of the time with the latter method. That saves a lot of power/CPU-resources. And hey, there is even a crate for exactly that: spin_sleep. You should probably just use that.
Finally, just in case you are not aware: for several use cases of these "timings", there are other functions you can use. For example, if you want to render a frame every 60th of a second, you want to use some API that synchronizes your loop with the refresh rate/v-blanking of the monitor directly, instead of manually sleeping.

Handling large number of outbound HTTP requests

I am building a feed reader application were I expect to have a large number of sources. I would request new data from each source in a given time interval (e.g., hourly) and then cache the response on my server. I am assuming requesting data from all sources at the same time is not the most optimal solution, as I will probably experience network congestion (I am curious to know if there would be any other bottlenecks too).
What would be an efficient way to perform such a large number of requests?
Thanks
Since, there's no urgency to any given request and you just want to make sure you hit them each periodically, you can just space all the requests out in time.
For example, if you have N sources and you want to hit each one once an hour, you can just create a list of all the sources, and keep track of an index for which source is next. Then, calculate how far apart you can make each request and still get through them all in an hour.
So, if you had N requests to process once an hour:
let listOfSources = [...];
let nextSourceIndex = 0;
const cycleTime = 1000 * 60 * 60; // an hour in ms
const delta = Math.round(cycleTime / listOfSources.length);
// create interval timer that cycles through the sources
setInterval(() => {
let index = nextSourceIndex++;
if (index >= listOfSources.length) {
// wrap back to start
index = 0;
nextSourceIndex = 1;
}
processNextSource(listOfSources[index]);
}, delta);
function processNextSource(item) {
// process this source
}
Note, if you have a lot of sources and it takes a little while to process each one, you may still have more than one source "in flight" at the same time, but that should be OK.
If the processing was really CPU or network heavy, you would have to keep an eye on whether you're getting bogged down and can't get through all the sources in an hour. If that was the case, depending upon the bottleneck issue, you may need either more bandwidth, faster storage or more CPUs applied to the project (perhaps using worker threads or child processes).
If the number of sources is dynamic or the time to process each is dynamic and you're anywhere near your processing limits, you could make this system adaptable so that if it was getting overly busy, it would just automatically space things out more than once an hour or vice versa, if things were not so busy it could visit them more frequently. This would require keeping track of some stats and calculating a new cycleTime variable and adjusting the timer each time through the cycle.
There are different types of approaches to. A common procedure when you have a large number of asynchronous operations to get through is to process them in a way that N of them are in-flight at any given time (where N is a relatively small number such as 3 to 10). This generally avoids overloading any local resources (such as memory usage, sockets in flight, bandwidth, etc...) while still allowing you to do some parallelism in the network aspect of things. This would be the type of approach you might use if you want to get through all of them as fast as possible without overwhelming local resources whereas the previous discussion is more about spacing them out in time.
Here's an implementation of a function called mapConcurrent() that iterates an array asynchronously with no more than N requests in flight at the same time. And, here's a function called rateMap() that is even more advanced in what type of concurrency controls it supports.

nodejs setTimout loop stopped after many weeks of iterations

Solved : It's a node bug. Happens after ~25 days (2^32 milliseconds), See answer for details.
Is there a maximum number of iterations of this cycle?
function do_every_x_seconds() {
//Do some basic things to get x
setTimeout( do_every_x_seconds, 1000 * x );
};
As I understand, this is considered a "best practice" way of getting things to run periodically, so I very much doubt it.
I'm running an express server on Ubuntu with a number of timeout loops .
One loop that runs every second and basically prints the timestamp.
One that calls an external http request every 5 seconds
and one that runs every X 30 to 300 seconds.
It all seemes to work well enough. However after 25 days without any usage, and several million iterations later, the node instance is still up, but all three of the setTimout loops have stopped. No error messages are reported at all.
Even stranger is that the Express server is still up, and I can load http sites which prints to the same console as where the periodic timestamp was being printed.
I'm not sure if its related, but I also run nodejs with the --expose-gc flag and perform periodic garbage collection and to monitor that memory is in acceptable ranges.
It is a development server, so I have left the instance up in case there is some advice on what I can do to look further into the issue.
Could it be that somehow the event-loop dropped all it's timers?
I have a similar problem with setInterval().
I think it may be caused by the following bug in Node.js, which seem to have been fixed recently: setInterval callback function unexpected halt #22149
Update: it seems the fix has been released in Node.js 10.9.0.
I think the problem is that you are relying on setTimeout to be active over days. setTimeout is great for periodic running of functions, but I don't think you should trust it over extended time periods. Consider this question: can setInterval drift over time? and one of its linked issues: setInterval interval includes duration of callback #7346.
If you need to have things happen intermittently at particular times, a better way to attack this would be to schedule cron tasks that perform the tasks instead. They are more resilient and failures are recorded at a system level in the journal rather than from within the node process.
A good related answer/question is Node.js setTimeout for 24 hours - any caveats? which mentions using the npm package cron to do task scheduling.

Performance issue while using Parallel.foreach() with MaximumDegreeOfParallelism set as ProcessorCount

I wanted to process records from a database concurrently and within minimum time. So I thought of using parallel.foreach() loop to process the records with the value of MaximumDegreeOfParallelism set as ProcessorCount.
ParallelOptions po = new ParallelOptions
{
};
po.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.ForEach(listUsers, po, (user) =>
{
//Parallel processing
ProcessEachUser(user);
});
But to my surprise, the CPU utilization was not even close to 20%. When I dig into the issue and read the MSDN article on this(http://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx), I tried using a specific value of MaximumDegreeOfParallelism as -1. As said in the article thet this value removes the limit on the number of concurrently running processes, the performance of my program improved to a high extent.
But that also doesn't met my requirement for the maximum time taken to process all the records in the database. So I further analyzed it more and found that there are two terms as MinThreads and MaxThreads in the threadpool. By default the values of Min Thread and MaxThread are 10 and 1000 respectively. And on start only 10 threads are created and this number keeps on increasing to a max of 1000 with every new user unless a previous thread has finished its execution.
So I set the initial value of MinThread to 900 in place of 10 using
System.Threading.ThreadPool.SetMinThreads(100, 100);
so that just from the start only minimum of 900 threads are created and thought that it will improve the performance significantly. This did create 900 threads, but it also increased the number of failure on processing each user very much. So I did not achieve much using this logic. So I changed the value of MinThreads to 100 only and found that the performance was much better now.
But I wanted to improve more as my requirement of time boundation was still not met as it was still exceeding the time limit to process all the records. As you may think I was using all the best possible things to get the maximum performance in parallel processing, I was also thinking the same.
But to meet the time limit I thought of giving a shot in the dark. Now I created two different executable files(Slaves) in place of only one and assigned them each half of the users from DB. Both the executable were doing the same thing and were executing concurrently. I created another Master program to start these two Slaves at the same time.
To my surprise, it reduced the time taken to process all the records nearly to the half.
Now my question is as simple as that I do not understand the logic behind Master Slave thing giving better performance compared to a single EXE with all the logic same in both the Slaves and the previous EXE. So I would highly appreciate if someone will explain his in detail.
But to my surprise, the CPU utilization was not even close to 20%.
…
It uses the Http Requests to some Web API's hosted in other networks.
This means that CPU utilization is entirely the wrong thing to look at. When using the network, it's your network connection that's going to be the limiting factor, or possibly some network-related limit, certainly not CPU.
Now I created two different executable files … To my surprise, it reduced the time taken to process all the records nearly to the half.
This points to an artificial, per process limit, most likely ServicePointManager.DefaultConnectionLimit. Try setting it to a larger value than the default at the start of your program and see if it helps.

Resources