Is reproducible benchmarking possible?

Is reproducible benchmarking possible? - node.js

I need to test some node frameworks, or at least their routing part. That means from the request arrives at the node process for processing until a route has been decided and a function/class with the business logic is called, e.g. just before calling it. I have looked hard and long for a suitable approach, but concluded that it must be done directly in the code and not using an external benchmark tool. I fear measuring the wrong attributes. I tried artillery and ab but they measure a lot more attributes then I want to measure, like RTT, bad OS scheduling, random tasks executing in the OS and so on. My initial benchmarks for my custom routing code using process.hrtime() shows approx. 0.220 ms (220 microseconds) execution time but the external measure shows 0.700 (700 microseconds) which is not an acceptable difference, since it's 3.18x additional time. Sometimes execution time jumps to 1.x seconds due to GC or system tasks. Now I wonder how a reproducible approach would look like? Maybe like this:
Use Docker with Scientific Linux to get a somewhat controlled environment.
A minimal docker container install, node enabled container only, no extras.
Store time results in global scope until test is done and then save to disk.
Terminate all applications with high/moderate diskIO and/or CPU on host OS.
Measure time as explained before and crossing my fingers.
Any other recommendations to take into consideration?

Related

Random slowdowns in node.js execution

I have an optimization algorithm written in node.js that uses cpu time (measured with performance.now()) as a heuristic.
However, I noticed that occasionally some trivial lines of code would cost much more than usual.
So I wrote a test program:
const timings = [];
while (true) {
const start = performance.now();
// can add any trivial line of code here, or just nothing
const end = performance.now();
const dur = end - start;
if (dur > 1) {
throw [
"dur > 1",
{
start,
end,
dur,
timings,
avg: _.mean(timings),
max: _.max(timings),
min: _.min(timings),
last: timings.slice(-10),
},
];
}
timings.push(dur);
}
The measurements showed an average of 0.00003ms and a peak >1ms (with the second highest <1ms but same order of magnitude).
The possible reasons I can think of are:
the average timing isn't the actual time for executing the code (some compiler optimization)
performance.now isn't accurate somehow
cpu scheduling related - process wasn't running normally but still counted in performance.now
occasionally node is doing something extra behind the scenes (GC etc)
something happening on the hardware/os level - caching / page faults etc
Is any of these a likely reason, or is it something else?
Whichever the cause is, is there a way to make a more accurate measurement for the algorithm to use?
The outliers are current causing the algorithm to misbehave & without knowing how to resolve this issue the best option is to use the moving average cost as a heuristic but has its downsides.
Thanks in advance!
------- Edit
I appreciate how performance.now() will never be accurate, but was a bit surprised that it could span 3-4 orders of magnitude (as opposed to 2 orders of magnitude or ideally 1.)
Would anyone have any idea/pointers as to how performance.now() works and thus what's likely the major contributor to the error range?
It'd be nice to know if the cause is due to something node/v8 doesn't have control over (hardware/os level) vs something it does have control over (a node bug/options/gc related), so I can decide whether there's a way to reduce the error range before considering other tradeoffs with using an alternative heuristic.
------- Edit 2
Thanks to #jfriend00 I now realize performance.now() doesn't measure the actual CPU time the node process executed, but just the time since when the process started.
The question now is
if there's an existing way to get actual CPU time
is this a feature request for node/v8
unless the node process doesn't have enough information from the OS to provide this

You're unlikely to be able to accurately measure the time for one trivial line of code. In fact, the overhead in executing performance.now() is probably many times higher than the time to execute one trivial line of code. You have to be careful that what you're measuring takes substantially longer to execute than the uncertainty or overhead of the measurement itself. Measuring very small executions times is not going to be an accurate endeavor.
1,3 and 5 in your list are also all possibilities. You aren't guaranteed that your code gets a dedicated CPU core that is never interrupted to service some other thread in the system. In my Windows system, even when my nodejs is the only "app" running, there are hundreds of other threads devoted to various OS services that may or may not request some time to run while my nodejs app is running and eventually get some time slice of the CPU core my nodejs app was using.
And, as best I know, performance.now() is just getting a high resolution timer from the OS that's relative to some epoch time. It has no idea when your thread is and isn't running on a CPU core and wouldn't have any way to adjust for that. It just gets a high resolution timestamp which you can compare to some other high resolution timestamp. The time elapsed is not CPU time for your thread. It's just clock time elapsed.

Is any of these a likely reason, or is it something else?
Yes, they all sound likely.
is there a way to make a more accurate measurement for the algorithm to use?
No, sub-millisecond time measurements are generally not reliable, and almost never a good idea. (Doesn't matter whether a timing API promises micro/nanosecond precision or whatever; chances are that (1) it doesn't hold up in practice, and (2) trying to rely on it creates more problems than it solves. You've just found an example of that.)
Even measuring milliseconds is fraught with peril. I once investigated a case of surprising performance, where it turned out that on that particular combination of hardware and OS, after 16ms of full load the CPU ~tripled its clock rate, which of course had nothing to do with the code that appeared to behave weirdly.
EDIT to reply to edited question:
The question now is
if there's an existing way to get actual CPU time
No.
is this a feature request for node/v8
No, because...
unless the node process doesn't have enough information from the OS to provide this
...yes.

Sleep() Methods and OS - Scheduler (Camunda/Groovy)

I got a question for you guys and its not as specific as usual, which could make it a little annoying to answer.
The tool i'm working with is Camunda in combination with Groovy scripts and the goal is to reduce the maximum cpu load (or peak load). I'm doing this by "stretching" the work load over a certain time frame since the platform seems to be unhappy with huge work load inputs in a short amount of time. The resulting problem is that Camunda wont react smoothly when someone tries to operate it at the UI - Level.
So i wrote a small script which basically just lets each individual process determine his own "time to sleep" before running, if a certain threshold is exceeded. This is based on how many processes are trying to run at the same time as the individual process.
It looks like:
Process wants to start -> Process asks how many other processes are running ->
waitingTime = numberOfProcesses * timeToSleep * iterationOfMeasures
CPU-Usage Curve 1,3 without the Script. Curve 2,4 With the script
Testing it i saw that i could stretch the work load and smoothe out the UI - Levels. But now i need to describe why this is working exactly.
The Questions are:
What does a sleep method do exactly ?
What does the sleep method do on CPU - Level?
How does an OS-Scheduler react to a Sleep Method?
Namely: Does the scheduler reschedule or just simply "wait" for the time given?
How can i recreate and test the question given above?
The main goal is not for you to answer this, but could you give me a hint for finding the right Literature to answer these questions? Maybe you remember a book which helped you understand this kind of things or a Professor recommended something to you. (Mine wont answer, and i cant blame him)
I'm grateful for hints and or recommendations !

i'm sure you could use timer event
https://docs.camunda.org/manual/7.15/reference/bpmn20/events/timer-events/
it allows to postpone next task trigger for some time defined by expression.
about sleep in java/groovy: https://www.javamex.com/tutorials/threads/sleep.shtml
using sleep is blocking current thread in groovy/java/camunda.
so instead of doing something effective it's just blocked.

Why does the response time curve of NodeJS API become sinus like under load?

I am currently performing an API Load Test on my NodeJS API using JMeter and am completely new to the field. The API is deployed on an IBM Virtual Server with 4 vCPUs and 8GB of RAM.
One of my load tests includes stress testing the API in a 2500 thread (users) configuration with a ramp-up period of 2700ms (45 min) on infinite loop. The goal is not to reach 2500 threads but rather to see at what point my API would throw its first error.
I am only testing one endpoint on my API, which performs a bubble sort to simulate a CPU intensive task. Using Matplotlib I plotted the results of the experiment. I plotted the response time in ms over the active threads.
I am unsure why the response time curve becomes sinus like once crossing roughly 1100 Threads. I expected the response time curve keep rising in the same manner it does in the beginning (0 - 1100 threads). Is there an explanation for the sinus like behaviour of the curve towards the end?
Thank you!
Graph:
Red - Errors
Blue - Response time

There could be 2 possible reasons for this:
Your application cannot handle such a big load and performs frequent garbage collection in order to free up resources or tasks are queuing up as application cannot process them as they come. You can try using i.e. JMeter PerfMon Plugin to ensure that the system under test doesn't lack CPU or RAM
JMeter by default comes up with relatively low JVM Heap size and a very little GC tuning (like it's described in Concurrent, High Throughput Performance Testing with JMeter article where the guy has very similar symptoms) so it might be the case JMeter cannot send requests fast enough, make sure to follow JMeter Best Practices and consider going for distributed testing if needed.

Measuring a feature's share of a web service's execution time

I have a piece of code that includes a specific feature that I can turn on and off. I want to know the execution time of the feature.
I need to measure this externally, i.e. by simply measuring execution time with a load test tool. Assume that I cannot track the feature's execution time internally.
Now, I execute two runs (on/off) and simply assume that the difference between the resulting execution time is my feature's execution time.
I know that it is not entirely correct to do this as I'm looking at two separate runs that may be influenced by networking, programmatic overhead, or the gravitational pull of the moon. Still, I hope I can assume that the result will still be viable if I have a sufficiently large number of requests.
Now for the real question. I do the above using the average response time. Which is not perfect, but more or less ok.
My question is, what if I now use a percentile (say, 95th) instead?
Would my imperfect subtract-A-from-B approach become significantly more imperfect when using percentiles?

I would stick to the percentiles as the "average" approach can mask the problem, for example if you have very low response times during the initial phase of the test when the load is low and very high response times during the main phase of test when the load is immense the arithmetic mean approach will give you okayish values while with the percentiles you will get the information that the response time for 95% of requests was X or higher.
More information: Understanding Your Reports: Part 3 - Key Statistics Performance Testers Need to Understand

Fast CPU masks test errors

I have a Node.js application which I'm testing with Mocha on a fast(ish) dev machine. I've noticed that sometimes the fast CPU will mask some errors. If the tests are run on a machine with a slower CPU these errors starting showing up.
Question: Is there an easy way to temporarily slow down or simulate a slow down in CPU processing to surface these errors? Or way to run these tests at full speed and still discover this type of error?

One possible reason for these discrepancies is that certain functions might take longer to run depending on the machine they're running on, for instance if it involves heavy computation, or reading from a DB. This might modify the order in which callbacks are called.
To work around this problem, you can get more control over the order in which parallel sequences of operations are run by using (for instance) Sinon.js in your tests: it has great spy/stub features, and also ships fake timers.
By mocking (stubbing) the async functions which take time to run, you can remove the speed factor (machine-dependent). Also, fake timers allow getting control over functions wrapped in setTimeout or setInterval

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string