Is there an optimized version of V8 for server-side JavaScript (Node, primarily)? I ask because I assume normal V8 is optimized for Chrome, thus client side JavaScript.
It used to be the case that V8's memory management was not optimized for very big heaps. However with the new GC starting in V8 version 3.7 that should be history. Run with the --max-old-space-size=8192 flag. Now you can have an 8Gbyte heap instead of the normal 1.4Gbyte limit.
If short pauses are very important to you you can also use the --max-new-space-size=2048 flag. This will reduce peak performance, but shorten the pauses from somewhere around 100ms to more like 20ms. On the other hand if you only care about peak performance and do not care about long pause times you can use the --noincremental-marking flag. With this flag you can expect pause times of around 1 second per gigabyte, so it would mainly be useful for small heaps or batch processing tasks.
Related
I have a real-time application in NodeJS that listen to multiple websockets and reacts to its events by placing HTTPS requests; it runs continuously. I noticed that the response time, at many many times during execution, was much higher than merely the expected network latency, which led me to investigate the problem. It turns out that the Garbage Collector was running multiple times in a row adding significant latency (5s, up to 30s) to the run time.
The problem seems to be related to the frequent scheduling of a Scavenger round to free up resources, due to allocation failure. Although each round takes less than 100ms to execute, executing thousands of times in a row does add up to many seconds. My only guess is that at some point in my application the allocated heap is close to its limit and little memory is actually freed in each GC round which keeps triggering GC in a long loop. The application seems not to be leaking memory because memory allocation never indeed fails. It just seems to be hitting an edge that triggers GC madness.
Could someone with knowledge/experience shed some tips on how to use the GC options that V8 provides in order to overcome such situation? I have tried --max_old_space_size and --nouse-idle-notification (as suggested by the few articles that tackle this problem) but I do not fully understand the internals of the node GC implementation and the effects of the available options. I would like to have more control over when Scavenger round should run or at least increase the interval between successive rounds so that it becomes more efficient.
I'm reading about V8 GC here. As this new GC uses workers threads to perform concurrent marking I wonder if the overall performance is better when there is > 1 cpus. Will GC run faster ? Has anyone compared both scenarios ?
My app is not clustered.
Yes, you will only get a speed benefit from concurrent operations (in V8 or elsewhere) if you have more than one CPU core.
The actual performance impact depends on the specifics of your app, so you'll have to measure it yourself if you want results that actually apply to your case. As a rough guess, I would expect "a couple percent" of overall throughput difference: most of JavaScript is single-threaded, and in most apps garbage collection accounts for about 2-10% of CPU load.
I'm looking into making a real time game with OpenGL and D, and I'm worried about the garbage collector. From what I'm hearing, this is a possibility:
10 frames run
Garbage collector runs kicks in automatically and runs for 10ms
10 frames run
Garbage collector runs kicks in automatically and runs for 10ms
10 frames run
and so on
This could be bad because it causes stuttering. However, if I force the the garbage collector to run consistently, like with GC.collect, will it make my game smoother? Like so:
1 frame runs
Garbage collector runs for 1-2ms
1 frame runs
Garbage collector runs for 1-2ms
1 frame runs
and so on
Would this approach actually work and make my framerate more consistent? I'd like to use D but if I can't make my framerate consistent then I'll have to use C++11 instead.
I realize that it might not be as efficient, but the important thing is that it will be smoother, at a more consistent framerate. I'd rather have a smoothe 30 fps than a stuttering 35 fps, if you know what I mean.
Yes, but it will likely not make a dramatic difference.
The bulk of time spent in a GC cycle is the "mark" stage, where the GC visits every allocated memory block (which is known to contain pointers) transitively, from the root areas (static data, TLS, stack and registers).
There are several approaches to optimize an application's memory so that D's GC makes a smaller impact on performance:
Use bulk allocation (allocate objects in bulk as arrays)
Use custom allocators (std.allocator is on its way, but you could use your own or third party solutions)
Use manual memory management, like in C++ (you can use RefCounted as you would use shared_ptr)
Avoiding memory allocation entirely during gameplay, and preallocating everything beforehand instead
Disabling the GC, and running collections manually when it is more convenient
Generally, I would not recommending being concerned about the GC before writing any code. D provides the tools to avoid the bulk of GC allocations. If you keep the managed heap small, GC cycles will likely not take long enough to interfere with your application's responsiveness.
If you were to run the GC every frame, you still would not get a smooth run, because you could have different amounts of garbage every frame.
You're left then with two options, both of which involve turning off the GC:
Use (and re-use) pre-allocated memory (structs, classes, arrays, whatever) so that you do not allocate during a frame, and do not need to.
Just run and eat up memory.
For both these, you would do a GC.disable() before you start your frames and then a GC.enable() after you're finished with all your frames (at the end of the battle or whatever).
The first option is the one which most high performance games use anyway, regardless of whether they're written in a language with a GC. They simply do not allocate or de-allocate during the main frame run. (Which is why you get the "loading" and "unloading" before and after battles and the like, and there are usually hard limits on the number of units.)
According to google, V8 uses an efficient garbage collection by employing a "stop-the-world, generational, accurate, garbage collector". Part of the claim is that the V8 stops program execution when performing a garbage collection cycle.
An obvious question is how can you have an efficient GC when you pause program execution?
I was trying to find more about this topic as I would be interested to know how does the GC impacts the response time when you have possibly tens of thounsands requests per second firing your node.js server.
Any expert help, personal experience or links would be greatly appreciated
Thank you
"Efficient" can mean several things. Here it probably refers to high throughput. When looking at response time, you're more interested in latency, which could indeed be worse than with alternative GC strategies.
The main alternatives to stop-the-world GCs are
incremental GCs, which need not finish a collection cycle before handing back control to the mutator1 temporarily, and
concurrent GCs which (virtually) operate at the same time as the mutator, interrupting it only very briefly (e.g. to scan the stack).
Both need to perform extra work to be correct in the face of concurrent modification of the heap (e.g. if a new object is created and attached to an already-scanned object, this new reference must be noticed). This impacts total throughput, i.e., it takes longer to actually clean the entire heap. The upside is that they do not (usually) interrupt the program for very long, if at all, so latency is low(er).
Although the V8 documentation still mentions a stop-the-world collector, it seems that an the V8 GC is incremental since 2011. So while it does stop program execution once in a while, it does not 2 stop the program for however long it takes to scan the entire heap. Instead it can scan for, say, a couple milliseconds, and let the program resume.
1 "Mutator" is GC terminology for the program whose heap is garbage collected.
2 At least in principle, this is probably configurable.
Here's what's I've read so far, and correct me if I'm wrong:
Node.js is based on V8 JavaScript engine.
V8 JavaScript engine implements stop-the-world garbage collection
Which..causes Node.js to sometimes completely shutdown for a few seconds to a few minutes to handle garbage collection.
If this is running for production code, that's a few seconds for 10,000 users.
Is this really acceptable in production environment?
Whether it is acceptable depends on your application and your heap size. Big Gc is around 1.3ms per Mbyte. YMMV. About half that for a compacting GC. Around 1 GC in 10 is big. Around 1 big GC in 3 is compacting. Use V8 flag --trace-gc to log GCs. We have done some work on reducing pauses. No promises, no timetables. See branches/experimental/gc in V8 repo.