timing for node.js Garbage Collection - node.js

Recently, I have installed https://github.com/lloyd/node-memwatch for development, to investigate how GC interacts with my program.
I have binded the event "stat", the arthor states that the event is triggered when GC is performed.
I've found that when the script with high load. The "stat" events are not triggered. I am not sure whether it implies GC is not performed, but it is a sign that GC may not have triggered.
In my production server, the loading is even a lot higher throughout the day. I am quite sure that GC has no chance to perform. The memory usage has no chance to decrease. It is just like memory leak.
Is my observation correct? Is GC not able to perform when there are high load continuously?
If so, should I use the exposed GC interface to force GC?
Is GC blocking? Should I perform GC more frequently so that GC will not block for a long time for each GC?
I know manual GC is not a good idea (There is someone opposing the idea of manual GC in node.js, but I cannot find the link for reference), but I am seeing that the memory usage is increasing continuously. It really needs to be solved.

There are 3 types of GC events in V8
kGCTypeMarkSweepCompact
kGCTypeScavenge
kGCTypeAll
V8 runs the scavenge event quite often, but only on newly created objects. During a heavy load the other types of GC may occur infrequently.
You can try running the NodeFly agent which uses nodefly-gcinfo module to track ongoing memory usage.
You can also call nodefly-gcinfo directly which has a callback that runs each time a GC event occurs:
require('nodefly-gcinfo').onGC(function(usage, type, flags){
console.log("GC Event Occurred");
console.log("Heap After GC:",usage, type, flags);
});

Related

How to tweak the NodeJS Garbage Collector to schedule fewer Scavenger rounds?

I have a real-time application in NodeJS that listen to multiple websockets and reacts to its events by placing HTTPS requests; it runs continuously. I noticed that the response time, at many many times during execution, was much higher than merely the expected network latency, which led me to investigate the problem. It turns out that the Garbage Collector was running multiple times in a row adding significant latency (5s, up to 30s) to the run time.
The problem seems to be related to the frequent scheduling of a Scavenger round to free up resources, due to allocation failure. Although each round takes less than 100ms to execute, executing thousands of times in a row does add up to many seconds. My only guess is that at some point in my application the allocated heap is close to its limit and little memory is actually freed in each GC round which keeps triggering GC in a long loop. The application seems not to be leaking memory because memory allocation never indeed fails. It just seems to be hitting an edge that triggers GC madness.
Could someone with knowledge/experience shed some tips on how to use the GC options that V8 provides in order to overcome such situation? I have tried --max_old_space_size and --nouse-idle-notification (as suggested by the few articles that tackle this problem) but I do not fully understand the internals of the node GC implementation and the effects of the available options. I would like to have more control over when Scavenger round should run or at least increase the interval between successive rounds so that it becomes more efficient.

Garbage collector in Node.js

According to google, V8 uses an efficient garbage collection by employing a "stop-the-world, generational, accurate, garbage collector". Part of the claim is that the V8 stops program execution when performing a garbage collection cycle.
An obvious question is how can you have an efficient GC when you pause program execution?
I was trying to find more about this topic as I would be interested to know how does the GC impacts the response time when you have possibly tens of thounsands requests per second firing your node.js server.
Any expert help, personal experience or links would be greatly appreciated
Thank you
"Efficient" can mean several things. Here it probably refers to high throughput. When looking at response time, you're more interested in latency, which could indeed be worse than with alternative GC strategies.
The main alternatives to stop-the-world GCs are
incremental GCs, which need not finish a collection cycle before handing back control to the mutator1 temporarily, and
concurrent GCs which (virtually) operate at the same time as the mutator, interrupting it only very briefly (e.g. to scan the stack).
Both need to perform extra work to be correct in the face of concurrent modification of the heap (e.g. if a new object is created and attached to an already-scanned object, this new reference must be noticed). This impacts total throughput, i.e., it takes longer to actually clean the entire heap. The upside is that they do not (usually) interrupt the program for very long, if at all, so latency is low(er).
Although the V8 documentation still mentions a stop-the-world collector, it seems that an the V8 GC is incremental since 2011. So while it does stop program execution once in a while, it does not 2 stop the program for however long it takes to scan the entire heap. Instead it can scan for, say, a couple milliseconds, and let the program resume.
1 "Mutator" is GC terminology for the program whose heap is garbage collected.
2 At least in principle, this is probably configurable.

What's the Gambit-C's GC mechanism?

What's the Gambit-C's GC mechanism? I'm curious about this for making interactive app. I want to know whether it can avoid burst GC operation or not.
According to these threads:
https://mercure.iro.umontreal.ca/pipermail/gambit-list/2005-December/000521.html
https://mercure.iro.umontreal.ca/pipermail/gambit-list/2008-September/002645.html
Gambit has traditional stop-the-world GC at least until September 2008. People in thread recommended using pre-allocated object pooling to avoid GC operation itself. I couldn't find out about current implementation.
*It's hard to agree with the conversation. Because I can't pool object not written by myself and finally full-GC will happen at sometime by accumulated small/non-pooled temporary objects. But the method mentioned by #Gregory may help to avoid this problem. However, I wish incremental GC added to Gambit :)
According to http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Debugging#Garbage_collection_threshold gambit has some controls:
Garbage collection threshold
Pay attention to the runtime options h (maximum heapsize in kilobytes) and l (livepercent). See the reference manual for more information. Setting livepercent to five means that garbage collection will take place at the time that there are nineteen times more memory allocated for objects that should be garbage collected, than there is memory allocated for objects that should not. The reason the livepercent option is there, is to give a way to control how sparing/generous the garbage collector should be about memory consumption, vs. how heavy/light it should be in CPU load.
You can always force garbage collection by (##gc).
If you force garbage collection after some small number of operations, or schedule it near continuously, or set the livepercent to like 90 then presumably the gc will run frequently and not do very much on each run. This is likely to be more expensive overall, but avoid bursts of expense. You can then fairly easily budget for that expense to make the service fast despite.

Difference between background and concurrent garbage collection?

I read that with .NET Framework 4 the current garbage collection implementation is replaced:
The .NET Framework 4 provides
background garbage collection. This
feature replaces concurrent garbage
collection in previous versions and
provides better performance.
At this page there is an explanation how it works but I am not sure I understood it.
In practical world application what is the benefit of this new GC implementation? Is it a feature that could be use to push for a transition from 3.5 or previous to 4.0?
Here, Microsoft uses the names "concurrent" and "background" to describe two versions of the GC it uses in .NET. In the .NET world, the "background collector" is an enhancement over the "concurrent collector" in that it has less restrictions on what application threads can do while the collector is running.
A basic GC uses a "stop-the-world" strategy: applicative threads allocate memory blocks from a common heap. When the GC must run (e.g. too many blocks have been allocated, some cleanup is needed), all applicative (managed) threads stop. The last stopping thread runs the GC, and unblocks all the other threads when it has finished. A stop-the-world GC is simple to implement but induces pauses which can be perceptible at the user level.
Microsoft's "concurrent GC" is generational: it uses the stop-the-world strategy for only a limited part of the heap (what they call "generations 0 and 1"). Since that part remains small, pauses remain short (e.g. below 50ms), so that the user will not notice them. The rest of the heap is collected with a dedicated GC thread, which can run concurrently with the applicative threads (hence the name).
The concurrent GC has some limitations. Namely, there are moments when the GC thread must assume a somewhat exclusive control of the heap. During such times, applicative threads may allocate blocks only from small thread-specific areas. Threads which have bigger needs will soon stumble upon the main heap, which, at that time, is locked by the GC thread. The allocating thread must then block until the GC thread has finished its lock-the-heap phase. This again induces pauses. Less pauses than with a stop-the-world GC, and these pauses do not affect all threads. Yet pauses nonetheless.
The "background GC" is an enhanced GC in which the GC thread needs not lock the heap. This removes the extra pauses described in the previous paragraph; only remain the limited pauses when the young generations are collected (what Microsoft calls "a foreground collection").
Note: there are "hidden costs" with the concurrent GC and the background GC. For these GC to operate properly, memory accesses from applicative threads must be done in some very specific ways, which have a slight impact on performance. Also, the GC thread may have an adverse effect on cache memory, thus indirectly degrading performance. For a purely computational task with no need for user interaction, a stop-the-world collector may, on average, yield somewhat better performance (e.g. a twenty-hours-long computation will complete in nineteen hours). But this is an edge case, and in most situations the concurrent and background GC are better.
Here is the real world explanation without slur and overinflated feeling of self-importance:
In concurrent GC you were allowed to allocate while in a GC, but you are not allowed to start another GC while in a GC. This in turn means that the maximum you are allowed to allocate while in a GC is whatever space you have left on one segment (currently 16 MB in workstation mode) minus anything that is already allocated there).
The difference in Background mode is that you are allowed to start a new GC (gen 0+1) while in a full background GC, and this allows you to even create a new segment to allocate in if necessary. In short, the blocking that could occur before when you allocated all you could in one segment won’t happen anymore.
From Tess da Man! http://blogs.msdn.com/b/tess/archive/2009/05/29/background-garbage-collection-in-clr-4-0.aspx
The primary benefit will be fewer application freezes due to garbage collection, which in itself could be considered a significant improvement. For most apps this difference will not be noticeable unless you have a HUGE number of long-lived objects in memory.
This change also makes .NET slightly more viable for building timing-sensitive apps (where response times are important). The extreme example are car airbags - you don't want your software to be busy doing garbage collection when they need to be inflated. The changes in 4.0 reduce the number and length of freezes due to GCing but does not remove them entirely.

Which counter can I use in performance monitor to see how much memory is waiting for the GC?

I am trying to profile a specific page of my ASP.NET site to optimize memory usage, but the nature of .NET as a Garbage Collected language is making it tough to get a true picture of what how memory is used and released in the program.
Is there a perfmon counter or other method for profiling that will allow me to see not only how much memory is allocated, but also how much has been released by the program and is just waiting for garbage collection?
Actually nothing in the machine really knows what is waiting for garbage collection: garbage collection is precisely the process of figuring that out and releasing the memory corresponding to dead objects. At best, the GC will have that information only on some very specific instants in its cycle. The detection and release parts are often interleaved (this depends on the GC technology) so it is possible that the GC never has a full count of what could be freed.
For most GC, obtaining such an information is computationally expensive. If you are ready to spend a bit of CPU time on it (it will not be transparent to the application) then you can use GC.Collect() to force the GC to run, immediately followed by a call to GC.GetTotalMemory() to know how much memory has survived the GC. Note that forcing the GC could induce a noticeable pause, and may also decrease overall performance.
This is the "homemade" method; for a more serious analysis, try a dedicated profiler.
The best way that I have been able to profile memory is to use ANTS Profiler from RedGate. You can view a snapshot, what stage of the lifecycle it is in and more. Including actual object values.

Resources