Why should external code call
v8::Isolate::AdjustAmountOfExternalAllocatedMemory, formerly known as v8::V8::AdjustAmountOfExternalAllocatedMemory and together also known as NanAdjustExternalMemory?
I see some bits of documentation on the web that these functions exist, and that they somehow help ther garbage collector. But how? Why? What repercussions are to be expected if some external code doesn't call these? In a Node.js module which makes use of asynchroneous execution, is it worth the effort to communicate changes in memory allocation from the worker threads back to the v8 thread where this function can be safely called? Why should anyone care how much memory the external code is using? And if there is a good reason, should I try to provide fine-grained updates for every malloc and free, or should I only call the function every once in a while, when the situation changes significantly?
You should only update it for memory that is kept alive by JavaScript objects. I.E. you have SetInternalField in a javascript object that points to C memory it owns.
This doesn't appear to be the case for you as you say:
is it worth the effort to communicate changes in memory allocation from the worker threads back to the v8 thread where this function can be safely called
Whatever memory your workers allocate, cannot be kept alive by some 'separate v8 thread' because an isolate can only be executed by a single thread. So it cannot be keeping alive any memory your other threads are allocating, so it is irrelevant.
In general you want to call this function because it will force v8 to do global GC more often, which it normally avoids doing at all costs. E.g. If you have 1000 dead javascript buffers each reserving 20MB, you have ~20GB of garbage while V8 thinks you only have some 20kb of garbage and thus won't try to GC anything. If you then told V8 that there is 20GB of external memory (AdjustAllocatedMemory(20 * 1024 * 1024 * 1024)), it would trigger a global GC, which would GC the JavaScript buffer objects, which would call their finalizers, where you would free() the 20MB buffers, which would free 20GB of memory.
Related
I am doing some tests using zlib provided by nodejs.
When I called the deflate function 5000 times in a loop, a memory leak occurred. And the brk function was called more than 10000 times in Linux(strace -cfe mmap,munmap,mprotect,brk -p {process Id} ).
However, when I called 5000 every 1 second using setInterval, there was no memory leak. And the brk function occurred much less in Linux.
In nodejs, the same number is called 5000, but why is the number of calls of brk function different in Linux?
I guess it's like reusing memory space in Linux. Is this correct? If not, what is the exact cause?
Mostly likely, you are seeing some side effect of the garbage collector. Analyzing this requires more more information than you've posted here, but if I were to hazard an educated guess, NodeJS (or v8 via libuv) probably does an opportunistic GC at the bottom of the event loop, to collect garbage in at least their shortest-lived pool.
When you are running this in a loop, you never fall off the event loop, which means that garbage collection probably has fewer opportunities to run. It probably runs based on a timer or allocation counter, and almost certainly during the memory allocator when there is memory pressure.
Run it in your loop for a few hundred thousand iterations, I bet it stabilizes the total process allocation at some point.
This may have been asked a few different ways, but this is a relatively new field to me so forgive me if it is redundant and point me on my way.
Essentially I have created a data collection engine that take high speed data (up to thousands of points a second) and stores them in a database.
The database is dynamic, so the statements being fed to the database are dynamically created in code as well, this in turn required a great deal of string manipulation. All of the strings however are declared within scope of asynchronous event handler methods, so they should fall out of scope as soon as the method completes.
As this application runs, its memory usage according to task manager / process explorer, slowly but steadily increases, so it would seem that something was not getting properly disposed and or collected.
If I attach CDB -p (yes I am loading the sos.dll from the CLR) and do a !dumpheap I see that the majority of this is being used by System.String, as well if I !dumpheap -type System.String, and the !do the addresses I see the exact strings (the SQL statements).
however if I do a !gcroot on the any of the addresses, I get "Found 0 unique roots (run '!GCRoot -all' to see all roots)." that in turn if I try as it suggests I get "Invalid argument -all" O.o
So after some googling, and some arguments concerning that unrooted objects will eventually be collected by GC, that this is not an issue.. I looked to see, and it appears 84% of my problem is sitting on the LOH (where depending on which thread you look at where, may or may not get processed for GC unless there is a memory constraint on the machine or I explicitly tell it to collect which is considered bad according to everything I can find)
So what I need to know is, is this essentially true, that this is not a memory leak, it is simply the system leaving stuff there until it HAS to be reclaimed, and if so how then do I tell that I do or do not have a legitimate memory leak.
This is my first time working the debugger external to the application as I have never had to address this sort of issue before, so I am very new to that portion, this is a learning experience.
Application is written in VS2012 Pro, C#, it is multi-threaded, and a console application is wrapping the API for testing, but will eventually be a Windows service.
What you read is true, managed applications use a memory model where objects pile on until you reach a certain memory threshold (calculated based on the amount of physical memory on your system and your application's real growth rate), after which all(*) "dead" objects get squished by the rest of the useful memory, making it one contiguous block for allocation speed.
So yes, don't worry about your memory steadily increasing until you're several tens of MB up and no collection has taken place.
(*) - is actually more complicated by multiple memory pools (based on object size and lifetime length), such that the system isn't constantly probing very long lived objects, and by finalizers. When an object has a finalizer, instead of being freed, the memory gets squished over them but they get moved to a special queue, the finalizer queue, where they wait for the finalizer to run on the UI thread (keep in mind the GC runs on a separate thread), and only then it finally gets freed.
According to google, V8 uses an efficient garbage collection by employing a "stop-the-world, generational, accurate, garbage collector". Part of the claim is that the V8 stops program execution when performing a garbage collection cycle.
An obvious question is how can you have an efficient GC when you pause program execution?
I was trying to find more about this topic as I would be interested to know how does the GC impacts the response time when you have possibly tens of thounsands requests per second firing your node.js server.
Any expert help, personal experience or links would be greatly appreciated
Thank you
"Efficient" can mean several things. Here it probably refers to high throughput. When looking at response time, you're more interested in latency, which could indeed be worse than with alternative GC strategies.
The main alternatives to stop-the-world GCs are
incremental GCs, which need not finish a collection cycle before handing back control to the mutator1 temporarily, and
concurrent GCs which (virtually) operate at the same time as the mutator, interrupting it only very briefly (e.g. to scan the stack).
Both need to perform extra work to be correct in the face of concurrent modification of the heap (e.g. if a new object is created and attached to an already-scanned object, this new reference must be noticed). This impacts total throughput, i.e., it takes longer to actually clean the entire heap. The upside is that they do not (usually) interrupt the program for very long, if at all, so latency is low(er).
Although the V8 documentation still mentions a stop-the-world collector, it seems that an the V8 GC is incremental since 2011. So while it does stop program execution once in a while, it does not 2 stop the program for however long it takes to scan the entire heap. Instead it can scan for, say, a couple milliseconds, and let the program resume.
1 "Mutator" is GC terminology for the program whose heap is garbage collected.
2 At least in principle, this is probably configurable.
I read that with .NET Framework 4 the current garbage collection implementation is replaced:
The .NET Framework 4 provides
background garbage collection. This
feature replaces concurrent garbage
collection in previous versions and
provides better performance.
At this page there is an explanation how it works but I am not sure I understood it.
In practical world application what is the benefit of this new GC implementation? Is it a feature that could be use to push for a transition from 3.5 or previous to 4.0?
Here, Microsoft uses the names "concurrent" and "background" to describe two versions of the GC it uses in .NET. In the .NET world, the "background collector" is an enhancement over the "concurrent collector" in that it has less restrictions on what application threads can do while the collector is running.
A basic GC uses a "stop-the-world" strategy: applicative threads allocate memory blocks from a common heap. When the GC must run (e.g. too many blocks have been allocated, some cleanup is needed), all applicative (managed) threads stop. The last stopping thread runs the GC, and unblocks all the other threads when it has finished. A stop-the-world GC is simple to implement but induces pauses which can be perceptible at the user level.
Microsoft's "concurrent GC" is generational: it uses the stop-the-world strategy for only a limited part of the heap (what they call "generations 0 and 1"). Since that part remains small, pauses remain short (e.g. below 50ms), so that the user will not notice them. The rest of the heap is collected with a dedicated GC thread, which can run concurrently with the applicative threads (hence the name).
The concurrent GC has some limitations. Namely, there are moments when the GC thread must assume a somewhat exclusive control of the heap. During such times, applicative threads may allocate blocks only from small thread-specific areas. Threads which have bigger needs will soon stumble upon the main heap, which, at that time, is locked by the GC thread. The allocating thread must then block until the GC thread has finished its lock-the-heap phase. This again induces pauses. Less pauses than with a stop-the-world GC, and these pauses do not affect all threads. Yet pauses nonetheless.
The "background GC" is an enhanced GC in which the GC thread needs not lock the heap. This removes the extra pauses described in the previous paragraph; only remain the limited pauses when the young generations are collected (what Microsoft calls "a foreground collection").
Note: there are "hidden costs" with the concurrent GC and the background GC. For these GC to operate properly, memory accesses from applicative threads must be done in some very specific ways, which have a slight impact on performance. Also, the GC thread may have an adverse effect on cache memory, thus indirectly degrading performance. For a purely computational task with no need for user interaction, a stop-the-world collector may, on average, yield somewhat better performance (e.g. a twenty-hours-long computation will complete in nineteen hours). But this is an edge case, and in most situations the concurrent and background GC are better.
Here is the real world explanation without slur and overinflated feeling of self-importance:
In concurrent GC you were allowed to allocate while in a GC, but you are not allowed to start another GC while in a GC. This in turn means that the maximum you are allowed to allocate while in a GC is whatever space you have left on one segment (currently 16 MB in workstation mode) minus anything that is already allocated there).
The difference in Background mode is that you are allowed to start a new GC (gen 0+1) while in a full background GC, and this allows you to even create a new segment to allocate in if necessary. In short, the blocking that could occur before when you allocated all you could in one segment won’t happen anymore.
From Tess da Man! http://blogs.msdn.com/b/tess/archive/2009/05/29/background-garbage-collection-in-clr-4-0.aspx
The primary benefit will be fewer application freezes due to garbage collection, which in itself could be considered a significant improvement. For most apps this difference will not be noticeable unless you have a HUGE number of long-lived objects in memory.
This change also makes .NET slightly more viable for building timing-sensitive apps (where response times are important). The extreme example are car airbags - you don't want your software to be busy doing garbage collection when they need to be inflated. The changes in 4.0 reduce the number and length of freezes due to GCing but does not remove them entirely.
I am trying to profile a specific page of my ASP.NET site to optimize memory usage, but the nature of .NET as a Garbage Collected language is making it tough to get a true picture of what how memory is used and released in the program.
Is there a perfmon counter or other method for profiling that will allow me to see not only how much memory is allocated, but also how much has been released by the program and is just waiting for garbage collection?
Actually nothing in the machine really knows what is waiting for garbage collection: garbage collection is precisely the process of figuring that out and releasing the memory corresponding to dead objects. At best, the GC will have that information only on some very specific instants in its cycle. The detection and release parts are often interleaved (this depends on the GC technology) so it is possible that the GC never has a full count of what could be freed.
For most GC, obtaining such an information is computationally expensive. If you are ready to spend a bit of CPU time on it (it will not be transparent to the application) then you can use GC.Collect() to force the GC to run, immediately followed by a call to GC.GetTotalMemory() to know how much memory has survived the GC. Note that forcing the GC could induce a noticeable pause, and may also decrease overall performance.
This is the "homemade" method; for a more serious analysis, try a dedicated profiler.
The best way that I have been able to profile memory is to use ANTS Profiler from RedGate. You can view a snapshot, what stage of the lifecycle it is in and more. Including actual object values.