Currently working on optimizing a library for speed. I've already reduced execution time drastically, using V8 CPU and Memory Profiling through Webstorm. This was achieved mainly by changing the core method from recursive to iterative.
Now the self time distribution breaks down as
I'm assuming the first entry "node" is timing internal functions calls, which is great. The other entries also make sense. I'm new to Nodejs profiling, but 31.6% for GC seems high, so I've decided to investigate.
I've now created a heap dump through Webstorm, but unfortunately that doesn't give me much information.
These seem to be system internal memory references mainly. Stepping through the core iteration code logic again, there also don't seem to be a lot of places where memory is explicitly allocated (using this as a reference).
Question
Can the GC overhead be reduced?
Is this amount of allocation just expected here?
Is it possible to get better memory profiling information?
Setup Instructions
In case someone want's to try debugging this, I'm including setup instructions.
Download or clone object-scan and run
yarn install --frozen-lockfile
yarn run test-simple --verbose
Now create a file test.js in the project root containing this content and run node --trace_gc test.js or run it through Webstorm for advanced profiling.
In Javascript and in v8 (node) particularly an amount of time spent for garbage collection depends on amount of data stored in heap, but that's only one of many factors.
In v8 engine there are two main "types" of GC: minor (scavenge) and major (mark-sweep/mark-compact). You may see GC types that happen during your tests in console with --trace-gc enabled. And in different cases one type could "eat" more time than other an vice versa. So before optimizations you should determine which gc takes more time.
There are not a lot of options for optimizing major GC, cause it highly affected by amount of data that stays in memory for "long" (actually in this case long means that object survives scavenge GC) period. Such data is stored in so called "old space" in heap. And major GC works with this space and it should scan all that memory and mark objects that no longer have any references for further clearance.
In your case the amount of test data you're loading goes to old space. As a result it affects major GC during the whole test. And in this case major GC will not clear too much, because you're using your test object, but it still consume time for scanning entire old space. So you may consider preventing v8 from doing that by launching node with gc-specific flags like: --nouse-idle-notification --expose-gc --gc_interval=100500 (where 100500 is number of allocation, it can be take high value that will prevent running gc before the whole test will pass) that will allow trigger garbage collections manually. Test your code using this approach and see how major GC affects it, try tests with different amount of data you provide to function. If the impact is quiet high you may try to refactor your code trying to minimize long-lived variables, closures, etc.
If you'll discover that major GC doesn't have much impact on performance, then scavenge GC takes the most of time. Unlike major GC it operates with so called "new space" in heap. It's a space where all new objects are stored. If those objects survive scavenge, then they are moved to old space. New space has much smaller size ( you may control it by setting --max_semi_space_size, note: new space size = 2 * semi space size) than old space and more new objects and variables you allocate more scavenge GC runs will happen. If this GC heats performance too much you may consider refactor your code to make less new allocations. But if you'll reuse variables it may also slowdown the performance and those objects will go to old space and may become a problem described in "major GC" section.
Also v8 GC doesn't always work in the same thread that your program runs. It does some work in background too, but I don't know what Webstorm shows in your case. If it counts just total time spend in GC, may be it just doesn't have so much impact.
You may find more details on v8 GC in this blog post.
TL;DR:
Can the GC overhead be reduced?
Yes, but first you should discover what should be optimized by following steps above.
Is this amount of allocation just expected here?
That's could be just discovered by comparing different approaches. There's no some absolute number that could limit "good" amount from "bad", because it depends on lot's of factors, including the amount on entry data.
Is it possible to get better memory profiling information?
You may find some good tools here, but in general you may use Chrome dev tools which could provide a bit more details rather than Webstorm does.
Related
I am experiencing a 'slow' memory increase in my node process which runs for longer periods of time (~1GB in over 2 months), however the heap stays constant (which implies that my code/stack is growing). I also tried to manually call the garbage collector but memory usage remains the same.
How can I investigate this further ? I want to confirm my theory and figure out why is my code segment / stack part growing.
I am using node v8 LTS (I know this is EOL from this year, I just need to know if there's a way to figure this out)
(V8 developer here.)
Code generated by V8 is on the heap, so if the heap isn't growing, that means that code isn't growing either.
The stack size is limited by the operating system, usually to 1-8 MB. Since operating systems simply kill processes that run into the stack limit, V8 imposes an even lower limit (a little less than a megabyte, I think it's 984KB currently) onto itself, and will throw a RangeError if that's ever exceeded. So a growing stack can't be your problem either.
Since you say that the heap memory reported by Node/V8 remains constant, that also means that most "how to debug memory leaks in Node" tutorials don't apply to your situation; and that probably also means that the leak is not in your (JavaScript) code.
That leaves C++ "heap memory" (which is very different from V8's managed "heap"!) as the most likely culprit. Node itself as well as native extensions can freely allocate memory that they manage themselves. Maybe something doesn't get cleaned up properly there. That could simply be an upstream bug; or it could be that something in your code is accidentally holding on to some embedder memory.
What I would try first is to update Node and any native extensions you have installed. Maybe the leak has already been found and fixed.
If that doesn't help, then you could try to investigate where the memory is going. For instance, you could compile everything from source with LSan enabled, and see if that reports anything. It would probably be helpful to construct a stress-test, e.g. a fake client that floods (a test instance of) your server with real-looking requests, to try to trigger inspectable instances of the leak in seconds or minutes rather than months. Crafting such a fake client might also help narrow down where things go wrong (e.g.: maybe you'll notice that one type of request does not trigger the leak but another type of request does).
is there an option for node.js to increase initial allocated memory?
https://futurestud.io/tutorials/node-js-increase-the-memory-limit-for-your-process
the --max-old-space-size seems to increase max memory but what about initial memory?
Kind of like xmx and xms for the JVM.
V8 developer here. The short answer is: no.
The reason no such option exists is that adding fresh pages to the heap is so fast that there is no significant benefit to doing it up front.
V8 does have a flag --initial-old-space-memory, but it doesn't increase the initial allocation. Instead, what it means is "don't bother doing (old-space) GC while the heap size is below this limit". If you set that to, e.g., 1000 (MB), and then allocate 800MB of unreachable objects, and then just wait, then V8 will sit around forever with 800MB of garbage on the heap and won't lift a finger to get rid of any of that.
I'm not sure in what scenario this behavior would be useful (it's not like it will turn off GC entirely; GC will just run less frequently, but fewer GCs on a bigger heap don't necessarily add up to less total time than more GCs on a smaller heap), so I would strongly recommend to measure the effect on your particular workload carefully before using this flag -- if it were a good idea to have this on by default, then it would be on by default!
If I had to guess: this flag might be beneficial if you know that (1) your application will have a large amount of "eternal" (=lives as long as the app is running) data on the heap, and (2) you can estimate the amount of that data with reasonable accuracy. E.g.: if you know that at any given time, your old-space will consist of 500MB of always-reachable-anyway data plus any potentially-freeable-garbage, you could use this flag to tell V8 "if old-space size is below 600MB (=500MB plus a little), then don't bother trying to find garbage, it won't be worth the effort".
I found the related problem here
But still don't got the real answer for this. :(
So, How to reduce to minimum memory with global.gc() in nodejs?
should I spam global.gc() function to reduce?
Instead of forcing the garbage collector to run, you should first identify if you actually have a memory leak by using various tools (e.g. node's built-in inspector, heapdump module, etc.) available for node that allow you to detect such leaks.
It's entirely possible that it appears there is a leak when there isn't, due to how V8's garbage collector works (it is generally lazy because GC is not exactly a cheap operation CPU usage-wise).
Also, you can limit the amount of memory used by V8 via the --max-old-space-size=xxxx command-line argument (where xxxx is the amount of memory in megabytes). This can also be helpful in more quickly determining whether you have a legitimate memory leak.
Context: 64 bit Oracle Java SE 1.8.0_20-b26
For over 11 hours, my running java8 app has been accumulating objects in the Tenured generation (close to 25%). So, I manually clicked on the Perform GC button in jconsole and you can see the precipitous drop in heap memory on the right of the chart. I don't have any special VM options turned on except for XX:NewRatio=2.
Why does the GC not clean up the tenured generation ?
This is a fully expected and desirable behavior. The JVM has been successfully avoiding a Major GC by performing timely Minor GC's all along. A Minor GC, by definition, does not touch the Tenured Generation, and the key idea behind generational garbage collectors is that precisely this pattern will emerge.
You should be very satisfied with how your application is humming along.
The throughput collector's primary goal is, as its name says, throughput (via GCTimeRatio). Its secondary goal is pause times (MaxGCPauseMillis). Only as tertiary goal it considers keeping the memory footprint low.
If you want to achieve a low heap size you will have to relax the other two goals.
You may also want to lower MaxHeapFreeRatio to allow the JVM to yield back memory to the OS.
Why does the GC not clean up the tenured generation ?
Because it doesn't need to.
It looks like your application is accumulating tenured garbage at a relatively slow rate, and there was still plenty of space for tenured objects. The "throughput" collector generally only runs when a space fills up. That is the most efficient in terms of CPU usage ... which is what the throughput collector optimizes for.
In short, the GC is working as intended.
If you are concerned by the amount of memory that is being used (because the tenured space is not being collected), you could try running the application with a smaller heap. However, the graph indicates that the application's initial behavior may be significantly different to its steady-state behavior. In other words, your application may require a large heap to start with. If that is the case, then reducing the heap size could stop the application working, or at least make the startup phase a lot slower.
I'm struggling with getting the right settings for my JVM.
Here's the use case:
Tomcat is serving requests (300req/s). But they are very fast (key-value lookup) so I don't have any performance problems. Everything would work fine till I have to refresh the data it's serving every 3 hours. You can imagine I have a big HashMap and I'm just doing lookups. During data reload a create a temporary HashMap and then I swap it. I need to load quite a lot of data (~800MB in memory every time).
The problem I have that during those loads from time to time Tomcat stops responding.
Initially the problem was promotion failures and FullGC but I got around those problems by tweaking the settings.
As you might notice I already decreased the value when the CMS collector kicks in. I don't get any promotion failure or anything like that any more. The young generation is reasonably small to make the minor collection fast. I've increased the SurvivorRatio because all the request objects die young and what doesn't should be automatically promoted to old generation.(the data being load).
But I'm still seeing 503 errors in Tomcat during the data load. In gc.log my minor collections started to be slow during this process. They are now in seconds comparing to miliseconds. I've tried slowing down the load process to give a breather to the GC but I doesn't seem to work...
The problem is especially problematic the moment I reach the capacity of old generation. CMS kicks in, frees up memory and then later the allocations are pretty slow. I don't see any errors in gc.log any more.
What can I do differently? I know fragmentation might be a problem but I'm not getting promotion failures. The machine is a 8 core server. Does decreasing the number of GCThread make any sense? Will setting a lower thread priority for the data loading thread make sense?
Is there a way to kick off CMS collector periodically in the background? The data that's being swapped can be actually immediately be garbage collected.
I'm open to any suggestions!
Here are my JVM settings.
-Xms14g
-Xmx14g
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+AlwaysPreTouch
-XX:MaxNewSize=256m
-XX:NewSize=256m
-XX:MaxPermSize=128m
-XX:PermSize=128m
-XX:SurvivorRatio=24
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=88
-XX:+UseCompressedStrings
-XX:+DisableExplicitGC
JDK 1.6.33
Tomcat 6
gc.log snippet:
line 7 the data load starts
line 20 it stops
http://safebin.net/9124
Looking at that attached log and seeing those huge increases in minor GC times leads me to belive that your machine is under extremely heavy load from other processes than the JVM.
My reasoning in this is that when your minor GC is taking place, all application threads are stopped. Hence, nothing your application does should be able to affect the minor GC times seeing that your new gen is constant in size.
However, if there are a lot of load from other processes on the machine during this time, the GC threads will compete for execution time and you could see this behavior.
Could you check the CPU usage from other processes when your data load is running?
Edit: Looking a bit more on the logs I come up with another possible explanation.
It seems that the target survivor space is full (ParNew goes down to exactly 10048K each "slow" GC). That would mean that objects are promoted to old gen directly which possibly could slow this down. I would try to increase the size of the New gen and lower the survivor ratio. Even maybe try to run without setting the new gen size or the survivor rate at all and see how the JVM managed to optimize this (although beware that the JVM usually does a poor job for optimizing for bursts like this).
your load lasts about 90s and is interrupted by a GC every 1s or so yet you have a 14G heap which has a steady state occupancy (assuming the surrounding log lines are steady state) of only about 5G which means you have a lot of memory going to waste. I think the previous answer looks to be correct (based on the data presented) when it says your survivor spaces are too small. If it reasonable does nothing but lookups the rest of the time then a perfectly reasonable strategy would be something like
tenuring threshold = 0 (or 1)
eden size > 2x the working set so maybe 1.5-2G (i.e. allow the current live data and the working copy to reside entirely in eden)
tenured = whatever is left
The point here being to try and completely avoid a young collection during the load phase. However a tenured threshold of 0 would mean the previous version would likely be in tenured and you'd eventually see a possibly lengthy collection to clean it up. Another option might be to go the other way round and have tenured big enough to fit 2-3 versions of the data and eden the rest with a view to attempting to minimise the frequency of a young collection and help tenured be collected as quickly as possible.
What works best really depends on what else the app is doing the rest of the time.
The cms trigger seems quite high for a large heap btw, if you only start collecting at 88% then does it have time to finish the job before a fullgc is forced? I suppose it might be quite safe if you're actually doing v little allocation most of the time.