I have a lot of "not very well" written JavaScript scripts I run in nodejs environment. It contains memory leaks, infinite loops and whatever code can "regular" (non-programmers) user produce.
What I found out when randomly analyzing execution of those "scripts" was, that some of them has huge rss memory area lets say around 1.0GB while heapTotal might be "just" around 450MB.
Despite reading blog posts about memory layout in nodejs I am not able to explain/simulate such "leak". I tried to create heapdump but obviously I wont find what is stored in the "stack area" because I did not dump that zone.
Do anyone know what has to happen in the source code so we leak out all memory while heap size is much smaller i.e. what would "evil source code" look like to eat space out of heap?
EDIT:
I found out that its pretty simple: const c = Buffer.alloc(1024*1024*1024, 1) consumes 1GB outside of heap. New question arise: How can one "clean out" this space and free the memory up? How can I detect leaky buffers? Is restart only way?
Related
I am currently running node.js using pm2.
And recently, I was able to check "custom metrics" using the pm2 monit command.
Here, information such as Heap size, used heap size, and active requests are shown.
I don't know how the heap size is determined. Actually, I checked pm2 running on different servers.
Each was set to 95mib / 55mib, and accordingly, the used heap size was different.
Also, is the heap usage closer to 100% the better?
While searching on "StackOverflow" to find related information, I saw the following article.
What does Heap Usage mean in PM2
Also what means active requests ? It is continuously zero.
Thank you!
[Edit]
env : ubuntu18.04 [ ec2 - t3.micro ]
node version : v10.15
[Additional]
server memory : 1GB [ 40~50% used ]
cpu : vCPU (2) [ 1~2% used ]
The heap is the RAM used by the program you're asking PM2 to manage and monitor. Heap space, in Javascript and similar language runtimes, is allocated when your program creates objects and released upon garbage collection. Your runtime asks your OS for more heap space whenever it needs it: when active allocations exceed the free space. So your heap size will probably grow as your program starts up. That's normal.
Most programs allocate and release lots of objects as they do their work, so you should not try to optimize the % usage of your heap. When your program is running at a steady state – that is, after it has started up — you'll find the % utilization creeping up until garbage collection happens, and then dropping back. For example, a nodejs/express web server allocates req and res objects for each incoming request, then uses them, then drops them so the garbage collector can reclaim their RAM.
If your allocated heap size keeps growing, over minutes or hours, you probably have a memory leak. That is a programming bug: a problem you should do your best to solve. You should look up how that works for your application language. Other than that, don't worry too much about heap usage.
Active requests count work being done via various asynchronous objects like file writers and TCP connections. Unless your program is very busy it stays near zero.
Keep an eye on loop delay if your program does computations. If it creeps up, some computation function is hogging Javascript.
I am experiencing a 'slow' memory increase in my node process which runs for longer periods of time (~1GB in over 2 months), however the heap stays constant (which implies that my code/stack is growing). I also tried to manually call the garbage collector but memory usage remains the same.
How can I investigate this further ? I want to confirm my theory and figure out why is my code segment / stack part growing.
I am using node v8 LTS (I know this is EOL from this year, I just need to know if there's a way to figure this out)
(V8 developer here.)
Code generated by V8 is on the heap, so if the heap isn't growing, that means that code isn't growing either.
The stack size is limited by the operating system, usually to 1-8 MB. Since operating systems simply kill processes that run into the stack limit, V8 imposes an even lower limit (a little less than a megabyte, I think it's 984KB currently) onto itself, and will throw a RangeError if that's ever exceeded. So a growing stack can't be your problem either.
Since you say that the heap memory reported by Node/V8 remains constant, that also means that most "how to debug memory leaks in Node" tutorials don't apply to your situation; and that probably also means that the leak is not in your (JavaScript) code.
That leaves C++ "heap memory" (which is very different from V8's managed "heap"!) as the most likely culprit. Node itself as well as native extensions can freely allocate memory that they manage themselves. Maybe something doesn't get cleaned up properly there. That could simply be an upstream bug; or it could be that something in your code is accidentally holding on to some embedder memory.
What I would try first is to update Node and any native extensions you have installed. Maybe the leak has already been found and fixed.
If that doesn't help, then you could try to investigate where the memory is going. For instance, you could compile everything from source with LSan enabled, and see if that reports anything. It would probably be helpful to construct a stress-test, e.g. a fake client that floods (a test instance of) your server with real-looking requests, to try to trigger inspectable instances of the leak in seconds or minutes rather than months. Crafting such a fake client might also help narrow down where things go wrong (e.g.: maybe you'll notice that one type of request does not trigger the leak but another type of request does).
Currently working on optimizing a library for speed. I've already reduced execution time drastically, using V8 CPU and Memory Profiling through Webstorm. This was achieved mainly by changing the core method from recursive to iterative.
Now the self time distribution breaks down as
I'm assuming the first entry "node" is timing internal functions calls, which is great. The other entries also make sense. I'm new to Nodejs profiling, but 31.6% for GC seems high, so I've decided to investigate.
I've now created a heap dump through Webstorm, but unfortunately that doesn't give me much information.
These seem to be system internal memory references mainly. Stepping through the core iteration code logic again, there also don't seem to be a lot of places where memory is explicitly allocated (using this as a reference).
Question
Can the GC overhead be reduced?
Is this amount of allocation just expected here?
Is it possible to get better memory profiling information?
Setup Instructions
In case someone want's to try debugging this, I'm including setup instructions.
Download or clone object-scan and run
yarn install --frozen-lockfile
yarn run test-simple --verbose
Now create a file test.js in the project root containing this content and run node --trace_gc test.js or run it through Webstorm for advanced profiling.
In Javascript and in v8 (node) particularly an amount of time spent for garbage collection depends on amount of data stored in heap, but that's only one of many factors.
In v8 engine there are two main "types" of GC: minor (scavenge) and major (mark-sweep/mark-compact). You may see GC types that happen during your tests in console with --trace-gc enabled. And in different cases one type could "eat" more time than other an vice versa. So before optimizations you should determine which gc takes more time.
There are not a lot of options for optimizing major GC, cause it highly affected by amount of data that stays in memory for "long" (actually in this case long means that object survives scavenge GC) period. Such data is stored in so called "old space" in heap. And major GC works with this space and it should scan all that memory and mark objects that no longer have any references for further clearance.
In your case the amount of test data you're loading goes to old space. As a result it affects major GC during the whole test. And in this case major GC will not clear too much, because you're using your test object, but it still consume time for scanning entire old space. So you may consider preventing v8 from doing that by launching node with gc-specific flags like: --nouse-idle-notification --expose-gc --gc_interval=100500 (where 100500 is number of allocation, it can be take high value that will prevent running gc before the whole test will pass) that will allow trigger garbage collections manually. Test your code using this approach and see how major GC affects it, try tests with different amount of data you provide to function. If the impact is quiet high you may try to refactor your code trying to minimize long-lived variables, closures, etc.
If you'll discover that major GC doesn't have much impact on performance, then scavenge GC takes the most of time. Unlike major GC it operates with so called "new space" in heap. It's a space where all new objects are stored. If those objects survive scavenge, then they are moved to old space. New space has much smaller size ( you may control it by setting --max_semi_space_size, note: new space size = 2 * semi space size) than old space and more new objects and variables you allocate more scavenge GC runs will happen. If this GC heats performance too much you may consider refactor your code to make less new allocations. But if you'll reuse variables it may also slowdown the performance and those objects will go to old space and may become a problem described in "major GC" section.
Also v8 GC doesn't always work in the same thread that your program runs. It does some work in background too, but I don't know what Webstorm shows in your case. If it counts just total time spend in GC, may be it just doesn't have so much impact.
You may find more details on v8 GC in this blog post.
TL;DR:
Can the GC overhead be reduced?
Yes, but first you should discover what should be optimized by following steps above.
Is this amount of allocation just expected here?
That's could be just discovered by comparing different approaches. There's no some absolute number that could limit "good" amount from "bad", because it depends on lot's of factors, including the amount on entry data.
Is it possible to get better memory profiling information?
You may find some good tools here, but in general you may use Chrome dev tools which could provide a bit more details rather than Webstorm does.
I have an application written in Node using many features such as the cluster module.
I need to know the memory usage of my app on a specific time, what I am thinking of is looping through the active workers and sum the output of all of them but I don't know if the output value will be correct. any one here can help me please?
In fact I can't seem also to know the true meanings of the three "rss","heapTotal","heapUsed" mean.. I googled it and what I found is what important to monitor is "heapTotal" & "heapUsed", is this correct?
RSS is the resident set size, the portion of the process's memory held in RAM (as opposed to the swap space or the part held in the filesystem).
The heap is the portion of memory from which newly allocated objects will come from (think of malloc in C, or new in JavaScript).
Good Tutorial
More about heap Wikipedia.
Happy Helping!
I am trying to wrap my head around the internals of mongodb, and I keep reading about this
http://www.theroadtosiliconvalley.com/technology/mongodb-mongo-nosql-db/
Why does this happen?
So the way memorry mapped files work is that the addresses in memory are mapped byte for byte with a file on disk. This makes it really fast and but really large. Imagine a file on disk for your data taking up that size of memory.
Why it's awesome
In practice, this rocks because writing and reading from memory directly instead of issuing a system call (think context switch) is fast. Also, in practice, the fact that this huge memory mapped chunk doesn't fit in your physical ram is fine. Why? You only need the working set of data to fit in ram because the non-used pages are not loaded and just kept on disk. If they are needed a page fault happens and it gets loaded up. (I believe the portion that has been loaded is referred to as resident memory)
Why it it kind of sucks
Files mapped in memory needs to be page aligned so if you don't use up the memory space on the page boundary exactly you waste space (small tradoff)
Summary (tldnr)
It may look like its taking up a lot of resources because its mapping the entirety of your data to memory addresses but it doesn't really matter as that data isn't actually all being held in RAM. Mongo will pull in data as it needs it and use memory effectively to maintain a performant working set.