I would like to compute the peak memory usage of my OCaml program when it is running in compiled form as native code. I considered using the stats API in the Gc module, but it seems to return a snapshot at the time it is called. Is there some information in the Gc module or some other module that I can use to get peak heap usage just before my program terminates?
You can get the current size of the major heap using Gc.stat, see the live_words field, multiply it by the word size in bytes to get the size in bytes (8 in a 64 bit system). It doesn't really matter but you can also add the size of the minor heap to the calculation, which is available via Gc.get () see the minor_heap_size field (again in words).
You can create an alarm with Gc.create_alarm to check the size of the heap after each major collection to get the maximum size ever used.
Related
I am trying to check if my firmware for an embedded intercom on a embedded linux board has some memory leakage.
Using the top command, I was surprised because the resident memory is always growing especially at the beginning when I start the process. After a while the momory consumption becomes more stable.
So I made an experiment by adding some code to allocate and free 2K memory block. What I found was that when I allocate the memory the first time the RES memory showed by the top command increases, but when I free it, the RES does not decreases.
In any case if I call malloc a second (or third,...) time to get a new 2K memory block, the RES does not increase anymore.
Even if I wait some time the RES memory never decreases.
It looks like top returns the maximum amount of memory used an not the memory used at a certain time.
is there an option for node.js to increase initial allocated memory?
https://futurestud.io/tutorials/node-js-increase-the-memory-limit-for-your-process
the --max-old-space-size seems to increase max memory but what about initial memory?
Kind of like xmx and xms for the JVM.
V8 developer here. The short answer is: no.
The reason no such option exists is that adding fresh pages to the heap is so fast that there is no significant benefit to doing it up front.
V8 does have a flag --initial-old-space-memory, but it doesn't increase the initial allocation. Instead, what it means is "don't bother doing (old-space) GC while the heap size is below this limit". If you set that to, e.g., 1000 (MB), and then allocate 800MB of unreachable objects, and then just wait, then V8 will sit around forever with 800MB of garbage on the heap and won't lift a finger to get rid of any of that.
I'm not sure in what scenario this behavior would be useful (it's not like it will turn off GC entirely; GC will just run less frequently, but fewer GCs on a bigger heap don't necessarily add up to less total time than more GCs on a smaller heap), so I would strongly recommend to measure the effect on your particular workload carefully before using this flag -- if it were a good idea to have this on by default, then it would be on by default!
If I had to guess: this flag might be beneficial if you know that (1) your application will have a large amount of "eternal" (=lives as long as the app is running) data on the heap, and (2) you can estimate the amount of that data with reasonable accuracy. E.g.: if you know that at any given time, your old-space will consist of 500MB of always-reachable-anyway data plus any potentially-freeable-garbage, you could use this flag to tell V8 "if old-space size is below 600MB (=500MB plus a little), then don't bother trying to find garbage, it won't be worth the effort".
As per this page, for versions 2.6 and 2.7 (http://www.timestored.com/kdb-guides/memory-management)
2.6 Unreferenced memory blocks over 32MB/64MB are returned immediately
2.7 Unreferenced memory blocks returned when memory full or .Q.gc[] called
But in both versions, there is a significant difference between used and heap space shown by .Q.w[]. This difference only grows as I run the function again. In 2.6, a difference could occur due to fragmentation (allocating many small objects) but I am not confident it accounts for this big of a difference. In 2.7, even after running .Q.gc[], it shows a significant difference. I would like to understand fundamentally the reason for this difference in the two versions as highlighted below.
This is behavior I am seeing in 2.6 and 2.7:
2.6:
used| 11442889952
heap| 28588376064
2.7 (after running .Q.gc[])
used| 11398025856
heap| 16508780544
Automatic Garbage collection doesn't clear small objects (<32MB) . In that case manual GC call is required. If your process has lot of unreferenced small objects then that will add up in heap size and not to used size.
Second, since KDB allocates memory in power of 2, that makes the difference between used and heap memory. For ex. if a vector requires 64000 bytes, it will be assigned a memory block of size 2^16 = 65536 bytes. And boundary cases makes this difference huge, for ex. if vector requires 33000 bytes (just over 2^15) it will be allocated 65536 bytes (2^16).
Following site has good explanation of GC behavior:
http://www.aquaq.co.uk/q/garbage-collection-kdb/
I'm benchmarking the memory consumption of a haskell programm compiled with GHC. In order to do so, I run the programm with the following command line arguments: +RTS -t -RTS. Here's an example output:
<<ghc: 86319295256 bytes, 160722 GCs, 53963869/75978648 avg/max bytes residency (386 samples), 191M in use, 0.00 INIT (0.00 elapsed), 152.69 MUT (152.62 elapsed), 58.85 GC (58.82 elapsed) :ghc>>.
According to the ghc manual, the output shows:
The total number of bytes allocated by the program over the whole run.
The total number of garbage collections performed.
The average and maximum "residency", which is the amount of live data in bytes. The runtime can only determine the amount of live data during a major GC, which is why the number of samples corresponds to the number of major GCs (and is usually relatively small).
The peak memory the RTS has allocated from the OS.
The amount of CPU time and elapsed wall clock time while initialising the runtime system (INIT), running the program itself (MUT, the mutator), and garbage collecting (GC).
Applied to my example, it means that my program shuffles 82321 MiB (bytes divided by 1024^2) around, performs 160722 garbage collections, has a 51MiB/72MiB average/maximum memory residency, allocates at most 191M memory in RAM and so on ...
Now I want to know, what »The average and maximum "residency", which is the amount of live data in bytes« is compared to »The peak memory the RTS has allocated from the OS«? And also: What uses the remaining space of roughly 120M?
I was pointed here for more information, but that does not state clearly, what I want to know. Another source (5.4.4 second item) hints that the 120M memory is used for garbage collection. But that is too vague – I need a quotable information source.
So please, is there anyone who could answer my questions with good sources as proofs?
Kind regards!
The "resident" size is how much live Haskell data you have. The amount of memory actually allocated from the OS may be higher.
The RTS allocates memory in "blocks". If your program needs 7.3 blocks of of RAM, the RTS has to allocate 8 blocks, 0.7 of which is empty space.
The default garbage collection algorithm is a 2-space collector. That is, when space A fills up, it allocates space B (which is totally empty) and copies all the live data out of space A and into space B, then deallocates space A. That means that, for a while, you're using 2x as much RAM as is actually necessary. (I believe there's a switch somewhere to use a 1-space algorithm which is slower but uses less RAM.)
There is also some overhead for managing threads (especially if you have lots), and there might be a few other things.
I don't know how much you already know about GC technology, but you can try reading these:
http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel-gc/par-gc-ismm08.pdf
http://www.mm-net.org.uk/workshop190404/GHC%27s_Garbage_Collector.ppt
I've got a node.js application where the RSS memory usage seems to keep growing despite the heapUsed/heapTotal staying relatively constant.
Here's a graph of the three memory measurements taken over a week (from process.memoryUsage()):
You may note that there's a somewhat cyclical pattern - this corresponds with the application's activity throughout each day.
There actually does seem to be a slight growth in the heap, although it's nowhere near that of the RSS growth. So I've been taking heap dumps every now and then (using node-heapdump), and using Chrome's heap compare feature to find leaks.
One such comparison might look like the following (sorted by size delta in descending order):
What actually shows up does depend on when the snapshot was taken (eg sometimes more Buffer objects are allocated etc) - here I've tried to take a sample which demonstrates the issue best.
First thing to note is that the sizes on the left side (203MB vs 345MB) are much higher than heap sizes shown in the graph. Secondly, the size deltas clearly don't match up with the 142MB difference. In fact, sorting by size delta in ascending order, many objects have be deallocated, which means that the heap should be smaller!
Does anyone have any idea on:
why is this the case? (RSS constantly growing with stable heap size)
how can I stop this from happening, short of restarting the server every now and then?
Other details:
Node version: 0.10.28
OS: Ubuntu 12.04, 64-bit
Update: list of modules being used:
async v0.2.6
log4js v0.6.2
mysql v2.0.0-alpha7
nodemailer v0.4.4
node-time v0.9.2 (for timezone info, not to be confused with nodetime)
sockjs v0.3.8
underscore v1.4.4
usage v0.3.9 (for CPU stats, not used for memory usage)
webkit-devtools-agent v0.2.3 (loaded but not activated)
heapdump v0.2.0 is loaded when a dump is made.
Thanks for reading.
The difference you see between RSS usage and heap usage are buffers.
"A Buffer is similar to an array of integers but corresponds to a raw memory allocation outside the V8 heap"
https://nodejs.org/api/buffer.html#buffer_buffer