Haskell - Resident memory much higher than profiled heap memory

Haskell - Resident memory much higher than profiled heap memory - haskell

I have a program which I am running with something like
stack exec -- foo +RTS -A256M -H256M -M1G -S -RTS
The -S flag causes a line like follows to print out
208797032 75072552 114619448 0.356 0.028 12.876 16.122 0 0 (Gen: 1)
This apparently means that the RTS thinks the program's live byte usage is around 100MB. However, htop reports that the program pegging over 1.4GB(!) of resident memory.
First of all, how come the resident memory is going past the max heap size that I set?
Second, what is causing this discrepancy and how can I keep the resident memory usage down?

Related

htop memory and ghc rts memory limit correlation

I am trying to test space leak.
I want to limit memory to get OOM faster.
I see in htop my process starts with RES=260m.
I set +RTS -M265m but process takes more than that (274m in htop) and nothing happens. Where is OOM?

How to increase memory usage to specific level for certain time duration

How can I reduce available free memory to certain level for specific time duration?
I've total of 16GB memory where free memory is more than 90%. I want to reduce free memory to 5% and have it in this state for about 120seconds
to check memory usage
vmstat -s
to reduce free memory
tail /dev/zero

You can use stress-ng.
This line command will use 90% of your memory.
stress-ng --vm-bytes $(awk '/MemAvailable/{printf "%d\n", $2 * 0.9;}' < /proc/meminfo)k --vm-keep -m 1
Change the 0.9 to any value (must remain as a decimal). It signifies the percentage.

Linux - Chrome memory calculations

I have used chrome with debug build on Linux for my experiment.
I ran chrome browser with debug build to run some web page, which is kind of loading lots of images.
To validate, i used "cat /proc/render-pid/statm" to check the memory.
To understand the allocation pattern, I have run with "valgrind --tool=massif --vgdb=yes --vgdb-error=0".
I see the RSS (2nd value in statm) increase from initial 133129 pages (before loading images) to 176562 pages(after loading images). And valgrind snapshot shows the memory allocated. (More than 140MB). If you calculate from statm its around 165M (Excluding Shared memory)
Next I freed up every image ( cleared up the internal caches ).
Now the Valgrind snapshot shows only 4MB of memory.
statm RSS value reduce to 161916 pages.
(I agree that Valgrind might take memory. But I have similar behavior in case of running alone. )
What I couldnt get is :
statm shows that total allocated in heap is around 165M, where as valgrind shows 140M
After freeing, valgrind shows around 4M where as statm shows increase in memory of 27 from the start of value.
So how to find where the memory is allocated??
NOTE:
valgrind doesnt show memory allocated via MMAP and SBRK (With the parameters which I passed) . But as per my understanding (With simple test program), the allocations via MMAP / SBRK will increase "Data" which 6th value in Statm output.
Hence I have ruled out that possibility.
Also i have not used "tc-malloc".
Since I am running chrome with valgrind, i see humongous amount of memory allocated. So generally I see only differences.
I calculated the private memory using RSS Memory - Shared Memory (2nd - 3rd value in statm output)
The values from statm output:
Initial Value : 370434 133129 9450 918 0 289639 0
Peak Value: 412588 175355 10145 918 0 331197 0
End Value (After freeing): 407473 161916 9573 918 0 326678 0

Understanding output of GHC's +RTS -t -RTS option

I'm benchmarking the memory consumption of a haskell programm compiled with GHC. In order to do so, I run the programm with the following command line arguments: +RTS -t -RTS. Here's an example output:
<<ghc: 86319295256 bytes, 160722 GCs, 53963869/75978648 avg/max bytes residency (386 samples), 191M in use, 0.00 INIT (0.00 elapsed), 152.69 MUT (152.62 elapsed), 58.85 GC (58.82 elapsed) :ghc>>.
According to the ghc manual, the output shows:
The total number of bytes allocated by the program over the whole run.
The total number of garbage collections performed.
The average and maximum "residency", which is the amount of live data in bytes. The runtime can only determine the amount of live data during a major GC, which is why the number of samples corresponds to the number of major GCs (and is usually relatively small).
The peak memory the RTS has allocated from the OS.
The amount of CPU time and elapsed wall clock time while initialising the runtime system (INIT), running the program itself (MUT, the mutator), and garbage collecting (GC).
Applied to my example, it means that my program shuffles 82321 MiB (bytes divided by 1024^2) around, performs 160722 garbage collections, has a 51MiB/72MiB average/maximum memory residency, allocates at most 191M memory in RAM and so on ...
Now I want to know, what »The average and maximum "residency", which is the amount of live data in bytes« is compared to »The peak memory the RTS has allocated from the OS«? And also: What uses the remaining space of roughly 120M?
I was pointed here for more information, but that does not state clearly, what I want to know. Another source (5.4.4 second item) hints that the 120M memory is used for garbage collection. But that is too vague – I need a quotable information source.
So please, is there anyone who could answer my questions with good sources as proofs?
Kind regards!

The "resident" size is how much live Haskell data you have. The amount of memory actually allocated from the OS may be higher.
The RTS allocates memory in "blocks". If your program needs 7.3 blocks of of RAM, the RTS has to allocate 8 blocks, 0.7 of which is empty space.
The default garbage collection algorithm is a 2-space collector. That is, when space A fills up, it allocates space B (which is totally empty) and copies all the live data out of space A and into space B, then deallocates space A. That means that, for a while, you're using 2x as much RAM as is actually necessary. (I believe there's a switch somewhere to use a 1-space algorithm which is slower but uses less RAM.)
There is also some overhead for managing threads (especially if you have lots), and there might be a few other things.
I don't know how much you already know about GC technology, but you can try reading these:
http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel-gc/par-gc-ismm08.pdf
http://www.mm-net.org.uk/workshop190404/GHC%27s_Garbage_Collector.ppt

malloc function in virtual memory

I read malloc function allocates memory on the heap, where heap resides in virtual memory in OS(Linux). So I have few doubts:
If process who is using memory allocation by malloc is terminated by either kill or exit itself without deallocating memory. Will that memory be deallocated by OS after termination as it is in virtual memory?
How can I know heap size in Linux OS?
How can I change heap size in Linux OS?

Will that memory be deallocated by the OS after the termination?
Yes it will, but I won't really call that deallocation(as in, no one will be calling free() after all your allocations); what happens is that the virtual address space assigned to your process (including the stack, the heap, the code, .bss, and any other segement) simply gets removed from the OS so any physical memory areas that were mapped to your process virtual memory will be usable by anyone else (without the need to swap in/out).
For more information about that, read this excellent article.
How can I know heap size in Linux OS?
ulimit -m
How can I change the heap size?
ulimit -S -m X (where X is the heap limit in kilo bytes)
For a more thorough explanation, visit this SO question.

The memory allocated to a process is freed when it gracefully or otherwise terminates. To set/check the heap size use ulimit:
ulimit -m # shows heap per process
ulimit -S -m 1000 # set heap size to 1000 * 1024 bytes
ulimit -S -m unlimited # unlimited heap size

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string