massif reported heap usage much less than VmRss, what could be wrong? - linux

massif output:
time=3220706
mem_heap_B=393242041
mem_heap_extra_B=73912175
mem_stacks_B=93616
heap_tree=peak
process shows 1.2GB in VmRss, so the huge difference comes from where? (I saw Rss grows up continuously).

Per http://cs.swan.ac.uk/~csoliver/ok-sat-library/internet_html/doc/doc/Valgrind/3.8.1/html/ms-manual.html
Heap allocation functions such as malloc are built on top of these system calls. For example, when needed, an allocator will typically call mmap to allocate a large chunk of memory, and then hand over pieces of that memory chunk to the client program in response to calls to malloc et al. Massif directly measures only these higher-level malloc et al calls, not the lower-level system calls.
There is no way to guarantee RSS size based on massif output. With --pages-as-heap=yes option you maybe able to estimate VIRT size, but that is including everything that was mapped into memory, not necessary residing in RAM.
You may want to play with alloc-fn option, which may bring you closer to estimating real memory usage by manually specifying all "custom" memory allocation functions.

Valgrind can use significant memory for its own internal house keeping. So, it is normal to have massif reporting memory significantly less than the process size, as the process size includes the 'client/guest' memory + valgrind's own memory.
You can use the valgrind option --stats=yes to have more information about the memory used by the client versus the memory used by valgrind.

Related

Linux: Tracking allocated memory over time

I want to generate a graph of the allocated memory for a particular PID over time for which I am currently using a custom script that uses an strace log. From the strace log, I am aggregating the memory allocation changes from mmap, munmap, and, brk system calls.
I was wondering, however, if there is a better and more matured solution to do this (measure/graph the lifetime of memory allocations for a process)
I believe what you are looking for is a tool called massif visualizer (a part of Valgrind) which allows you to view graphed memory allocation for specific processes over time and is still actively maintained.

GNU malloc_info(): get really allocated memory?

I'm trying to investigate memory use of a large multi-threaded server. According to mallinfo(), I get arena=350M and fordblks=290M, which suggests most of the space is actually wasted inside malloc(). The malloc_info() function gives a nice XML data structure that is supposed to be self-explanatory. Still, can someone explain to me
Is heap 0 special? Is the that main arena in which all others reside?
Are the <size from=.../> chunk allocated, free/available or both?
What is the <system> element? Memory allocated using mmap()/sbrk()?
What is the <aspace> element? Available memory?
What about <aspace type="mprotect" .../>?
Just for a start, I'd like to be able to plot total memory allocated by the application, i.e., everything allocated and not yet freed, according to what malloc() thinks.
A large amount of virtual memory usage is not necessarily a problem. The default malloc implementation will allocate large amounts of storage per thread in order to avoid contention issues. This happens particularly on 64-bit implementations which are pretty common nowadays. You should not worry unless you experience problems with the size of resident memory or you get paging problems.
Kevin Grigorenko has written a number of blog posts which deal with memory usage in relation to WebSphere, but they are applicable to any large multi-threaded process.

How to determine the real memory usage of a single process?

How can I calculate the real memory usage of a single process? I am not talking about the virtual memory, because it just keeps growing. For instance, there are proc files like smaps, where you can get the mappings of a process. But this is virtual memory and the values of that file just keeps growing for running process. But I would like to reflect the real memory usage of a process. E.g. if you plot the memory usage of a process it should represent the allocations of memory and also the freeing of memory. So the plot should be like an up and down movement instead of a linear function, that just keeps growing for a running process.
So, how could I calculate the real memory usage? I would appreciate any helpful answer.
It's actually kind of a complicated question. The two most common metrics for a program's memory usage at the OS level are virtual size and resident set size. (These show in the output of ps -u as the VSZ and RSS columns.) Roughly speaking, these tell the total memory the program has assigned to it, versus how much it is currently actively using.
Further complicating the question is that when you use malloc (or the C++ new operator) to allocate memory, memory is allocated from a pool in your process which is built by occasionally requesting an allocation of memory from the operating system. But when you free memory, the memory goes back into this pool, but it is typically not returned to the OS. So as your program allocates and frees memory, you typically will not see its memory footprint go up and down. (However, if it frees a lot of memory and then doesn't allocate it any more, eventually you may see its rss go down.)

How much memory did Linux give to malloc()?

This is a Linux system question, not a coding question. When I use "top" to check the memory usage of my program, it reports a value 3-4 times as large as the actual heap allocation as given by Valgrind's Massif, a memory profiler. It's a large program, and the difference is hundreds of megabytes. The Valgrind manual gives only a partial explanation:
(Massif) does not directly measure memory allocated with
lower-level system calls such as mmap, mremap, and brk.
Heap allocation functions such as malloc are built on top of these
system calls. For example, when needed, an allocator will typically
call mmap to allocate a large chunk of memory, and then hand over
pieces of that memory chunk to the client program in response to calls
to malloc et al. Massif directly measures only these higher-level
malloc et al calls, not the lower-level system calls.
Fine, but how much memory am I really taking away from the system? I need to be able to run as many instances of this program as possible on one machine, so I need to know how much of that memory is still available. Page alignment etc. cannot explain a difference of hundreds of megabytes in reported memory usage.
Also, what determines the block size of the underlying mmap() call? I'm seeing blocks of 64MB at a time being taken according to top, which seems bizarrely large.
Any malloc implementation will be optimised for applications with huge memory requirements, because apps with low requirements run just fine anyway, and virtual memory is cheap.
For example, you will find malloc implementations that use a block of memory for up to 1024 mallocs of up to 16 bytes, another block for up to 1024 mallocs of up to 32 bytes, and so on. With a few mallocs this is inefficient but still cheap. With gazillions of mallocs, it makes malloc very efficient.
So saying "4 times as much" can be completely pointless. Tell us how many megabytes more than you thought.

How is memory allocated on heap without a system call?

I was wondering that if the space required on heap is not large enough
such that there is no need for a brk/sbrk system all (to shift the break pointer (brk) of data segment), how does a library function (such as malloc) allocates space on heap.
I am not asking about the data-structures and algorithms for heap management. I am just asking how does malloc get the address of the first location of the heap if it doesn't invoke a system call. I am asking this because I have heard that it is not always necessary to invoke a system call (brk/sbrk) as these are only required to expand the space.Please correct me if I am wrong.
The basic idea is that when your program starts, the heap is very small, but not necessarily zero. If you only allocate (malloc) a small amount of memory, the library is able to handle it within the small amount of space it has when it is loaded. However, when malloc runs out of that space, it needs to make a system call to get more memory.
That system call is often sbrk(), which moves the top of the heap's memory region up by a certain amount. Usually, the malloc library routine increases the heap by larger than what is needed for the current allocation, with the hope that future allocations can be performed w/o making a system call.
Other implementations of malloc use mmap() instead -- this allows the program to create a sparse virtual memory mapping. However, mmap() based malloc implementations do the same thing as the sbrk()-based ones: each system call reserves more memory than what is necessarily needed for the current call.
One way to look at this is to trace a program that uses malloc: you'll see that for N calls to malloc, you will see M system calls (where M is much smaller than N).
The short answer is that it uses sbrk() to allocate a big hunk, which at that point belongs to your app process. It can then further parcel out subsections of that as individual malloc calls without needing to ask the system for anything, until it exhausts that space and needs to resort sbrk() again.
You said you didn't want the details on the data structures, but suffice it to say that the implementation of malloc (i.e. your own process, not the OS kernel) is keeping track of which space in the region it got from the system is spoken for and which is still available to dole out as individual mallocs. It's like buying a big tract of land, then subdividing it into lots for individual houses.
Use sbrk() or mmap() — http://linux.die.net/man/2/sbrk, http://linux.die.net/man/2/mmap

Resources