How much memory did Linux give to malloc()? - linux

This is a Linux system question, not a coding question. When I use "top" to check the memory usage of my program, it reports a value 3-4 times as large as the actual heap allocation as given by Valgrind's Massif, a memory profiler. It's a large program, and the difference is hundreds of megabytes. The Valgrind manual gives only a partial explanation:
(Massif) does not directly measure memory allocated with
lower-level system calls such as mmap, mremap, and brk.
Heap allocation functions such as malloc are built on top of these
system calls. For example, when needed, an allocator will typically
call mmap to allocate a large chunk of memory, and then hand over
pieces of that memory chunk to the client program in response to calls
to malloc et al. Massif directly measures only these higher-level
malloc et al calls, not the lower-level system calls.
Fine, but how much memory am I really taking away from the system? I need to be able to run as many instances of this program as possible on one machine, so I need to know how much of that memory is still available. Page alignment etc. cannot explain a difference of hundreds of megabytes in reported memory usage.
Also, what determines the block size of the underlying mmap() call? I'm seeing blocks of 64MB at a time being taken according to top, which seems bizarrely large.

Any malloc implementation will be optimised for applications with huge memory requirements, because apps with low requirements run just fine anyway, and virtual memory is cheap.
For example, you will find malloc implementations that use a block of memory for up to 1024 mallocs of up to 16 bytes, another block for up to 1024 mallocs of up to 32 bytes, and so on. With a few mallocs this is inefficient but still cheap. With gazillions of mallocs, it makes malloc very efficient.
So saying "4 times as much" can be completely pointless. Tell us how many megabytes more than you thought.

Related

Why does the Linux kernel require small short-term memory chunks in odd sizes?

I'm reading Operating System: Internals and Design Principles by William Stallings, 7th edition. In section 8.4 Linux Memory Management, when talking about kernel memory management, it goes like:
The foundation of kernel memory allocation for Linux is the page allocation
mechanism used for user virtual memory management. As in the virtual memory
scheme, a buddy algorithm is used so that memory for the kernel can be allocated
and deallocated in units of one or more pages. Because the minimum amount of
memory that can be allocated in this fashion is one page, the page allocator alone
would be inefficient because the kernel requires small short-term memory chunks
in odd sizes.
I could understand the discuss on paging, but why does the author says that the kernel requires small short-term memory chunks
in odd sizes., especially, why in odd sizes?
Because most programs require small allocations, for relatively short periods, in a variety of sizes? That's why malloc and friends exist: To subdivide the larger allocations from the OS into smaller pieces with sub-page-size granularity. Want a linked list (commonly needed in OS kernels)? You need to be able to allocate small nodes that contain the value and a pointer to the next node (and possibly a reverse pointer too).
I suspect by "odd sizes" they just mean "arbitrary sizes"; I don't expect the kernel to be unusually heavy on 1, 3, 5, 7, etc. byte allocations, but the allocation sizes are, in many cases, not likely to be consistent enough that a fixed block allocator is broadly applicable. Writing a special block allocator for each possible linked list node size (let alone every other possible size needed for dynamically allocated memory) isn't worth it unless that linked list is absolutely performance critical after all.

massif reported heap usage much less than VmRss, what could be wrong?

massif output:
time=3220706
mem_heap_B=393242041
mem_heap_extra_B=73912175
mem_stacks_B=93616
heap_tree=peak
process shows 1.2GB in VmRss, so the huge difference comes from where? (I saw Rss grows up continuously).
Per http://cs.swan.ac.uk/~csoliver/ok-sat-library/internet_html/doc/doc/Valgrind/3.8.1/html/ms-manual.html
Heap allocation functions such as malloc are built on top of these system calls. For example, when needed, an allocator will typically call mmap to allocate a large chunk of memory, and then hand over pieces of that memory chunk to the client program in response to calls to malloc et al. Massif directly measures only these higher-level malloc et al calls, not the lower-level system calls.
There is no way to guarantee RSS size based on massif output. With --pages-as-heap=yes option you maybe able to estimate VIRT size, but that is including everything that was mapped into memory, not necessary residing in RAM.
You may want to play with alloc-fn option, which may bring you closer to estimating real memory usage by manually specifying all "custom" memory allocation functions.
Valgrind can use significant memory for its own internal house keeping. So, it is normal to have massif reporting memory significantly less than the process size, as the process size includes the 'client/guest' memory + valgrind's own memory.
You can use the valgrind option --stats=yes to have more information about the memory used by the client versus the memory used by valgrind.

can we use too many malloc and free in c program

is it ok to call too many malloc & free in a program?
i have a program that does malloc and free for each record. Although it sounds bad, does it have performance issue if i use too many malloc and free ?
Most modern malloc(3) implementations work like a memory pool. Since most modern OSes treat memory with pages (usually 4KB size), a malloc will probably request at least 4KB from the OS.
Suppose you keep calling malloc with 32. In your first malloc, at least one new page is requested from the OS (via sbrk(2) on unix). The successive mallocs have nothing to do with the OS, they just return you the next free chunk of memory in the memory pool as long as memory is available. So, calling malloc many times is not a big deal, usually. The point here is that system calls (the communication between the user process and OS) are usually expensive and malloc tries its best to avoid as much as possible.
free is similar too. When you free memory, usually OS isn't notified about that. When a page is totally freed, the page may be returned to the OS. Some implementations do not return the page to the OS unless the process already holds many unused pages.
To sum it up, malloc and free are like generic memory managers working with arbitrary size. The problem you might face is that malloc is designed to work with arbitrary size allocations, which might be slower than a memory manager that's designed to work with fixed size allocations. If you're usually allocating the same types of memory, you might be better off with implementing your own memory pool. Another case would be that malloc calls involve locking/unlocking in most modern implementations to support multithreading. If you're working with a single thread, that might also be an overhead: another reason to implement your own memory pool.
You might also want to work with different malloc implementations, benchmark them and decide to go with either one. Starting with a clean implementation and stripping off unnecessary parts might also be a good idea here.
yes/no. Large volumes of malloc/free can cause the heap to be fragmented to the point where malloc can fail. It is less of an issue now that memory is pretty cheap.
There is some overhead in calling malloc, but not a lot. malloc basically has to go to through the heap and find a block of memory that is unused and large enough to hold the number of bytes you asked for, then it designates that block as used and tells the operating system to mmap it for you and returns a pointer to that block.
It's a few steps, but really not a lot of work for your computer. The difference between using malloc to get memory for you, and putting a variable on the stack is a handful of instructions, and a system call, and unless you're programming on an embedded system, you honestly shouldn't worry about it. You'll only take a real performance hit if you allocate so much memory that you actually run out of RAM (in which case your Virtual Memory Manager will have to move some things into the swap space to make more room - as it turns out, malloc never fails)!
Freeing memory is even easier than allocating it, and in the end it's better to free what you allocate (future malloc calls will be faster, more memory will be available).
In short, use malloc to your hearts content! Decades of advances in technology have worked hard to earn you that right, there's no sense squandering it!
By definition 'too many' is 'too many'.
But more seriously, on most systems heap allocation is reasonably fast - because its done a lot. Allocating space for a record each time its processed doesn't sound bad.
The real answer is : write your program and measure its speed, is it acceptable? If not then profile it and find where the bottlenecks are - my 10c says it wont be heap processing

libc memory management

How does libc communicate with the OS (e.g., a Linux kernel) to manage memory? Specifically, how does it allocate memory, and how does it release memory? Also, in what cases can it fail to allocate and deallocate, respectively?
That is very general question, but I want to speak to the failure to allocate. It's important to realize that memory is actually allocated by kernel upon first access. What you are doing when calling malloc/calloc/realloc is reserving some addresses inside the virtual address space of a process (via syscalls brk, mmap, etc. libc does that).
When I get malloc or similar to fail (or when libc get brk or mmap to fail), it's usually because I exhausted the virtual address space of a process. This happens when there is no continuous block of free address, an no room to expand an existing one. You can either exhaust all space available or hit a limit RLIMIT_AS. It's pretty common especially on 32bit systems when using multiple threads, because people sometimes forget that each thread needs it's own stack. Stacks usually consume several megabytes, which means you can create only few hundreds threads before you have no more free address space. Maybe an even more common reason for exhausted address space are memory leaks. Libc of course tries to reuse space on the heap (space obtained by a brk syscall) and tries to munmmap unneeded mappings. However, it can't reuse something that is not "deallocated".
The shortage of physical memory is not detectable from within a process (or libc which is part of the process) by failure to allocate. Yeah, you can hit "overcommitting limit", but that doesn't mean the physical memory is all taken. When free physical memory is low, kernel invokes special task called OOM killer (Out Of Memory Killer) which terminates some processes in order to free memory.
Regarding failure to deallocate, my guess is it doesn't happen unless you do something silly. I can imagine setting program break (end of heap) below it's original position (by a brk syscall). That is, of course, recipe for a disaster. Hopefully libc won't do that and it doesn't make much sense either. But it can be seen as failed deallocation. munmap can also fail if you supply some silly argument, but I can't think of regular reason for it to fail. That doesn't mean it doesn't exists. We would have to dig deep within source code of glibc/kernel to find out.
1) how does it allocate memory
libc provides malloc() to C programs.
Normally, malloc allocates memory from the heap, and adjusts the
size of the heap as required, using sbrk(2). When allocating blocks of
memory larger than MMAP_THRESHOLD bytes, the glibc malloc()
implementation allocates the memory as a private anonymous mapping
using mmap(2). MMAP_THRESHOLD is 128 kB by default, but is adjustable
using mallopt(3). Allocations performed using mmap(2) are unaffected
by the RLIMIT_DATA resource limit (see getrlimit(2)).
And this is about sbrk.
sbrk - change data segment size
2) in what cases can it fail to allocate
Also from malloc
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
And from proc
/proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting mode. Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
Mostly it uses the sbrk system call to adjust the size of the data segment, thereby reserving more memory for it to parcel out. Memory allocated in that way is generally not released back to the operating system because it is only possible to do it when the blocks available to be released are at the end of the data segment.
Larger blocks are sometime done by using mmap to allocate memory, and that memory can be released again with an munmap call.
How does libc communicate with the OS (e.g., a Linux kernel) to manage memory?
Through system calls - this is a low-level API that the kernel provides.
Specifically, how does it allocate memory, and how does it release memory?
Unix-like systems provide the "sbrk" syscall.
Also, in what cases can it fail to allocate and deallocate, respectively?
Allocation can fail, for example, when there's no enough available memory. Deallocation shall not fail.

How is memory allocated on heap without a system call?

I was wondering that if the space required on heap is not large enough
such that there is no need for a brk/sbrk system all (to shift the break pointer (brk) of data segment), how does a library function (such as malloc) allocates space on heap.
I am not asking about the data-structures and algorithms for heap management. I am just asking how does malloc get the address of the first location of the heap if it doesn't invoke a system call. I am asking this because I have heard that it is not always necessary to invoke a system call (brk/sbrk) as these are only required to expand the space.Please correct me if I am wrong.
The basic idea is that when your program starts, the heap is very small, but not necessarily zero. If you only allocate (malloc) a small amount of memory, the library is able to handle it within the small amount of space it has when it is loaded. However, when malloc runs out of that space, it needs to make a system call to get more memory.
That system call is often sbrk(), which moves the top of the heap's memory region up by a certain amount. Usually, the malloc library routine increases the heap by larger than what is needed for the current allocation, with the hope that future allocations can be performed w/o making a system call.
Other implementations of malloc use mmap() instead -- this allows the program to create a sparse virtual memory mapping. However, mmap() based malloc implementations do the same thing as the sbrk()-based ones: each system call reserves more memory than what is necessarily needed for the current call.
One way to look at this is to trace a program that uses malloc: you'll see that for N calls to malloc, you will see M system calls (where M is much smaller than N).
The short answer is that it uses sbrk() to allocate a big hunk, which at that point belongs to your app process. It can then further parcel out subsections of that as individual malloc calls without needing to ask the system for anything, until it exhausts that space and needs to resort sbrk() again.
You said you didn't want the details on the data structures, but suffice it to say that the implementation of malloc (i.e. your own process, not the OS kernel) is keeping track of which space in the region it got from the system is spoken for and which is still available to dole out as individual mallocs. It's like buying a big tract of land, then subdividing it into lots for individual houses.
Use sbrk() or mmap() — http://linux.die.net/man/2/sbrk, http://linux.die.net/man/2/mmap

Resources