When malloc_trim is (was) called automatically from free in glibc's ptmalloc or dlmalloc? - malloc

There is incorrectly documented function malloc_trim in glibc malloc (ptmalloc2), added in 1995 by Doug Lea and Wolfram Gloger (dlmalloc 2.5.4).
In glibc this function can return some memory freed by application back to operating system using negative sbrk for heap trimming, madvise(...MADV_DONTNEED) for unused pages in the middle of the heaps (this feature is in malloc_trim since 2007 - glibc 2.9, but not in systrim), and probably not by trimming additional thread arenas (which are not sbrk-allocated by mmaped as separate heaps).
Such function can be very useful for some long-running C++ daemons, doing a lot of threads with very high number of mixed size allocations, both tiny, small, middle and large in required size. These daemons may have almost constant amount of live (allocated by malloc and not yet freed) memory, but at the same time grow in RSS (physical memory consumption) over time. But not every daemon can call this function, and not every author of such daemons know that he should call this function periodically.
Man page of malloc_trim by Kerrisk http://man7.org/linux/man-pages/man3/malloc_trim.3.html (2012) says in Notes about automatic calls to malloc_trim from other parts of glibc malloc:
This function is automatically called by free(3) in certain circumstances.
(Also in http://www.linuxjournal.com/node/6390/print Advanced Memory Allocation, 2003 - Automatic trimming is done inside the free() function by calling memory_trim())
This note is probably from source code of malloc/malloc.c which mentions in several places that malloc_trim is called automatically, probably from the free():
736 M_TRIM_THRESHOLD is the maximum amount of unused top-most memory
737 to keep before releasing **via malloc_trim in free().**
809 * When malloc_trim is called automatically from free(),
810 it is used as the `pad' argument.
2722 systrim ... is also called by the public malloc_trim routine.
But according to grep for malloc_trim in glibc's malloc: http://code.metager.de/source/search?q=malloc_trim&path=%2Fgnu%2Fglibc%2Fmalloc%2F&project=gnu there are: declaration, definition and two calls of it from tst-trim1.c (test, not part of malloc). Same result for whole glibc grep (additionally it was listed in abilists). Actual implementation of malloc_trim is mtrim() but it is called only from __malloc_trim() of malloc.c.
So the question is: When malloc_trim or its internal implementation (mtrim/mTRIm) is called by glibc, is it called from free or malloc or malloc_consolidate, or any other function?
If this call is lost in current version of glibc, was it there in any earlier version of glibc, in any version of ptmalloc2, or in original dlmalloc code (http://g.oswego.edu/dl/html/malloc.html)? When and why it was removed? (What is the difference between systrim/sys_trim and malloc_trim?)

Related

Does Linux guarantee freeing malloc'd unfreed memory on program exit?

I used to believe it does for certain but... I can't find it explicitly stated.
man 3 exit and man 2 _exit verbosely specify the effects of process termination, but don't mention memory leaks.
Posix comes closer: it mentions this:
Memory mappings that were created in the process shall be unmapped before the process is destroyed.
[TYM] [Option Start] Any blocks of typed memory that were mapped in the calling process shall be unmapped, as if munmap() was implicitly called to unmap them. [Option End]
Intermixing this with man 3 malloc:
Normally, malloc() allocates memory from the heap, and adjusts the size of the heap as required, using sbrk(2). When allocating blocks of memory larger than MMAP_THRESHOLD bytes, the glibc malloc() implementation allocates the memory as a private anonymous mapping using mmap(2).
So we could conclude that if malloc called mmap then process termination might make a corresponding call to munmap, BUT... (a) this "optional feature" tags in this POSIX specification are kind of worrying, (b) This is mmap but what about sbrk? (c) Linux isn't 100% POSIX conformant so I'm uncertain if intermixing Linux docu with Posix specs is mandated
The reason I'm asking is this... When a library call fails, am I allowed to just exit?
if(somecall() == -1) {
error(EXIT_FAILURE, errno, "Big fat nasty error.\n");
}
Or do I have to go up the stack making sure everything all the way up to main() is free()'d and only call exit or error in main()?
The former is so much simpler. But to feel easy going with the former, I'd like to find it in the docs explicitly mentioned that this is not an error and that doing this is safe. As I said, the fact that the docs care to explicitly mention a number of guarantees of what will surely be cleaned up BUT fail to mention this specific guarantee unsettles me. (Isn't it the most common and most obvious case? Wouldn't this be mentioned in the first place?)
This "freeing" is done at kernel level. So you are unlikely to find anything direct in POSIX API or C specifications as virtual memory is well "below" them. So you would hardly find anything relevant - let alone guarantees.
On Linux, the kernel reclaims the memory on process exit (both sbrk and mmap), which is guaranteed. See source code of mm.
When a library call fails, am I allowed to just exit?
Yes. This is fine to do.
However, note that there may be other considerations you need to think of, such uncleaned temporary files, open database/network connections and so on. E.g., if your program leaves a database connection open and exits, the server side may not know when to close the connection.
You can read more about Virtual Memory Manager (it's based on older kernel but the idea is still applicable).
I am certain that the POSIX committee intended that all memory allocated by malloc should be deallocated as one of the "consequences of process termination" listed in the specification of _exit that you linked to. I can also tell you that in practice every implementation of Unix I've ever used has done this.
However, I think you have found a genuine lacuna in the specifications. POSIX doesn't say anything about memory allocated by sbrk, because it doesn't specify sbrk at all. Its specification for malloc is taken more-or-less verbatim from the C standard, and the C standard intentionally doesn't say that all memory allocated by malloc should be deallocated upon "normal termination" because there exist embedded environments that don't do that. And, as you point out, "memory mappings that were created in the process" could be read to apply only to allocations made directly by mmap, shmat, and similar. It might be worth filing an interpretation request with the Austin Group.

Where does my allocated memory actually start from when i use brk system call

I am trying to allocate some memory using sys_brk in NASM/x86 assembly. sys_break returns the new address of break, which is the end of the data segment right? So where does my newly allocated memory reside? I assumed that it is in between the old break value and the new break value. So if I allocate 64bytes of memory with sys_brk i can use the next 64 bytes starting from the old break value that i stored before calling sys_brk. Am I right?
My Assembly code that will allocate memory will look somewhat like this.https://gist.github.com/nikAizuddin/f4132721126257ec4345
And another side question is;
I am supposed to write a function in Assembly that returns the pointer to the dynamically allocated memory and that function will be called from a C program. How can i free this block of memory from C side of my program? Would just calling free() be enough?
The brk(2) man page (section: C library/kernel ABI differences) describes how the glibc wrapper is implemented on top of Linux's system call, which returns the new brk on success, or the old brk on failure.
As I understand it, memory beyond the current break is unmapped. Addresses below the current break are part of the data segment (in the sense of data+bss+heap). The docs aren't clear on whether the break has to be page-aligned. (i.e. can you sbrk(64), or only sbrk(4096)?) If ASLR is enabled, the initial break will be some random distance past the end of the BSS.
See: What does the brk() system call do? An answer on that question has an example of using sbrk to replace malloc for code-golf. So yes, the old break is the address to return. And apparently you can sbrk any increment you want, not just pages.
You're the one writing the memory allocator. sbrk just lets you get more from the OS, like mmap(MAP_ANONYMOUS) but less flexible. It doesn't help you keep track of free blocks so you can use them for future allocations instead of always getting more from the OS.
The way to give back memory you got with sbrk is by calling sbrk with a negative argument. Obviously this requires a last-in-first-out usage pattern, which is why glibc's malloc only uses sbrk for small allocations (that can be put on the free-list when freed, to be handed out for future mallocs). Big allocations are best returned to the OS right away, instead of being kept mapped, so glibc's malloc uses mmap for those.
Never call free(3) on memory you didn't get from malloc(3) (or an associated function, like strdup(3), that says in the docs you can and should free(3) the memeory.) IDK what would happen if you called munmap on a page of memory below the program break. Probably it would just work, but then you'd have a hole in your data segment that could cause problems if the break ever decreased to there.
In assembly, the Linux brk system call takes an address where you want to set the break. As the man page notes, it either returns that for success, or returns the old break on failure, never a -errno code like -ENOMEM.
See Assembly x86 brk() call use for an x86-64 example.
The POSIX API where you can use positive or negative integer offsets is something you can implement by always calling twice, or like glibc keeping track of the current break in a global variable. To init that variable, use brk once with a requested address of 0, which will fail, as shown in the strace output below.
This is similar to what you'd do with the POSIX API, calling sbrk with increment = 0.
This is what glibc's malloc(3) does internally:
$ strace -e brk ls 2>&1 | m
brk(0) = 0x650000
brk(0) = 0x650000
brk(0x671000) = 0x671000
The brk man page mentions end(3). Apparently there are globals which are located at the end of the text, data, and bss segments. However, &end is only "somewhere near" the program break, which is why malloc still has to make a system call to get the initial break. IDK why there's a redundant brk(0). These are raw system calls, not library function calls, so an sbrk(0) probably doesn't explain it.

libc memory management

How does libc communicate with the OS (e.g., a Linux kernel) to manage memory? Specifically, how does it allocate memory, and how does it release memory? Also, in what cases can it fail to allocate and deallocate, respectively?
That is very general question, but I want to speak to the failure to allocate. It's important to realize that memory is actually allocated by kernel upon first access. What you are doing when calling malloc/calloc/realloc is reserving some addresses inside the virtual address space of a process (via syscalls brk, mmap, etc. libc does that).
When I get malloc or similar to fail (or when libc get brk or mmap to fail), it's usually because I exhausted the virtual address space of a process. This happens when there is no continuous block of free address, an no room to expand an existing one. You can either exhaust all space available or hit a limit RLIMIT_AS. It's pretty common especially on 32bit systems when using multiple threads, because people sometimes forget that each thread needs it's own stack. Stacks usually consume several megabytes, which means you can create only few hundreds threads before you have no more free address space. Maybe an even more common reason for exhausted address space are memory leaks. Libc of course tries to reuse space on the heap (space obtained by a brk syscall) and tries to munmmap unneeded mappings. However, it can't reuse something that is not "deallocated".
The shortage of physical memory is not detectable from within a process (or libc which is part of the process) by failure to allocate. Yeah, you can hit "overcommitting limit", but that doesn't mean the physical memory is all taken. When free physical memory is low, kernel invokes special task called OOM killer (Out Of Memory Killer) which terminates some processes in order to free memory.
Regarding failure to deallocate, my guess is it doesn't happen unless you do something silly. I can imagine setting program break (end of heap) below it's original position (by a brk syscall). That is, of course, recipe for a disaster. Hopefully libc won't do that and it doesn't make much sense either. But it can be seen as failed deallocation. munmap can also fail if you supply some silly argument, but I can't think of regular reason for it to fail. That doesn't mean it doesn't exists. We would have to dig deep within source code of glibc/kernel to find out.
1) how does it allocate memory
libc provides malloc() to C programs.
Normally, malloc allocates memory from the heap, and adjusts the
size of the heap as required, using sbrk(2). When allocating blocks of
memory larger than MMAP_THRESHOLD bytes, the glibc malloc()
implementation allocates the memory as a private anonymous mapping
using mmap(2). MMAP_THRESHOLD is 128 kB by default, but is adjustable
using mallopt(3). Allocations performed using mmap(2) are unaffected
by the RLIMIT_DATA resource limit (see getrlimit(2)).
And this is about sbrk.
sbrk - change data segment size
2) in what cases can it fail to allocate
Also from malloc
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
And from proc
/proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting mode. Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
Mostly it uses the sbrk system call to adjust the size of the data segment, thereby reserving more memory for it to parcel out. Memory allocated in that way is generally not released back to the operating system because it is only possible to do it when the blocks available to be released are at the end of the data segment.
Larger blocks are sometime done by using mmap to allocate memory, and that memory can be released again with an munmap call.
How does libc communicate with the OS (e.g., a Linux kernel) to manage memory?
Through system calls - this is a low-level API that the kernel provides.
Specifically, how does it allocate memory, and how does it release memory?
Unix-like systems provide the "sbrk" syscall.
Also, in what cases can it fail to allocate and deallocate, respectively?
Allocation can fail, for example, when there's no enough available memory. Deallocation shall not fail.

How is memory allocated on heap without a system call?

I was wondering that if the space required on heap is not large enough
such that there is no need for a brk/sbrk system all (to shift the break pointer (brk) of data segment), how does a library function (such as malloc) allocates space on heap.
I am not asking about the data-structures and algorithms for heap management. I am just asking how does malloc get the address of the first location of the heap if it doesn't invoke a system call. I am asking this because I have heard that it is not always necessary to invoke a system call (brk/sbrk) as these are only required to expand the space.Please correct me if I am wrong.
The basic idea is that when your program starts, the heap is very small, but not necessarily zero. If you only allocate (malloc) a small amount of memory, the library is able to handle it within the small amount of space it has when it is loaded. However, when malloc runs out of that space, it needs to make a system call to get more memory.
That system call is often sbrk(), which moves the top of the heap's memory region up by a certain amount. Usually, the malloc library routine increases the heap by larger than what is needed for the current allocation, with the hope that future allocations can be performed w/o making a system call.
Other implementations of malloc use mmap() instead -- this allows the program to create a sparse virtual memory mapping. However, mmap() based malloc implementations do the same thing as the sbrk()-based ones: each system call reserves more memory than what is necessarily needed for the current call.
One way to look at this is to trace a program that uses malloc: you'll see that for N calls to malloc, you will see M system calls (where M is much smaller than N).
The short answer is that it uses sbrk() to allocate a big hunk, which at that point belongs to your app process. It can then further parcel out subsections of that as individual malloc calls without needing to ask the system for anything, until it exhausts that space and needs to resort sbrk() again.
You said you didn't want the details on the data structures, but suffice it to say that the implementation of malloc (i.e. your own process, not the OS kernel) is keeping track of which space in the region it got from the system is spoken for and which is still available to dole out as individual mallocs. It's like buying a big tract of land, then subdividing it into lots for individual houses.
Use sbrk() or mmap() — http://linux.die.net/man/2/sbrk, http://linux.die.net/man/2/mmap

Will malloc implementations return free-ed memory back to the system?

I have a long-living application with frequent memory allocation-deallocation. Will any malloc implementation return freed memory back to the system?
What is, in this respect, the behavior of:
ptmalloc 1, 2 (glibc default) or 3
dlmalloc
tcmalloc (google threaded malloc)
solaris 10-11 default malloc and mtmalloc
FreeBSD 8 default malloc (jemalloc)
Hoard malloc?
Update
If I have an application whose memory consumption can be very different in daytime and nighttime (e.g.), can I force any of malloc's to return freed memory to the system?
Without such return freed memory will be swapped out and in many times, but such memory contains only garbage.
The following analysis applies only to glibc (based on the ptmalloc2 algorithm).
There are certain options that seem helpful to return the freed memory back to the system:
mallopt() (defined in malloc.h) does provide an option to set the trim threshold value using one of the parameter option M_TRIM_THRESHOLD, this indicates the minimum amount of free memory (in bytes) allowed at the top of the data segment. If the amount falls below this threshold, glibc invokes brk() to give back memory to the kernel.
The default value of M_TRIM_THRESHOLD in Linux is set to 128K, setting a smaller value might save space.
The same behavior could be achieved by setting trim threshold value in the environment variable MALLOC_TRIM_THRESHOLD_, with no source changes absolutely.
However, preliminary test programs run using M_TRIM_THRESHOLD has shown that even though the memory allocated by malloc does return to the system, the remaining portion of the actual chunk of memory (the arena) initially requested via brk() tends to be retained.
It is possible to trim the memory arena and give any unused memory back to the system by calling malloc_trim(pad) (defined in malloc.h). This function resizes the data segment, leaving at least pad bytes at the end of it and failing if less than one page worth of bytes can be freed. Segment size is always a multiple of one page, which is 4,096 bytes on i386.
The implementation of this modified behavior of free() using malloc_trim could be done using the malloc hook functionality. This would not require any source code changes to the core glibc library.
Using madvise() system call inside the free implementation of glibc.
Most implementations don't bother identifying those (relatively rare) cases where entire "blocks" (of whatever size suits the OS) have been freed and could be returned, but there are of course exceptions. For example, and I quote from the wikipedia page, in OpenBSD:
On a call to free, memory is released
and unmapped from the process address
space using munmap. This system is
designed to improve security by taking
advantage of the address space layout
randomization and gap page features
implemented as part of OpenBSD's mmap
system call, and to detect
use-after-free bugs—as a large memory
allocation is completely unmapped
after it is freed, further use causes
a segmentation fault and termination
of the program.
Most systems are not as security-focused as OpenBSD, though.
Knowing this, when I'm coding a long-running system that has a known-to-be-transitory requirement for a large amount of memory, I always try to fork the process: the parent then just waits for results from the child [[typically on a pipe]], the child does the computation (including memory allocation), returns the results [[on said pipe]], then terminates. This way, my long-running process won't be uselessly hogging memory during the long times between occasional "spikes" in its demand for memory. Other alternative strategies include switching to a custom memory allocator for such special requirements (C++ makes it reasonably easy, though languages with virtual machines underneath such as Java and Python typically don't).
I had a similar problem in my app, after some investigation I noticed that for some reason glibc does not return memory to the system when allocated objects are small (in my case less than 120 bytes).
Look at this code:
#include <list>
#include <malloc.h>
template<size_t s> class x{char x[s];};
int main(int argc,char** argv){
typedef x<100> X;
std::list<X> lx;
for(size_t i = 0; i < 500000;++i){
lx.push_back(X());
}
lx.clear();
malloc_stats();
return 0;
}
Program output:
Arena 0:
system bytes = 64069632
in use bytes = 0
Total (incl. mmap):
system bytes = 64069632
in use bytes = 0
max mmap regions = 0
max mmap bytes = 0
about 64 MB are not return to system. When I changed typedef to:
typedef x<110> X; program output looks like this:
Arena 0:
system bytes = 135168
in use bytes = 0
Total (incl. mmap):
system bytes = 135168
in use bytes = 0
max mmap regions = 0
max mmap bytes = 0
almost all memory was freed. I also noticed that using malloc_trim(0) in either case released memory to system.
Here is output after adding malloc_trim to the code above:
Arena 0:
system bytes = 4096
in use bytes = 0
Total (incl. mmap):
system bytes = 4096
in use bytes = 0
max mmap regions = 0
max mmap bytes = 0
I am dealing with the same problem as the OP. So far, it seems possible with tcmalloc. I found two solutions:
compile your program with tcmalloc linked, then launch it as :
env TCMALLOC_RELEASE=100 ./my_pthread_soft
the documentation mentions that
Reasonable rates are in the range [0,10].
but 10 doesn't seem enough for me (i.e I see no change).
find somewhere in your code where it would be interesting to release all the freed memory, and then add this code:
#include "google/malloc_extension_c.h" // C include
#include "google/malloc_extension.h" // C++ include
/* ... */
MallocExtension_ReleaseFreeMemory();
The second solution has been very effective in my case; the first would be great but it isn't very successful, it is complicated to find the right number for example.
Of the ones you list, only Hoard will return memory to the system... but if it can actually do that will depend a lot on your program's allocation behaviour.
The short answer: To force malloc subsystem to return memory to OS, use malloc_trim(). Otherwise, behavior of returning memory is implementation dependent.
For all 'normal' mallocs, including the ones you've mentioned, memory is released to be reused by your process, but not back to the whole system. Releasing back to the whole system happens only when you process is finally terminated.
FreeBSD 12's malloc(3) uses jemalloc 5.1, which returns freed memory ("dirty pages") to the OS using madvise(...MADV_FREE).
Freed memory is only returned after a time delay controlled by opt.dirty_decay_ms and opt.muzzy_decay_ms; see the manual page and this issue on implementing decay-based unused dirty page purging for more details.
Earlier versions of FreeBSD shipped with older versions of jemalloc, which also returns freed memory, but uses a different algorithm to decide what to purge and when.

Resources