Related code:
write(-1, "test", sizeof("test"));
void * p = malloc(1024);
void * p2 = malloc(510);
write(-1, "hi", sizeof("hi"));
Related strace output:
write(4294967295, "test\0", 5) = -1 EBADF (Bad file descriptor)
brk(0) = 0x601000
brk(0x622000) = 0x622000
write(4294967295, "hi\0", 3) = -1 EBADF (Bad file descriptor)
I'm surprised such low level operation doesn't involve syscall?
Not every call to malloc invokes a syscall. On my linux desktop malloc allocates a space in 128KB blocks and then distributes the space. So I will see a syscall every 100-200 malloc calls. On freebsd malloc allocates by 2MB blocks. On your machine numbers will likely differ.
If you want to see syscall on every malloc allocate large amounts of memory (malloc(10*1024*1024*1024))
What do you think brk is? malloc absolutely is invoking a syscall in this example, the syscall just isn't "malloc".
malloc() calls the system brk() function (in Linux/Unix) - but it only calls it if the local heap is exhausted. I.e. most malloc implementations manage a memory heap obtained via brk(), and if it's too small or too fragmented they ask for more via brk().
Related
I know for malloc sbrk is the system call invoked ,Similarly What is the system cal invoked when i write to a malloed memory(heap memory)
int main
{
/* 10 byte of heap memory allocated */
char *ptr = malloc(5);
ptr[0] = 10; // **What is the system call invoked for
writing into this heap memory** ?????
}
There are no system call involved in this case. Ask you compiler to generate assembly so that you can see that there is only some MOV instructions there. Or you can use a debugger to see the assembly
Accessing memory does not require a system call. On the contrary, accessing memory is what most of your code does most of the time! On a modern OS, you have a flat view of a contiguous range of virtual memory, and you typically only need a system call to mark a particular region (a "page") of that memory as valid; other times, contiguously growing memory ranges such as the call stack don't even require any action on your program's part. It's solely the job of your operating system's memory manager to intercept accesses to memory that isn't mapped to physical memory (via a page fault), do some kernel magic to bring the desired memory into physical space and return control to your program.
The only reason malloc occasionally needs to perform a system call is because it asks the operating system for a random piece of virtual memory somewhere in the middle. If your program were to only function with global and local variables (but no dynamic allocation), you wouldn't need any system calls for memory management.
"operating system doesn't see every write that occurs: a write to memory corresponds simply to a STORE assembly instruction, not a system call. It is the hardware that takes care of the STORE and the necessary address translation. The only time the OS will see a memory write is when the address translation in the page tables fails, causing a trap to the OS. "
Please read the below link for details
http://pages.cs.wisc.edu/~dusseau/Classes/CS537-F04/Questions/sol12.html
According to the manpages for memfd_create, when I call memfd_create it provides me with a file descriptor that I can read from and write to that corresponds to some space in main memory. My question is where exactly is this memory being allocated? memfd_create is a syscall so it isn't using malloc to allocate memory in the heap and, using GDB, it doesn't seem like a new page in memory is being created when memfd_create is called.
I am trying to allocate some memory using sys_brk in NASM/x86 assembly. sys_break returns the new address of break, which is the end of the data segment right? So where does my newly allocated memory reside? I assumed that it is in between the old break value and the new break value. So if I allocate 64bytes of memory with sys_brk i can use the next 64 bytes starting from the old break value that i stored before calling sys_brk. Am I right?
My Assembly code that will allocate memory will look somewhat like this.https://gist.github.com/nikAizuddin/f4132721126257ec4345
And another side question is;
I am supposed to write a function in Assembly that returns the pointer to the dynamically allocated memory and that function will be called from a C program. How can i free this block of memory from C side of my program? Would just calling free() be enough?
The brk(2) man page (section: C library/kernel ABI differences) describes how the glibc wrapper is implemented on top of Linux's system call, which returns the new brk on success, or the old brk on failure.
As I understand it, memory beyond the current break is unmapped. Addresses below the current break are part of the data segment (in the sense of data+bss+heap). The docs aren't clear on whether the break has to be page-aligned. (i.e. can you sbrk(64), or only sbrk(4096)?) If ASLR is enabled, the initial break will be some random distance past the end of the BSS.
See: What does the brk() system call do? An answer on that question has an example of using sbrk to replace malloc for code-golf. So yes, the old break is the address to return. And apparently you can sbrk any increment you want, not just pages.
You're the one writing the memory allocator. sbrk just lets you get more from the OS, like mmap(MAP_ANONYMOUS) but less flexible. It doesn't help you keep track of free blocks so you can use them for future allocations instead of always getting more from the OS.
The way to give back memory you got with sbrk is by calling sbrk with a negative argument. Obviously this requires a last-in-first-out usage pattern, which is why glibc's malloc only uses sbrk for small allocations (that can be put on the free-list when freed, to be handed out for future mallocs). Big allocations are best returned to the OS right away, instead of being kept mapped, so glibc's malloc uses mmap for those.
Never call free(3) on memory you didn't get from malloc(3) (or an associated function, like strdup(3), that says in the docs you can and should free(3) the memeory.) IDK what would happen if you called munmap on a page of memory below the program break. Probably it would just work, but then you'd have a hole in your data segment that could cause problems if the break ever decreased to there.
In assembly, the Linux brk system call takes an address where you want to set the break. As the man page notes, it either returns that for success, or returns the old break on failure, never a -errno code like -ENOMEM.
See Assembly x86 brk() call use for an x86-64 example.
The POSIX API where you can use positive or negative integer offsets is something you can implement by always calling twice, or like glibc keeping track of the current break in a global variable. To init that variable, use brk once with a requested address of 0, which will fail, as shown in the strace output below.
This is similar to what you'd do with the POSIX API, calling sbrk with increment = 0.
This is what glibc's malloc(3) does internally:
$ strace -e brk ls 2>&1 | m
brk(0) = 0x650000
brk(0) = 0x650000
brk(0x671000) = 0x671000
The brk man page mentions end(3). Apparently there are globals which are located at the end of the text, data, and bss segments. However, &end is only "somewhere near" the program break, which is why malloc still has to make a system call to get the initial break. IDK why there's a redundant brk(0). These are raw system calls, not library function calls, so an sbrk(0) probably doesn't explain it.
On linux malloc behaves opportunistically, only backing virtual memory by real memory when it is first accessed. Would it be possible to modify calloc so that it also behaves this way (allocating and zeroing pages when they are first accessed)?
It is not a feature of malloc() that makes it "opportunistic". It's a feature of the kernel with which malloc() has nothing to do whatsoever.
malloc() asks the kernel for a slap of memory everytime it needs more memory to fulfill a request, and it's the kernel that says "Yeah, sure, you have it" everytime without actually supplying memory. It is also the kernel that handles the subsequent page faults by supplying zero'ed memory pages. Note that any memory that the kernel supplies will already be zero'ed out due to safety considerations, so it is equally well suited for malloc() and for calloc().
That is, unless the calloc() implementation spoils this by unconditionally zeroing out the pages itself (generating the page faults that prompt the kernel to actually supply memory), it will have the same "opportunistic" behavior as malloc().
Update:
On my system, the following program successfully allocates 1 TiB (!) on a system with only 2 GiB of memory:
#include <stdlib.h>
#include <stdio.h>
int main() {
size_t allocationCount = 1024, successfullAllocations = 0;
char* allocations[allocationCount];
for(int i = allocationCount; i--; ) {
if((allocations[i] = calloc(1, 1024*1024*1024))) successfullAllocations++;
}
if(successfullAllocations == allocationCount) {
printf("all %zd allocations were successfull\n", successfullAllocations);
} else {
printf("there were %zd failed allocations\n", allocationCount - successfullAllocations);
}
}
I think, its safe to say that at least the calloc() implementation on my box behaves "opportunistically".
From the related /proc/sys/vm/overcommit_memory section in proc:
The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been "used" by them as of yet. A process which allocates 1GB of memory (using malloc(3) or similar), but only touches 300MB of that memory will only show up as using 300MB of memory even if it has the address space allocated for the entire 1GB. This 1GB is memory which has been "committed" to by the VM and can be used at any time by the allocating application. With strict overcommit enabled on the system (mode 2 /proc/sys/vm/overcommit_memory), allocations which would exceed the CommitLimit (detailed above) will not be permitted. This is useful if one needs to guarantee that processes will not fail due to lack of memory once that memory has been successfully allocated.
Though not explicitly said, I think similar here means calloc and realloc. So calloc already behaves opportunistically as malloc.
I have a long-living application with frequent memory allocation-deallocation. Will any malloc implementation return freed memory back to the system?
What is, in this respect, the behavior of:
ptmalloc 1, 2 (glibc default) or 3
dlmalloc
tcmalloc (google threaded malloc)
solaris 10-11 default malloc and mtmalloc
FreeBSD 8 default malloc (jemalloc)
Hoard malloc?
Update
If I have an application whose memory consumption can be very different in daytime and nighttime (e.g.), can I force any of malloc's to return freed memory to the system?
Without such return freed memory will be swapped out and in many times, but such memory contains only garbage.
The following analysis applies only to glibc (based on the ptmalloc2 algorithm).
There are certain options that seem helpful to return the freed memory back to the system:
mallopt() (defined in malloc.h) does provide an option to set the trim threshold value using one of the parameter option M_TRIM_THRESHOLD, this indicates the minimum amount of free memory (in bytes) allowed at the top of the data segment. If the amount falls below this threshold, glibc invokes brk() to give back memory to the kernel.
The default value of M_TRIM_THRESHOLD in Linux is set to 128K, setting a smaller value might save space.
The same behavior could be achieved by setting trim threshold value in the environment variable MALLOC_TRIM_THRESHOLD_, with no source changes absolutely.
However, preliminary test programs run using M_TRIM_THRESHOLD has shown that even though the memory allocated by malloc does return to the system, the remaining portion of the actual chunk of memory (the arena) initially requested via brk() tends to be retained.
It is possible to trim the memory arena and give any unused memory back to the system by calling malloc_trim(pad) (defined in malloc.h). This function resizes the data segment, leaving at least pad bytes at the end of it and failing if less than one page worth of bytes can be freed. Segment size is always a multiple of one page, which is 4,096 bytes on i386.
The implementation of this modified behavior of free() using malloc_trim could be done using the malloc hook functionality. This would not require any source code changes to the core glibc library.
Using madvise() system call inside the free implementation of glibc.
Most implementations don't bother identifying those (relatively rare) cases where entire "blocks" (of whatever size suits the OS) have been freed and could be returned, but there are of course exceptions. For example, and I quote from the wikipedia page, in OpenBSD:
On a call to free, memory is released
and unmapped from the process address
space using munmap. This system is
designed to improve security by taking
advantage of the address space layout
randomization and gap page features
implemented as part of OpenBSD's mmap
system call, and to detect
use-after-free bugsāas a large memory
allocation is completely unmapped
after it is freed, further use causes
a segmentation fault and termination
of the program.
Most systems are not as security-focused as OpenBSD, though.
Knowing this, when I'm coding a long-running system that has a known-to-be-transitory requirement for a large amount of memory, I always try to fork the process: the parent then just waits for results from the child [[typically on a pipe]], the child does the computation (including memory allocation), returns the results [[on said pipe]], then terminates. This way, my long-running process won't be uselessly hogging memory during the long times between occasional "spikes" in its demand for memory. Other alternative strategies include switching to a custom memory allocator for such special requirements (C++ makes it reasonably easy, though languages with virtual machines underneath such as Java and Python typically don't).
I had a similar problem in my app, after some investigation I noticed that for some reason glibc does not return memory to the system when allocated objects are small (in my case less than 120 bytes).
Look at this code:
#include <list>
#include <malloc.h>
template<size_t s> class x{char x[s];};
int main(int argc,char** argv){
typedef x<100> X;
std::list<X> lx;
for(size_t i = 0; i < 500000;++i){
lx.push_back(X());
}
lx.clear();
malloc_stats();
return 0;
}
Program output:
Arena 0:
system bytes = 64069632
in use bytes = 0
Total (incl. mmap):
system bytes = 64069632
in use bytes = 0
max mmap regions = 0
max mmap bytes = 0
about 64 MB are not return to system. When I changed typedef to:
typedef x<110> X; program output looks like this:
Arena 0:
system bytes = 135168
in use bytes = 0
Total (incl. mmap):
system bytes = 135168
in use bytes = 0
max mmap regions = 0
max mmap bytes = 0
almost all memory was freed. I also noticed that using malloc_trim(0) in either case released memory to system.
Here is output after adding malloc_trim to the code above:
Arena 0:
system bytes = 4096
in use bytes = 0
Total (incl. mmap):
system bytes = 4096
in use bytes = 0
max mmap regions = 0
max mmap bytes = 0
I am dealing with the same problem as the OP. So far, it seems possible with tcmalloc. I found two solutions:
compile your program with tcmalloc linked, then launch it as :
env TCMALLOC_RELEASE=100 ./my_pthread_soft
the documentation mentions that
Reasonable rates are in the range [0,10].
but 10 doesn't seem enough for me (i.e I see no change).
find somewhere in your code where it would be interesting to release all the freed memory, and then add this code:
#include "google/malloc_extension_c.h" // C include
#include "google/malloc_extension.h" // C++ include
/* ... */
MallocExtension_ReleaseFreeMemory();
The second solution has been very effective in my case; the first would be great but it isn't very successful, it is complicated to find the right number for example.
Of the ones you list, only Hoard will return memory to the system... but if it can actually do that will depend a lot on your program's allocation behaviour.
The short answer: To force malloc subsystem to return memory to OS, use malloc_trim(). Otherwise, behavior of returning memory is implementation dependent.
For all 'normal' mallocs, including the ones you've mentioned, memory is released to be reused by your process, but not back to the whole system. Releasing back to the whole system happens only when you process is finally terminated.
FreeBSD 12's malloc(3) uses jemalloc 5.1, which returns freed memory ("dirty pages") to the OS using madvise(...MADV_FREE).
Freed memory is only returned after a time delay controlled by opt.dirty_decay_ms and opt.muzzy_decay_ms; see the manual page and this issue on implementing decay-based unused dirty page purging for more details.
Earlier versions of FreeBSD shipped with older versions of jemalloc, which also returns freed memory, but uses a different algorithm to decide what to purge and when.