How can I benchmark malloc implementations?

How can I benchmark malloc implementations? - malloc

I am comparing between different malloc implementations and I would like to compare their run time and memory usage.
In particular, I am interested in the runtime and in the maximum resident memory. It is important that the maximum resident memory will be the real one (without the code segment etc.).
I cannot use tools like valgrind, since it replaces the malloc implementation. Also, I run the tests on programs that I have not written, and I prefer not to change their source code.

You can use rdtscbench for the runtime measurement. See:
https://github.com/petersenna/rdtscbench

Related

What memory leaks can occur outside the view of GHC's heap profiler

I have a program that exhibits the behavior of a memory leak. It gradually takes up all of the systems memory until it fills all swap space and then the operating system kills it. This happens once every several days.
I have extensively profiled the heap in a manner of ways (-hy, -hm, -hc) and tried limiting heap size (-M128M) tweaked the number of generations (-G1) but no matter what I do the heap size appears constant-ish and low always (measured in kB not MB or GB). Yet when I observe the program in htop, its resident memory steadily climbs.
What this indicates to me is that the memory leak is coming from somewhere besides the GHC heap. My program makes use of dependencies, specifically Haskell's yaml library which wraps the C library libyaml, it is possible that the leak is in the number of foreign pointers it has to objects allocated by libyaml.
My question is threefold:
What places besides the GHC heap can memory leak from in a Haskell program?
What tools can I use to track these down?
What changes to my source code need to be made to avoid these types of leaks, as they seem to differ from the more commonly experienced space leaks in Haskell?

This certainly sounds like foreign pointers aren't being finalized properly. There are several possible reasons for this:
The underlying C library doesn't free memory properly.
The Haskell library doesn't set up finalization properly.
The ForeignPtr objects aren't being freed.
I think there's actually a decent chance that it's option 3. If the RTS consistently finds enough memory in the first GC generation, then it just won't bother running a major collection. Fortunately, this is the easiest to diagnose. Just have your program run System.Memory.performGC every so often. If that fixes it, you've found the bug and can tweak just how often you want to do that.
Another possible issue is that you could have foreign pointers lying around in long-lived thunks or other closures. Make sure you don't.
One particularly strong possibility when working with a wrapped C library is that the wrapper functions will return ByteStrings whose underlying arrays were allocated by C code. So any ByteStrings you get back from yaml could potentially be off-heap.

How much memory did Linux give to malloc()?

This is a Linux system question, not a coding question. When I use "top" to check the memory usage of my program, it reports a value 3-4 times as large as the actual heap allocation as given by Valgrind's Massif, a memory profiler. It's a large program, and the difference is hundreds of megabytes. The Valgrind manual gives only a partial explanation:
(Massif) does not directly measure memory allocated with
lower-level system calls such as mmap, mremap, and brk.
Heap allocation functions such as malloc are built on top of these
system calls. For example, when needed, an allocator will typically
call mmap to allocate a large chunk of memory, and then hand over
pieces of that memory chunk to the client program in response to calls
to malloc et al. Massif directly measures only these higher-level
malloc et al calls, not the lower-level system calls.
Fine, but how much memory am I really taking away from the system? I need to be able to run as many instances of this program as possible on one machine, so I need to know how much of that memory is still available. Page alignment etc. cannot explain a difference of hundreds of megabytes in reported memory usage.
Also, what determines the block size of the underlying mmap() call? I'm seeing blocks of 64MB at a time being taken according to top, which seems bizarrely large.

Any malloc implementation will be optimised for applications with huge memory requirements, because apps with low requirements run just fine anyway, and virtual memory is cheap.
For example, you will find malloc implementations that use a block of memory for up to 1024 mallocs of up to 16 bytes, another block for up to 1024 mallocs of up to 32 bytes, and so on. With a few mallocs this is inefficient but still cheap. With gazillions of mallocs, it makes malloc very efficient.
So saying "4 times as much" can be completely pointless. Tell us how many megabytes more than you thought.

Following memory allocation in gdb

Why is memory consumption jumping unpredictably as I step through a program in the gdb debugger? I'm trying to use gdb to find out why a program is using far more memory than it should, and it's not cooperating.
I step through the source code while monitoring process memory usage, but I can't find what line(s) allocate the memory for two reasons:
Reported memory usage only jumps up in increments of (usually, but not always exactly) 64 MB. I suspect I'm seeing the effects of some memory manager I don't know about which reserves 64 MB at a time and masks multiple smaller allocations.
The jump doesn't happen at a consistent location in code. Not only does it occur on different lines during different gdb runs; it also sometimes happens in illogical places like the closing bracket of a (c++) function. Is it possible that gdb itself is affecting memory allocations?
Any ideas/suggestions for more effective tools to help me drill down to the code lines that are really responsible for these memory allocations?
Here's some relevant system info: I'm running x86_64-redhat-linux-gnu version 7.2-64.el6-5.2 on a virtual CentOS Linux machine under Windows. The program is built on a remote server via a complicated build script, so tracking down exactly what options were used at any point is itself a bit of a chore. I'm monitoring memory usage both with the top utility ("virt" or virtual memory column) and by reading the real-time monitoring file /proc/<pid>/status, and they agree. Since this program uses a large suite of third-party libraries, there may be one or more overridden malloc() functions involved somewhere that I don't know about--hunting them down is part of this task.

gdb, left to its own devices, will not affect the memory use of your program, though a run under gdb may differ from a standalone run for other reasons.
However, this also depends on the way you use gdb. If you are just setting simple breakpoints, stepping, and printing things, then you are ok. But sometimes, to evaluate an expression, gdb will allocate memory in the inferior. For example, if you have a breakpoint condition like strcmp(arg, "string") == 0, then gdb will allocate memory for that string constant. There are other cases like this as well.

This answer is in several parts because there were several things going on:
Valgrind with the Massif module (a memory profiler) was much more helpful than gdb for this problem. Sometimes a quick look with the debugger works, sometimes it doesn't. http://valgrind.org/docs/manual/ms-manual.html
top is a poor tool for profiling memory usage because it only reports virtual memory allocations, which in this case were about 3x the actual heap memory usage. Virtual memory is mapped and made available by the Unix kernel when a process asks for a memory block, but it's not necessarily used. The underlying system call is mmap(). I still don't know how to check the block size. top can only tell you what the Unix kernel knows about your memory consumption, which isn't enough to be helpful. Don't use it (or the memory files under /proc/) to do detailed memory profiling.
Memory allocation when stepping out of a function was caused by autolocks--that's a thread lock class whose destructor releases the lock when it goes out of scope. Then a different thread goes into action and allocates some memory, leaving the operator (me) mystified. Non-repeatability is probably because some threads were waiting for external resources like Internet connections.

can we use too many malloc and free in c program

is it ok to call too many malloc & free in a program?
i have a program that does malloc and free for each record. Although it sounds bad, does it have performance issue if i use too many malloc and free ?

Most modern malloc(3) implementations work like a memory pool. Since most modern OSes treat memory with pages (usually 4KB size), a malloc will probably request at least 4KB from the OS.
Suppose you keep calling malloc with 32. In your first malloc, at least one new page is requested from the OS (via sbrk(2) on unix). The successive mallocs have nothing to do with the OS, they just return you the next free chunk of memory in the memory pool as long as memory is available. So, calling malloc many times is not a big deal, usually. The point here is that system calls (the communication between the user process and OS) are usually expensive and malloc tries its best to avoid as much as possible.
free is similar too. When you free memory, usually OS isn't notified about that. When a page is totally freed, the page may be returned to the OS. Some implementations do not return the page to the OS unless the process already holds many unused pages.
To sum it up, malloc and free are like generic memory managers working with arbitrary size. The problem you might face is that malloc is designed to work with arbitrary size allocations, which might be slower than a memory manager that's designed to work with fixed size allocations. If you're usually allocating the same types of memory, you might be better off with implementing your own memory pool. Another case would be that malloc calls involve locking/unlocking in most modern implementations to support multithreading. If you're working with a single thread, that might also be an overhead: another reason to implement your own memory pool.
You might also want to work with different malloc implementations, benchmark them and decide to go with either one. Starting with a clean implementation and stripping off unnecessary parts might also be a good idea here.

yes/no. Large volumes of malloc/free can cause the heap to be fragmented to the point where malloc can fail. It is less of an issue now that memory is pretty cheap.

There is some overhead in calling malloc, but not a lot. malloc basically has to go to through the heap and find a block of memory that is unused and large enough to hold the number of bytes you asked for, then it designates that block as used and tells the operating system to mmap it for you and returns a pointer to that block.
It's a few steps, but really not a lot of work for your computer. The difference between using malloc to get memory for you, and putting a variable on the stack is a handful of instructions, and a system call, and unless you're programming on an embedded system, you honestly shouldn't worry about it. You'll only take a real performance hit if you allocate so much memory that you actually run out of RAM (in which case your Virtual Memory Manager will have to move some things into the swap space to make more room - as it turns out, malloc never fails)!
Freeing memory is even easier than allocating it, and in the end it's better to free what you allocate (future malloc calls will be faster, more memory will be available).
In short, use malloc to your hearts content! Decades of advances in technology have worked hard to earn you that right, there's no sense squandering it!

By definition 'too many' is 'too many'.
But more seriously, on most systems heap allocation is reasonably fast - because its done a lot. Allocating space for a record each time its processed doesn't sound bad.
The real answer is : write your program and measure its speed, is it acceptable? If not then profile it and find where the bottlenecks are - my 10c says it wont be heap processing

Memory Debugging

Currently I analyze a C++ application and its memory consumption. Checking the memory consumption of the process before and after a certain function call is possible. However, it seems that, for technical reasons or for better efficiency the OS (Linux) assigns not only the required number of bytes but always a few more which can be consumed later by the application. This makes it hard to analyze the memory behavior of the application.
Is there a workaround? Can one switch Linux to a mode where it assigns just the required number of bytes/pages?

if you use malloc/new, the allocator will always alloc a little more bytes than you requested , as it needs some room to do its housekeeping, also it may need to align the bytes on pages boundaries. The amount of supplementary bytes allocated is implementation dependent.
you can consider to use tools such as gperftools (google) to monitor the memory used.

I wanted to check a process for memory leeks some years ago.
What I did was the following: I wrote a very small debugger (it is easier than it sounds) that simply set breakpoints to malloc(), free(), mmap(), ... and similar functions (I did that under Windows but under Linux it is simpler - I did it in Linux for another purpose!).
Whenever a breakpoint was reached I logged the function arguments and continued program execution...
By processing the logfile (semi-automated) I could find memory leaks.
Disadvantage: It is not possible to debug the program using another debugger in parallel.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string