valgrind - Find memory leak in a shared library - memory-leaks

I need to know ways to find out the memory leaks in a shared library which will be loaded to a release binary. I mean shared library I built with -g option but the binary that loads the shared library is not built with -g option.
I get the leak report as follows.
==739== at 0x4A05809: malloc (vg_replace_malloc.c:149)
==739== by 0x84781B1: ???
==739== by 0x87507F5: ???
==739== by 0x874CF47: ???
==739== by 0x874E657: ???
==739== by 0x874F7C2: ???
==739== by 0x8779C0C: ???
Please let me know how to get the stack trace of the leak from the shared library?

Assuming that the leak really is coming from your shared library then I don't think the problem is the lack of debugging in the main executable.
More likely your problem is that the executable is unloading the shared library by calling dlclose before it finishes. That means that when valgrind comes to check for leaks all the symbol information for the library is gone as the library is no longer loaded.
If you can rebuild the executable then the easiest solution may be to temporarily stop it calling dlclose so that the library stays loaded until the end.
If you can't do that, then try using LD_PRELOAD to keep the library loaded, like this:
LD_PRELOAD="/path/to/library.so" valgrind my-executable
which will hopefully trick the dynamic linker into keeping the library loaded even after it has been closed.

As the previous answer suggests, this is because you have closed your libraries before the program terminates and therefore symbol information is not available to valgrind.
Using LD_PRELOAD didn't work for me; I now have two builds; one that explicitly does not call dlclose(); on this build, valgrind correctly reports line number information as you would expect with dynamic linking.

Related

What is determing which `malloc` will be called for an injected code?

I'm using frida to hook various functions of the Firefox web browser running atop of Windows. One of the symbols I hooked was mozglue::malloc() which calls for the jemalloc allocator.
In the process address-space there are three malloc() symbols:
In msvcrt.lib (static linking)
In ucrtbase.dll for dynamic linking
The already mentioned in mozglue.dll
I was expecting that all memory allocations that made by the Firefox processes will be allocated by the mozglue::malloc() and of course this truely happens.
I didn't expect that memory allocations that made by the frida JS agent which was injected to the target process will also be allocated using jemalloc, and honestly I still can't figure out why.
frida couldn't possibly know that there is a mozglue::malloc() symbol when its first attaching to a process, from the frida point of view there is a simple call for malloc(), so how and why this call is redirected from the default CRT symbols to the Mozila dll? This probably have something to do with the PE design, but I can't put the finger on it...
Thank for any help / insight / answer
mozglue::malloc shouldn't even be the only function called inside Firefox, because some system functions called by Firefox will use system malloc.
I see only one explanation: Firefox replaces original malloc calls in memory with its own version of malloc. Having a check at the source code supports this idea: https://searchfox.org/mozilla-central/source/memory/build/replace_malloc.h

Leak/Address sanitizer in a shared library without LD_PRELOAD

I'm looking to use Clang's leak/address sanitizer on my shared library, which is loaded from JVM or dotnet (Linux) at runtime, so I can't recompile the binary.
Using LD_PRELOAD makes for a very noisy output, a lot (presumably false positive?) leaks get reported from the JVM itself. The sanitizer outright crashes when LD_PRELOADing for dotnet.
Is there any way to statically link the sanitizer into the shared library (or dynamically without LD_PRELOAD)?
First thing first, you can not statically link sanitizer runtime libs into your library. It has to be preloaded to intercept std allocator (malloc, etc.) and would malfunction otherwise (there's a special check at Asan startup that ensures that libasan was preloaded).
Noisy output in JVM may well be legitimate errors.
Using LD_PRELOAD makes for a very noisy output, a lot (presumably false positive?) leaks get reported from the JVM itself.
The sanitizer outright crashes when LD_PRELOADing
for dotnet.
Is it a real crash or diagnosed memory error? Crash can be reported in Asan tracker.
Memory error should be reported to dotnet project but you can still continue execution after it using continue-after-error mode (grep for "continue-after-error" in Asan FAQ).

best approach to debug "corrupted double-linked list" crash

I am in the process of debugging a "corrupted double-linked list" crash. I have seen the source and understand the chunk struct and the fd/bk pointers, etc, so I think I know why this crash has occurred. I am now trying to fix it and I have a couple of questions.
Question #1: where (with respect to the pointer returned from malloc) is the malloc_chunks struct maintained? Are they before the memory block or after it?
Question #2: the malloc_chunks for allocated memory are different from the malloc_chunks for unallocated memory. It appears (??) that the allocated buffer case does not have the fd/bk pointers. Is this correct?
Question #3: what is the recommended approach to debug this type of error? I am assuming that I should put a break point for the malloc_chunks so I can break on when the struct is overwritten. But I am not sure how to access those malloc structs so I can set a break point in gdb.
Any suggestions on how to proceed would be very appreciated.
Thanks,
-Andres
what is the recommended approach to debug this type of error?
The usual way is not to peek into GLIBC internals, but to use a tool like Valgrind or AddressSanitizer, either of which is likely to point you straight at the problem.
Update:
Valgrind crashes ...
You should try building the latest Valgrind version from source, and if that still crashes, report the crash to Valgrind developers.
Chances are the Valgrind problem is already fixed, and building new Valgrind and testing your program with it will still be faster than trying to debug GLIBC internals (heap corruption bugs are notoriously difficult to find by program inspection or debugging).
AddressSanitizer, I thought it was a clang only tool -- I do not think it is available for linux.
Two points:
Clang works just fine on Linux, I use it almost every day,
Recent GCC versions have an equivalent -fsanitize=address option.
There are ways to debug heap overruns without valgrind.
One way is to use a malloc debug library such as Electric Fence. It will make your rogram crash exactly at the moment of accessing an illegal address in the heap.
The other way is to use built-in debug capabilities of GNU malloc. See man mcheck. If you call mcheck_pedantic before the first call to malloc, then every memory block is checked at every allocation. This is very slow but does allow you to isolate the fault.

Dynamically loaded library remains loaded despite dlclose

Today I'm looking for some enlightenment of the deep magic inside the dynamic loader. I'm debugging/troubleshooting a plugin system for a C++ application running on Linux. It loads plugins via dlopen (RTLD_NOW | RTLS_LOCAL) and releases them using dlclose. Nothing extraordinary - one would think.
However, I noticed that some plugins remain loaded even after dlclose is successfully called*. I concluded this after looking at the running process' memory map using pmap. Some libs get immediately removed from process memory and others keep lingering around apparently indefinitely.
Continuing on, the dlopen man page states:
The function dlclose() decrements the reference count on the dynamic
library handle handle. If the reference count drops to zero and no
other loaded libraries use symbols in it, then the dynamic library is
unloaded.
That means the problem boils down to these two possibilities; Either the reference counts aren't zero, or other loaded libs are using symbols from some - but not all - of the plugins.
I'm pretty sure (although not 100%) that the reference counts are zero. The application's plugin manager handles all plugins exactly the same. It also makes sure that a plugin doesn't get loaded multiple times. IMO loading & unloading should therefore behave the same for all plugins.
That leaves the second possibility: other loaded libs are using symbols from the plugins. Another typical case of 'That should never happen'. Although it's certainly possible. We're using gcc and default visibility and as far as I've seen nothing is stripped, so tons of symbols are being exported. Actually that worries me much more as these plugins should be independent.
Here then are my open questions at this point:
Are my conclusions so far correct?
Do you know a way to verifydlopen's reference count?
If there are internal symbols of my plugins (accidentally) used by other libs, is there a way to track down who is using which symbols?
My machine is:
Linux 3.13.0-43-generic #72-Ubuntu SMP Mon Dec 8 19:35:44 UTC 2014 i686 i686 i686 GNU/Linux
* I should mention that all of the loading and unloading happens in the main thread, so there should be no multi threading issue here.
other loaded libs are using symbols from the plugins
If other libs are not linked to that shared library at link time, referring to symbols of a shared library does not prevent that shared library from being unloaded.
To debug the run-time linker set environment variable LD_DEBUG to all, e.g. LD_DEBUG=all ./my_app. See man ld.so for details.

Loading and Unloading shared libraries in Mac OSX

I am sorry if this question has been repeated before in this forum. I am having a problem where, Loading and Unloading of dylibs arent working as expected in Mac(esp the unloading part.).
The question is if I have an executable and if I load a shared library say A.dylib and then use the loaded shared library to load an library say B.dylib. When I try unloading the library B.dylib at a later stage, the there is no error code returned(the return int value is 0 - as I am using a regular dlopen and dlclose functions to load and unload libraries, 0 means unloaded successfully), but when I check to make sure using the activity monitor or lsof the b.dylib is still in the memory.
Now the we are porting this code for windows, linux & mac. Windows and Linux works as expected, but only mac is giving me problems.
I was reading in the mac developer library and found out that: " There are a couple of cases in which a dynamic library will never be unloaded:
1) the main executable links against it, 2) An API that does not supoort unloading (e.g. NSAddImage())
was used to load it or some other dynamic library that depends on it, 3) the dynamic library is in
dyld's shared cache."
In my case I dont fall either of the first 2 cases. I am suspecting on case3.
Here is my question:
1. What can I do to make sure I have case 3?
2. If yes, how to fix it?
3. If not, how to fix it?
4. Why is mac so different?
Any help in this regard is appreciated!
Thanks,
Jan
When you load a shared library into an executable, all of the symbols exported by that library are candidates to resolve symbols required by the executable, causing the library to remain loaded if the DYLD linker binds to an unintended symbol. You can list the symbols in a shared library by using nm, and you can set environment variables to enable debugging output for the dynamic linker (see this man page on dyld). You need to set the DYLD_PRINT_BINDINGS environment variable.
Most likely, you need to limit the exported symbols to a specific subset that is used by the executable, so that only those symbols you intend to use are bound. This can be done by placing the required symbols in a file and passing it to the linker via the -exported_symbols_list option. Without doing so, you can end up binding a symbol in the dyloaded library, and it will not be unloaded since they are required to resolve a symbol in the executable and won't unload when dlclose() is called.

Resources