Debugging memory leaks in Windows Explorer extensions - memory-leaks

Greetings all,
I'm the developer of a rather large C# Windows Explorer extension. As you can imagine, there is a lot of P/Invoke involved, and unfortunately, I've confirmed that it's leaking unmanaged memory somewhere. However, I'm coming up empty as to how to find the leak. I tried following this helpful guide, which says to use WinDBG. But, when I try to use !heap, it won't let me because I don't have the .PDB files for explorer.exe (and the public symbol files aren't sufficient, apparently).
Help?

I've used many time UMDH with very good results. The guide you mentioned describing WinDbg uses the same method as UMDH, based on ability of debug heap to record stack traces for all allocations. The only difference is that UMDH does it automated -- you simply run umdh from command line and it creates snapshot of all current allocations. Normally you to repeate the snapshots two or more times, then you calculate 'delta' between two snapshots (also using umdh.exe). The information on the 'delta' file gives you all new allocations that happen between your snapshots, sorted by the allocation size.
UMDH also needs symbols. You will need at least symbols for ntdll.dll (heap implementation lives there). Public symbols available on public symbols from http://msdl.microsoft.com/download/symbols will work fine.
Make sure you are using correct bitness of the umdh.exe. Explorer.exe is 64 bit on 64 bit OS, so if your OS is 64 bit you need to use 64 bit umdh.exe -- i.e. download appropriate bitness of Windows debugging tools.

Related

Tool for analyse portable executable loaded into memory

There is a lot of tools designed to help analyzing portable executable files. For example PE Explorer. We can load .exe file into it and check things like number of sections, section alignment or virtual addresses of particular sections.
Is there any similar tool which allows me to do the same but for a portable executable already loaded into memory? Without access to it's .exe file?
EDIT:
Maybe I will try to clarify what I'm trying to achieve. Lets say (like #0x90 suggested) that I have two applications or maybe even three applications.
app1.exe - executed by user, creates new process basing on app3.exe and modifies its memory by putting app2.exe into it.
app2.exe - Injected into app3.exe memory by app1.exe.
app3.exe - Used to create the new process.
I have sources of all of three applications. I'm simply trying to learn about windows internals by practical exercises with injecting/Processes Hollowing. I have some bug in app1.exe which leads to:
The application was unable to start correctly (0xc0000018).
And I'm trying to find a way to debug this situation. My idea was to compare PE on disk with PE in memory and check if it looks correct. I was surprised that I can't find tool designed for such a propose.
For clarity, the application you wish to examine shall be called App1 while the application which loads the PE files contents into memory shall be called App2.
To my knowledge, all major disassemblers work on files.
This is due to the fact that App1 will be mapped into the address space of App2.
Your easiest solution is to dump the executable to disk from memory.
If you control the source of App2, this step is trivial.
If you don't, you will need to attack a debugger, identify the exact memory range where the PE file resides and use the debugger's functionality to dump that memory range to disk.

Check memory usage in haskell

I'm creating a program which implements some kind of cache. I need to use as much memory as possible and to do that I need to do two things:
Check how much memory is still available in system (RAM only, not SWAP)
Check how much memory my app is already using.
I need a platform independent solution (Linux, Windows, etc.).
Using these two pieces of information I will reduce the size of cache or enlarge it.
How can I get this information in Haskell? Are there any packages that can provide that information?
I can't immediately see how to do this portably.
However, GHC does have "weak pointers". (See System.Mem.Weak.) If you create items and hang on to them via weak pointers (only), then the garbage collector will automatically start deleting items if you run low on physical memory.
(Unfortunately, this doesn't give you the ability to decide which items to delete first — e.g., the ones that are cheapest to recreate or the ones that have been least-used or something.)

Minimal core dump (stack trace + current frame only)

Can I configure what goes into a core dump on Linux? I want to obtain something like the Windows mini-dumps (minimal information about the stack frame when the app crashed). I know you can set a max size for the core files using ulimit, but this does not allow me to control what goes inside the core (i.e. there is no guarantee that if I set the limit to 64kb it will dump the last 16 pages of the stack, for example).
Also, I would like to set it in a programmatic way (from code), if possible.
I have looked at the /proc/PID/coredump_filter file mentioned by man core, but it seems too coarse grained for my purposes.
To provide a little context: I need tiny core files, for multiple reasons: I need to collect them over the network, for numerous (thousands) of clients; furthermore, these are embedded devices with little SD cards, and GPRS modems for the network connection. So anything above ~200k is out of question.
EDIT: I am working on an embedded device which runs linux 2.6.24. The processor is PowerPC. Unfortunately, powerpc-linux is not supported in breakpad at the moment, so google breakpad is not an option
I have "solved" this issue in two ways:
I installed a signal handler for SIGSEGV, and used backtrace/backtrace_symbols to print out the stack trace. I compiled my code with -rdynamic, so even after stripping the debug info I still get a backtrace with meaningful names (while keeping the executable compact enough).
I stripped the debug info and put it in a separate file, which I will store somewhere safe, using strip; from there, I will use add22line with the info saved from the backtrace (addresses) to understand where the problem happened. This way I have to store only a few bytes.
Alternatively, I found I could use the /proc/self/coredump_filter to dump no memory (setting its content to "0"): only thread and proc info, registers, stacktrace etc. are saved in the core. See more in this answer
I still lose information that could be precious (global and local variable(s) content, params..). I could easily figure out which page(s) to dump, but unfortunately there is no way to specify a "dump-these-pages" for normal core dumps (unless you are willing to go and patch the maydump() function in the kernel).
For now, I'm quite happy with there 2 solutions (it is better than nothing..) My next moves will be:
see how difficult would be to port Breakpad to powerpc-linux: there are already powerpc-darwin and i386-linux so.. how hard can it be? :)
try to use google-coredumper to dump only a few pages around the current ESP (that should give me locals and parameters) and around "&some_global" (that should give me globals).

Under Linux, how do I track down a memory leak in pre-built software?

I have a new Ubuntu Linux Server 64bit 10.04 LTS.
A default install of Mysql with replication turned on appears to be leaking memory.
However, we've tried going back to an earlier version and memory is still leaking but I can't tell where.
What tools/techniques can I use to pinpoint where memory is leaking so that I can rectify the problem?
Valgrind, http://valgrind.org/, can be very useful in these situations. It runs on unmodified executables but it does help tremendously if you can install the debugging symbols. Be sure to use the --show-reachable=yes flag as the leaked memory may still be reachable in some way but just not the way you want it. Also --trace-children in case of a fork. You'll likely have to track down in the start-up script where the executable is called and then add something like the following:
valgrind --show-reachable=yes --trace-children=yes --log-file=/path/to/log SQL-cmdline sqlargs
The man page has lots of other potentially useful options.
Have you tried the MySQL mailing list? Something like this would certainly be of interest to them if you can reproduce it in a straightforward manner.
You can use Valgrind as ninjalj suggests, but I doubt you'll get that close to anything useful. Even if you see a real leak (and they will be hard enough to validate), tracking down the root cause through the C call stacks will likely be very annoying (for example if the leak is triggered by a particular SQL pattern or stored procedure, you'll be looking at the call stack from the resultant optimized query, and not the original calls, which are likely in a different language).
Normally you might have no recourse, and have to resort to tracking it down through callstacks and iterative testing, but you have the source code to MySQL (including the source for the exact default package install), so you can use more advanced tools like MemoryScape (or at least build with symbols in order to provide Valgrind more food for thought).
Try using valgrind.
A very good and powerful tool, which is installed/available for most distributions is Valgrind.
It has a plethora of different options and is pretty much (as far as I've seen) the default profiler under linux systems.

Why is the startup of an App on linux slower when using shared libs?

On the embedded device I'm working on, the startup time is an important issue. The whole application consists of several executables that use a set of libraries. Because space in FLASH memory is limited we'd like to use shared libraries.
The application workes as usual when compiled and linked with shared libraries and the amount of FLASH memory is reduced as expected.
The difference to the version that is linked to static libs is that the startup time of the application is about 20s longer and I have no idea why.
The application runs on an ARM9 CPU at 180 MHz with Linux 2.6.17 OS,
16 MB FLASH (JFFS File System) and 32 MB RAM.
Bacause shared libraries have to be linked to at runtime, usually by dlopen() or something similar. There's no such step for static libraries.
Edit: some more detail. dlopen has to perform the following tasks.
Find the shared library
Load it into memory
Recursively load all dependencies (and their dependencies....)
Resolve all symbols
This requires quite a lot of IO operations to accomplish.
In a statically linked program all of the above is done at compile time, not runtime. Therefore it's much faster to load a statically linked program.
In your case, the difference is exaggerated by the relatively slow hardware your code has to run on.
This is a fine example of the classic tradeoff of speed and space.
You can statically link all your executables so that they are faster but then they will take more space
OR
You can have shared libraries that take less space but also more time to load.
So decide what you want to sacrifice.
There are many factors for this difference (OS, compiler e.t.c) but a good list of reasons can be found here. Basically shared libraries were created for space reasons and much of the "magic" involved to make them work takes a performance hit.
(As a historical note the original Netscape navigator on Linux/Unix was a statically linked big fat executable).
This may help others with similar problems:
The reason why startup took so long in my case was, that the default setting of the GCC is to export all symbols inside of a library.
A big improvement is to set a compiler setting "-fvisibility=hidden".
All symbols that the lib has to export have to be augmented with the statement
__attribute__ ((visibility("default")))
see gcc wiki
and the very fine article how to write shared libraries
Ok, I have learned now that the usage of shared libraries has it's disadvatages concerning speed. I found this article about dynamic linking and loading enlighting. The loading process seems to be much lengthier than I have expected.
Interesting.. typically loading time for a shared library is unnoticeable from a fat app that is statically linked. So I can only surmise that the system is either very slow to load a library from flash memory, or the library that is loaded is being checked in some way (eg .NET apps run a checksum for all loaded dlls, reducing startup time considerably in some cases). It could be that the shared libraries are being loaded as-needed, and unloaded afterwards which could indicate a configuration problem.
So, sorry I can't help say why, but I think its an issue with your ARM device/OS. Have you tried instrumenting the startup code, or statically linking with 1 of the most commonly-used libraries to see if that makes a large difference. Also put the shared libs in the same directory as the app to reduce the time it takes to search the FS for the lib.
One option which seems obvious to me, is to statically link the several programs all into a single binary. That way you continue to share as much code as possible (probably more than before), but you will also avoid the overhead of the dynamic linker AND save the space of having the dynamic linker on the system at all.
It's pretty easy to combine several executables into the same one, you normally just examine argv and decide which routine to call based on that.

Resources