I am trying to profile my application for checking possible memory leaks using Valgrind's memcheck tool. As my application has many dependent 3rd party libraries, which are reporting 'Invalid write of size' memory error. How do I suppress this error? I have tried make options in the suppression file such as Cond, Value4, Addr4. But nothing has suppressed this warning. I have also provided the option --undef-value-errors=no
Use
--gen-suppressions=all
and edit the resulting generated suppressions (e.g. to make them more general
and give a name to them)
Related
I have noticed that Valgrind is not detecting resources created with the C API of HDF5 and that are not closed before the end of the program, though I launched it with the option --leak-check=full. Is that normal ?
I often rely on Valgrind before shipping the code, but today I was surprised and frustrated when reviewing the code that it was not detected by it.
valgrind memcheck tool detects memory allocated/released by the 'standard' allocators, such as malloc/free/new/delete/...
If the C API of HDF5 is not using (internally) the above standard allocators,
then there is no way that valgrind could guess by itself what to monitor.
If HDF5 is implementing its own heap management (e.g.based on mmap, and cutting
these blocks in smaller allocated blocks),
then valgrind provides 'client requests' allowing to have some valgrind support
for such non standard allocators. But that all implies some work in the HDF5
sources.
See e.g. http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools
for more information about how to describe such non standard allocators.
Some libraries/tools that are implementing their own non standard allocators
have sometime a way (e.g. an environment variable) to indicate to bypass
these non standard allocators, and still use malloc/free/...
Again, up to HDF5 to provide this.
If now HDF5 really uses the standard allocators and valgrind cannot track
what it does, then file a bug on valgrind bugzilla.
Is there an easy way to check whether the current process has coredumps enabled, so a library can instead return an error before loading encryption keys?
You can use the ulimit command/system call to check the core dump size. If it is zero, no core dumps. Please note that the manual page suggest using getrlimit() and setrlimit() instead of ulimit()
I am in the process of debugging a "corrupted double-linked list" crash. I have seen the source and understand the chunk struct and the fd/bk pointers, etc, so I think I know why this crash has occurred. I am now trying to fix it and I have a couple of questions.
Question #1: where (with respect to the pointer returned from malloc) is the malloc_chunks struct maintained? Are they before the memory block or after it?
Question #2: the malloc_chunks for allocated memory are different from the malloc_chunks for unallocated memory. It appears (??) that the allocated buffer case does not have the fd/bk pointers. Is this correct?
Question #3: what is the recommended approach to debug this type of error? I am assuming that I should put a break point for the malloc_chunks so I can break on when the struct is overwritten. But I am not sure how to access those malloc structs so I can set a break point in gdb.
Any suggestions on how to proceed would be very appreciated.
Thanks,
-Andres
what is the recommended approach to debug this type of error?
The usual way is not to peek into GLIBC internals, but to use a tool like Valgrind or AddressSanitizer, either of which is likely to point you straight at the problem.
Update:
Valgrind crashes ...
You should try building the latest Valgrind version from source, and if that still crashes, report the crash to Valgrind developers.
Chances are the Valgrind problem is already fixed, and building new Valgrind and testing your program with it will still be faster than trying to debug GLIBC internals (heap corruption bugs are notoriously difficult to find by program inspection or debugging).
AddressSanitizer, I thought it was a clang only tool -- I do not think it is available for linux.
Two points:
Clang works just fine on Linux, I use it almost every day,
Recent GCC versions have an equivalent -fsanitize=address option.
There are ways to debug heap overruns without valgrind.
One way is to use a malloc debug library such as Electric Fence. It will make your rogram crash exactly at the moment of accessing an illegal address in the heap.
The other way is to use built-in debug capabilities of GNU malloc. See man mcheck. If you call mcheck_pedantic before the first call to malloc, then every memory block is checked at every allocation. This is very slow but does allow you to isolate the fault.
I am trying to load a kernel module (out-of-tree) and dmesg shows a panic. The kernel is still up though. I guess the module panic'd.
Where to find the core file? I want to use gdb and see whats the problem.
Where to find the core file?
Core files are strictly a user-space concept.
I want to use gdb and see whats the problem.
You may be looking for KGDB and/or Kdump/Kexec.
Normally, whenever the coredump was generated, it will state "core dumped". This could be one high level easy way to confirm whether coredump got generated however, this statement alone cannot guarantee on coredump file availability. The location where coredump is generated is specified through core_pattern to kernel via sysctl. You need to check the information present in core_pattern of your system. Also, note that in case of Ubuntu, it appears that the coredump file size is kept as zero by default which will avoid generation of coredump. So, you might need to check the corefile size ulimit and change it to 'ulimit -c unlimited', if it is zero. The manpage http://man7.org/linux/man-pages/man5/core.5.html explains about various reasons due to which coredump shall not get generated.
However, from your explanation, it appears that you are facing 'kernel oops' as the kernel is still up(unstable state) even though a particular module got panic'd/killed. In such cases, kernel shall print an oops message. Refer to link https://www.kernel.org/doc/Documentation/oops-tracing.txt that has information regarding the kernel oops messages.
Abstract from the link: Normally the Oops text is read from the kernel buffers by klogd and
handed to syslogd which writes it to a syslog file, typically
/var/log/messages (depends on /etc/syslog.conf). Sometimes klogd
dies, in which case you can run dmesg > file to read the data from the
kernel buffers and save it. Or you can cat /proc/kmsg > file, however
you have to break in to stop the transfer, kmsg is a "never ending
file".
printk is used for generating the oops messages. printk does tagging of severity by means of different loglevels /priorities and allows the classification of messages according to their severity. (Different priorities are defined in file linux/kernel.h or linux/kern_levels.h, in form of macros like KERN_EMERG, KERN_ALERT, KERN_CRIT etc..)So, you may need to check the default logging levels in system by using cat /proc/sys/kernel/printk and change it as per your requirement. Also, check whether the logging daemons are up and incase you want to debug kernel, ensure that the kernel is compiled with CONFIG_DEBUG_INFO.
The method to use GDB to find the location where the kernel panicked or oopsed in ubuntu is in the link https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks which can be one of the method that can be used by you for debugging kernel oops.
there won't be a core file.
You should follow the stack trace in kernel messages. type dmesg to see it.
For an embedded ARM system running in-field there is a need to retrieve relevant debug information when a user-space application crash occurs. Such information will be stored in a non-volatile memory so it could be retreived at a later time. All such information must be stored during runtime, and cannot use third-party applications due to memory consumption concerns.
So far I have thought of following:
Signal ID and corresponding PC / memory addresses in case a kernel SIG occurs;
Process ID;
What other information do you think it's relevant in order to indentify the causing problem and be able to do a fast debug afterwards?
Thank you!
Usually, to be able to understand an issue, you'll need every register (from r0 to r15), the CPSR, and the top of the stack (to be able to determine what happened before the crash). Please also note that, when your program is interrupt for any invalid operation (jump to invalid address, ...), the processor goes to an exception mode, while you need to dump the registers and stack in the context of your process.
To be able to investigate, using those data, you also must keep the ELF files (with debug information, if possible) from your build, to be able to interpret the content of your registers and stack.
In the end, the more information you keep, the easier the debug is, but it may be expensive to keep every memory sections used by your program at the time of the failure (as a matter of fact, I've never done this).
In postmortem analysis, you will face some limits :
Dynamically linked libraries : if your crash occurs in a dynamically loaded and linked code, you will also need the lib binary you are using on your target.
Memory corruption : memory corruption usually results in the call of random data as code. On ARM with linux, this will probably lead to a segfault, as you can't go to an other process memory area, and as your data will probably be marked as "never execute", nevertheless, when the crash happens, you may have already corrupted the data that could have allow you to identify the source of the corruption. Postmortem analysis isn't always able to identify the failure cause.