In Linux environment, when getting "glibc detected *** free(): invalid pointer" errors, how do I identify which line of code is causing it?
Is there a way to force an abort? I recall there being an ENV var to control this?
How to set a breakpoint in gdb for the glibc error?
I believe if you setenv MALLOC_CHECK_ to 2, glibc will call abort() when it detects the "free(): invalid pointer" error. Note the trailing underscore in the name of the environment variable.
If MALLOC_CHECK_ is 1 glibc will print "free(): invalid pointer" (and similar printfs for other errors). If MALLOC_CHECK_ is 0, glibc will silently ignore such errors and simply return. If MALLOC_CHECK_ is 3 glibc will print the message and then call abort(). I.e. its a bitmask.
You can also call mallopt(M_CHECK_ACTION, arg) with an argument of 0-3, and get the same result as with MALLOC_CHECK_.
Since you're seeing the "free(): invalid pointer" message I think you must already be setting MALLOC_CHECK_ or calling mallopt(). By default glibc does not print those messages.
As for how to debug it, installing a handler for SIGABRT is probably the best way to proceed. You can set a breakpoint in your handler or deliberately trigger a core dump.
I recommend you get valgrind:
valgrind --tool=memcheck --leak-check=full ./a.out
In general, it looks like you might have to recompile glibc, ugh.
You don't say what environment you're running on, but if you can recompile your code for OS X, then its version of libc has a free() that listens to this environment variable:
MallocErrorAbort If set, causes abort(3) to be called if an
error was encountered in malloc(3) or
free(3) , such as a calling free(3) on a
pointer previously freed.
The man page for free() on OS X has more information.
If you're on Linux, then try Valgrind, it can find some impossible-to-hunt bugs.
How to set a breakpoint in gdb?
(gdb) b filename:linenumber
// e.g. b main.cpp:100
Is there a way to force an abort? I recall there being an ENV var to control this?
I was under the impression that it aborted by default. Make sure you have the debug version installed.
Or use libdmalloc5: "Drop in replacement for the system's malloc',realloc', calloc',free' and other memory management routines while providing powerful debugging facilities
configurable at runtime. These facilities include such things as memory-leak tracking, fence-post write detection, file/line number reporting, and general logging of statistics."
Add this to your link command
-L/usr/lib/debug/lib -ldmallocth
gdb should automatically return control when glibc triggers an abort.
Or you can set up a signal handler for SIGABRT to dump the stacktrace to a fd (file descriptor). Below, mp_logfile is a FILE*
void *array[512 / sizeof(void *)]; // 100 is just an arbitrary number of backtraces, increase if you want.
size_t size;
size = backtrace (array, 512 / sizeof(void *));
backtrace_symbols_fd (array, size, fileno(mp_logfile));
Related
I am hacking a plugin that requires the host-application to allow an executable stack.
This can be achieved by running
execstack -s /path/to/my/host
However, if the host application lacks the executable stack flag (e.g. the above command has not been called), running my plugin simply crashes the host:
Program received signal SIGSEGV, Segmentation fault.
I would like to avoid the crash, e.g. by disabling parts of my code automatically if the executable stack lag is not set.
The check should happen at runtime during the plugin initialization
However, I haven't found any documentation on how to detect the availability of the executable stack at runtime (without crashing).
The only thing I have found so far is execstack -q /path/to/my/host, but that seems hacky to run from within a plugin loaded by /path/to/my/host.
so it seems there is better solution to my problem than querying the protection scheme in runtime: explicitely flag a memory to be executable, using
int mprotect(void *addr, size_t len, int prot);
this basically adds an exception to the executable stack protection in a well defined memory region.
I'm implementing the backend for a JavaScript JIT compiler that produces x86 code. Sometimes, as the result of bugs, I get segmentation faults. It can be quite difficult to trace back what caused them. Hence, I've been wondering if there would be some "easy" way to trap segmentation faults and other such crashes, and get the address of the instruction that caused the fault. This way, I could map the address back to compiled x86 assembly, or even back to source code.
This needs to work on Linux, but ideally on any POSIX compliant system. In the worst case, if I can't catch the seg fault and get the IP in my running JIT, I'd like to be able to trap it outside (kernel log?), and perhaps just have the compiler dump a big file with mappings of addresses to instructions, which I could match with a Python script or something.
Any ideas/suggestions are appreciated. Feel free to share your own debugging tips if you've ever worked on a compiler project of your own.
If you use sigaction, you can define a signal handler that takes 3 arguments:
void (*sa_sigaction)(int signum, siginfo_t *info, void *ucontext)
The third argument passed to the signal handler is a pointer to an OS and architecture specific data structure. On linux, its a ucontext_t which is defined in the <sys/ucontext.h> header file. Within that, uc_mcontext is an mcontext_t (machine context) which for x86 contains all the registers at the time of the signal in gregs. So you can access
ucontext->uc_mcontext.gregs[REG_EIP] (32 bit mode)
ucontext->uc_mcontext.gregs[REG_RIP] (64 bit mode)
to get the instruction pointer of the faulting instruction.
I am writing an editor in assembly 64bit mode in linux. It runs correctly when I debug the program in GDB but it does not run correctly when I run it normally it means it has runtime errors when I use ./programName .
You're probably accessing uninitialized data or have some kind of memory corruption problem. This would explain the program behaving differently when run in the debugger - you're seeing the results of undefined behavior.
Run your program through valgrind's memcheck tool and see what it outputs. Valgrind is a powerful tool that will identify many runtime errors on Linux, including a full stack trace to the error.
If GDB disabling ASLR is what makes it work, perhaps set disable-randomization off in GDB will let your reproduce a crash inside GDB so you can debug it. Force gdb to load shared library at randomized address.
Otherwise enable core dumps from your program, and use GDB on the core dump.
gdb ./prog core.1234.
On x86, you can insert a ud2 instruction in your asm source to intentionally cause a crash at whatever point you want in your code, if you want to get a coredump to examine registers/memory at some point before it crashes on its own. All architectures have an undefined instruction you can use, but I only know the mnemonic for x86's off the top of my head.
I have a library of dubious origins which is identified by file as a 32 bit executable. However, when I try to dlopen it on a 32 bit CentOS 4.4 machine, dlopen terminates with SIGFPE. Surely if there was something wrong with the format of the binary then dlopen should be handling an error?
So the question is: What kinds of problems can cause dlopen to emit SIGFPE?
Some possible reasons are:
Division by zero (rule this out with gdb)
Architecture mismatch (did you compile the DSO yourself on the same architecture? or is it prebuilt?)
ABI compatibility problems (loading a DSO built for one Linux distro on a different one).
Here is an interesting discussion regarding hash generation in the ELF format in GNU systems where an ABI mismatch can cause SIGFPE on systems when you mix and match DSOs not built on that distro/system.
Run GDB against your executable with:
]$ gdb ./my_executable
(gdb) run
When the program crashes, get a backtrace with
(gdb) bt
If the stack ends in do_lookup_x () then you likely have the same problem and should ensure your DSO is correct for the system you are trying to load it on ... However you do say it has dubious origins so the problem is probably an ABI problem similar to the one described.
Get a non-dubious library / executable! ;)
Good Luck
Why does the linux kernel generate a segfault on stack overflow? This can make debugging very awkward when alloca in c or fortran creation of temporary arrays overflows. Surely it mjust be possible for the runtime to produce a more helpful error.
You can actually catch the condition for a stack overflow using signal handlers.
To do this, you must do two things:
Setup a signal handler for SIGSEGV (the segfault) using sigaction, to do this set the SO_ONSTACK flag. This instructs the kernel to use an alternative stack when delivering the signal.
Call sigaltstack() to setup the alternate stack that the handler for SIGSEGV will use.
Then when you overflow the stack, the kernel will switch to your alternate stack before delivering the signal. Once in your signal handler, you can examine the address that caused the fault and determine if it was a stack overflow, or a regular fault.
The "kernel" (it's actually not the kernel running your code, it's the CPU) doesn't know how your code is referencing the memory it's not supposed to be touching. It only knows that you tried to do it.
The code:
char *x = alloca(100);
char y = x[150];
can't really be evaluated by the CPU as you trying to access beyond the bounds of x.
You may hit the exact same address with:
char y = *((char*)(0xdeadbeef));
BTW, I would discourage the use of alloca since stack tends to be much more limited than heap (use malloc instead).
A stack overflow is a segmentation fault. As in you've broken the given bounds of memory that the you were initially allocated. The stack of of finite size, and you have exceeded it. You can read more about it at wikipedia
Additionally, one thing I've done for projects in the past is write my own signal handler to segfault (look at man page signal (2)). I usually caught the signal and wrote out "Fatal error has occured" to the console. I did some further stuff with checkpoint flags, and debugging.
In order to debug segfaults you can run a program in GDB. For example, the following C program will segfault:
#segfault.c
#include
#include
int main()
{
printf("Starting\n");
void *foo=malloc(1000);
memcpy(foo, 0, 100); //this line will segfault
exit(0);
}
If I compile it like so:
gcc -g -o segfault segfault.c
and then run it like so:
$ gdb ./segfault
GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) run
Starting program: /tmp/segfault
Starting
Program received signal SIGSEGV, Segmentation fault.
0x4ea43cbc in memcpy () from /lib/libc.so.6
(gdb) bt
#0 0x4ea43cbc in memcpy () from /lib/libc.so.6
#1 0x080484cb in main () at segfault.c:8
(gdb)
I find out from GDB that there was a segmentation fault on line 8. Of course there are more complex ways of handling stack overflows and other memory errors, but this will suffice.
Simply use Valgrind. It will point out all your memory allocation mistakes with excruciating preciseness.
A stack overflow does not necessarily yield a crash. It may silently trash data of your program but continue to execute.
I wouldn't use SIGSEGV handler kludges but instead fix the original problem.
If you want automated help, you can use gcc's -Wstack-protector option, which will spot some overflows at runtime and abort the program.
valgrind is good for dynamic memory allocation bugs, but not for stack errors.
Some of the comments are helpful, but the problem is not of memory allocation errors. That is there is no mistake in the code. It's quite a nuisance in fortran where the runtime allocates temporary values on the stack. Thus a command such as
write(fp)x,y,z
can trigger are segfault with no warning. The technical support for the intel Fortran compiler say that there is no way that the runtime library can print a more helpful message. However if Miguel is right than this should be possible as he suggests. So thanks a lot. The remaining question then is how do I firstly find the address of the seg fault and the figure out if it came from a stack overflow or some other problem.
For others who find this problem there is a compiler flag which puts temporary varibles above a certain size on the heap.