general protection error on linux based OS - linux

I am getting a General protection error while I run a binary. But there is no core dump. How do I debug the problem ?
Is this the problem with the "ld" i am using ?
kernel: testbin[24879] general protection ip:7fd7271585e0 sp:7fff1ef55070 error:0 in ld-2.14.so[7fd727142000+20000]

Before debugging, re-compile your program with debugging symbols (-g option), else you wont have enough detail information to effectively and quickly debug it (like filename, function, line).
There are more programs for debugging programs but for now I will suggest you valgrind.
So run your program through valgrind:
valgrind /path/to/your/program
and then reproduce steps to cause your program to General Protection Fault.
If the software isnt yours, you'll have to contact the author and get support.

Related

Debug a futex lock

I have a process waiting on a futex:
# strace -p 5538
Process 5538 attached - interrupt to quit
futex(0x7f86c9ed6a0c, FUTEX_WAIT, 20, NULL
How can I best debug such a situation? Can I identify who holds the futex? Are there any tools similar to ipcs and ipcrm but for futexes?
Try using gdb -p *PID* and then run where or bt to see a backtrace.
It won't be spectacularly useful with binaries and libraries that have had their debugging symbols stripped, but you may be able to deduce a fair bit from the context. It might be able to indicate to you which part of a complex process is hanging, and then you could examine the right part of the sources to search for the lock.
I have the same problem with a piece of c++ code. Running ubuntu 12.10 64bit. It looks to like a similar problem in 2007, where the libc was buggy (and maybe still is?).
I start a pthread which runs a traceroute in a system call. Printf before and after the system indicate, that the operating system hangs on the system call, WITHOUT executing the traceroute.
I am not sure if my linux is broken once again because of the ubuntu update, or if it's a libc related bug. Since a lot of applications seem to have "similar" problems, I assume it's stuck somewhere in the userspace.
My c++ code runs perfectly on 32bit systems and even 64bit osx, so i assume that ubuntu 12.10 + 64bit libc combination is broken.

gdb - No hardware breakpoint support in the target

I'm tring to set hardware breakpoint using gdb hbreak command
hbreak *address
but I'm getting the following error: "No hardware breakpoint support in the target".
Is there anyway to fix this problem?
Try the start command first. GDB often says that when the program has not been started yet even if there is hardware support (this is a very misleading message in this context).
Your hardware may not be supporting hardware breakpoints or perhaps you are out of available hw-breakpoint registers. You can still use software breakpoints as a fix.

How to test the kernel for kernel panics?

I am testing the Linux Kernel on an embedded device and would like to find situations / scenarios in which Linux Kernel would issue panics.
Can you suggest some test steps (manual or code automated) to create Kernel panics?
There's a variety of tools that you can use to try to crash your machine:
crashme tries to execute random code; this is good for testing process lifecycle code.
fsx is a tool to try to exercise the filesystem code extensively; it's good for testing drivers, block io and filesystem code.
The Linux Test Project aims to create a large repository of kernel test cases; it might not be designed with crashing systems in particular, but it may go a long way towards helping you and your team keep everything working as planned. (Note that the LTP isn't proscriptive -- the kernel community doesn't treat their tests as anything important -- but the LTP team tries very hard to be descriptive about what the kernel does and doesn't do.)
If your device is network-connected, you can run nmap against it, using a variety of scanning options: -sV --version-all will try to find versions of all services running (this can be stressful), -O --osscan-guess will try to determine the operating system by throwing strange network packets at the machine and guessing by responses what the output is.
The nessus scanning tool also does version identification of running services; it may or may not offer any improvements over nmap, though.
You can also hand your device to users; they figure out the craziest things to do with software, they'll spot bugs you'd never even think to look for. :)
You can try following key combination
SysRq + c
or
echo c >/proc/sysrq-trigger
Crashme has been known to find unknown kernel panic situations, but it must be run in a potent way that creates a variety of signal exceptions handled within the process and a variety of process exit conditions.
The main purpose of the messages generated by Crashme is to determine if sufficiently interesting things are happening to indicate possible potency. For example, if the mprotect call is needed to allow memory allocated with malloc to be executed as instructions, and if you don't have the mprotect enabled in the source code crashme.c for your platform, then Crashme is impotent.
It seems that operating systems on x64 architectures tend to have execution turned off for data segments. Recently I have updated the crashme.c on http://crashme.codeplex.com/ to use mprotect in case of __APPLE__ and tested it on a MacBook Pro running MAC OS X Lion. This is the first serious update to Crashme since 1994. Expect to see updated Centos and Freebsd support soon.

Program runs with gdb but doesn't run with ./ProgramName

I am writing an editor in assembly 64bit mode in linux. It runs correctly when I debug the program in GDB but it does not run correctly when I run it normally it means it has runtime errors when I use ./programName .
You're probably accessing uninitialized data or have some kind of memory corruption problem. This would explain the program behaving differently when run in the debugger - you're seeing the results of undefined behavior.
Run your program through valgrind's memcheck tool and see what it outputs. Valgrind is a powerful tool that will identify many runtime errors on Linux, including a full stack trace to the error.
If GDB disabling ASLR is what makes it work, perhaps set disable-randomization off in GDB will let your reproduce a crash inside GDB so you can debug it. Force gdb to load shared library at randomized address.
Otherwise enable core dumps from your program, and use GDB on the core dump.
gdb ./prog core.1234.
On x86, you can insert a ud2 instruction in your asm source to intentionally cause a crash at whatever point you want in your code, if you want to get a coredump to examine registers/memory at some point before it crashes on its own. All architectures have an undefined instruction you can use, but I only know the mnemonic for x86's off the top of my head.

Under Linux, how do I track down a memory leak in pre-built software?

I have a new Ubuntu Linux Server 64bit 10.04 LTS.
A default install of Mysql with replication turned on appears to be leaking memory.
However, we've tried going back to an earlier version and memory is still leaking but I can't tell where.
What tools/techniques can I use to pinpoint where memory is leaking so that I can rectify the problem?
Valgrind, http://valgrind.org/, can be very useful in these situations. It runs on unmodified executables but it does help tremendously if you can install the debugging symbols. Be sure to use the --show-reachable=yes flag as the leaked memory may still be reachable in some way but just not the way you want it. Also --trace-children in case of a fork. You'll likely have to track down in the start-up script where the executable is called and then add something like the following:
valgrind --show-reachable=yes --trace-children=yes --log-file=/path/to/log SQL-cmdline sqlargs
The man page has lots of other potentially useful options.
Have you tried the MySQL mailing list? Something like this would certainly be of interest to them if you can reproduce it in a straightforward manner.
You can use Valgrind as ninjalj suggests, but I doubt you'll get that close to anything useful. Even if you see a real leak (and they will be hard enough to validate), tracking down the root cause through the C call stacks will likely be very annoying (for example if the leak is triggered by a particular SQL pattern or stored procedure, you'll be looking at the call stack from the resultant optimized query, and not the original calls, which are likely in a different language).
Normally you might have no recourse, and have to resort to tracking it down through callstacks and iterative testing, but you have the source code to MySQL (including the source for the exact default package install), so you can use more advanced tools like MemoryScape (or at least build with symbols in order to provide Valgrind more food for thought).
Try using valgrind.
A very good and powerful tool, which is installed/available for most distributions is Valgrind.
It has a plethora of different options and is pretty much (as far as I've seen) the default profiler under linux systems.

Resources