GDB shows the wrong thread in postmortem analysis - linux

I am experiencing a strange behavior of GDB. When running a post-mortem analysis of a core, dumped from a heavily multithreaded application in c++, the debugger commands
bt
where
thread info
never tell me the thread which the program actually crashed. It keeps showing me the thread number 1. As I am used to see this working from other Systems, I am curious if is is a Bug in GDB or if they changed the behavior somehow. Can anyone point me to a solution of this, it is PITA to search through 75 Threads, just to find out something the Debugger already knows.
By the way, I am on Debian Squeeze (6.0.1), the version of GDB is 7.0.1-debian, the System is x86 and completely 32-Bit. On my older Debian (5.x) installation, debugging a core, dumped by the exact same source, delivers me a backtrace of the correct thread, as does GDB on a Ubuntu 10.04 installation.
Thanks!

GDB does not know which thread caused the crash, and simply shows the first thread that it sees in the core.
The Linux kernel usually dumps the faulting thread first, and that is why on most systems you end up in exactly the correct thread once you load core into GDB.
I've never seen a kernel where this was broken, but I've never used Debian 6 either.
My guess would be that this was broken, and then got fixed, and Debian 6 shipped with a broken kernel.
You could try upgrading the kernel on your Debian 6 machine to match e.g. your Ubuntu 10.04, and see if the problem disappears.
Alternatively, Google user-space coredumper does it correctly. You can link it in, and call it from SIGSEGV handler.

Related

QNX: Detect memory leak in program

For: QNX Software Development Platform 6.5.0
I have run into a problem on a QNX 6.5.0 system where my program silently exists, and has been found to be due to a race condition similar to this post here:
Thread stops randomly in the middle of a while loop
I have done some research and found that QNX has some built in tools to monitor memory and detect any leaks that are present in the program, however the instructions I have come across are for the QNX 6.5.0 IDE GUI, and I am running QNX on a server in command line.
example: http://www.qnx.com/developers/docs/6.5.0/index.jsp?topic=%2Fcom.qnx.doc.ide.userguide%2Ftopic%2Fmemory_DetecMemLeaks_.html
I'm kind of stuck with this as there isn't really a simple way to do this as the software designed is for logging purposes and is taking thousands of entries per second, and silently exists after a few hours. So I can't sit here waiting 2 hours each round.
Has anyone had experience with debugging mem leaks in QNX?
Edit: I am also using boost::lockfree::spsc_queue which may be causing the crash.
I was able to solve this by utilising Valgrind. I compiled my program and valgrind for linux and was able to debug my issue this way.

How they do debugging Linux Kernel Core

Now a days debugging become so advanced that even 'core kernel source code' can be debugged using Virtual environment.
But after reading couple of blog related to Kernel Core development it was not clear whether they are debugging using Virtual environment.
They have mentioned that they rely on 'Printing message' rather than using debugging tool, at-least for core component.
So, I Request from 'Linux Kernel Experts' to let me know what is good practice followed while debugging Kernel?
I've tried multiple approaches when trying to debug the kernel.
Sometimes, the easiest way is to just add a few printk statements based on my own conditional values, monitor the serial log and see what's going on. Its especially useful when the function in question is invoked quite often, but you are interested only in a subset of those.
QEMU GDB debugging. I have a buildroot filesystem setup. This means the kernel is lean and it boots up real fast. I start qemu with the -s -S flags, and attach gdb as target remote :1234. Additionally, there aren't very many userspace processes in this setup so its easier to debug the kernel.
VMWare stub. Assuming you are running an Ubuntu VM, it is possible that you can attach gdb to a VMware stub and debug the kernel. Personally, I never have had to pursue this route, but I look forward to trying it out someday.
If you have a kernel for a device that gets stuck in a bootloop and it does not print out any debug information out onto serial, it still might be helpful to try and boot it up using QEMU. Sure, the booting up will probably fail as the kernel tries to load up drivers, but you should be able to attach gdb, get a stack trace and see what the root cause is(perhaps a recursive call).

High CPU usage using Lauterbach T32 with Linux/Qt

I'm experiencing a very high CPU usage (~100%) using the Qt version of T32 on Linux, even when the program is waiting user interaction. The executable is t32marm-qt.
This does not happen when I use the standard Tcl-based t32marm executable.
A strace shows that the executable continuosly cycles on the
clock_gettime(CLOCK_REALTIME,...)
syscall.
The Linux distribution is Mint 14 32-bit (derivation of Ubuntu 12.10).
Has anybody experienced this behavior ?
If so, is it a bug or just a wrong configuration ?
Yes, I have been just confirmed that it is a software bug, fixed in more recent versions of the tool. If you encounter such a problem, update your version.

Linux stuck in CPU soft lockup?

My system is a CentOS 6.3 (running Kernel version 2.6.32-279.el6.x86_64).
I have a loadable kernel module which is a driver that manages a PCIe card.
If I manually insert the driver using insmod while the OS is up and running, the driver loads successfully and is operational.
However, if I try to install the driver using rpm and then reboot the system, during startup the OS gets stuck spitting out the following "soft lockup" message for ALL the CPU cores, except for one core that is in "soft lockup" in one of the threads created by my driver.
BUG: soft lockup - CPU#X stuck for 67s! [migration/8:36]
.......(same above message for all cores except one)
BUG: soft lockup - CPU#10 stuck for 67s! [mydriver_thread/8:36]
(one core is locked up in one of the threads in my driver).
I searched the net quite a bit for info on this kernel msg / bug, and there are quite a bit of posts about it, none on what causes it or how to debug. Any help with the following questions would really be appreciated:
I am not able to log into the system, I think it's because all the cores are in a "soft lockup" state, and hence cannot trigger a kernel dump from shell prompt. I enabled SysRq, and tried to trigger a kernel dump with SysRq key combo, but no luck. It seems the system is not responding to keyboard (not even responding to CapsLock button). Any suggestions on how I can trigger a kernel dump in this circumstance?
I can imagine the possibly of my driver thread causing "soft lockup". But how can the "migration" thread (a kernel thread) be in a "soft lockup" just because of my driver?
From browsing the net, the "migration" thread is used to move tasks from one cpu to another. Can someone please help me understand what this thread exact does? And how it can be affected by other threads, if at all.
I had a very similar problem on my desktop. It would soft lockup very frequently - about once a day or so.
It turns out it was because I was running on an Intel Haswell. It seems that the Haswell/Broadwell series of Intel processors have a bug which can cause system instability. This bug was fixed in a microcode update.
Check if CentOS offers an intel-microcode package, and install it. Make sure you configure grub to load it as the initial ramdisk before it loads initramfs.
Personally, I upgraded my microcode by booting into Windows and running a BIOS Update. You can check if the micrcode was actually updated by comparing the output of grep 'microcode' /proc/cpuinfo before and after the update.

GDB not breaking on SIGSEGV

I'm trying to debug an application for an ARM processor from my x86 box. I some followed instructions from someone that came before on getting a development environment setup. I've got a version of gdbserver that has been cross-compiled for the ARM processor and appears to allow me to connect to it via my ARM-aware gdb on my box.
I'm expecting that when the process I've got gdb attached to crashes (from a SIGSEGV or similar) it will break so that I can check out the call stack.
Is that a poor assumption? I'm new to the ARM world and cross-compiling things, is there possibly a good resource to get started on this stuff that I'm missing?
It depends on the target system (the one which uses an ARM processor). Some embedded systems detect invalid memory accesses (e.g. dereferencing NULL) but react with unconditional, uncatchable system termination (I have done development on such a system). What kind of OS is the target system running ?
So i assume that the gdb client is able to connect to gdbserver and you are able to put the break point on the running process right?
If all the above steps are successful then you should put the break point before the instruction which crashes, lets say if you dont know where is it crashing then i would say once the application is crashed, the core will be generated, take that core from the board. Then compile the source code again with debug option using -g option(if binaries are stripped) and do the offline ananlysis of core. something like below
gdb binary-name core_file
Then once you get gdb prompt ,give below commands
gdb thread apply all bt
The above command will give you the complete backtrace of all the threads, remember that binaries should not be stripped and the proper path of all the source code and shared lib should be available.
you can switch between threads using below command on gdb prompt
gdb thread thread_number
If the core file is not getting generated on the board then try below command on board before executing the application
ulimit -c unlimited

Resources