For: QNX Software Development Platform 6.5.0
I have run into a problem on a QNX 6.5.0 system where my program silently exists, and has been found to be due to a race condition similar to this post here:
Thread stops randomly in the middle of a while loop
I have done some research and found that QNX has some built in tools to monitor memory and detect any leaks that are present in the program, however the instructions I have come across are for the QNX 6.5.0 IDE GUI, and I am running QNX on a server in command line.
example: http://www.qnx.com/developers/docs/6.5.0/index.jsp?topic=%2Fcom.qnx.doc.ide.userguide%2Ftopic%2Fmemory_DetecMemLeaks_.html
I'm kind of stuck with this as there isn't really a simple way to do this as the software designed is for logging purposes and is taking thousands of entries per second, and silently exists after a few hours. So I can't sit here waiting 2 hours each round.
Has anyone had experience with debugging mem leaks in QNX?
Edit: I am also using boost::lockfree::spsc_queue which may be causing the crash.
I was able to solve this by utilising Valgrind. I compiled my program and valgrind for linux and was able to debug my issue this way.
Related
The question is in the subject. Let me explain "why".
I am running my application on Red Hat Enterprise Linux Server 7.7. When I was checking performance using htop I found that few threads take too much of the CPU.
I added some debug logging and found that the threads with high CPU are not created in my code. So I assume that these CPU greedy threads are created in 3rd party shared libs which I am using.
So there is a question:
Say I have a thread id (17405). Is there any way to find which shared lib started this thread?
I apologize if the question is too trivial - I started work with Linux OS not long time ago.
Thank you
Actually I found solution which looks satisfactory to me.
I start gdb, attach to my process, then i can list all threads in the process "info threads", then i select thread I am interested in, and - voila - i can see the stack trace by issuing bt command.
Works I think
I'm experiencing a very high CPU usage (~100%) using the Qt version of T32 on Linux, even when the program is waiting user interaction. The executable is t32marm-qt.
This does not happen when I use the standard Tcl-based t32marm executable.
A strace shows that the executable continuosly cycles on the
clock_gettime(CLOCK_REALTIME,...)
syscall.
The Linux distribution is Mint 14 32-bit (derivation of Ubuntu 12.10).
Has anybody experienced this behavior ?
If so, is it a bug or just a wrong configuration ?
Yes, I have been just confirmed that it is a software bug, fixed in more recent versions of the tool. If you encounter such a problem, update your version.
Using NDK r8c, Eclipse 4.2, Windows 7 64
I've used remote debuggers before (on other platforms, via gigabit ethernet) for large C++ codebases that felt no different than local debugging. The Java debugger that comes with the SDK runs fast too. Therefore I'm quite baffled why gdb is so slow to connect and step over lines of code.
In my current application, which is around 20 static libraries and 1500 source files, it takes about 15 seconds to connect, and about 2 seconds to step. I'm more concerned about stepping.
Has anyone ever profiled gdb to see what the problem is? If so, any suggestions?
I have. My cohorts and I at NVIDIA have contributed several commits to AOSP to address this problem, although our focus has been on shared libraries (symbol load performance, and pending symbol resolution.) We have sped up solib load processing by a factor of 6x. (Although, after doing our own work we discovered that 3x of that 6x had already been solved upstream by GNU, in 7.5... so we abandoned our reinvention, and submitted the relevant 7.5 patches up to Google's NDK repository, which was based on the older 7.3 GDB.) I believe all of our speedups are present in r8d... but I haven't checked.
I cannot think of any reason why static libraries would slow things down, but I must admit I haven't given any thought to them. Do you have a specific reason for believing so, or was that just comment to give perspective about the size and scope of your debugging needs?
We have begun to work on the stepping problem, but don't have anything to share yet. Basically, the bottleneck is ADB (especially on Windows.) Additionally, there is a lot of chatty communication between GDB and gdbserver, when stepping, especially if you are using an IDE with local window, register window, expression window, stack window, etc., all updating with each step. That's a lot of chatter that could likely be optimized for the IDE use-case.
Just some of the fixes that we are considering for speeding up stepping will be IDE-specific:
Using python scripting to pre-process watch expressions in GDB, rather than in the IDE.
Implementing "super-packets" communicating between GDB and gdbserver... packets that encapsulate IDE-specific communications in a way that minimizes chatter between GDB and gdbserver.
We intend to share all of this with the Android community.
I am experiencing a strange behavior of GDB. When running a post-mortem analysis of a core, dumped from a heavily multithreaded application in c++, the debugger commands
bt
where
thread info
never tell me the thread which the program actually crashed. It keeps showing me the thread number 1. As I am used to see this working from other Systems, I am curious if is is a Bug in GDB or if they changed the behavior somehow. Can anyone point me to a solution of this, it is PITA to search through 75 Threads, just to find out something the Debugger already knows.
By the way, I am on Debian Squeeze (6.0.1), the version of GDB is 7.0.1-debian, the System is x86 and completely 32-Bit. On my older Debian (5.x) installation, debugging a core, dumped by the exact same source, delivers me a backtrace of the correct thread, as does GDB on a Ubuntu 10.04 installation.
Thanks!
GDB does not know which thread caused the crash, and simply shows the first thread that it sees in the core.
The Linux kernel usually dumps the faulting thread first, and that is why on most systems you end up in exactly the correct thread once you load core into GDB.
I've never seen a kernel where this was broken, but I've never used Debian 6 either.
My guess would be that this was broken, and then got fixed, and Debian 6 shipped with a broken kernel.
You could try upgrading the kernel on your Debian 6 machine to match e.g. your Ubuntu 10.04, and see if the problem disappears.
Alternatively, Google user-space coredumper does it correctly. You can link it in, and call it from SIGSEGV handler.
A linux machine freezes few hours after booting and running software (including custom drivers). I'm looking a method to debug such problem. Recently, there has been significant progress in Linux Kernel debugging techniques, hasn't it?
I kindly ask to share some experience on the topic.
If you can reproduce the problem inside a VM, there is indeed a fairly new (AFAIK) technique which might be useful: debugging the virtual machine from the host machine it runs on.
See for example this:
Debugging Linux Kernel in VMWare with Windows host
VMware Workstation 7 also enables a powerful technique that lets you record system execution deterministically and then replay it as desired, even backwards. So as soon as the system crashes you can go backwards and see what was happening then (and even try changing something and see if it still crashes). IIRC I read somewhere you can't do this and debug the kernel using VMware/gdb at the same time.
Obviously, you need a VMM for this. I don't know what VMM's other than VMware's VMM family support this, and I don't know if any free VMware versions support this. Likely not; one can't really expect a commercial company to give away everything for free. The trial version is 30 days.
If your custom drivers are for hardware inside the machine, then I suppose this probably won't work.
SystemTap seems to be to Linux what Dtrace is to Solaris .. however I find it rather hostile to use. Still, you may want to give it a try. NB: compile the kernel with debug info and spend some time with the kernel instrumentation hooks.
This is why so many are still using printk() after empirically narrowing a bug down to a specific module.
I'm not recommending it, just pointing out that it exists. I may not be smart enough to appreciate some underlying beauty .. I just write drivers for odd devices.
There are many and varied techniques depending on the sort of problems you want to debug. In your case the first question is "is the system really frozen?". You can enable the magic sysrq key and examine the system state at freeze and go from there.
Probably the most directly powerful method is to enable the kernel debugger and connect to it via a serial cable.
One option is to use Kprobes. A quick search on google will show you all the information you need. It isn't particularly hard to use. Kprobes was created by IBM I believe as a solution for kernel debugging. It is essentially a elaborate form of printk() however it allows you to handle any "breakpoints" you insert using handlers. It may be what you are looking for. All you need to do is write and 'insmod' a module into the kernel which will handle any "breakpoints" hit that you specify in the module.
Hope that can be a useful option...
How I debug this kind of bug, was to run my OS inside the VirtualBox, and compile the kernel with kgdb builtin. Then I setup a serial console on the VirtualBox so that I can gdb to the kernel inside the VirtualBox's OS via the serial console. Anytime the OS hang, just like magic sysrq key, I can enter ctrl-c on the gdb to stop and understand the kernel at that point in time.
Normally kernel stack tracing is just too difficult to pinpoint the culprit process, so the best way I think is still generic "top" command, just looking at the application logs to see what are the cause of hanging - this will need a reboot to see the log of course.