LeakSanitizer not working under gdb in Ubuntu 18.04? - linux

I've upgraded my Linux development VM from Ubuntu 16.04 to 18.04 recently, and noticed one thing that has changed. This is on x86-64. With 16.04, I've always had this workflow where I'd build the project I'm working on with gcc (5.4, the stock version in 16.04) and -fsanitize=address and -O0 -g, and then run the executable through gdb (7.11.1, also the version that came with Ubuntu). This worked fine, and at the end, LeakSanitizer would produce a leak report if it detected memory leaks.
In 18.04, this doesn't seem to work anymore; LeakSanitizer complains about running under ptrace:
==5820==LeakSanitizer has encountered a fatal error.
==5820==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==5820==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
Then the program crashes:
Thread 1 "spyglass" received signal SIGABRT, Aborted.
__GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
I'm not sure what is causing the new behavior. On 18.04 I'm building with the default gcc shipped (7.3.0), using -fsanitize=address -O0 -g and debugging with the default gdb (8.1.0). Can the old behavior be somehow re-enabled? Or do I need to change my workflow and detach from the program before killing it to get a leak report?

LeakSanitizer internally uses ptrace, probably to suspend all threads such that it can scan for leaks without false positives (see issue 9). Only one application can use ptrace, so if you run your application under gdb or strace, then LeakSanitizer won't be able to attach via ptrace.
If you are not interested in leak debugging, disable it:
export ASAN_OPTIONS=detect_leaks=0
If you do want to enable leak debugging, you must detach the debugger before LeakSanitizer starts scanning. To be able to attach a debugger shortly afterwards, sleep a bit (for example, 10 seconds):
export ASAN_OPTIONS=sleep_before_dying=10
./program
Then in another shell, attach to the application again:
gdb -q -p $(pidof program)
For more a description of the above (and other) options, see https://github.com/google/sanitizers/wiki/AddressSanitizerFlags.

Related

ctrl+c not killing a process

I have a process that responds perfectly well to CTRL+C on my local machine. And it appears to also be working.
But on an EC2 instance it freezes and becomes a defunct or zombie process.
kill -9 <PID> doesn't remove it and I have to reboot the EC2 instance to clean it up properly.
When it runs it also loads an in house developed shared library that I have no influence over and have no access to any source code in it to see what it's doing. This library also uses CUDA and appears to start multiple threads.
I tried installing a signal handler on the main thread and it does get installed but calling _exit doesn't shut the whole process down, it seems to still be waiting.
Why might be happening here that is preventing CTRL+C from exiting the process cleanly? Can I override or examine what the other threads could be doing?
Ah, I found the problem. I'll leave the question as it stands in case it helps someone else.
It turns out that on my PC, I have a GTX 680 and the drivers get installed when installing CUDA. On EC2 the card is a GRID K520, and the driver installed by CUDA doesn't work. I downloaded and installed the latest stable card specific driver and it then worked.
The discovery was made after running nvidia-smi and it wouldn't print any details about the card but rather would just show Killed. Run nvidia-smi again and it would lock up the console.
Unfortunately, I hadn't tested that CUDA app's were working but relied on the driver appearing to print a message in the log saying it was loaded and assumed it was working.
Updating the driver consisted of downloading the latest driver from nvidia (use the .run version). Then:
sudo modprobe -r nvidia_uvm
sudo modprobe -r nvidia
Finally install it with a command like:
sudo ./NVIDIA-Linux-x86_64-3xx.xx.xx.run
I then rebooted the instance and verified it with nvidia-smi
This link was insightful - CUDA 7.5 unstable on EC2

How can I change how eclipse invokes gdb in linux?

In short, I need to understand how to configure eclipse to run "optirun gbd" instead of "gdb". An explanation of what exactly I'm trying to accomplish follows.
I need to run my debug app in eclipse such that it will use the nvidia optimus card instead of the integrated card. My app requires opengl support that is only available this way.
I've got a laptop with an nvidia optimus video card. I'm running linux (ubuntu). I've successfully set up bumblebee such that I can take advantage of the optimus technology. This requires that, to use the nvidia card, I run a given program "foo" with the program "optirun:" optirun foo.
I need to configure eclipse to launch my program in debug mode under optirun. If I run from command line: optirun gdb app everything works as expected.
Edit: Changing the "GDB Debugger" field inside the debug configuration to optirun gdb does not work. Lanching eclipse by optirun eclipse does, however. But this is a detriment to battery life.
Go to "Debug Configurations", open "Debugger" tab. Change "GDB debugger" from gdb to optirun gdb.
Works in Eclipse Juno, Ubuntu 12.04.
Since I'm sure eclipse uses the shell to execute the program, a workaround is to alias gdb to optirun gdb in ~/.bashrc
I look into this issue today and I found another solution. As long as you have Bumblebee installed (http://www.bumblebee-project.org/) and you know you can attach optirun to an executable (try with glxgears for example) you can attach it to cuda-gdb.
What I did is create a script:
#!/bin/bash
optirun /usr/local/cuda/bin/cuda-gdb $*
And save it to /usr/local/cuda/bin or somewhere else it doesn't matter, with the appropriate permissions for execution (755).
What it does is very simple, it runs optirun cuda-gdb args where args is whatever the command line sends it.
In terminal just run opti_cuda-gdb with or without arguments.
For example I named it opti_cuda-gdb and placed it in that directory (which conveniently is added to the path if CUDA is properly configured).
If you use an IDE to develop, like say Netbeans, point the debbuger executable to that script.
I've been successfully compiled and debbuged code using CuSparse and CuBlas with NetBeans running in a SAMSUNG SF410 with Nvidia Optimus and Ubuntu 11.04 and 11.10.
I'm open to provide further details if you think I omitted something.

MacVim caught deadly signal

When I start MacVim within terminal I get a nasty error message saying it has caught a deadly singal SEGV. I really don't know what's going on. Like wise when I start the application by double clicking it on my Doc the app opens but I can't do anything.
Is there any way to fix this?
I have had the same problem, and traced it to the Command-T plugin containing native extensions that were compiled against a different version of Ruby (1.8) to the one in my environment (1.9).
I recommend disabling all of your plugins/addons, and re-enabling them one by one.
You might get more of a hint what's going wrong by running MacVim's vim process inside gdb (Xcode required):
paul#paulbookpro ~ βΈ© gdb /Applications/MacVim.app/Contents/MacOS/Vim [11:20:55]
GNU gdb 6.3.50-20050815 (Apple version gdb-1705) (Fri Jul 1 10:50:06 UTC 2011)
...
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries ................ done
(gdb) run
Starting program: /usr/local/Cellar/macvim/7.3-61/MacVim.app/Contents/MacOS/Vim
Hopefully gdb will report some useful information about the segfault, and you can use commands like backtrace to get more data.
Good luck.
Signal SEGV means "segmentation violation" and generally indicates a bug in the application. You can try reinstalling it, or contact the software vendor.

Is CUDA in installed correctly on my Ubuntu 10.04? Some samples don't run.

I am trying to install CUDA on a server running Ubuntu 10.04.
I followed the NVDIA instructions and installed the "CUDA toolkit for Ubuntu Linux 10.04", "GPU Conputing SDK code samples",and "Developer Drivers for Linux (260.19.26) (64 bit)", my system is 64 bit. This installation seems successful. everything downloaded from http://developer.nvidia.com/object/cuda_3_2_downloads.html#Linux
According to the messages of the installation packages, I added /usr/local/cuda/bin to PATH, /usr/local/cuda/lib64:/usr/local/cuda/lib to LD_LIBRARY_PATH
Then, I tried to run the sample programs. The strange things is, some of them can be run, and some of them don't even through they can be made with no problem.
For example,
- convolutionSeparable will just stop there without any message, I can kill it by ctrl + c.
matrixMul outputs a line
Device 0: "Quadro 5000" with Compute 2.0 capability
and stop there, again can be killed by Ctrl+C
clock works, outputs
PASSED
time = 12574
Press ENTER to exit...
simpleMultiCopy outputs PASSED
MonteCarlo outputs PASSED
simpleZeroCopy outputs PASSED
bandwidthTest stops there with blinking cursor for ever.
What is wrong with this?! How can I check if my CUDA installation is successful ? What is wrong with those programs don't run? They don't even have a error message.
I would start by upgrading the driver to 260.19.36, which can be found here. Then I would suggest running nvidia-smi -a to see if the driver is happy. Then I second the suggestion to run deviceQuery to see if the CUDA Toolkit 3.2 is working.
If deviceQuery output appears nominal, then I would start adding printf's to see where things go awry in matrixMul.
What does deviceQuery say? Also check the output of dmesg right after you run that program to see if you can figure out whats up.
Another tip, if you still are having issues, is try running:
strace ./deviceQuery 2> out.txt
Then check out.txt to see if you can find any clues why this error is occuring.
I have similar problem but solved by updating kernel and drivers.
install newer kernel on 10.04
linux-image-generic-pae-lts-backport-natty
linux-headers-generic-pae-lts-backport-natty
download the latest nvidia driver
from http://www.nvidia.com/Download/index.aspx?lang=en-us
install the latest CUDA (at moment 4.0) from
http://developer.nvidia.com/cuda-toolkit-40
CUDA Toolkit for Ubuntu Linux 10.10 32-bit
CUDA Tools SDK 32-bit
GPU Computing SDK code samples
then I passed all SDK example tests.
ThinkPad w520 Quadro 1000 on Ubuntu 10.04

running binary on different machine causes segfault

I'm not well versed in how linking happens in c++
I have a binary that i compiled on one machine and i'd like to copy it and run it on another machine.
I would expect this to work, because the libraries are the same on both machines (i think!) and the version of linux is the same (same kernel, etc.) However, when i copy it over... it appears to segfault in one of the libraries i am dynamically linking when i run it.
It runs like butter on the machine that i compiled it on. But on the machine that i scp'd it over to, when i run the binary, it instantly segfaults on a std::string::compare in a call stack with some functions in one of the libraries i am dynamically linking.
i tried installing the libraries again on both machines and doing ldconfig, but same results.
any ideas on how to debug these kind of weird segfaults caused by dynamic linking issues?
Well, it might help narrow down the problem if you run the program in a debugger. When compiling, add the -g -ggdb arguments to the g++ command, then when running the program, use the command gdb ./executable (you may need to install gdb first.) At the gdb prompt, type run and your program will run until it segfaults. Then you can try to figure out what went wrong.
There are plenty of tutorials for using gdb (the GNU debugger) online.
Sounds like a binary compatibility issue. This SO link might shed some light:
Creating a generic binary in linux for all x86 machines

Resources