How do I get gdb working with D programs under linux? - linux

I have a patched gdb 6.8, but I can't get any debugging to work. Given this test file:
import std.stdio;
void main()
{
float f = 3.0;
int i = 1;
writeln(f, " ", i);
f += cast(float)(i / 10.0);
writeln(f, " ", i);
i++;
f += cast(float)(i / 10.0);
writeln(f, " ", i);
i += 2;
f += cast(float)(i / 5.0);
writeln(f, " ", i);
}
And attempting to debug on the command line:
bash-4.0 [d]$ dmd -g test.d # '-gc' shows the same behaviour.
bash-4.0 [d]$ ~/src/gdb-6.8/gdb/gdb test
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(gdb) list
1 ../sysdeps/i386/elf/start.S: No such file or directory.
in ../sysdeps/i386/elf/start.S
And debugging a project with Eclipse
Using -gc:
Dwarf Error: Cannot find DIE at 0x134e4 referenced from DIE at 0x12bd4 [in module /home/bernard/projects/drl/drl.i386]
(gdb) Dwarf Error: Cannot find DIE at 0x1810 referenced from DIE at 0x1b8 [in module /home/bernard/projects/drl/drl.i386]
Using -g:
(gdb) Die: DW_TAG_<unknown> (abbrev = 7, offset = 567)
has children: FALSE
attributes:
DW_AT_byte_size (DW_FORM_data1) constant: 4
DW_AT_type (DW_FORM_ref4) constant ref: 561 (adjusted)
DW_AT_containing_type (DW_FORM_ref4) constant ref: 539 (adjusted)
Dwarf Error: Cannot find type of die [in module /home/bernard/projects/drl/drl.i386]
I've seen quite a few posts like this on the Digital Mars newsgroup, but all are seemingly ignored. Can anyone shed some light on the situation?
I know of ZeroBUGS, but I really want to get gdb working.
Update:
Thanks to luca_, on IRC (freenode, #D), I got the simple case (one file) working:
(gdb) list Dmain
1 void main()
2 {
3 float f = 3.0;
4 int i = 1;
5 f += cast(float)(i / 10.0);
6 i++;
7 f += cast(float)(i / 10.0);
8 i += 2;
9 f += cast(float)(i / 5.0);
10 }
(gdb) break 3
Unfortunately, my project made up of multiple files dies with a DWARF error.
EDIT:
As of 2.036 (I think), the GDB debugging information produced by DMD is correct, and should work as expected.

You may have stumbled on a GDB bug, which was recently fixed here.
To get the fix, you would have to build GDB from CVS Head. Instructions on how to get it are here.
If that doesn't fix the problem, it may be due to another bug in GDB, or it may be that dmd emits incorrect DWARF debug info. I suggest opening a bug in GDB bugzilla and attaching your small executable (and all runtime libraries it requires).

The answer, it seems, is to use GDC, if you can stand going back to D 2.015 (this is for D2, I have no idea how old the D1 stuff is). GDB works great.

Related

Invoke kernel failure through cuda-gdb?

Is there a way to invoke kernel failure using cuda-gdb? I've tried stepping through the kernel code and setting invalid index positions, odd values to variables, but I'm unable to trigger a "kernel Execution Failed" after continuing from an erroneous setting.
Does anyone know of a proper way to do this through cuda-gdb? I've read through the cuda-gdb documentation twice but might have missed some clues on how to achieve this if it is at all possible. If anyone knows of any tools/techniques that would be most appreciated, thanks.
I'm on CentOS 7 and my device's compute capability is 2.1. See below for the output of the uname -a command.
Linux john 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Is there a way to invoke kernel failure using cuda-gdb?
Yes, it's possible. Here is a fully worked example:
$ cat t678.cu
#include <stdio.h>
__global__ void kernel(int *data){
int idx = 0; // line 4
idx += data[0];
int tval = data[idx];
data[1] = tval;
}
int main(){
int *d_data;
cudaMalloc(&d_data, 32*sizeof(int));
cudaMemset(d_data, 0, 32*sizeof(int));
kernel<<<1,1>>>(d_data);
cudaDeviceSynchronize();
cudaError_t err = cudaGetLastError();
if (err != cudaSuccess) printf("kernel fail %s\n", cudaGetErrorString(err));
}
$ nvcc -g -G -o t678 t678.cu
$ cuda-gdb ./t678
NVIDIA (R) CUDA Debugger
7.5 release
Portions Copyright (C) 2007-2015 NVIDIA Corporation
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/user2/misc/t678...done.
(cuda-gdb) break t678.cu:4
Breakpoint 1 at 0x4026d5: file t678.cu, line 4.
(cuda-gdb) run
Starting program: /home/user2/misc/./t678
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff700a700 (LWP 8693)]
[Switching focus to CUDA kernel 0, grid 2, block (0,0,0), thread (0,0,0), device 0, sm 14, warp 2, lane 0]
Breakpoint 1, kernel<<<(1,1,1),(1,1,1)>>> (data=0x13047a0000) at t678.cu:4
4 int idx = 0; // line 4
(cuda-gdb) step
5 idx += data[0];
(cuda-gdb) print idx
$1 = 0
(cuda-gdb) set idx=1000000
(cuda-gdb) step
6 int tval = data[idx];
(cuda-gdb) print idx
$2 = 1000000
(cuda-gdb) step
CUDA Exception: Device Illegal Address
The exception was triggered in device 0.
Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
kernel<<<(1,1,1),(1,1,1)>>> (data=0x13047a0000) at t678.cu:7
7 data[1] = tval;
(cuda-gdb)
In the above cuda-gdb output, you can see that after setting the idx variable to a large value, it results in an index-out-of-bounds (illegal address) error when executing the following line in the debugger:
int tval = data[idx];

Fixing 'no symbolic type information' from dtrace in Linux?

Just documenting this: (self-answer to follow)
I'm aware that Sun's dtrace is not packaged for Ubuntu due to licensing issues; so I downloaded it and built it from source on Ubuntu - but I'm having an issue pretty much like the one in Simple dtraces not working · Issue #17 · dtrace4linux/linux · GitHub; namely loading of the driver seems fine:
dtrace-20130712$ sudo make load
tools/load.pl
23:20:31 Syncing...
23:20:31 Loading: build-2.6.38-16-generic/driver/dtracedrv.ko
23:20:34 Preparing symbols...
23:20:34 Probes available: 364377
23:20:44 Time: 13s
... however, if I try to run a simple script, it fails:
$ sudo ./build/dtrace -n 'BEGIN { printf("Hello, world"); exit(0); }'
dtrace: invalid probe specifier BEGIN { printf("Hello, world"); exit(0); }: "/path/to/src/dtrace-20130712/etc/sched.d", line 60: no symbolic type information is available for kernel`dtrace_cpu_id: Invalid argument
As per the issue link above:
(ctf requires a private and working libdwarf lib - most older releases have broken versions).
... I then built libdwarf from source, and then dtrace based on it (not trivial, requires manually finding the right placement of symlinks); and I still get the same failure.
Is it possible to fix this?
Well, after a trip to gdb, I figured that the problem occurs in dtrace's function dt_module_getctf (called via dtrace_symbol_type and, I think, dt_module_lookup_by_name). In it, I noticed that most calls propagate the attribute/variable dm_name = "linux"; but when the failure occurs, I'd get dm_name = "kernel"!
Note that original line 60 from sched.d is:
cpu_id = `dtrace_cpu_id; /* C->cpu_id; */
Then I found thr3ads.net - dtrace discuss - accessing symbols without type info [Nov 2006]; where this error message is mentioned:
dtrace: invalid probe specifier fbt::calcloadavg:entry {
printf("CMS_USER: %d, CMS_SYSTEM: %d, cpu_waitrq: %d\n",
`cpu0.cpu_acct[0], `cpu0.cpu_acct[1], `cpu0.cpu_waitrq);}: in action
list: no symbolic type information is available for unix`cpu0: No type
information available for symbol
So:
on that system, the request `cpu0.cpu_acct[0] got resolved to unix`cpu0;
and on my system, the request `dtrace_cpu_id got resolved to kernel`dtrace_cpu_id.
And since "The backtick operator is used to read the
value of kernel variables, which will be specific to the running kernel." (howto measure CPU load - DTrace General Discussion - ArchiveOrange), I thought maybe explicitly "casting" this "backtick variable" to linux would help.
And indeed it does - only a small section of sched.d needs to be changed to this:
translator cpuinfo_t < dtrace_cpu_t *C > {
cpu_id = linux`dtrace_cpu_id; /* C->cpu_id; */
cpu_pset = -1;
cpu_chip = linux`dtrace_cpu_id; /* C->cpu_id; */
cpu_lgrp = 0; /* XXX */
/* cpu_info = *((_processor_info_t *)`dtrace_zero); /* ` */ /* XXX */
};
inline cpuinfo_t *curcpu = xlate <cpuinfo_t *> (&linux`dtrace_curcpu);
... and suddenly - it starts working!:
dtrace-20130712$ sudo ./build/dtrace -n 'BEGIN { printf("Hello, world"); exit(0); }'
dtrace: description 'BEGIN ' matched 1 probe
CPU ID FUNCTION:NAME
1 1 :BEGIN Hello, world
PS:
Protip 1: NEVER do dtrace -n '::: { printf("Hello"); }' - this means "do a printf on each and every kernel event", and it will completely freeze the kernel; not even CTRL-Alt-Del will work!
Protip 2: If you want to use DTRACE_DEBUG as in Debugging DTrace, use sudo -E:
dtrace-20130712$ DTRACE_DEBUG=1 sudo -E ./build/dtrace -n 'BEGIN { printf("Hello, world"); exit(0); }'
libdtrace DEBUG: reading kernel .ctf: /path/to/src/dtrace-20130712/build-2.6.38-16-generic/linux-2.6.38-16-generic.ctf
libdtrace DEBUG: opened 32-bit /proc/kallsyms (syms=75761)
...

SIGSEGV when using pthreads in Stop-and-Wait Protocol implementation

I'm a college student and as part of a Networks Assignment I need to do an implementation of the Stop-and-Wait Protocol. The problem statement requires using 2 threads. I am a novice to threading but after going through the man pages for the pthreads API, I wrote the basic code. However, I get a segmentation fault after the thread is created successfully (on execution of the first line of the function passed to pthread_create() as an argument).
typedef struct packet_generator_args
{
int max_pkts;
int pkt_len;
int pkt_gen_rate;
} pktgen_args;
/* generates and buffers packets at a mean rate given by the
pkt_gen_rate field of its argument; runs in a separate thread */
void *generate_packets(void *arg)
{
pktgen_args *opts = (pktgen_args *)arg; // error occurs here
buffer = (char **)calloc((size_t)opts->max_pkts, sizeof(char *));
if (buffer == NULL)
handle_error("Calloc Error");
//front = back = buffer;
........
return 0;
}
The main thread reads packets from this bufffer and runs the stop-and wait algorithm.
pktgen_args thread_args;
thread_args.pkt_len = DEF_PKT_LEN;
thread_args.pkt_gen_rate = DEF_PKT_GEN_RATE;
thread_args.max_pkts = DEF_MAX_PKTS;
/* initialize sockets and other data structures */
.....
pthread_t packet_generator;
pktgen_args *thread_args1 = (pktgen_args *)malloc(sizeof(pktgen_args));
memcpy((void *)thread_args1, (void *)&thread_args, sizeof(pktgen_args));
retval = pthread_create(&packet_generator, NULL, &generate_packets, (void *)thread_args1);
if (retval != 0)
handle_error_th(retval, "Thread Creation Error");
.....
/* send a fixed no of packets to the receiver wating for ack for each. If
the ack is not received till timeout occurs resend the pkt */
.....
I have tried debugging using gdb but am unable to understand why a segmentation fault is occuring at the first line of my generate_packets() function. Hopefully, one of you can help. If anyone needs additional context, the entire code can be obtained at http://pastebin.com/Z3QtEJpQ. I am in a real jam here having spent hours over this. Any help will be appreciated.
You initialize your buffer as NULL:
char **buffer = NULL;
and then in main() without further do, you try to address it:
while (!buffer[pkts_ackd]); /* wait as long as the next pkt has not
Basically my semi-educated guess is that your thread hasn't generated any packets yet and you crash on trying to access an element in NULL.
[162][04:34:17] vlazarenko#alluminium (~/tests) > cc -ggdb -o pthr pthr.c 2> /dev/null
[163][04:34:29] vlazarenko#alluminium (~/tests) > gdb pthr
GNU gdb 6.3.50-20050815 (Apple version gdb-1824) (Thu Nov 15 10:42:43 UTC 2012)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries .. done
(gdb) run
Starting program: /Users/vlazarenko/tests/pthr
Reading symbols for shared libraries +............................. done
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
0x000000010000150d in main (argc=1, argv=0x7fff5fbffb10) at pthr.c:205
205 while (!buffer[pkts_ackd]); /* wait as long as the next pkt has not
(gdb)

Why is this wrong!? Generating strings

I've been trying to generate strings in this way:
a
b
.
.
z
aa
ab
.
.
zz
.
.
.
.
zzzz
And I want to know why Segmentation fault (core dumped) error is prompted when it reaches 'yz'. I know my code don't cover all the posibles strings like 'zb' or 'zc', but that's not all the point, I want to know why this error. I am not a master in coding as you see so please try to explain it clearly. Thanks :)
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
void move_positions (char s[]);
int main (int argc, char *argv[])
{
char s[28];
s[0] = ' ';
s[1] = '\0';
int a = 0;
for(int r = 'a'; r <= 'z'; r++)
{
for(int t ='a';t <='z'; t++)
{
for(int u = 'a';u <= 'z'; u++)
{
for(int y = 'a'; y <= 'z'; y++)
{
s[a] = (char)y;
printf ("%s\n", s);
if (s[0] == 'z')
{
move_positions(s);
a++;
}
}
s[a-1] = (char)u;
}
s[a-2] = (char)t;
}
s[a-3] = (char)r;
}
return 0;
}
void move_positions (char s[])
{
char z[28];
z[0] = ' ';
z[1] = '\0';
strcpy(s, strcat(z, s));
}
First, let's compile with debugging turned on:
gcc -g prog.c -o prog
Now let's run it under a debugger:
> gdb prog
GNU gdb 6.3.50-20050815 (Apple version gdb-1822) (Sun Aug 5 03:00:42 UTC 2012)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries .. done
(gdb) run
Starting program: /Users/andrew/Documents/programming/sx/13422880/prog
Reading symbols for shared libraries +............................. done
a
b
c
d
e
...
yu
yv
yw
yx
yy
yz
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00007fffc0bff6c5
0x0000000100000c83 in main (argc=1, argv=0x7fff5fbff728) at prog.c:22
22 s[a] = (char)y;
Ok, it crashed on line 22, trying to do s[a] = (char)y. What's a?
(gdb) p a
$1 = 1627389953
So you're setting the ~1.6 millionth entry of the array s. What is s?
(gdb) ptype s
type = char [28]
Saving 1.6 million entries in a 28-element array? That's not going to work. Looks like you need to reset a to zero at the start of some of your loops.

What difference between VC++ 2010 Express and Borland C++ 3.1 in compiling simple c++ code file?

I already don't know what to think or what to do. Next code compiles fine in both IDEs, but in VC++ case it causes weird heap corruptions messages like:
"Windows has triggered a breakpoint in Lab4.exe.
This may be due to a corruption of the heap, which indicates a bug in Lab4.exe or any of the DLLs it has loaded.
This may also be due to the user pressing F12 while Lab4.exe has focus.
The output window may have more diagnostic information."
It happens when executing Task1_DeleteMaxElement function and i leave comments there.
Nothing like that happens if compiled in Borland C++ 3.1 and everything work as expected.
So... what's wrong with my code or VC++?
#include <conio.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <memory.h>
void PrintArray(int *arr, int arr_length);
int Task1_DeleteMaxElement(int *arr, int arr_length);
int main()
{
int *arr = NULL;
int arr_length = 0;
printf("Input the array size: ");
scanf("%i", &arr_length);
arr = (int*)realloc(NULL, arr_length * sizeof(int));
srand(time(NULL));
for (int i = 0; i < arr_length; i++)
arr[i] = rand() % 100 - 50;
PrintArray(arr, arr_length);
arr_length = Task1_DeleteMaxElement(arr, arr_length);
PrintArray(arr, arr_length);
getch();
return 0;
}
void PrintArray(int *arr, int arr_length)
{
printf("Printing array elements\n");
for (int i = 0; i < arr_length; i++)
printf("%i\t", arr[i]);
printf("\n");
}
int Task1_DeleteMaxElement(int *arr, int arr_length)
{
printf("Looking for max element for deletion...");
int current_max = arr[0];
for (int i = 0; i < arr_length; i++)
if (arr[i] > current_max)
current_max = arr[i];
int *temp_arr = NULL;
int temp_arr_length = 0;
for (int j = 0; j < arr_length; j++)
if (arr[j] < current_max)
{
temp_arr = (int*)realloc(temp_arr, temp_arr_length + 1 * sizeof(int)); //if initial array size more then 4, breakpoint activates here
temp_arr[temp_arr_length] = arr[j];
temp_arr_length++;
}
arr = (int*)realloc(arr, temp_arr_length * sizeof(int));
memcpy(arr, temp_arr, temp_arr_length);
realloc(temp_arr, 0); //if initial array size is less or 4, breakpoint activates at this line execution
return temp_arr_length;
}
My guess is VC++2010 is rightly detecting memory corruption, which is ignored by Borland C++ 3.1.
How does it work?
For example, when allocating memory for you, VC++2010's realloc could well "mark" the memory around it with some special value. If you write over those values, realloc detects the corruption, and then crashes.
The fact it works with Borland C++ 3.1 is pure luck. This is a very very old compiler (20 years!), and thus, would be more tolerant/ignorant of this kind of memory corruption (until some random, apparently unrelated crash occurred in your app).
What's the problem with your code?
The source of your error:
temp_arr = (int*)realloc(temp_arr, temp_arr_length + 1 * sizeof(int))
For the following temp_arr_length values, in 32-bit, the allocation will be of:
0 : 4 bytes = 1 int when you expect 1 (Ok)
1 : 5 bytes = 1.25 int when you expect 2 (Error!)
2 : 6 bytes = 1.5 int when you expect 3 (Error!)
You got your priotities wrong. As you can see:
temp_arr_length + 1 * sizeof(int)
should be instead
(temp_arr_length + 1) * sizeof(int)
You allocated too little memory,and thus wrote well beyond what was allocated for you.
Edit (2012-05-18)
Hans Passant commented on allocator diagnostics. I took the liberty of copying them here until he writes his own answer (I've already seen coments disappear on SO):
It is Windows that reminds you that you have heap corruption bugs, not VS. BC3 uses its own heap allocator so Windows can't see your code mis-behaving. Not noticing these bugs before is pretty remarkable but not entirely impossible.
[...] The feature is not available on XP and earlier. And sure, one of the reasons everybody bitched about Vista. Blaming the OS for what actually were bugs in the program. Win7 is perceived as a 'better' OS in no small part because Vista forced programmers to fix their bugs. And no, the Microsoft CRT has implemented malloc/new with HeapAlloc for a long time. Borland had a history of writing their own, beating Microsoft for a while until Windows caught up.
[...] the CRT uses a debug allocator like you describe, but it generates different diagnostics. Roughly, the debug allocator catches small mistakes, Windows catches gross ones.
I found the following links explaining what is done to memory by Windows/CRT allocators before and after allocation/deallocation:
http://www.codeguru.com/cpp/w-p/win32/tutorials/article.php/c9535/Inside-CRT-Debug-Heap-Management.htm
https://stackoverflow.com/a/127404/14089
http://www.nobugs.org/developer/win32/debug_crt_heap.html#table
The last link contains a table I printed and always have near me at work (this was this table I was searching for when finding the first two links... :- ...).
If it is crashing in realloc, then you are over stepping, the book keeping memory of malloc & free.
The incorrect code is as below:
temp_arr = (int*)realloc(temp_arr, temp_arr_length + 1 * sizeof(int));
should be
temp_arr = (int*)realloc(temp_arr, (temp_arr_length + 1) * sizeof(int));
Due to operator precedence of * over +, in the next run of the loop when you are expecting realloc to passed 8 bytes, it might be passing only 5 bytes. So, in your second iteration, you will be writing into 3 bytes someone else's memory, which leads to memory corruption and eventual crash.
Also
memcpy(arr, temp_arr, temp_arr_length);
should be
memcpy(arr, temp_arr, temp_arr_length * sizeof(int) );

Resources