I recently read that software breakpoints for Linux on ARM are implemented using UND instruction in ARM mode and the BKPT instruction in Thumb mode. Why are there 2 separate instructions used to raise software interrupts?
Thumb compatible code:
0000e150 <pthread_mutexattr_setpshared>:
e150: b573 push {r0, r1, r4, r5, r6, lr}
e152: 4605 mov r5, r0
e154: 460c mov r4, r1
e156: 4616 mov r6, r2
e158: f7fd fa70 bl b63c <pthread_mutexattr-0xba>
e15c: 4629 mov r1, r5
Pure arm:
0000d564 <pthread_mutex_init>:
d564: e2503000 subs r3, r0, #0
d568: 03a00016 moveq r0, #22
d56c: 012fff1e bxeq lr
arm bkpt 0xe7f001f0
thumb bkpt 0xde01
If try to use always arm bkpt and rewrite first instruction in function:
pthread_mutex_init all will be fine but if rewrite first instruction in pthread_mutexattr_setpshared second instruction will be rewrote too.
If always try to use thumb bkpt and rewrite first instruction in pthread_mutex_init resulted instruction will be invalid.
Related
In debugging a core file, I can find the pthread_specific data but I have not found a way to access __thread data such as errno.
The access is via %fs:0x0 register and I can disassemble __errno_location to find its relative address from %fs:0x0 but I don't see a way to resolve it on a per thread basis as it seems like the base %fs value is not available to GDB.
Example:
Dump of assembler code for function __errno_location:
0x00007fefa83911f0 <+0>: push %rbp
0x00007fefa83911f1 <+1>: mov 0x206d90(%rip),%rax # 0x7fefa8597f88
0x00007fefa83911f8 <+8>: add %fs:0x0,%rax
0x00007fefa8391201 <+17>: mov %rsp,%rbp
0x00007fefa8391204 <+20>: pop %rbp
0x00007fefa8391205 <+21>: retq
End of assembler dump.
Is there a way to find out what the value of %fs:0x0 would be for the current thread?
The info registers command shows only this for the segment registers:
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
libc version: 2.17
libc.so.6 debug not currently installed
GDB version: GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
If I load the crashing program and the core dump into gdb, it shows me a stack trace and crash point as below.
Core was generated by `./cut --output-d=: -b1,1234567890- /dev/fd/63'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 is_printable_field (i=1234567890) at src/cut.c:266
266 return (printable_field[n] >> (i % CHAR_BIT)) & 1;
(gdb) bt
#0 is_printable_field (i=1234567890) at src/cut.c:266
#1 set_fields (fieldstr=0x7ffccb0561c4 "") at src/cut.c:533
#2 main (argc=4, argv=0x7ffccb055cf8) at src/cut.c:865
Is there any means to know the exact assembly instruction that caused the segfault?
One possibility is to set:
(gdb)layout asm
When GDB stops the corresponding assembly line is pointed.
Example:
│0x7ffff7aa441d <strtok+45> je 0x7ffff7aa44d6 <strtok+230> │
│0x7ffff7aa4423 <strtok+51> mov %rsi,%rax │
>│0x7ffff7aa4426 <strtok+54> mov (%rax),%cl │
│0x7ffff7aa4428 <strtok+56> test %cl,%cl │
│0x7ffff7aa442a <strtok+58> je 0x7ffff7aa4454 <strtok+100>
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7aa4426 in strtok () from /lib64/libc.so.6
(gdb)
You could use the disassemble GDB command. Also perhaps use x/i on $rip (the program counter on x86-64)
However, in your case, assuming the code is in C (not C++ with some operator []), the only possible culprits are the printable_field pointer, or the n index.
Consider also using valgrind and/or compiling (in addition of -g -Wall options to a recent GCC compiler) with -fsanitize=... options, notably -fsanitize=address or -fsanitize=undefined...
I'm trying to retrofit a current (GCC >= 4.6) toolchain onto a legacy embedded ARM/Linux system based on glibc 2.3.6. I have successfully built the toolchain, but now my test programs are segfaulting in libstdc++, for example:
int main()
{
int* foo = new int[100];
delete [] foo;
return 0;
}
... segfaults in static initialization of libstdc++:
#0 0x40082778 in (anonymous namespace)::__future_category_instance ()
at /path/to/src/gcc-4.6.4/libstdc++-v3/src/future.cc:64
#1 0x40082bb0 in __static_initialization_and_destruction_0 (__priority=65535, __initialize_p=1)
at /path/to/src/gcc-4.6.4/libstdc++-v3/src/future.cc:103
#2 _GLOBAL__sub_I_future.cc(void) () at /path/to/src/gcc-4.6.4/libstdc++-v3/src/future.cc:109
#3 0x400e92b8 in __do_global_ctors_aux () from /path/to/symbols/libstdc++.so.6
#4 0x400627a0 in _init () from /path/to/symbols/libstdc++.so.6
#5 0x4000b5e4 in ?? () from /path/to/sysroot/lib/ld-linux.so.2
#6 0x4000b5e4 in ?? () from /path/to/sysroot/lib/ld-linux.so.2
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I have several more examples, but the crash sites all look similar to this:
Dump of assembler code for function (anonymous namespace)::__future_category_instance():
0x40082764 <+0>: ldr r3, [pc, #264] ; 0x40082874 <(anonymous namespace)::__future_category_instance()+272>
0x40082768 <+4>: push {r11, lr}
0x4008276c <+8>: add r11, sp, #4
0x40082770 <+12>: sub sp, sp, #64 ; 0x40
0x40082774 <+16>: mov r1, #0
=> 0x40082778 <+20>: ldr r3, [r1, r3]
I interpret this as the code trying to read from base address 0 (r1 = 0, r3 in this case was 3736), which might hint at a relocation problem?
This particular crash occurs when I build with either -static, -static-libgcc -static-libstdc++ or force loading of the libgcc_s.so.1 and libstdc++.so.6 from my toolchain via LD_LIBRARY_PATH.
I'm pretty much stuck here and would appreciate any clues as to what might be wrong with my toolchain, and whether this should work at all.
So I have now tracked this down to a change in GCC 4.6.0 that seems to have broken the code generation for the obsolete ABI I'm forced to use here (APCS).
With that change reversed, my test code now runs successfully.
My guess is that it's either a broken build, or it's trying to load a library from your old system.
You can check the second option by running with strace to see what library files it opens:
strace your-program
This will work fine for a statically linked binary, but is more tricky if you want to set LD_LIBRARY_PATH because that will most likely break the strace binary. In that case try it like this:
strace /path/to/ld-linux.so --library-path /path/to/libraries your-program
You'll need to figure out what ld-linux.so is called on your system.
A linux stat64 call is supposed to end up calling xstat64 with a static version of stat64 generated that passes a version along with the call.
We are seeing a condition where a C linked (gcc) version of code that calls stat64, when linked against an older version of a (C++ linked) shared library (libdb2.so.1, that uses stat64, but isn't supposed to provide it), is not ending up with a the "proper" static version of this stat64 call. The C++ linked app has what we expect:
00000000004007c8 <__xstat64#plt>:
4007c8: jmpq *1051250(%rip) # 501240 <_GLOBAL_OFFSET_TABLE_+0x20>
4007ce: pushq $0x1
4007d3: jmpq 4007a8 <_init+0x18>
0000000000400ac0 <stat64>:
400ac0: push %rbp
400ac1: mov %rsp,%rbp
400ac4: sub $0x10,%rsp
400ac8: mov %rdi,0xfffffffffffffff8(%rbp)
400acc: mov %rsi,0xfffffffffffffff0(%rbp)
400ad0: mov 0xfffffffffffffff0(%rbp),%rdx
400ad4: mov 0xfffffffffffffff8(%rbp),%rsi
400ad8: mov $0x1,%edi
400add: callq 4007c8 <__xstat64#plt>
400ae2: leaveq
400ae3: retq
whereas the gcc linked code (that also links to our libdb2 shared lib) ends up with a global reference to stat64 instead of the "static" version that it is suppose to have:
0000000000400618 <stat64#plt>:
400618: jmpq *1050146(%rip) # 500c40 <_GLOBAL_OFFSET_TABLE_+0x20>
40061e: pushq $0x1
400623: jmpq 4005f8 <_init+0x18>
The same code, also when linked with gcc, when not linked to our libdb2 library, ends up with the expected "static" stat64 function:
0000000000400550 <__xstat64#plt>:
400550: jmpq *1050170(%rip) # 500b90 <_GLOBAL_OFFSET_TABLE_+0x20>
400556: pushq $0x1
40055b: jmpq 400530 <_init+0x18>
00000000004007b0 <stat64>:
4007b0: mov %rsi,%rdx
4007b3: mov %rdi,%rsi
4007b6: mov $0x1,%edi
4007bb: jmpq 400550 <__xstat64#plt>
EDIT: more info obtained from a linker map (-Wl,--print-map)
When the gcc linked exe doesn't link to our (libdb2) shared lib, we see that it gets it's stat64 from libc_nonshared.a:
/usr/lib64/libc_nonshared.a(stat64.oS)
/home/hotellnx94/peeterj/tmp/cc2f7ETx.o (stat64)
...
.plt 0x0000000000400530 0x70
*(.plt)
.plt 0x0000000000400530 0x70 /usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../lib64/crt1.o
0x0000000000400540 __libc_start_main##GLIBC_2.2.5
0x0000000000400550 __xstat64##GLIBC_2.2.5
0x0000000000400560 printf##GLIBC_2.2.5
0x0000000000400570 memset##GLIBC_2.2.5
0x0000000000400580 strerror##GLIBC_2.2.5
0x0000000000400590 __errno_location##GLIBC_2.2.5
.text 0x00000000004007b0 0x10 /usr/lib64/libc_nonshared.a(stat64.oS)
0x00000000004007b0 stat64
whereas, once we link to our shared lib (libdb2), the symbols are picked up from crt1.o instead of lib_nonshared.a:
.plt 0x00000000004005f8 0x70
*(.plt)
.plt 0x00000000004005f8 0x70 /usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../lib64/crt1.o
0x0000000000400608 __libc_start_main##GLIBC_2.2.5
0x0000000000400618 stat64
0x0000000000400628 printf##GLIBC_2.2.5
0x0000000000400638 memset##GLIBC_2.2.5
0x0000000000400648 strerror##GLIBC_2.2.5
0x0000000000400658 __errno_location##GLIBC_2.2.5
What could we be doing (or would have been doing since we don't see this in new versions of our library), that would cause lib_nonshared.a to no longer be shared once the consumer links to our library?
It turned out that this was due to an intel compiler bug that was fixed. When we started using the compiler version that had the fix we were then exposed to a binary compatibility issue since the new version of the intel compiler (producing the shared lib in question), properly didn't export this stat64 symbol.
I was trying to debug a USER Process in Linux Crash Dump.
The normal steps to go to the crash dump are:
Go to the path where the dump is located.
Use the command crash kernel_link dump.201104181135.
Where kernel_link is a soft link I have created for vmlinux image.
Now you will be in the CRASH prompt.
If you run the command foreach <PID Of the process> bt
Eg:
crash> **foreach 6920 bt**
**PID: 6920 TASK: ffff88013caaa800 CPU: 1 COMMAND: **"**climmon**"****
#0 [ffff88012d2cd9c8] **schedule** at ffffffff8130b76a
#1 [ffff88012d2cdab0] **schedule_timeout** at ffffffff8130bbe7
#2 [ffff88012d2cdb50] **schedule_timeout_uninterruptible** at ffffffff8130bc2a
#3 [ffff88012d2cdb60] **__alloc_pages_nodemask** at ffffffff810b9e45
#4 [ffff88012d2cdc60] **alloc_pages_curren**t at ffffffff810e1c8c
#5 [ffff88012d2cdc90] **__page_cache_alloc** at ffffffff810b395a
#6 [ffff88012d2cdcb0] **__do_page_cache_readahead** at ffffffff810bb592
#7 [ffff88012d2cdd30] **ra_submit** at ffffffff810bb6ba
#8 [ffff88012d2cdd40] **filemap_fault** at ffffffff810b3e4e
#9 [ffff88012d2cdda0] **__do_fault** at ffffffff810caa5f
#10 [ffff88012d2cde50] **handle_mm_fault** at ffffffff810cce69
#11 [ffff88012d2cdf00] **do_page_fault** at ffffffff8130f560
#12 [ffff88012d2cdf50] **page_fault** at ffffffff8130d3f5
RIP: 00007fd02b7e9071 RSP: 0000000040e86ea0 RFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fd02b7e9071
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040e86ec0
RBP: 0000000040e87140 R8: 0000000000000800 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff16ec43d0
R13: 00007fd02bcadf00 R14: 0000000040e87950 R15: 0000000000001000
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
If you check the above backtrace it shows the kernel functions used for scheduling/handling page fault but not the functions that were executed in the USER process (here eg. climmon).
So I am not able to debug this process as I am not able to see the functions executed in that process.
Can any one help me with this case?
You can not debug a userspace process from a kernel crash dump. If your kernel crashed it was most certainly the fault of the kernel and not some userspace process. The kernel should always behave properly no matter what userspace process runs on it. If you want to debug a userspace process I recommend looking at ltrace, strace and gdb.
Gergely from toptal.com
I don't know if it is what you want. But you can try a crash extension call "gcore". It can dump a user process core from kernel crash file.
Also, make sure you include user page when you are dumping.
Load the dump with gdb:
gdb vmlinux
Load these gdb macros: http://www.kernel.org/doc/Documentation/kdump/gdbmacros.txt
(gdb) source gdbmacros.txt
Use 'btt' to "dump all thread stack traces on a kernel compiled with CONFIG_FRAME_POINTER":
(gdb) btt
Use 'bttnobp' to "dump all thread stack traces on a kernel compiled with !CONFIG_FRAME_POINTER":
(gdb) bttnobp