(cocos2d-x) How to debug android native crash with anonymous and unknown backtrace? - android-ndk

I use cocos2d-x and ndk-build to build app on arm64. But when i run it on 64bit device, the app crash randomly with error signal 11 (SIGSEGV), and the backtrace shows anonymous and unknown.
I use cocos2d-x 3.17.1, ndk 16, Android Studio 3.4.1, gradle tools 3.2.0 and gradle wrapper 4.6.
I tried ndk-stack but it didn't show me more useful information.
This is the log in the Logcat.
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'Xiaomi/chiron/chiron:8.0.0/OPR1.170623.027/V10.3.1.0.ODECNXM:user/release-keys'
Revision: '0'
ABI: 'arm64'
pid: 17667, tid: 17711, name: GLThread 135726 >>> com.test.myapp <<<
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x72adf0f460
x0 00000072bc089378 x1 0000000000000000 x2 fffd8072bc08bc18 x3 fffd8072bb3f4950
x4 00000000ee1763de x5 fffd8072bc096c88 x6 ff687373604f6d64 x7 7f7f7f7f7f7f7f7f
x8 00000072b09a9f08 x9 00000072b09a9f00 x10 fffffffffffffffb x11 00000072a9bcc6c8
x12 00000072aceade40 x13 0000000000000000 x14 0000000000697474 x15 00000072a9ac6c10
x16 0000000000000001 x17 fffa0072a9ac58c8 x18 0000000000000012 x19 00000072b09a9e98
x20 fffd8072a9bc31e0 x21 00000072a9bcfa00 x22 00000072bc0893d8 x23 00000072ac48ab40
x24 00000072b162cca8 x25 00000072ade90978 x26 00000072a94d0c20 x27 00000072a99953e0
x28 00000072a9995070 x29 00000072b0da8080 x30 fffd8072a9ac48a0
sp 00000072b0da8060 pc 00000072adf0f460 pstate 0000000080000000
backtrace:
#00 pc 0000000000078460 <anonymous:00000072ade97000>
#01 pc fffd8072a9ac489c <unknown>
This is the log with ndk-stack
********** Crash dump: **********
Build fingerprint: 'Xiaomi/chiron/chiron:8.0.0/OPR1.170623.027/V10.3.1.0.ODECNXM:user/release-keys'
pid: 17667, tid: 17711, name: GLThread 135726 >>> com.test.myapp <<<
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x72adf0f460
Stack frame #00 pc 0000000000078460 <anonymous:00000072ade97000>
Stack frame #01 pc fffd8072a9ac489c <unknown>
I expect the backtrace and ndk-stack can show me where the problem is, but it shows only anonymous and unknown.

Related

Why is the compiler adding an extra 'sxtw' instruction (resulting further in a kernel panic)?

Issue/Symptom:
At the end of a function return, the compiler adds an sxtw instruction as seen in the disassembly, resulting in a return address of only 32 bits instead of 64 bits, resulting in a kernel panic:
Unable to handle kernel paging request at virtual address xxxx
Build Environment:
Platform : ARMV7LE
gcc, linux-4.4.60
Archictecture : arm64
gdb : aarch64-5.3-glibc-2.22/usr/bin/aarch64-linux-gdb
Details:
Here's the simplified project structure. It's been taken care of correctly in the corresponding makefile. Also note that file1.c and file2.c are part of same module.
../src/file1.c /* It has func1() defined as well as called /
../src/file2.c
../inc/files.h / There's no func1() declared in the header */
Cause of the issue:
A call to the func1() was added from the file2.c w/o func1 declaration in files.h or file2.c. (Basically the inclusion of func1 was accidentally missed in the files.h.)
Code compiled with no errors, but a warning as expected -- Implicit declaration of function func1.
At run time though, right after returning from func1 inside file2, the system crashed as it tried de-referencing the returned address from func1.
Further analysis showed that at the end of a function return, the compiler added an sxtw instruction as seen in the disassembly, resulting in a return address of only 32 bits instead of 64 bits, resulting in a kernel panic.
Unable to handle kernel paging request at virtual address xxxx
Note that x19 is of 64 bit while w0 is of 32 bit.
Note that x0 LS word matches with that of x19.
System crashed while de-referencing x19.
sxtw x19, w0 /* This was added by compiler as extra instruction /
ldp x1, x0, [x19,#304] / System crashed here */
Registers:
[ 91.388130] pc : [<ffffff80016c9074>] lr : [<ffffff80016c906c>] pstate: 80000145
[ 91.462090] sp : ffffff80094333b0
[ 91.552708] x29: ffffff80094333d0 x28: ffffffc06995408a
[ 91.652701] x27: ffffffc06c400a00 x26: 0000000000000000
[ 91.716243] x25: 0000000000000000 x24: ffffffc069958000
[ 91.779784] x23: ffffffc076e00000 x22: ffffffc06c400a00
[ 91.843326] x21: 0000000000000031 x20: ffffffc073060000
[ 91.906867] x19: 0000000066bfc780 x18: ffffff8009436888
[ 91.970409] x17: 0000000000000000 x16: ffffff8008193074
[ 92.033952] x15: 00000000000a8c06 x14: 2c30323030387830
[ 92.097492] x13: 3d7367616c66202c x12: 3038653030303030
[ 92.161034] x11: 3038666666666666 x10: 78303d646e65202c
[ 92.224576] x9 : 3063303030303030 x8 : 3030303030303030
[ 92.288117] x7 : 0000000000000880 x6 : 0000000000000000
[ 92.351659] x5 : ffffffc07fd10ad8 x4 : 0000000000000001
[ 92.415202] x3 : 0000000000000007 x2 : cb88537fdc8ba63c
[ 92.478743] x1 : 0000000000000000 x0 : ffffffc066bfc780
After adding the declaration of func1 in the files.h, the extra instruction and hence the crash was not seen.
Can someone please explain why the compiler added sxtw in this case?
You should have received at least two warnings, one about the missing function declaration and another one about the the implicit conversion from int to a pointer type.
The reason is that implicitly declared functions have a return type of int. Casting this int value to a 64-bit pointer throws away 32 bit of the result. This is the expected GNU C behavior, based on what C compilers for early 64-bit targets did. The sxtw instruction is required to implement this behavior. (Current C standards no longer have implicit function declarations, but GCC still has to support them for backwards compatibility with existing autoconf tests.)
Note that your platform is obviously Aarch64 (with 64-bit registers), not 32-bit ARMv7.

get assert() failure message in android ndk sigaction crash handler

I am using sigaction() to install a crash handler in my app and print private version information.
In case of a failed assert() the abort message is super useful and I'd like my sigaction handler to print it. How can I get that string? Below is some analysis I've done thus far:
I noticed in ndk debuggerd/tombstone.c engrave_tombstone() that there is a argument uintptr_t abort_msg_address which contains the abort message, typically a failed assert. But no clue where that is fetched from. Noticing the default crash logs of android seem to run in a different process and debuggerd imples it's a daemon, I am not sure this is the right way to go.
I furthermore see in bionic libc/stdlib/assert.c there is just a call to __libc_android_log_print(ANDROID_LOG_FATAL, ...). Also not very helpful. But in bionic linker_main.cpp there is abort_msg_t* g_abort_message = nullptr; and other exciting stuff in android_set_abort_message.cpp. Again, not sure this is the right way, feels very hackish.
This is what the crash handler of android prints by default. Note how the first message in in the crashed pid/tid, but the others are some other random process (presumably, debuggerd?).
10-09 16:49:28.551 12084 12127 F libc : Fatal signal 6 (SIGABRT), code -6 (SI_TKILL) in tid 12127 (applyRouting), pid 12084 (com.android.nfc)
10-09 16:49:28.691 12203 12203 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
10-09 16:49:28.692 12203 12203 F DEBUG : Build fingerprint: 'Android/aosp_marlin/marlin:Q/OC-MR1/summit07191458:userdebug/test-keys'
10-09 16:49:28.692 12203 12203 F DEBUG : Revision: '0'
10-09 16:49:28.692 12203 12203 F DEBUG : ABI: 'arm64'
10-09 16:49:28.692 12203 12203 F DEBUG : pid: 12084, tid: 12127, name: applyRouting >>> com.android.nfc <<<
10-09 16:49:28.692 12203 12203 F DEBUG : signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
10-09 16:49:28.692 12203 12203 F DEBUG : Abort message: 'jni_internal.cc:622] JNI FatalError called: applyRouting'
10-09 16:49:28.692 12203 12203 F DEBUG : x0 0000000000000000 x1 0000000000002f5f x2 0000000000000006 x3 0000000000000008

How to debug mapbox crash

I have a redundant crash with mapbox library com.mapbox.mapboxsdk:mapbox-android-sdk:6.1.1
On the logcat I got this stack:
pid: 4960, tid: 4960, name: [...] >>> [...] <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x8
r0 00000000 r1 6b9faf17 r2 0000010f r3 00000001
r4 00000000 r5 b4d96a80 r6 00000003 r7 bef0db50
r8 bef0dae0 r9 b4db6500 sl bef0dbb0 fp b4db6500
ip b4cfc94c sp bef0da88 lr b499dc29 pc 9fdc9ebc cpsr 60060030
backtrace:
#00 pc 00095ebc /data/app/[...]/lib/arm/libmapbox-gl.so
#01 pc 00097545 /data/app/[...]/lib/arm/libmapbox-gl.so
#02 pc 00097587 /data/app/[...]/lib/arm/libmapbox-gl.so
#03 pc 000eaa29 /system/lib/libart.so (art_quick_generic_jni_trampoline+40)
#04 pc 000e6331 /system/lib/libart.so (art_quick_invoke_stub_internal+64)
#05 pc 004028a5 /system/lib/libart.so (art_quick_invoke_stub+188)
#06 pc 007fccdc [stack]`
And trying to get a clear line number for libmapbox-gl.so with ndk-stack, I only got that:
Build fingerprint: 'google/hammerhead/hammerhead:6.0.1/M4B30Z/3437181:user/release-keys'
pid: 4960, tid: 4960, name: [...] >>> [...] <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x8
Stack frame #00 pc 00095ebc /data/app/[...]/lib/arm/libmapbox-gl.so: Routine ??
??:0
Stack frame #01 pc 00097545 /data/app/[...]/lib/arm/libmapbox-gl.so: Routine ??
??:0
Stack frame #02 pc 00097587 /data/app/[...]/lib/arm/libmapbox-gl.so: Routine ??
??:0
Stack frame #03 pc 000eaa29 /system/lib/libart.so (art_quick_generic_jni_trampoline+40)
Stack frame #04 pc 000e6331 /system/lib/libart.so (art_quick_invoke_stub_internal+64)
Stack frame #05 pc 004028a5 /system/lib/libart.so (art_quick_invoke_stub+188)
Stack frame #06 pc 007fccdc [stack]
Here is my ndk-stack command: $ANDROID_LIB_PATH/sdk/ndk-bundle/ndk-stack -sym $PROJECT_PATH/app/build/intermediates/transforms/mergeJniLibs/servertest/release/0/lib/x86_64/ -dump logcat.txt
How to get the line number where the crash happened?

How come Linux kernel interferes the execution of RISC-V custom0 instruction on Zedboard?

dummy_rocc is a naive built-in RoCC accelerator example in RISCV tools, where several custom0 instructions are defined. After setup dummy_rocc (either on Spike ISA simulator or on Rocket-FPGA, differently), we use dummy_rocc_test -- a user program testcase to verify the correctness of the dummy_rocc accelerator. We have two ways to run dummy_rocc_test, either on pk (proxy kernel) or on Linux.
I once setup dummy_rocc on Spike ISA simulator, the dummy_rocc_test worked well either on pk or on Linux.
Now I replace Spike with Rocket-FPGA on Zedboard. While the execution on pk succeeds:
root#zynq:~# ./fesvr-zynq pk /nfs/copy_to_rootfs/work/dummy_rocc_test
begin
after asm code
load x into accumulator 2 (funct=0)
read it back into z (funct=1) to verify it
accumulate 456 into it (funct=3)
verify it
do it all again, but initialize acc2 via memory this time (funct=2)
do it all again, but initialize acc2 via memory this time (funct=2)
do it all again, but initialize acc2 via memory this time (funct=2)
success!
the execution on Linux fails:
./fesvr-zynq +disk=/nfs/root.bin bbl /nfs/fpga-zynq/zedboard/fpga-images-zedboard/riscv/vmlinux
..................................Booting RISC-V Linux.........................................
/ # ./work/dummy_rocc_test
begin
after asm code
[ 0.400000] dummy_rocc_test[23]: unhandled signal 4 code 0x30001 at 0x0000000000800500 in ]
[ 0.400000] CPU: 0 PID: 23 Comm: dummy_rocc_test Not tainted 3.14.33-g043bb5d #1
[ 0.400000] task: ffffffff8fa3f500 ti: ffffffff8fb76000 task.ti: ffffffff8fb76000
[ 0.400000] sepc: 0000000000800500 ra : 00000000008004fc sp : 0000003fff943c70
[ 0.400000] gp : 0000000000882198 tp : 0000000000884700 t0 : 0000000000000000
[ 0.400000] t1 : 000000000080adc8 t2 : 8101010101010100 s0 : 0000003fff943ca0
[ 0.400000] s1 : 0000000000800d5c a0 : 000000000000000f a1 : 0000002000002000
[ 0.400000] a2 : 000000000000000f a3 : 000000000085cee8 a4 : 0000000000000001
[ 0.400000] a5 : 000000000000007b a6 : 0000000000000008 a7 : 0000000000000040
[ 0.400000] s2 : 0000000000000000 s3 : 00000000008a2668 s4 : 00000000008d8d98
[ 0.400000] s5 : 00000000008d7770 s6 : 0000000000000008 s7 : 00000000008d6000
[ 0.400000] s8 : 00000000008d8d60 s9 : 0000000000000000 s10: 00000000008a32b8
[ 0.400000] s11: ffffffffffffffff t3 : 000000000000000b t4 : 000000006ffffdff
[ 0.400000] t5 : 000000000000000a t6 : 000000006ffffeff
[ 0.400000] sstatus: 8000000000003008 sbadaddr: 0000000000800500 scause: 0000000000000002
Illegal instruction
A screenshot shows that the "signal 4" is caused by a custom0 instruction.
readelf screenshot of dummy_rocc_test
So my problem is "How come Linux kernel interferes the execution of RISC-V custom0 instruction on Zedboard? "
The source code of dummy_rocc_test is provided as reference:
// The following is a RISC-V program to test the functionality of the
// dummy RoCC accelerator.
// Compile with riscv64-unknown-elf-gcc dummy_rocc_test.c
// Run with spike --extension=dummy_rocc pk a.out
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
int main() {
printf("begin\n");
uint64_t x = 123, y = 456, z = 0;
// load x into accumulator 2 (funct=0)
// asm code
asm volatile ("addi a1, a1, 2");
/// printf again
printf("after asm code\n");
asm volatile ("custom0 x0, %0, 2, 0" : : "r"(x));
printf("load x into accumulator 2 (funct=0)\n");
// read it back into z (funct=1) to verify it
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("read it back into z (funct=1) to verify it\n");
assert(z == x);
// accumulate 456 into it (funct=3)
asm volatile ("custom0 x0, %0, 2, 3" : : "r"(y));
printf("accumulate 456 into it (funct=3)\n");
// verify it
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("verify it\n");
assert(z == x+y);
// do it all again, but initialize acc2 via memory this time (funct=2)
asm volatile ("custom0 x0, %0, 2, 2" : : "r"(&x));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
asm volatile ("custom0 x0, %0, 2, 3" : : "r"(y));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
asm volatile ("custom0 %0, x0, 2, 1" : "=r"(z));
printf("do it all again, but initialize acc2 via memory this time (funct=2)\n");
assert(z == x+y);
printf("success!\n");
}
"Illegal instruction" means your processor threw an illegal instruction exception.
Since custom0 is not going to be something Linux will know how to execute in software (since it's something that's customizable!), Linux will panic and throw the error that you saw.
The question I have for you is "Did you implement the custom0 instruction in the processor? Is it enabled? Did the program execute your custom0 instruction properly and return the correct answer when you used the proxy-kernel?"

help understanding MonoTouch crash log

My MonoTouch app (release build) is crashing randomly and I'm getting this in the crash log. Unfortunately, I don't see anything useful related to my app. It looks like it's down deep in the bowels of MonoTouch and iOS.
I'm running this on an iPhone 3G with OS 3.1.2.
Can anyone help me understand what this crash log means?
Incident Identifier: 222781AB-0F7C-4E1D-9E10-6EE946D6C320
CrashReporter Key: 0ee985a48f32f63b7e50536870f06a1ab4122600
Process: MyApp_iOS [593]
Path: /var/mobile/Applications/095A615B-2F9B-4A84-B0E3-EF1246915594/MyApp_iOS.app/MyApp_iOS
Identifier: MyApp_iOS
Version: ??? (???)
Code Type: ARM (Native)
Parent Process: launchd [1]
Date/Time: 2011-03-24 13:04:18.479 -0700
OS Version: iPhone OS 3.1.2 (7D11)
Report Version: 104
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x00000000, 0x00000000
Crashed Thread: 0
Thread 0 Crashed:
0 dyld 0x2fe125b2 ImageLoaderMachOCompressed::findExportedSymbol(char const*, ImageLoader const**) const + 58
1 dyld 0x2fe0dcd6 ImageLoaderMachO::findExportedSymbol(char const*, bool, ImageLoader const**) const + 30
2 dyld 0x2fe0ee6e ImageLoaderMachOClassic::resolveUndefined(ImageLoader::LinkContext const&, macho_nlist const*, bool, bool, ImageLoader const**) + 434
3 dyld 0x2fe10250 ImageLoaderMachOClassic::doBindLazySymbol(unsigned long*, ImageLoader::LinkContext const&) + 212
4 dyld 0x2fe037ae dyld::bindLazySymbol(mach_header const*, unsigned long*) + 94
5 dyld 0x2fe0e29c stub_binding_helper_interface + 12
6 MyApp_iOS 0x0071a754 mono_handle_native_sigsegv (mini-exceptions.c:1762)
7 MyApp_iOS 0x0073d900 sigabrt_signal_handler (mini-posix.c:155)
8 libSystem.B.dylib 0x0008e81c _sigtramp + 28
9 libSystem.B.dylib 0x00033904 semaphore_wait_signal + 4
10 libSystem.B.dylib 0x00003ca8 pthread_mutex_lock + 440
11 MyApp_iOS 0x0088e76c GC_lock (pthread_support.c:1679)
12 MyApp_iOS 0x00884970 GC_malloc_atomic (malloc.c:259)
13 MyApp_iOS 0x007f26e4 mono_object_new_ptrfree_box (object.c:3996)
[... there are 10 active threads but I've only included the one that crashed]
Thread 0 crashed with ARM Thread State:
r0: 0x00000000 r1: 0x0097dc97 r2: 0x344d7c3c r3: 0x344dd2bd
r4: 0x344dd2bd r5: 0x00005681 r6: 0x0097dc97 r7: 0x2fffe6d8
r8: 0x344e7f34 r9: 0x00000001 r10: 0x0000007f r11: 0x0097dc97
ip: 0x344d8e4c sp: 0x2fffe658 lr: 0x2fe0dcdd pc: 0x2fe125b2
cpsr: 0x20000030
Another diagnosis option I've found is to:
Hook up AppDomain.CurrentDomain.UnhandledException
Put a try-catch around your entire "static void Main()" method
In both causes write the exception to Console.WriteLine().
Then run your app, open XCode and open the console window for your device while it's plugged in. Then cause the crash. You should be able to see a decent C# stack trace of the exception.
This has helped me fix many issues that only happen when running in release on the device.

Resources