get assert() failure message in android ndk sigaction crash handler - android-ndk

I am using sigaction() to install a crash handler in my app and print private version information.
In case of a failed assert() the abort message is super useful and I'd like my sigaction handler to print it. How can I get that string? Below is some analysis I've done thus far:
I noticed in ndk debuggerd/tombstone.c engrave_tombstone() that there is a argument uintptr_t abort_msg_address which contains the abort message, typically a failed assert. But no clue where that is fetched from. Noticing the default crash logs of android seem to run in a different process and debuggerd imples it's a daemon, I am not sure this is the right way to go.
I furthermore see in bionic libc/stdlib/assert.c there is just a call to __libc_android_log_print(ANDROID_LOG_FATAL, ...). Also not very helpful. But in bionic linker_main.cpp there is abort_msg_t* g_abort_message = nullptr; and other exciting stuff in android_set_abort_message.cpp. Again, not sure this is the right way, feels very hackish.
This is what the crash handler of android prints by default. Note how the first message in in the crashed pid/tid, but the others are some other random process (presumably, debuggerd?).
10-09 16:49:28.551 12084 12127 F libc : Fatal signal 6 (SIGABRT), code -6 (SI_TKILL) in tid 12127 (applyRouting), pid 12084 (com.android.nfc)
10-09 16:49:28.691 12203 12203 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
10-09 16:49:28.692 12203 12203 F DEBUG : Build fingerprint: 'Android/aosp_marlin/marlin:Q/OC-MR1/summit07191458:userdebug/test-keys'
10-09 16:49:28.692 12203 12203 F DEBUG : Revision: '0'
10-09 16:49:28.692 12203 12203 F DEBUG : ABI: 'arm64'
10-09 16:49:28.692 12203 12203 F DEBUG : pid: 12084, tid: 12127, name: applyRouting >>> com.android.nfc <<<
10-09 16:49:28.692 12203 12203 F DEBUG : signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
10-09 16:49:28.692 12203 12203 F DEBUG : Abort message: 'jni_internal.cc:622] JNI FatalError called: applyRouting'
10-09 16:49:28.692 12203 12203 F DEBUG : x0 0000000000000000 x1 0000000000002f5f x2 0000000000000006 x3 0000000000000008

Related

(cocos2d-x) How to debug android native crash with anonymous and unknown backtrace?

I use cocos2d-x and ndk-build to build app on arm64. But when i run it on 64bit device, the app crash randomly with error signal 11 (SIGSEGV), and the backtrace shows anonymous and unknown.
I use cocos2d-x 3.17.1, ndk 16, Android Studio 3.4.1, gradle tools 3.2.0 and gradle wrapper 4.6.
I tried ndk-stack but it didn't show me more useful information.
This is the log in the Logcat.
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'Xiaomi/chiron/chiron:8.0.0/OPR1.170623.027/V10.3.1.0.ODECNXM:user/release-keys'
Revision: '0'
ABI: 'arm64'
pid: 17667, tid: 17711, name: GLThread 135726 >>> com.test.myapp <<<
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x72adf0f460
x0 00000072bc089378 x1 0000000000000000 x2 fffd8072bc08bc18 x3 fffd8072bb3f4950
x4 00000000ee1763de x5 fffd8072bc096c88 x6 ff687373604f6d64 x7 7f7f7f7f7f7f7f7f
x8 00000072b09a9f08 x9 00000072b09a9f00 x10 fffffffffffffffb x11 00000072a9bcc6c8
x12 00000072aceade40 x13 0000000000000000 x14 0000000000697474 x15 00000072a9ac6c10
x16 0000000000000001 x17 fffa0072a9ac58c8 x18 0000000000000012 x19 00000072b09a9e98
x20 fffd8072a9bc31e0 x21 00000072a9bcfa00 x22 00000072bc0893d8 x23 00000072ac48ab40
x24 00000072b162cca8 x25 00000072ade90978 x26 00000072a94d0c20 x27 00000072a99953e0
x28 00000072a9995070 x29 00000072b0da8080 x30 fffd8072a9ac48a0
sp 00000072b0da8060 pc 00000072adf0f460 pstate 0000000080000000
backtrace:
#00 pc 0000000000078460 <anonymous:00000072ade97000>
#01 pc fffd8072a9ac489c <unknown>
This is the log with ndk-stack
********** Crash dump: **********
Build fingerprint: 'Xiaomi/chiron/chiron:8.0.0/OPR1.170623.027/V10.3.1.0.ODECNXM:user/release-keys'
pid: 17667, tid: 17711, name: GLThread 135726 >>> com.test.myapp <<<
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x72adf0f460
Stack frame #00 pc 0000000000078460 <anonymous:00000072ade97000>
Stack frame #01 pc fffd8072a9ac489c <unknown>
I expect the backtrace and ndk-stack can show me where the problem is, but it shows only anonymous and unknown.

How to fix MPI_ERR_RMA_SHARED?

I wrote an MPI program in which I use shared memory through MPI_Win_Allocate_shared command, then I run the program on a Virtual Machine with 4 cpus on Azure.
Everything works well with 1 or processes, but it doesn't work with 3 or 4.
I know that MPI_Win_Allocate_shared works only if processes are on the same node, so I thought the problem was related to that. I tried to solve that with an hostfile setting "AzureVM slots=4 max_slots=8", but I still get error.
I'll report the error below:
mpiexec -np 3 --hostfile my_host --oversubscribe tables
[AzureVM][[37487,1],1][btl_openib_component.c:652:init_one_port] ibv_query_gid failed (mlx4_0:1, 0)
[AzureVM][[37487,1],0][btl_openib_component.c:652:init_one_port] ibv_query_gid failed (mlx4_0:1, 0)
[AzureVM][[37487,1],2][btl_openib_component.c:652:init_one_port] ibv_query_gid failed (mlx4_0:1, 0)
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: AzureVM
Local device: mlx4_0
--------------------------------------------------------------------------
[AzureVM:01918] 2 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[AzureVM:01918] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[AzureVM:1930] *** An error occurred in MPI_Win_allocate_shared
[AzureVM:1930] *** reported by process [2456748033,2]
[AzureVM:1930] *** on communicator MPI_COMM_WORLD
[AzureVM:1930] *** MPI_ERR_RMA_SHARED: Memory cannot be shared
[AzureVM:1930] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[AzureVM:1930] *** and potentially your MPI job)
[AzureVM:01918] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
Makefile:54: recipe for target 'table' failed
make: *** [table] Error 71
Please, could someone explain me how to solve the problem?? Thank you in advance!
Hi, have you solved the problem?
Consider adding these two lines (following the quide)
MPI_Comm nodecomm;
MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, &nodecomm);
And after, allocate memory with
// define alloc_length (sth like: int alloc_length = 10 * sizeof(int);)
MPI_Win win;
MPI_Win_allocate_shared (alloc_length, 1, info, shmcomm, &mem, &win);
I had the same problem (a similar error log at least) and solved it exactly in the way I described above
To better understand, see this. I tested the code at the end of the answer chosen as the best one, and unfortunately, it didn't work for me. I modified it as follows:
#include <stdio.h>
#include <mpi.h>
#define ARRAY_LEN 32
int main() {
MPI_Init(NULL, NULL);
int * baseptr;
MPI_Comm nodecomm;
MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0,
MPI_INFO_NULL, &nodecomm);
int nodesize, noderank;
MPI_Comm_size(nodecomm, &nodesize);
MPI_Comm_rank(nodecomm, &noderank);
MPI_Win win;
int size = (noderank == 0)? ARRAY_LEN * sizeof(int) : 0;
MPI_Win_allocate_shared(size, 1, MPI_INFO_NULL,
nodecomm, &baseptr, &win);
if (noderank != 0) {
MPI_Aint size;
int disp_unit;
MPI_Win_shared_query(win, 0, &size, &disp_unit, &baseptr);
}
for (int i = noderank; i < ARRAY_LEN; i += nodesize)
baseptr[i] = noderank;
MPI_Barrier(nodecomm);
if (noderank == 0) {
for (int i = 0; i < nodesize; i++)
printf("%4d", baseptr[i]);
printf("\n");
}
MPI_Win_free(&win);
MPI_Finalize();
}
Now, if you name the code above as test.cpp
mpic++ test.cpp && mpirun -n 8 ./a.out will output 0 1 2 3 4 5 6 7
Some right tips I took from here
Good luck!

Giving S_IWUGO permission to module parameter results in compilation error (while S_IRUGO or S_IXUGO doesn't) - why?

I wrote a simple kernel module to learn module_param feature of the kernel module. However, if I give the S_IWUGO, S_IRWXUGO or S_IALLUGO permissions for the perm field, I get the follwing compilation error:
[root#localhost param]# make -C $KDIR M=$PWD modules
make: Entering directory `/usr/src/kernels/3.11.10-301.fc20.i686+PAE'
CC [M] /root/ldd/misc/param/param/hello.o
/root/ldd/misc/param/param/hello.c:6:168: error: negative width in bit-field ‘<anonymous>’
module_param(a, int, S_IWUGO);
^
make[1]: *** [/root/ldd/misc/param/param/hello.o] Error 1
make: *** [_module_/root/ldd/misc/param/param] Error 2
make: Leaving directory `/usr/src/kernels/3.11.10-301.fc20.i686+PAE'
Compilation is successful for S_IRUGO or S_IXUGO (permission not containing Write permssion). I suppose I must be doing something wrong because from what I know, wrtie permission is legal. What am I doing wrong here?
The program:
#include<linux/module.h>
#include<linux/stat.h>
int a = 2;
module_param(a, int, S_IXUGO);
int f1(void){
return 0;
}
void f2(void){
}
module_init(f1);
module_exit(f2);
MODULE_AUTHOR("lavya");
MODULE_LICENSE("GPL v2");
MODULE_DESCRIPTION("experiment with parameters");
Linux does not accept the S_IWOTH permission.
If you follow the macro chain behind module_param, you arrive to __module_param_call which includes:
BUILD_BUG_ON_ZERO((perm) < 0 || (perm) > 0777 || ((perm) & 2))
S_IWOTH == 2 so the test fails.
The negative width in bit-field error is merely is an artefact of the implementation of BUILD_BUG_ON_ZERO
Linux probably refuses to make module parameters world-writable for security reasons. You should be able to use narrower permissions such as S_IWUSR | S_IWGRP.

GlibC Double free or corruption (fclose)

I got an error on my C program on runtime. I found some stuff about "double free or corruption" error but nothing relevant.
Here is my code :
void compute_crc32(const char* filename, unsigned long * destination)
{
FILE* tmp_chunk = fopen(filename, "rb");
printf("\n\t\t\tCalculating CRC...");
fflush(stdout);
Crc32_ComputeFile(tmp_chunk, destination);
printf("\t[0x%08lX]", *destination);
fflush(stdout);
fclose(tmp_chunk);
printf("\t[ OK ]");
fflush(stdout);
}
It seems the
fclose(tmp_chunk);
raises this glibc error :
*** glibc detected *** ./crc32: double free or corruption (out): 0x09ed86f0 ***
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x75ee2)[0xb763cee2]
/lib/i386-linux-gnu/libc.so.6(fclose+0x154)[0xb762c424]
./crc32[0x80498be]
./crc32[0x8049816]
./crc32[0x804919c]
./crc32[0x8049cc2]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75e04d3]
./crc32[0x8048961]
In the console output, the last CRC is displayed but not the last "[ OK ]"...
I never have this type of error and I searched for hours on Google but nothing really interesting in my case... please help :)
Now I have another error :
*** glibc detected *** ./xsplit: free(): invalid next size (normal): 0x095a66f0 ***
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x75ee2)[0xb7647ee2]
/lib/i386-linux-gnu/libc.so.6(fclose+0x154)[0xb7637424]
./xsplit[0x80497f7]
./xsplit[0x804919c]
./xsplit[0x8049cd6]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75eb4d3]
./xsplit[0x8048961]
What the hell is this ? I'm lost... :(
*** glibc detected *** ./crc32: double free or corruption
Glibc is telling you that you've corrupted heap.
The tools to find such corruption on Linux are Valgrind and AddressSanitizer.
Chances are, either one of them will immediately tell you what your problem is.

Kernel Module: hrtimer_start "Unknown Symbol in Module"

I'm building a kernel module that uses the hrtimer interface. I have the module compiling successfully, and it's got MODULE_LICENSE("GPL") set:
make -C /lib/modules/3.0.0-23-server/build SUBDIRS=/home/projects/net-modeler modules
make[1]: Entering directory `/usr/src/linux-headers-3.0.0-23-server'
CC [M] /home/projects/net-modeler/nm_injector.o
CC [M] /home/projects/net-modeler/nm_scheduler.o
LD [M] /home/projects/net-modeler/net-modeler.o
Building modules, stage 2.
MODPOST 1 modules
CC /home/projects/net-modeler/net-modeler.mod.o
LD [M] /home/projects/net-modeler/net-modeler.ko
make[1]: Leaving directory `/usr/src/linux-headers-3.0.0-23-server'
... but when I try to insmod it, dmesg outputs
[111853.094925] Unknown symbol hrtimer_init (err 0)
[111853.094931] Unknown symbol hrtimer_start (err 0)
[111853.094942] Unknown symbol hrtimer_cancel (err 0)
Those functions are externed inside of <linux/hrtimer.h>, and exported in kernel/hrtimer.c as follows:
/**
* hrtimer_init - initialize a timer to the given clock
* #timer: the timer to be initialized
* #clock_id: the clock to be used
* #mode: timer mode abs/rel
*/
void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
enum hrtimer_mode mode)
{
debug_init(timer, clock_id, mode);
__hrtimer_init(timer, clock_id, mode);
}
EXPORT_SYMBOL_GPL(hrtimer_init);
cat /proc/kallsyms | grep <func> for the three functions results in:
0000000000000000 T hrtimer_init
0000000000000000 T hrtimer_cancel
0000000000000000 T hrtimer_start
Can anyone help me figure out what's going on? It seems to me that all of the functions are exported and they should be able to be found, but for some reason they're not. Am I doing something stupid?
For anyone else trying to solve this problem, MODULE_LICENSE("GPL") must be in all of the module files, not just the main one.
Without that, the file that actually contained the function calls was restricted from accessing them by EXPORT_SYMBOL_GPL.

Resources