Google breakpad catches signal (about crash in native code via JNI) but the app still dies after it. What should be done in order to prevent it?
log:
03-08 15:04:13.398: ERROR/NATIVE_LIB(2828): init breakpad
03-08 15:04:13.398: ERROR/NATIVE_LIB(2828): testing crash
03-08 15:04:13.468: ERROR/NATIVE_LIB(2828): Dump path: ./5f6097b2-5feb-0723-3271a7ff-2a4fcadd.dmp
03-08 15:04:13.468: WARN/crash_handler(2828): Caught a crash, signum=11
...
03-08 15:04:14.589: INFO/ActivityManager(544): Process name.antonsmirnov.android.app (pid 2828) has died.
code:
#include "native_lib.h"
#include <stdio.h>
#include "client/linux/handler/exception_handler.h"
#include "client/linux/handler/minidump_descriptor.h"
#include <android/log.h>
void debug(const char *format, ... ) {
va_list argptr;
va_start(argptr, format);
__android_log_vprint(ANDROID_LOG_ERROR, "NATIVE_LIB", format, argptr);
va_end(argptr);
}
bool DumpCallback(const google_breakpad::MinidumpDescriptor& descriptor,
void* context,
bool succeeded) {
debug("Dump path: %s\n", descriptor.path());
return succeeded;
}
JNIEXPORT jint JNICALL Java_name_antonsmirnov_android_app_libnative_func(JNIEnv *env, jobject obj)
{
debug("init breakpad");
google_breakpad::MinidumpDescriptor descriptor(".");
google_breakpad::ExceptionHandler eh(descriptor, NULL, DumpCallback, NULL, true, -1);
{
debug("testing crash\n");
char *ptr = 0;
*ptr = '!'; // ERROR HERE!
debug("unreachable\n");
}
debug("finished\n");
}
In some cases, there is no way of preventing your app from crashing, even if you use Breakpad or CoffeeCatch.
However, you will be notified before the crash occurs. You can use that time to warn the user about what is happening (a fatal error) and what will happen next (the app will force close).
Related
I need to write a module that creates a file and outputs an inscription with a certain frequency. I implemented it. But when this module is running, at some point the system crashes and no longer turns on.
#include <linux/module.h>
#include <linux/init.h>
#include <linux/fs.h>
#include <linux/uaccess.h>
#include <linux/kernel.h>
#include <linux/timer.h>
MODULE_LICENSE("GPL");
#define BUF_LEN 255
#define TEXT "Hello from kernel mod\n"
int g_timer_interval = 10000;
static struct file *i_fp;
struct timer_list g_timer;
loff_t offset = 0;
char buff[BUF_LEN + 1] = TEXT;
void timer_rest(struct timer_list *timer)
{
mod_timer(&g_timer, jiffies + msecs_to_jiffies(g_timer_interval));
i_fp = filp_open("/home/hajol/Test.txt", O_RDWR | O_CREAT, 0644);
kernel_write(i_fp, buff, strlen(buff), &offset);
filp_close(i_fp, NULL);
}
static int __init kernel_init(void)
{
timer_setup(&g_timer, timer_rest, 0);
mod_timer(&g_timer, jiffies + msecs_to_jiffies(g_timer_interval));
return 0;
}
static void __exit kernel_exit(void)
{
pr_info("Ending");
del_timer(&g_timer);
}
module_init(kernel_init);
module_exit(kernel_exit);
When the system crashes, you should get a very detailed error message from the kernel, letting you know where and why this happened (the "oops" message):
Read that error message
Read it again
Understand what it means (this often requires starting over from step 1 a couple of times :-) )
One thing that jumps out at me is that you're not going any error checking on the return value of filp_open. So you could very well be feeding a NULL pointer (or error pointer) into kernel_write.
I'm trying to use QueueUserAPC to run some function asyncronously on a specific thread. My code is working ok when compiled for x64, but when I compile and run for x86, I get an access violation.
At the end of my post is a minimal, complete example that shows what I am trying to do (cleanup of thread and events omitted for brevity).
On my machine, when I compile and run for "x64", I get the expected output:
waiting...
async function!
waiting...
async function!
waiting...
ConsoleApplication3.exe (process 17100) exited with code 0.
When I compile and run for "x86", I get:
waiting...
async function!
And then the access violation, here:
if (WaitForSingleObjectEx(param, INFINITE, TRUE) == WAIT_OBJECT_0)
Exception thrown at 0x776227FB (ntdll.dll) in ConsoleApplication3.exe: 0xC0000005: Access violation reading location 0x36623194.
What am I doing wrong?
Full example:
#include "pch.h"
#include <iostream>
#include <windows.h>
#include <stdio.h>
#include <conio.h>
DWORD ThreadFunction(LPVOID param)
{
while (true)
{
printf("waiting...\n");
if (WaitForSingleObjectEx(param, INFINITE, TRUE) == WAIT_OBJECT_0)
break;
}
ExitThread(0);
return 0;
}
void AsyncFunction(UINT_PTR param)
{
printf("async function!\n");
}
int main()
{
HANDLE hThread, hStopEvent;
DWORD threadID;
hStopEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
hThread = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE) ThreadFunction, hStopEvent, 0, &threadID);
Sleep(1000);
QueueUserAPC((PAPCFUNC) AsyncFunction, hThread, NULL);
Sleep(1000);
QueueUserAPC((PAPCFUNC) AsyncFunction, hThread, NULL);
Sleep(1000);
SetEvent(hStopEvent);
WaitForSingleObject(hThread, INFINITE);
}
There are two mistakes with my AsyncFunction definition:
1 - The parameter type should be ULONG_PTR, not UINT_PTR. This was actually a copy-paste error from my real implementation
2 - Missing calling convention from the function
void AsyncFunction(UINT_PTR param) should be void CALLBACK AsyncFunction(ULONG_PTR param)
And then there is no need for the cast to PAPCFUNC here:
QueueUserAPC((PAPCFUNC) AsyncFunction, hThread, NULL);
Look at this Unix C program:
#include <stdio.h>
#include <signal.h>
void handler(int signum)
{
printf("Handler signum=%d\n",signum);
}
int main(int argc, char *argv)
{
printf("Start\n");
signal(SIGFPE, handler);
int i=10/0;
printf("Next\n");
return 0;
}
As you can see, i am connecting SIGFPE to an handler.
Then, i make a DIV0 erreur.
The handler is fired, that is great.
But, this handler is called in loop !
Why ?
Thanks
If you simply return from your handler, execution resumes at the point where the signal was thrown, which results in another divide by zero error, which results in the handler being called again, and so on. You need to arrange for execution to continue at some other point in the code. The traditional approach is to use setjmp/longjmp, something like this
#include <stdio.h>
#include <signal.h>
#include <setjmp.h>
jmp_buf buf;
void handler(int signum)
{
longjmp(buf, signum);
}
int main(int argc, char *argv)
{
int rc = setjmp(buf);
if (rc == 0) {
printf("Start\n");
signal(SIGFPE, handler);
int i=10/0;
}
printf("Handler signum=%d\n", rc);
printf("Next\n");
return 0;
}
Note: this approach is very old school, and probably someone can suggest a better way to handle it. Also, you are probably better off calling sigaction rather than signal, as the semantics of signal are not consistent across different versions of Unix.
I'm using google breakpad to catch incorrect operations which lead SIGSEGV signal. I expect the process to continue but it's finished by dalvik jvm in android. How can i recovery the process from finishing on android?
Have you tried this?
#include <signal.h>
#include <stdio.h>
#define __USE_GNU
#include <ucontext.h>
int *p = NULL;
int n = 100;
void sighandler(int signo, siginfo_t *si, ucontext_t* context)
{
printf("Handler executed for signal %d\n", signo);
context->uc_mcontext.gregs[REG_RAX] = &n;
}
_your_amazing_method (jenv *env, jobject obj, _your_args)
{
do_some_dirty_stuff();
signal(SIGSEGV, sighandler);
printf("%d\n", *p); // ... movl (%rax), %esi ...
return 0;
}
I'm debugging a select loop that normally works OK but dies with segmentation fault under heavy load. I've figured out that the program is sometimes invoking FD_ISSET() for a (correct) descriptor that was not added to the select set. Like in a following snippet:
#include <sys/select.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
void die(const char* msg)
{
fprintf(stderr, "fatal %s", msg);
exit(1);
}
int main(void)
{
FILE* file = fopen("/tmp/test", "r");
if (file == NULL)
die("fopen");
int file_fd = fileno(file);
fd_set read_fds;
int max_fd = 0;
FD_ZERO(&read_fds);
// Only stdin is added to read_fds.
FD_SET(0, &read_fds);
if (select(max_fd + 1, &read_fds, NULL, NULL, NULL) < 0)
die("select");
if (FD_ISSET(0, &read_fds))
printf("Can read from 0");
// !!! Here FD_ISSET is called with a valid descriptor that was
// not added to read_fds.
if (FD_ISSET(file_fd, &read_fds))
printf("Can read from file_fd");
return 0;
}
It is obvious that the check marked with !!! should never return true, but is it possible that it can be the cause of the SEGFAULT? When I run this snippet under valgrind, no errors are reported, but when I run my load test under valgrind I'm ocasionnaly seing errors like:
==25513== Syscall param select(writefds) points to uninitialised byte(s)
==25513== at 0x435DD2D: ___newselect_nocancel (syscall-template.S:82)
FD_ISSET() tests to see if a file descriptor is a part of the set read_fds. This means that FD_ISSET should not cause the segmentation fault.
Try checking for errno value set prior to calling the FD_ISSET. The select should be causing the segfault.
Also check that the file_fd value isn't greater than FD_MAX.