Linux C and C++: what else should I be logging when handling signals like SIGSEGV?

Linux C and C++: what else should I be logging when handling signals like SIGSEGV? - linux

Working on some linux (Ubuntu) systems, running some in-house C and C++ apps (gcc).
There is a long list of signals which are handled, such as SIGSEGV and SIGINT. On signal, the callstack is obtained using backtrace(3) and backgrace_symbols(3). For C++ the function names are even demangled with abi::__cxa_demangle().
My question is: when these signals come up, what other C/C++ API is there which would give us more useful information to log for debugging after-the-fact? Or is the backtrace the only 'sexy' thing to do?

You may want to enable core dumps... ulimit -c unlimited or similar. Then you can load the core file into GDB and see what happened to the program.

Related

How to find reason for SEGFAULT in large multi-thread C program?

I am working with a multi-thread program for Linux embedded systems that crashes very randomly through SEGFAULT signal.
I need to find the issue WITHOUT using gdb since the crash occurs only under production environment, never during testing.
I know the symbol table of the program and I'm using sigaction() and backtrace() in main thread but I don't get enough information. The backtraced lines are from sigaction function itself. I allow 50 frames to be captured and I use -g flag in gcc for compilation:
Caught segfault at address 0xe76acc
Obtained 3 stack frames.
./mbca(_Z11print_tracev+0x20) [0x37530]
./mbca(_Z18segfault_sigactioniP7siginfoPv+0x34) [0x375f4]
/lib/libc.so.6(__default_rt_sa_restorer_v2+0) [0x40db5a60]
As the program is running 15 threads, I would like to get a clue about from which one is coming the signal so I can limit the possibilities. FYI Main thread creates a fork and that fork creates the remaining 14 threads.
How can I achive this? What can I do with the information I already have?
Thank you all for your help
PD: I also tried Core-dump file but it is not generated because this option was not included in kernel compilation and I cannot modify it.

How to collect a minimum debug data from a truncated core of Linux C program? with or without GDB?

How to collect a minimum debug data from a truncated core of Linux C program?
I tried some hours to reproduce the segmentation-fault signal with the same program but I did not succed. I only succeded to get it once. I don't know how to extract a minimum information with gdb. I thougth I was experienced with gdb but now I think I have to learn a lot again...
Perhaps someone knows another debugger than GDB to try something with. By minimum information I mean the calling function which produces a segfault signal.
My core file is 32MB and GDB indicates the core allocated memory requiered is 700MB. How to know if the deeper stack function concerned by the segfault is identified inside the core file or not?
From the name of the file I already know the concerned thread name but it is not enougth to debug the program.
I collected the /proc/$PID/maps file of the main program but I don't know if it is usefull to retrieve the segfault function.
Moreover, how to know if the segfault signal was produced from inside the thread or if it came from outside the thread?

Generating core dumps

From times to times my Go program crashes.
I tried a few things in order to get core dumps generated for this program:
defining ulimit on the system, I tried both ulimit -c unlimited and ulimit -c 10000 just in case. After launching my panicking program, I get no core dump.
I also added recover() support in my program and added code to log to syslog in case of panic but I get nothing in syslog.
I am running out of ideas right now.
I must have overlooked something but I do not find what, any help would be appreciated.
Thanks ! :)

Note that a core dump is generated by the OS when a condition from a certain set is met. These conditions are pretty low-level — like trying to access unmapped memory or trying to execute an opcode the CPU does not know etc. Under a POSIX operating system such as Linux when a process does one of these things, an appropriate signal is sent to it, and some of them, if not handled by the process, have a default action of generating a core dump, which is done by the OS if not prohibited by setting a certain limit.
Now observe that this machinery treats a process on the lowest possible level (machine code), but the binaries a Go compiler produces are more higher-level that those a C compiler (or assembler) produces, and this means certain errors in a process produced by a Go compiler are handled by the Go runtime rather than the OS. For instance, a typical NULL pointer dereference in a process produced by a C compiler usually results in sending the process the SIGSEGV signal which is then typically results in an attempt to dump the process' core and terminate it. In contrast, when this happens in a process compiled by a Go compiler, the Go runtime kicks in and panics, producing a nice stack trace for debugging purposes.
With these facts in mind, I would try to do this:
Wrap your program in a shell script which first relaxes the limit for core dumps (but see below) and then runs your program with its standard error stream redirected to a file (or piped to the logger binary etc).
The limits a user can tweak have a hierarchy: there are soft and hard limits — see this and this for an explanation. So try checking your system does not have 0 for the core dump size set as a hard limit as this would explain why your attempt to raise this limit has no effect.
At least on my Debian systems, when a program dies due to SIGSEGV, this fact is logged by the kernel and is visible in the syslog log files, so try grepping them for hints.

First, please make sure all errors are handled.
For core dump, you can refer generate a core dump in linux
You can use supervisor to reboot the program when it crashes.

Address of instruction causing SIGSEGV in external program

I want to get address of instruction that causes external program to SIGSEGV. I tried using ptrace for this, but I'm getting EIP from kernel space (probably default signal handler?). How GDB is able to get the correct EIP?
Is there a way to make GDB provide this information using some API?
edit:
I don't have sources of the program, only binary executable. I need automation, so I can't simply use "run", "info registers" in GDB. I want to implement "info registers" in my own mini-debugger :)

You can attach to a process using ptrace. I found an article at Linux Gazette.
It looks like you will want PTRACE_GETREGS for the registers. You will want to look at some example code like strace to see how it manages signal handling and such. It looks to me from reading the documentation that the traced child will stop at every signal and the tracing parent must wait() for the signal from the child then command it to continue using PTRACE_CONT.

Compile your program with -g, run gdb <your_app>, type run and the error will occur. After that use info registers and look in the rip register.
You can use objectdump -D <your_app> to get some more information about the code at that position.

You can enable core dumps with ulimit -c unlimited before running your external program.
Then you can examine the core dump file after a crash using gdb /path/to/program corefile
Because it is binary and not compiled with debugging options you will have to view details at the register and machine code level.

Try making a core dump, then analyse it with gdb. If you meant you wanted to make gdb run all your commands at one touch of a key by 'automate', gdb ca do that too. Type your commands into a file and look into the help user-defined section of manuals, gdb can handle canned commands.

What is a good way to dump a Linux core file from inside a process?

We have a server (written in C and C++) that currently catches a SEGV and dumps some internal info to a file. I would like to generate a core file and write it to disk at the time we catch the SEGV, so our support reps and customers don't have to fuss with ulimit and then wait for the crash to happen again in order to get a core file. We have used the abort function in the past, but it is subject to the ulimit rules and doesn't help.
We have some legacy code that reads /proc/pid/map and manually generates a core file, but it is out of date, and doesn't seem very portable (for example, I'm guessing it would not work in our 64 bit builds). What is the best way to generate and dump a core file in a Linux process?

Google has a library for generating coredumps from inside a running process called google-coredumper. This should ignore ulimit and other mechanisms.
The documentation for the call that generates the core file is here. According to the documentation, it seems that it is feasible to generate a core file in a signal handler, though it is not guaranteed to always work.

I saw pmbrett's post and thought "hey, thats cool" but couldn't find that utility anywhere on my system ( Gentoo ).
So I did a bit of prodding, and discovered GDB has this option in it.
gdb --pid=4049 --batch -ex gcore
Seemed to work fine for me.
Its not however very useful because it traps the very lowest function that was in use at the time, but it still does a good job outside that ( With no memory limitations, Dumped 350M snapshot of a firefox process with it )

Try using the Linux command gcore
usage: gcore [-o filename] pid
You'll need to use system (or exec) and getpid() to build up the right command line to call it from within your process

Some possible solutions^W ways of dealing with this situation:
Fix the ulimit!!!
Accept that you don't get a core file and run inside gdb, scripted to do a "thread all apply bt" on SIGSEGV
Accept that you don't get a core file and acquired a stack trace from within the application. The Stack Backtracing Inside Your Program article is pretty old but it should be possible these days too.

You can also change the ulimit() from within your program with setrlimit(2). Like the ulimit shell command, this can lower limits, or raise them as hard as the hard limit allows. At startup setrlimit() to allow core dumping, and you're fine.

I assume you have a signal handler that catches SEGV, for example, and does something like print a message and call _exit(). (Otherwise, you'd have a core file in the first place!) You could do something like the following.
void my_handler(int sig)
{
...
if (wantCore_ && !fork()) {
setrlimit(...); // ulimit -Sc unlimited
sigset(sig, SIG_DFL); // reset default handler
raise(sig); // doesn't return, generates a core file
}
_exit(1);
}

use backtrace and backtrace_symbols glibc calls to get the trace, just keep in mind that backtrace_symbols uses malloc internally and in case of heap corruption it might fail.

system ("kill -6 ")
I'd give it a try if you are still looking for something

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string