Generating core dumps - linux

From times to times my Go program crashes.
I tried a few things in order to get core dumps generated for this program:
defining ulimit on the system, I tried both ulimit -c unlimited and ulimit -c 10000 just in case. After launching my panicking program, I get no core dump.
I also added recover() support in my program and added code to log to syslog in case of panic but I get nothing in syslog.
I am running out of ideas right now.
I must have overlooked something but I do not find what, any help would be appreciated.
Thanks ! :)

Note that a core dump is generated by the OS when a condition from a certain set is met. These conditions are pretty low-level — like trying to access unmapped memory or trying to execute an opcode the CPU does not know etc. Under a POSIX operating system such as Linux when a process does one of these things, an appropriate signal is sent to it, and some of them, if not handled by the process, have a default action of generating a core dump, which is done by the OS if not prohibited by setting a certain limit.
Now observe that this machinery treats a process on the lowest possible level (machine code), but the binaries a Go compiler produces are more higher-level that those a C compiler (or assembler) produces, and this means certain errors in a process produced by a Go compiler are handled by the Go runtime rather than the OS. For instance, a typical NULL pointer dereference in a process produced by a C compiler usually results in sending the process the SIGSEGV signal which is then typically results in an attempt to dump the process' core and terminate it. In contrast, when this happens in a process compiled by a Go compiler, the Go runtime kicks in and panics, producing a nice stack trace for debugging purposes.
With these facts in mind, I would try to do this:
Wrap your program in a shell script which first relaxes the limit for core dumps (but see below) and then runs your program with its standard error stream redirected to a file (or piped to the logger binary etc).
The limits a user can tweak have a hierarchy: there are soft and hard limits — see this and this for an explanation. So try checking your system does not have 0 for the core dump size set as a hard limit as this would explain why your attempt to raise this limit has no effect.
At least on my Debian systems, when a program dies due to SIGSEGV, this fact is logged by the kernel and is visible in the syslog log files, so try grepping them for hints.

First, please make sure all errors are handled.
For core dump, you can refer generate a core dump in linux
You can use supervisor to reboot the program when it crashes.

Related

Core files generated by linux kernel modules

I am trying to load a kernel module (out-of-tree) and dmesg shows a panic. The kernel is still up though. I guess the module panic'd.
Where to find the core file? I want to use gdb and see whats the problem.
Where to find the core file?
Core files are strictly a user-space concept.
I want to use gdb and see whats the problem.
You may be looking for KGDB and/or Kdump/Kexec.
Normally, whenever the coredump was generated, it will state "core dumped". This could be one high level easy way to confirm whether coredump got generated however, this statement alone cannot guarantee on coredump file availability. The location where coredump is generated is specified through core_pattern to kernel via sysctl. You need to check the information present in core_pattern of your system. Also, note that in case of Ubuntu, it appears that the coredump file size is kept as zero by default which will avoid generation of coredump. So, you might need to check the corefile size ulimit and change it to 'ulimit -c unlimited', if it is zero. The manpage http://man7.org/linux/man-pages/man5/core.5.html explains about various reasons due to which coredump shall not get generated.
However, from your explanation, it appears that you are facing 'kernel oops' as the kernel is still up(unstable state) even though a particular module got panic'd/killed. In such cases, kernel shall print an oops message. Refer to link https://www.kernel.org/doc/Documentation/oops-tracing.txt that has information regarding the kernel oops messages.
Abstract from the link: Normally the Oops text is read from the kernel buffers by klogd and
handed to syslogd which writes it to a syslog file, typically
/var/log/messages (depends on /etc/syslog.conf). Sometimes klogd
dies, in which case you can run dmesg > file to read the data from the
kernel buffers and save it. Or you can cat /proc/kmsg > file, however
you have to break in to stop the transfer, kmsg is a "never ending
file".
printk is used for generating the oops messages. printk does tagging of severity by means of different loglevels /priorities and allows the classification of messages according to their severity. (Different priorities are defined in file linux/kernel.h or linux/kern_levels.h, in form of macros like KERN_EMERG, KERN_ALERT, KERN_CRIT etc..)So, you may need to check the default logging levels in system by using cat /proc/sys/kernel/printk and change it as per your requirement. Also, check whether the logging daemons are up and incase you want to debug kernel, ensure that the kernel is compiled with CONFIG_DEBUG_INFO.
The method to use GDB to find the location where the kernel panicked or oopsed in ubuntu is in the link https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks which can be one of the method that can be used by you for debugging kernel oops.
there won't be a core file.
You should follow the stack trace in kernel messages. type dmesg to see it.

How to dump the heap of running C++ process to a file under Linux?

I've got a program that is running on a headless/embedded Linux box, and under certain circumstances that program seems to be using up quite a bit more memory (as reported by top, etc) than I would expect it to use.
Since the fault condition is difficult to reproduce outside of the actual working environment, and since the embedded box doesn't have niceties like valgrind or gdb installed, what I'd like to do is simply write out the process's heap-memory to a file, which I could then transfer to my development machine and look through at my leisure, to see if I can tell from the contents of the file what kind of data it is that is taking up the bulk of the heap. If I'm lucky there might be a smoking gun like a repeating string or magic-number that comes up a lot, that points me to the place in my code that is either leaking or perhaps just growing a data structure without bounds.
Is there a good way to do this? The only way I can think of would be to force the process to crash and then collect a core dump, but since the fault condition is rare it would be preferable if I could collect the information without crashing the process as a side effect.
You can read the entire memory space of the process via /proc/pid/mem; You can read /proc/pid/maps to see what is where in the memory space (so you can find the bounds of the heap and read just that). You can attempt to read the data while the process is running (in which case it might be changing while you are reading it), or you can stop the process with a SIGSTOP signal and later resume it with a SIGCONT.

How do I ensure my Linux program doesn't produce core dumps?

I've got a program that keeps security-sensitive information (such as private keys) in memory, since it uses them over the lifetime of the program. Production versions of this program set RLIMIT_CORE to 0 to try to ensure that a core dump that might contain this sensitive information is never produced.
However, while this isn't mentioned in the core(8) manpage, the apport documentation on the Ubuntu wiki claims,
Note that even if ulimit is set to disabled core files (by specyfing a
core file size of zero using ulimit -c 0), apport will still capture
the crash.
Is there a way within my process (i.e., without relying on configuration of the system external to it) that I can ensure that a core dump of my process is never generated?
Note: I'm aware that there are plenty of methods (such as those mentioned in the comments below) where a user with root or process owner privileges could still access the sensitive data. What I'm aiming at here is preventing unintentional exposure of the sensitive data through it being saved to disk, being sent to the Ubuntu bug tracking system, or things like that. (Thanks to Basile Starynkevitch for making this explicit.)
According to the POSIX spec, core dumps only happen in response to signals whose action is the default action and whose default action is to "terminate the process abnormally with additional actions".
So, if you scroll down to the list in the description of signal.h, everything with an "A" in the "Default Action" column is a signal you need to worry about. Use sigaction to catch all of them and just call exit (or _exit) in the signal handler.
I believe these are the only ways POSIX lets you generate a core dump. Conceivably, Linux might have other "back doors" for this purpose; unfortunately, I am not enough of a kernel expert to be sure...

How to set core dump naming scheme without su/sudo?

I am developing a MPI program on a Linux machine where I do not have sudo/su access. As my program currently segfaults, I would like to examine the core dumps via gdb. Unfortunately, as the program is multi-threaded, all the threads write to one core dump. So I would like to be able to append the PID to each separate core dump for every process.
I know there is a way to do it via /proc/sys/kernel/core_pattern, however I do not have access to write to this.
Thanks for any help.
It can be a pain to debug MPI apps on systems that are configured this way when you do not have root access. One option for working around this is to use Valgrind to get stack traces for your segfault(s). This will only be useful provided that your application will fail in a reasonable period of time when slowed down via Valgrind, and that it still segfaults at all in this case.
I usually run MPI apps under Valgrind like this:
% mpiexec -n 5 valgrind -q /path/to/my_app
That will send all of the Valgrind output to standard error. But if I want the output separated into different files, then you can get a bit fancier:
% mpiexec -n 5 valgrind -q --log-file='vg_out.%q{PMI_RANK}' /path/to/my_app
That's the setup for MPICH2. I think that for Open MPI you'll need to replace PMI_RANK with OMPI_MCA_ns_nds_vpid, but if that doesn't work for you then you'll need to check with the Open MPI developers on their discussion list. In either case, this will yield N files, where N is the size of MPI_COMM_WORLD, each named vg_out.0, vg_out.1, ..., to vg_out.$(($N-1)), each corresponding to a rank in MPI_COMM_WORLD.

What is a good way to dump a Linux core file from inside a process?

We have a server (written in C and C++) that currently catches a SEGV and dumps some internal info to a file. I would like to generate a core file and write it to disk at the time we catch the SEGV, so our support reps and customers don't have to fuss with ulimit and then wait for the crash to happen again in order to get a core file. We have used the abort function in the past, but it is subject to the ulimit rules and doesn't help.
We have some legacy code that reads /proc/pid/map and manually generates a core file, but it is out of date, and doesn't seem very portable (for example, I'm guessing it would not work in our 64 bit builds). What is the best way to generate and dump a core file in a Linux process?
Google has a library for generating coredumps from inside a running process called google-coredumper. This should ignore ulimit and other mechanisms.
The documentation for the call that generates the core file is here. According to the documentation, it seems that it is feasible to generate a core file in a signal handler, though it is not guaranteed to always work.
I saw pmbrett's post and thought "hey, thats cool" but couldn't find that utility anywhere on my system ( Gentoo ).
So I did a bit of prodding, and discovered GDB has this option in it.
gdb --pid=4049 --batch -ex gcore
Seemed to work fine for me.
Its not however very useful because it traps the very lowest function that was in use at the time, but it still does a good job outside that ( With no memory limitations, Dumped 350M snapshot of a firefox process with it )
Try using the Linux command gcore
usage: gcore [-o filename] pid
You'll need to use system (or exec) and getpid() to build up the right command line to call it from within your process
Some possible solutions^W ways of dealing with this situation:
Fix the ulimit!!!
Accept that you don't get a core file and run inside gdb, scripted to do a "thread all apply bt" on SIGSEGV
Accept that you don't get a core file and acquired a stack trace from within the application. The Stack Backtracing Inside Your Program article is pretty old but it should be possible these days too.
You can also change the ulimit() from within your program with setrlimit(2). Like the ulimit shell command, this can lower limits, or raise them as hard as the hard limit allows. At startup setrlimit() to allow core dumping, and you're fine.
I assume you have a signal handler that catches SEGV, for example, and does something like print a message and call _exit(). (Otherwise, you'd have a core file in the first place!) You could do something like the following.
void my_handler(int sig)
{
...
if (wantCore_ && !fork()) {
setrlimit(...); // ulimit -Sc unlimited
sigset(sig, SIG_DFL); // reset default handler
raise(sig); // doesn't return, generates a core file
}
_exit(1);
}
use backtrace and backtrace_symbols glibc calls to get the trace, just keep in mind that backtrace_symbols uses malloc internally and in case of heap corruption it might fail.
system ("kill -6 ")
I'd give it a try if you are still looking for something

Resources