Relevant debug data for a Linux target - linux

For an embedded ARM system running in-field there is a need to retrieve relevant debug information when a user-space application crash occurs. Such information will be stored in a non-volatile memory so it could be retreived at a later time. All such information must be stored during runtime, and cannot use third-party applications due to memory consumption concerns.
So far I have thought of following:
Signal ID and corresponding PC / memory addresses in case a kernel SIG occurs;
Process ID;
What other information do you think it's relevant in order to indentify the causing problem and be able to do a fast debug afterwards?
Thank you!

Usually, to be able to understand an issue, you'll need every register (from r0 to r15), the CPSR, and the top of the stack (to be able to determine what happened before the crash). Please also note that, when your program is interrupt for any invalid operation (jump to invalid address, ...), the processor goes to an exception mode, while you need to dump the registers and stack in the context of your process.
To be able to investigate, using those data, you also must keep the ELF files (with debug information, if possible) from your build, to be able to interpret the content of your registers and stack.
In the end, the more information you keep, the easier the debug is, but it may be expensive to keep every memory sections used by your program at the time of the failure (as a matter of fact, I've never done this).
In postmortem analysis, you will face some limits :
Dynamically linked libraries : if your crash occurs in a dynamically loaded and linked code, you will also need the lib binary you are using on your target.
Memory corruption : memory corruption usually results in the call of random data as code. On ARM with linux, this will probably lead to a segfault, as you can't go to an other process memory area, and as your data will probably be marked as "never execute", nevertheless, when the crash happens, you may have already corrupted the data that could have allow you to identify the source of the corruption. Postmortem analysis isn't always able to identify the failure cause.

Related

Why is eBPF said to be safer than LKM?

when talking about ebpf advantage, it always mentions safe than lkm.
I read some documentation, ebpf ensures safe by verifying code before it loaded.
these are checklists that verify to do:
loops
out of range jumps
unreachable instructions
invalid instructions
uninitialized register access
uninitialized stack access
misaligned stack access
out of range stack access
invalid calling convention
most of these checks I can understand, but it's all reason that lkm cause kernel panic? if do these can ensure safe?
I have 120000 servers in production, this question is the only reason to prevent me to migrate from traditional hids to ebpf hids. but if it can cause a kernel panic on a large scale, only one time, our business will be over.
Yes, as far as I know, the BPF verifier is meant to prevent any sort of kernel crash. That however doesn't mean you can't break things unintentionally in production. You could for example freeze your system by attaching BPF programs to all kernel functions or lose all connectivity by dropping all received packets. In those cases, the verifier has no way to know that you didn't mean to perform those actions; it won't stop you.
That being said, any sort of verification is better than no verification as in traditional kernel modules. With kernel modules, not only can you shoot yourself in the foot as I've described above, but you could also crash the whole system because of a subtle bug somewhere in the code.
Regardless of what you're using, you should obviously test it extensively before deploying to production.

Linux kernel module to monitor a particular process

I would like to write a kernel module in Linux that can monitor all the memory accesses made by a particular process(that I specify by name in the kernel module). I would also like to keep track of all the signals generated by the process and log all memory accesses that result in page faults, and memory accesses that cause a TRAP or a SEGV. How could I go about doing this? Could you point me towards any resources that could get me started off?
Well if you have never written a kernel module before this might be a great start:
https://web.archive.org/web/20180901094541/http://www.freesoftwaremagazine.com/articles/drivers_linux?page=0%2C2
From there you basically wan't to grab process information and output it, perhaps create some kind of /proc device..
But you should know this isn't really something you need kernel mode for. You could probably do this easily right from user space.

How to dump the heap of running C++ process to a file under Linux?

I've got a program that is running on a headless/embedded Linux box, and under certain circumstances that program seems to be using up quite a bit more memory (as reported by top, etc) than I would expect it to use.
Since the fault condition is difficult to reproduce outside of the actual working environment, and since the embedded box doesn't have niceties like valgrind or gdb installed, what I'd like to do is simply write out the process's heap-memory to a file, which I could then transfer to my development machine and look through at my leisure, to see if I can tell from the contents of the file what kind of data it is that is taking up the bulk of the heap. If I'm lucky there might be a smoking gun like a repeating string or magic-number that comes up a lot, that points me to the place in my code that is either leaking or perhaps just growing a data structure without bounds.
Is there a good way to do this? The only way I can think of would be to force the process to crash and then collect a core dump, but since the fault condition is rare it would be preferable if I could collect the information without crashing the process as a side effect.
You can read the entire memory space of the process via /proc/pid/mem; You can read /proc/pid/maps to see what is where in the memory space (so you can find the bounds of the heap and read just that). You can attempt to read the data while the process is running (in which case it might be changing while you are reading it), or you can stop the process with a SIGSTOP signal and later resume it with a SIGCONT.

3.10 kernel crash BUG() in mark_bootmem()

I get a kernel crash at BUG() here - http://lxr.free-electrons.com/source/mm/bootmem.c?v=3.10#L385 with the following message
2kernel BUG at /kernel/mm/bootmem.c:385!
What could be a possible reason for this?
Following is the function call trace
[<c0e165f8>] (mark_bootmem+0xd0/0xe0) from [<c0e05d64>] (bootmem_init+0x16c/0x26
[<c0e05d64>] (bootmem_init+0x16c/0x264) from [<c0e07980>] (paging_init+0x734/0x7
[<c0e07980>] (paging_init+0x734/0x7d4) from [<c0e03f20>] (setup_arch+0x3e8/0x69c
[<c0e03f20>] (setup_arch+0x3e8/0x69c) from [<c0e007d8>] (start_kernel+0x78/0x370
[<c0e007d8>] (start_kernel+0x78/0x370) from [<10008074>] (0x10008074)
Thanks
The mm/bootmem.c file is responsible for Boot Memory Allocator. Function mark_bootmem marks memory pages between start and end addresses (start is rounded down and end is rounded up to page boundaries) as reserved (or not reserved when used for freeing) for this allocator.
It iterates over bdata_list trying to find a region containing first page from requested address range. It it won't find it, the BUG() you mentioned will be triggered. The same BUG() will be triggered if it succeeds finding it, but the region is not large enough (end is outside of the region). So this BUG() means that it wasn't able to find requested memory region to mark.
Now if I understand the kernel code correctly, on normal UMA systems there will be only one entry in bdata_list and it should describe the range of lowmemory pages available in the system. Since you didn't provide too much information about your system it's hard to guess exact reason for the problem but in general, it seems that your memory setup is broken. This thing is very architecture specific so it's hard to tell what exactly is going on.

Minimal core dump (stack trace + current frame only)

Can I configure what goes into a core dump on Linux? I want to obtain something like the Windows mini-dumps (minimal information about the stack frame when the app crashed). I know you can set a max size for the core files using ulimit, but this does not allow me to control what goes inside the core (i.e. there is no guarantee that if I set the limit to 64kb it will dump the last 16 pages of the stack, for example).
Also, I would like to set it in a programmatic way (from code), if possible.
I have looked at the /proc/PID/coredump_filter file mentioned by man core, but it seems too coarse grained for my purposes.
To provide a little context: I need tiny core files, for multiple reasons: I need to collect them over the network, for numerous (thousands) of clients; furthermore, these are embedded devices with little SD cards, and GPRS modems for the network connection. So anything above ~200k is out of question.
EDIT: I am working on an embedded device which runs linux 2.6.24. The processor is PowerPC. Unfortunately, powerpc-linux is not supported in breakpad at the moment, so google breakpad is not an option
I have "solved" this issue in two ways:
I installed a signal handler for SIGSEGV, and used backtrace/backtrace_symbols to print out the stack trace. I compiled my code with -rdynamic, so even after stripping the debug info I still get a backtrace with meaningful names (while keeping the executable compact enough).
I stripped the debug info and put it in a separate file, which I will store somewhere safe, using strip; from there, I will use add22line with the info saved from the backtrace (addresses) to understand where the problem happened. This way I have to store only a few bytes.
Alternatively, I found I could use the /proc/self/coredump_filter to dump no memory (setting its content to "0"): only thread and proc info, registers, stacktrace etc. are saved in the core. See more in this answer
I still lose information that could be precious (global and local variable(s) content, params..). I could easily figure out which page(s) to dump, but unfortunately there is no way to specify a "dump-these-pages" for normal core dumps (unless you are willing to go and patch the maydump() function in the kernel).
For now, I'm quite happy with there 2 solutions (it is better than nothing..) My next moves will be:
see how difficult would be to port Breakpad to powerpc-linux: there are already powerpc-darwin and i386-linux so.. how hard can it be? :)
try to use google-coredumper to dump only a few pages around the current ESP (that should give me locals and parameters) and around "&some_global" (that should give me globals).

Resources