Core-dump file format - linux

I have written a custom core-dump handling application for a project. I have changed '/proc/sys/kernel/core_pattern' to call my dump-handler and its invoked successfully.
Now the issue is saving the core-dump into a file that can be recognized by gdb. Currently my dump-handler read the dump from STDIN and save it into a file 'core.dump'. When I try to load this core dump into gdb it gives me error:
(gdb) ... is not a core dump: File format not recognized
When I run 'file' command on a standard core dump it give me following:
core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from './dump_gen'
And for custom generated dump, 'file' gives following:
core.dump: data
Please can anyone help me how to write core-dump correctly so it can be used in gdb.
PS: I don't want to use standard core dump file.

I think you somehow don't write all the data to the core file.
Create a simple script, make it executable and set the core pattern to the script.
#!/bin/sh
cat > /tmp/core.$$
Now generate a core file (for example run sleep 1243 and press ctrl+\) and it should work.
I just tested it myself on my system and it works without a problem.

The first thing to check that comes to mind is the Elf header flag that indicates what kind of file it is. It has four values - shared object, unlinked object, executable and core dump. That's most likely what's causing gdb errors.
Also, try examining it with objdump - it can pull apart the entire ELF file for analysis what part of it is apparently not good.
You can find the ELF spec at https://refspecs.linuxbase.org/elf/elf.pdf

Related

GDB - Loading Debug information from external ".sym" files

I am attempting to carry out postmortem analysis of a crashed binary, "TestApp", on a linux system.
I have a copy of the binaries and and shared objects that are copied onto the device in a path:
/usr/public/target
This folder contains all the binaries in question in the directory structure used on the system under test, ie:
/usr/public/target/sbin/TestApp
/usr/public/target/lib/TestAppLib.so
/usr/public/target/usr/lib/TestAppAPILib.so
The automated build process strips the debug information from the binaries, and stores them in external, symbol files, all under:
/usr/public/target_external_symbols
So the symbol information for the above binaries would exist in files named:
/usr/public/target_external_symbols/sbin/TestApp.sym
/usr/public/target_external_symbols/lib/TestAppLib.so.sym
/usr/public/target_external_symbols/usr/lib/TestAppAPILib.so.sym
How do I get GDB to be aware of the existence of these external symbols and to load them?
I typically invoke GDB via:
gdb TestApp TestApp.core
I've referred to other articles on creating a test .gdbinit file and passing it to GDB via the -command argument, but it doesn't appear to work. Every time I attempt to get a backtrace from my core file, I get an indication from GDB that it cannot open the debug symbols. Any help in resolving this is appreciated.
(gdb) info shared
From To Syms Read Shared Object Library
0x78000000 0x780061e8 Yes (*) /usr/public/target/lib/TestAppLib.so
0x78010000 0x7806e60c Yes (*) /usr/public/target/usr/lib/TestAppAPILib.so
0x78070000 0x78091d2c Yes (*) /usr/public/target/lib/libm.so.2
(*): Shared library is missing debugging information.
Thank you.
There are commands set debug-file-directory, symbol-file and add-symbol-file to load debugging symbols from within a gdb session. The latter one might require the address where the shared library was loaded into memory.
Maybe during your build process 'gnu debuglinks' have been added to your binaries. This would mean that in the executables there's a path coded that directs gdb where to look for debug symbols. More can be found here.

Find which program caused a core dump file

I've been going through intense program/package installation recently, so I can't tell for sure which of the newly installed programs (or old programs) caused the appearance of a core file in my home folder. It's a server, so I better find out any possible sources of instability on the machine.
You can simply use the file program to identify them:
E.g
# file /var/core/core
/var/core/core: ELF 64-bit MSB core file SPARCV9 Version 1, from 'crs_stat.bin'
Often using the file program on the core file will show the errant executable, as explained by #Benj in the accepted answer (code from Benj's answer):
# file /var/core/core
/var/core/core: ELF 64-bit MSB core file SPARCV9 Version 1, from 'crs_stat.bin'
However, sometimes you may get a complaint about "too many program header sections":
core.some-lib.nnnn.nnnn: ELF 64-bit LSB core file x86-64, version 1 (SYSV), too many program header sections (1850)
In this case, you can try some alternatives:
Tail the last several strings of the corefile (the app was about 25 back for me): strings core.some-lib.nnnn.nnnn | tail -50
Use gdb itself: gdb -c core.some-lib.nnnn.nnnn This will often tell you something like this: Core was generated by '/usr/local/bin/some-executable'
you can navigate to the directory where the core.pid is and run gdb core core.pid

How can I do core dump analysis for a production application in UNIX/Linux?

I have come across an option to do core dump analysis by using GDB - it mentions that I need to build the executable with special command line parameters to include merging of symbols information.
But it increases the executable size, and I am guessing that it will slow down an application.
Can someone please advice if there is another method to do core dump analysis without effecting performance of an application?
Debug symbols will not slow down the application. You can work with split debug symbols as follows.
gcc -ggdb -o target obj1.o obj2.o ...
strip target --only-keep-debug -o target.dbg
strip target
Then in gdb, use symbol-file target.dbg

linux 'cannot execute binary file' on every executable I compile, chmod 777 doesn't help

I am running red had linux 7.3 (old, I know), and for the past few months I've been learning assembly programming, writing small programs and compiling with nasm. For months, things have been going fine, and now for some unknown reason, I cannot execute any programs that I compile.
nasm file.s //used to work just fine, then I'd execute ./file
now, when I run ./file, first I get "permission denied", which never used to happen before. then, once i chmod +777 file, I get "cannot execute binary file".
I have NO IDEA why this is happening, but it is extremely frustrating since NOTHING I compile will run anymore.
Logging in as root doesn't change anything.
All suggestions are welcome, THANK YOU!!
nasm does not produce an executable, but just an object file (like gcc -c would). You still need to run the linker on it.
N.B.: “0777 is almost always wrong.”
Run the file command on your binaries and make sure they're identified correctly as executables.
Also try the ldd command. It will very likely fail for the exact same reason, but it's worth a shot.
This can happen if the file system you operate on is mounted with the noexec option. You could check that by doing mount | grep noexec and see if your current working directory suffers from that.
"Cannot execute binary file" is the strerror(3) message for the error code ENOEXEC. That has a very specific meaning: (quoting the manpage for execve(2))
[ENOEXEC] The new process file has the appropriate access
permission, but has an unrecognized format
(e.g., an invalid magic number in its header).
So what that means is, your nasm invocation is not producing an executable, but rather something else. As John Kugelman suggests, the file command will tell you what it is (user502515 is very likely to be right that it's an unlinked object file, but I have never used nasm myself so I don't know).
BTW, you'll do yourself a favor if you learn GAS/"AT&T" assembly syntax now, rather than when you need to rewrite your assembly code for an architecture that doesn't do Intel bizarro-world syntax. And I do hope you're using assembly only for inner-loop subroutines that actually need to be hand-optimized.
This just happened to me. After running
file <executable name>
it output <file name> ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped
And the problem was that I was trying to run a 64 bit app on a 32 bit machine!
You may try looking into /var/log for some change in the system from this start to happen.

Analyzing a Xen core dump

After a Xen guest domain hang, I took a dump using xm core-dump . Following the sparse documentation I found, I tried using the crash utility to analyze the dump.
Unfortunately, the kernel image (Debian lenny) is stripped, so I am forced to make use of the map file.
However,
crash
/boot/System.map-2.6.26-2-xen-amd64
vmlinux-2.6.26-2-xen-amd64
/mnt/my-core-file
(with vmlinux-2.6.26-2-xen-amd64 being the gunzip'ed vmlinuz image) fails:
crash: vmlinux-2.6.26-2-xen-amd64: no
debugging data available
Then I read that current Xen versions produce ELF-compatible dumps for guest domains. Indeed, this seems to be the case:
~$ sudo file my-core-dump
my-core-dump: ELF 64-bit LSB core file x86-64, version 1
However, gdb vmlinux-2.6.26-2-xen-amd64 my-core-dump fails, too:
...is not a core dump: File format not
recognized
Any hints?
Have you tried attaching to the domU console ?
xm create domU.conf -c
On the subject of the core-dump file, I found this:
http://lists.xensource.com/archives/html/xen-devel/2006-12/msg00456.html
I just want to check that you aren't under the impression that 'xm
dump-core' emits an Elf core file. It doesn't -- the format is custom and as
far as I know is only interpreted by a set of gdbserver patches that we ship
in our repository. Does the crash utility really support this special
format?
Edit: This might help to debug the core-dump: http://os-drive.com/files/docbook/xen-faq.html#setup_gdb

Resources