Keeping type definitions and some symbols in an elf file - struct

Starting from an elf file that contains all information needed to fully debug my application, I would like to make an elf that contains only some symbols.
I managed to do this with GNU binutils strip tool :
strip -F elf32-big -p -s -K myFunc1-K myFunc2 -K myVar1 -K myVar2 myApp.elf
My concern here is that myVar1 and myVar2 are structured variables and the debugger cannot dig into them because 'strip' removed the .debug_info section from the elf (.debug_info is where the structure definitions are stored, I understood).
Ideally, I would keep in the elf only whats necessary for the debugger to parse my variables. I played with the options of 'strip'. I played with other binutils (readelf, objcopy, objdump...) after I read this thread. But it gave nothing satisfactory.
How would you do that?

I don't know of a tool that already does what you want.
If I was asked to do this, first I would push back. Rather than trying to strip parts of the debuginfo, I would wonder why we couldn't use the existing split debuginfo approach. Coupled with build-ids this has the nice property that one can ship stripped executables but still get full debugging when needed -- just by pointing gdb to the debuginfo ELFs.
That said, if I did have to write it, I would say, first define exactly what you want to keep. Then, write a program to read the DWARF (say, using the elfutils libraries for this) and then write out new DWARF with just the desired information.
This is not extremely hard (see the "dwz" tool for an example of a DWARF manipulator...) but also not all that easy, either.

Related

How addr2line can locate the source file and the line of code?

addr2line translates addresses into file names and line numbers. I am still beginner in debugging, and have some questions about addr2line.
If am debugging a certain .so (binary) file, how the tool can locate
its source code file (from where can get it!), what if the source doesn't exist?
What is the relation between the address in a binary and the line
number in its source, so addr2line can do this kind of mapping?
In general, addr2line works best on ELF executables or shared libraries with debug information. That debug information is emitted by the compiler when you pass -g (or -g2, etc...) to GCC. It notably provides a mapping between source code location (name of source file, line number, column number) and functions, variable names, call stack frame organization, etc etc... The debug information is today in DWARF format (and is also processed by the gdb debugger, the libbacktrace library, etc etc...). Notice that the debug information contains source file paths (not the source file itself).
In practice, you can (and often should) pass the -g (or -g2) debugging option to GCC even with optimization flags like -O2. In that case, the debug information is slightly less precise but still practically usable. In some cases, stack frames may disappear (inlined function calls, tail call optimizations, ....).
You could use the strip(1) utility to remove debug information (and other symbol tables, etc) from some ELF executable.

Can nasm generate debug symbol to binary file?

I have a binary file made with nasm -f which I want to do some debugging, or close enough. So far I know, nasm doesn't generate proper symbols for debugging to a binary file, right? which approach could I use to e.g, see each value passed on register/memory a time? I have an "array" in a assembly program that I want to see each value of. Is there any tool would help to perform this task?
If you are on linux, you should use nasm -f elf -F dwarf to get debug information, and make sure you are not stripping them during linking.
Also, to see register or memory contents you don't need debug info.

ptrace: get imagebase of tracee?

I am on ubuntu 13.10 and have this little stripped+packed elf file. I need to dump various pieces of information from its process in an automated way, so i hacked together a tiny tracer that traces my progress, similar to strace. Three questions arose:
1) after attaching to my process, how can i get it's imagebase?
2) where does the process break first? Apparently it is not the EP of the program.
3) any way i can be notified when a .so/.lib file is loaded? GDB can do this somehow, i think.
The first question really is the most important one. Any help is appreciated.
1) /proc/<PID>/maps contains list of everything the process mapped and from where, including pages mapped from an executable. By reading executable ELF headers you should be able to figure out where .text is.
2) Execution of dynamically linked binary typically starts with an interpreter. INTERP program header in an ELF executable (dump with readelf -e) will have its name. It's interpreter's entry point where execution starts. Typically it's a runtime linker ld-<some-variant>.so. It maps in executable's sections and may also map required shared libraries.
3) GDB has fairly detailed knowledge how runtime linker is implemented so it's able to intercept dynamic object loading by setting breakpoints in the right places. You can do the same. dlopen() seems like a good candidate for an interception point. As I noted in #2, shared objects may have been pre-loaded before the executable gets control.

Intel binary to ELF

Really quick question here. I'm working in Ubuntu, I have a simple "Hello World!" program in assembly which I have assembled into x86 assembly. Now I want to turn that machine code into an ELF executable which my computer can run. I am aware that I could just assemble directly to ELF, the purpose of my inquiry is to discover how to make ELF binaries out of assembled machine code.
Thanks!
Final ELF executable files are typically built out of other ELF files, reorganized by the linker. The easiest way, of course, would be to specify ELF as the output format of your assembler.
1) If you really want to do this, you could start with an "empty" ELF file (that you get from compiling or assembling nothing, etc.). Then you could use objcopy --add-section, which allows you to add an arbitrary file as a section in an existing ELF file.
This will create a minimal ELF file:
$ echo "" | gcc -c -o empty.out -xc -
2) Alternatively, you could include your raw binary into another assembly file using something like nasm's incbin, which would then need to be assembled as an ELF.
3) A third option (the best so far) would be to provide your raw binary to the linker, and use a custom linker script to tell it what section to put it in (determined from the input file name). The -b flag before an input file will tell ld what type of file it is. This should let you use your flat binary file.
One of the first obstacles you're going to face is getting the entry point to point to your code. Off the top of my head I'm not sure how to edit that.
There is a Python library, pyelftools that may help you in your quest.
If it's really assembled, then it's already in the ELF format (compilers targeting Linux generally store the object code in ELF object files as well).
However, if you want a fully-functioning executable, you have to feed the object file to a linker.

are ".o" files "loadable"?

I have been reading John R. Levine's Linkers and Loaders and I read that the properties of an object file will include one or more of the following.
file should be linkable
file should be loadable
file should be executable
Now, considering this example:
#include<stdio.h>
int main() {
printf("testing\n");
return 0;
}
Which I would compile and link with:
$ gcc -c t.c
$ gcc -o t t.o
I tried inspecting t.o using objdump and its type shows up as REL. What all properties does t.o satisfy? I believe that its linkable, non-executable. I would have believed that its not loadable(unless you create an .so file from the .o file); however the type REL means that its supposed to be relocated, and relocation would occur only in the context of loading, so I'm having a confusion here.
My doubts summarized :-
Are ".o" files loadable?
Reading resources regarding the sections present in a ".o", ".so" file - differences etc?
An object file (i.e., a file with the .o extension) is not loadable. This is because it lacks critical information about how to resolve all the symbols within it: in this case, the println symbol in particular would need additional information. (C compilers do not bind library identities into the object files they create, which is occasionally even useful.)
When you link the object file into a shared library (.so), you are adding that binding. Typically, you're also grouping a number of object files together and resolving references between them (plus a few more esoteric things). That then makes the result possible to load, since the loader can then just do resolution of references and loading of dependencies that it doesn't already know about.
Going from there to executable is typically just a matter of adding on the OS-defined program bootstrap. This is a small piece of code that the OS will start the program running by calling, and it typically works by loading the rest of the program and dependencies and then calling main() with information about the arguments. (It's also responsible for exiting cleanly if main returns.)
Just to set the context this link states somethings similar (emphasis for readability only);
A file may be linkable, used as input by a link editor or linking
loader. It my be executable, capable of being loaded into
memory and run as a program, loadable, capable of being loaded
into memory as a library along with a program, or any combination of
the three.
A .o file is a linker object file, which is according to this definition not executable and definitely linkable. Loadable is a tougher call, but since .o files are not loadable without some definitely not cross platform trickery, I'd say the spirit is that it's not loadable.

Resources