Intel binary to ELF - linux

Really quick question here. I'm working in Ubuntu, I have a simple "Hello World!" program in assembly which I have assembled into x86 assembly. Now I want to turn that machine code into an ELF executable which my computer can run. I am aware that I could just assemble directly to ELF, the purpose of my inquiry is to discover how to make ELF binaries out of assembled machine code.
Thanks!

Final ELF executable files are typically built out of other ELF files, reorganized by the linker. The easiest way, of course, would be to specify ELF as the output format of your assembler.
1) If you really want to do this, you could start with an "empty" ELF file (that you get from compiling or assembling nothing, etc.). Then you could use objcopy --add-section, which allows you to add an arbitrary file as a section in an existing ELF file.
This will create a minimal ELF file:
$ echo "" | gcc -c -o empty.out -xc -
2) Alternatively, you could include your raw binary into another assembly file using something like nasm's incbin, which would then need to be assembled as an ELF.
3) A third option (the best so far) would be to provide your raw binary to the linker, and use a custom linker script to tell it what section to put it in (determined from the input file name). The -b flag before an input file will tell ld what type of file it is. This should let you use your flat binary file.
One of the first obstacles you're going to face is getting the entry point to point to your code. Off the top of my head I'm not sure how to edit that.
There is a Python library, pyelftools that may help you in your quest.

If it's really assembled, then it's already in the ELF format (compilers targeting Linux generally store the object code in ELF object files as well).
However, if you want a fully-functioning executable, you have to feed the object file to a linker.

Related

How does file(1) utility discern from an ELF shared object and an ELF executable?

The problem is ELF shared libraries and normal executables don't differ at all if you look at their ELF headers. On my Linux machine(Debian 11.4) even e_type field of the ELF header is set to shared object file as ht utility reports even when the file in consideration is an ELF executable. It looked like the only easy and reliable method to differ ELF executable files from ELF shared libraries, and for some reason GCC fills out e_type field only with this value. Nevertheless, file(1) utility can accurately tell me if the input file I give to her is an executable or shared library.
The first answer to this question suggests that the file(1)'s code should look for PT_INTERP program header:
distinguish shared objects from position independent executables
This sounds reasonable because all shared libraries get loaded after the executable file is loaded in first place, so they don't need to load interpreter one more time because a normal executable would already have done it.
Also I found this in file(1) source code when I was looking how magic.mgc file is compiled:
0 name elf-le
>16 leshort 0 no file type,
!:mime application/octet-stream
>16 leshort 1 relocatable,
!:mime application/x-object
>16 leshort 2 executable,
!:mime application/x-executable
>16 leshort 3 ${x?pie executable:shared object},
and I cannot understand what the last line means but it seems I can find an answer to my question if I understand this.
>16 leshort 3 ${x?pie executable:shared object},
This means "look at a 2-byte little endian word at offset 16 of the file. If it has the value '3' then check if the file is executable or not (permissions bit). If it is, the type is 'pie executable', otherwise it is 'shared object'"
You can look at the magic(5) man page for info about this syntax.

Locate a function from ELF executable file

In a rather obscure use case, instead of loading the whole ELF executable file into memory, I'd like to load only the part of ELF file that contains a particular function. The difficulty I am facing is: I don't know how to locate where the code of this particular function is in the ELF file. If I had this piece of information, I would use it to load the disk sector(s) containing this part of ELF file into memory, and jump to it. But, being not very familiar with ELF file format and how ld works, I don't know how to get this piece of information. All the information I know is the function name (Just C function, no overload). Or, is this possible to find out the position of a function from headers of ELF file at all?
I would greatly appreciate it if you could give me some help to locate a particular function in ELF executable file. It would be perfect if I can know both its starting and ending position, but only the starting position is also fine. A reference to some reading materials towards this goal (if technically feasible at all) for self-study is also ok. The platform I am working on is Linux 20.04 with GNU development toolchain (the version of ld is 2.34) on x86 CPU (the ELF format is elf32−i386).
I realize this is a bit late, but for posterity:
Note that in the following text, most of the numbers are going to be in hexadecimal, since that's how they are output by binutils.
Code (usually?) resides in the .text section, so find out where that is in the ELF file: readelf -S <ELF file>
This should output section headers, the relevant output should look like this:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[13] .text PROGBITS 0000000000411000 00011000
0000000000465bac 0000000000000000 AX 0 0 4096
Note that the address is 411000 and the offset is 11000 in the above output, so the address is offset+400000.
Figure out the address of your symbol: nm -gC <ELF file> | grep <func name>
For function main, this got me (among other things): 0000000000411455 T main, which means that the function main is located at the address 411455.
Since readelf told us that offset=address-400000, the code for main should start at byte 11455 from the start of the file.
Also note to OP: AFAIK on modern computers most memory is marked as not executable, so if you just load code into memory and jump to it, it will likely crash. There are certainly ways to allocate executable memory, but they are probably a bit more complicated than calling malloc.

ELF format: is ELF a subset of .o/.so or is ELF basically the entire .o/.so?

I'm currently doing some study on ELF format. I would like to confirm something I think is right.
ELF is a format, it stands for Executable and linkable format. In linux, everything is in ELF format.
When using gcc to compile a code with -c and -fPIC file, it transfers the code into a .o file with ELF format.
Is it correct if I say .o/.so and linux executables are ELF files? or is ELF something inside a .o/.so file? In other words, is ELF a subset of .o/.so or is ELF basically the entire .o/.so?
I would like to confirm this, because I'd like to make sure I understand this. Sorry for asking a stupid question.
Is it correct if I say .o/.so and linux executables are ELF files? or is ELF something inside a .o/.so file? In other words, is ELF a subset of .o/.so or is ELF basically the entire .o/.so?
Yes. Object files (.o), shared libraries (.so), and executables (.exe) are three of the four types of ELF files. (The fourth type is core files -- a dump of the state of a crashed process, sometimes used for post-mortem debugging.)
All four types use the same general format, but will have some differences specific to their type. For instance, an executable will typically have an entry point, whereas object files and shared libraries won't.

Keeping type definitions and some symbols in an elf file

Starting from an elf file that contains all information needed to fully debug my application, I would like to make an elf that contains only some symbols.
I managed to do this with GNU binutils strip tool :
strip -F elf32-big -p -s -K myFunc1-K myFunc2 -K myVar1 -K myVar2 myApp.elf
My concern here is that myVar1 and myVar2 are structured variables and the debugger cannot dig into them because 'strip' removed the .debug_info section from the elf (.debug_info is where the structure definitions are stored, I understood).
Ideally, I would keep in the elf only whats necessary for the debugger to parse my variables. I played with the options of 'strip'. I played with other binutils (readelf, objcopy, objdump...) after I read this thread. But it gave nothing satisfactory.
How would you do that?
I don't know of a tool that already does what you want.
If I was asked to do this, first I would push back. Rather than trying to strip parts of the debuginfo, I would wonder why we couldn't use the existing split debuginfo approach. Coupled with build-ids this has the nice property that one can ship stripped executables but still get full debugging when needed -- just by pointing gdb to the debuginfo ELFs.
That said, if I did have to write it, I would say, first define exactly what you want to keep. Then, write a program to read the DWARF (say, using the elfutils libraries for this) and then write out new DWARF with just the desired information.
This is not extremely hard (see the "dwz" tool for an example of a DWARF manipulator...) but also not all that easy, either.

Can nasm generate debug symbol to binary file?

I have a binary file made with nasm -f which I want to do some debugging, or close enough. So far I know, nasm doesn't generate proper symbols for debugging to a binary file, right? which approach could I use to e.g, see each value passed on register/memory a time? I have an "array" in a assembly program that I want to see each value of. Is there any tool would help to perform this task?
If you are on linux, you should use nasm -f elf -F dwarf to get debug information, and make sure you are not stripping them during linking.
Also, to see register or memory contents you don't need debug info.

Resources