Disassemble ELF to asm and assemble to ELF again - linux

Let say I am running a box on x86 or x86_64 and have an executable/library in the Executable- and Linkable Format (ELF). Would it be possible to
generate an assembler listing from the ELF and then
assemble it back to an ELF again?
If yes,
are there any tools that do that (the first step)?
is it possible to produce an ELF that is bitwise identical to the original one? If not, how close could one be to the perfect case?

Related

Locate a function from ELF executable file

In a rather obscure use case, instead of loading the whole ELF executable file into memory, I'd like to load only the part of ELF file that contains a particular function. The difficulty I am facing is: I don't know how to locate where the code of this particular function is in the ELF file. If I had this piece of information, I would use it to load the disk sector(s) containing this part of ELF file into memory, and jump to it. But, being not very familiar with ELF file format and how ld works, I don't know how to get this piece of information. All the information I know is the function name (Just C function, no overload). Or, is this possible to find out the position of a function from headers of ELF file at all?
I would greatly appreciate it if you could give me some help to locate a particular function in ELF executable file. It would be perfect if I can know both its starting and ending position, but only the starting position is also fine. A reference to some reading materials towards this goal (if technically feasible at all) for self-study is also ok. The platform I am working on is Linux 20.04 with GNU development toolchain (the version of ld is 2.34) on x86 CPU (the ELF format is elf32−i386).
I realize this is a bit late, but for posterity:
Note that in the following text, most of the numbers are going to be in hexadecimal, since that's how they are output by binutils.
Code (usually?) resides in the .text section, so find out where that is in the ELF file: readelf -S <ELF file>
This should output section headers, the relevant output should look like this:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[13] .text PROGBITS 0000000000411000 00011000
0000000000465bac 0000000000000000 AX 0 0 4096
Note that the address is 411000 and the offset is 11000 in the above output, so the address is offset+400000.
Figure out the address of your symbol: nm -gC <ELF file> | grep <func name>
For function main, this got me (among other things): 0000000000411455 T main, which means that the function main is located at the address 411455.
Since readelf told us that offset=address-400000, the code for main should start at byte 11455 from the start of the file.
Also note to OP: AFAIK on modern computers most memory is marked as not executable, so if you just load code into memory and jump to it, it will likely crash. There are certainly ways to allocate executable memory, but they are probably a bit more complicated than calling malloc.

What is the significance of ".note.ABI-tag" section in ELF?

I see a .note.ABI-tag section when I objdump -h <binary> on a ELF file.
As per the ELF man page:
.note.ABI-tag
This section is used to declare the expected run-time ABI
of the ELF image. It may include the operating system name
and its run-time versions. This section is of type
SHT_NOTE. The only attribute used is SHF_ALLOC.
Is this section necessary?
What could be the side effects removing this section?
How to remove this section (a gcc flag) from ELF?
Could break the executable on some systems. It is supposed to give information on which kernel it's compatible with so if the binary's ABI are not compatible with the current kernel's ABI. See more information here https://refspecs.linuxfoundation.org/LSB_1.2.0/gLSB/noteabitag.html
However if your binary is not compiled for a specific kernel(not necessarily Linux as a lot of different targets use the ELF output) it doesn't matter and could just be cut if your goal is to reduce the executable's size. You should however be aware that it's already ignored if you are doing a objcopyfrom ELF to BIN.

ELF format: is ELF a subset of .o/.so or is ELF basically the entire .o/.so?

I'm currently doing some study on ELF format. I would like to confirm something I think is right.
ELF is a format, it stands for Executable and linkable format. In linux, everything is in ELF format.
When using gcc to compile a code with -c and -fPIC file, it transfers the code into a .o file with ELF format.
Is it correct if I say .o/.so and linux executables are ELF files? or is ELF something inside a .o/.so file? In other words, is ELF a subset of .o/.so or is ELF basically the entire .o/.so?
I would like to confirm this, because I'd like to make sure I understand this. Sorry for asking a stupid question.
Is it correct if I say .o/.so and linux executables are ELF files? or is ELF something inside a .o/.so file? In other words, is ELF a subset of .o/.so or is ELF basically the entire .o/.so?
Yes. Object files (.o), shared libraries (.so), and executables (.exe) are three of the four types of ELF files. (The fourth type is core files -- a dump of the state of a crashed process, sometimes used for post-mortem debugging.)
All four types use the same general format, but will have some differences specific to their type. For instance, an executable will typically have an entry point, whereas object files and shared libraries won't.

Is compiling ELF files with MSB flag possible in Linux

Is it possible to compile binary files with MSB endianness in GCC? If so, would they work correctly when executed?
Transferring comments into something resembling an answer.
In part, it depends on the CPU architecture. On SPARC or Power, you'd probably find MSB is the default. On Intel, which is definitely LSB by default, you probably can't.
On Intel. If I somehow altered every ELF header entry to MSB in a little-endian binary, would that execute properly?
No; you'd have to make lots of other changes to the code to give it the slightest chance of working. Basically, every number would have to be reworked from little-endian to big-endian.
Would that include instruction addresses, immediate parameters etc.? I would assume that regardless of the ELF flag, these should remain little-endian.
With just a modicum of further thought, I think there's a very chance high chance that the kernel for Intel will simply decline to execute the MSB ELF executable. It was compiled to expect LSB and knows it doesn't know how to deal with the alternative. To fix that, you'd have to rebuild the kernel and the dynamic loader, ld.so.1. And that's probably just the start of your problems.
On the whole, I would regard this as an exercise in futility. In the absence of information to the contrary, I don't think you need to worry about handling the headers of MSB ELF headers for Intel binaries; they won't exist in practice.
It do not think it is explicitely stated in the System V ABI but AFAIK, the ELF file is expected to be in native endianness (and e_ident[EI_DATA] describes the endianess used):
Byte e_ident[EI_DATA] specifies the data encoding of the
processor-specific data in the object file. The following encodings
are currently defined.
You might expect the processor-specific data to be in the processor endianness. For example, the content of the .got is processor-specific data and you definitely want it to be in native endianness.
On Intel computers, you have to use the ELFDATA2LSB.
From the System V ABI ~ Intel386 Supplement 4th edition:
For file identification in e_ident, the Intel386 architecture
requires the following values.
e_ident[EI_CLASS] = ELFCLASS32
e_ident[EI_DATA] = ELFDATA2LSB
From the System V ABI ~ AMD64 supplement Draft 0.99.6:
For file identification in e_ident, the AMD64 architecture
requires the following values.
e_ident[EI_CLASS] = ELFCLASS64
e_ident[EI_DATA] = ELFDATA2LSB

Intel binary to ELF

Really quick question here. I'm working in Ubuntu, I have a simple "Hello World!" program in assembly which I have assembled into x86 assembly. Now I want to turn that machine code into an ELF executable which my computer can run. I am aware that I could just assemble directly to ELF, the purpose of my inquiry is to discover how to make ELF binaries out of assembled machine code.
Thanks!
Final ELF executable files are typically built out of other ELF files, reorganized by the linker. The easiest way, of course, would be to specify ELF as the output format of your assembler.
1) If you really want to do this, you could start with an "empty" ELF file (that you get from compiling or assembling nothing, etc.). Then you could use objcopy --add-section, which allows you to add an arbitrary file as a section in an existing ELF file.
This will create a minimal ELF file:
$ echo "" | gcc -c -o empty.out -xc -
2) Alternatively, you could include your raw binary into another assembly file using something like nasm's incbin, which would then need to be assembled as an ELF.
3) A third option (the best so far) would be to provide your raw binary to the linker, and use a custom linker script to tell it what section to put it in (determined from the input file name). The -b flag before an input file will tell ld what type of file it is. This should let you use your flat binary file.
One of the first obstacles you're going to face is getting the entry point to point to your code. Off the top of my head I'm not sure how to edit that.
There is a Python library, pyelftools that may help you in your quest.
If it's really assembled, then it's already in the ELF format (compilers targeting Linux generally store the object code in ELF object files as well).
However, if you want a fully-functioning executable, you have to feed the object file to a linker.

Resources