How does file(1) utility discern from an ELF shared object and an ELF executable? - linux

The problem is ELF shared libraries and normal executables don't differ at all if you look at their ELF headers. On my Linux machine(Debian 11.4) even e_type field of the ELF header is set to shared object file as ht utility reports even when the file in consideration is an ELF executable. It looked like the only easy and reliable method to differ ELF executable files from ELF shared libraries, and for some reason GCC fills out e_type field only with this value. Nevertheless, file(1) utility can accurately tell me if the input file I give to her is an executable or shared library.
The first answer to this question suggests that the file(1)'s code should look for PT_INTERP program header:
distinguish shared objects from position independent executables
This sounds reasonable because all shared libraries get loaded after the executable file is loaded in first place, so they don't need to load interpreter one more time because a normal executable would already have done it.
Also I found this in file(1) source code when I was looking how magic.mgc file is compiled:
0 name elf-le
>16 leshort 0 no file type,
!:mime application/octet-stream
>16 leshort 1 relocatable,
!:mime application/x-object
>16 leshort 2 executable,
!:mime application/x-executable
>16 leshort 3 ${x?pie executable:shared object},
and I cannot understand what the last line means but it seems I can find an answer to my question if I understand this.

>16 leshort 3 ${x?pie executable:shared object},
This means "look at a 2-byte little endian word at offset 16 of the file. If it has the value '3' then check if the file is executable or not (permissions bit). If it is, the type is 'pie executable', otherwise it is 'shared object'"
You can look at the magic(5) man page for info about this syntax.

Related

e_machine field in elf header

In the elf header, there's 'e-machine' field. So my question is does it only specify the processor architecture the file can run on or it specifies the processor architecture that was used to make the elf file?
I have done some research and I've found that it specifies the architecture required for the file
The job of ELF is to describe the executable, not where it came from. (that information would basically be useless; why would you care?)

Locate a function from ELF executable file

In a rather obscure use case, instead of loading the whole ELF executable file into memory, I'd like to load only the part of ELF file that contains a particular function. The difficulty I am facing is: I don't know how to locate where the code of this particular function is in the ELF file. If I had this piece of information, I would use it to load the disk sector(s) containing this part of ELF file into memory, and jump to it. But, being not very familiar with ELF file format and how ld works, I don't know how to get this piece of information. All the information I know is the function name (Just C function, no overload). Or, is this possible to find out the position of a function from headers of ELF file at all?
I would greatly appreciate it if you could give me some help to locate a particular function in ELF executable file. It would be perfect if I can know both its starting and ending position, but only the starting position is also fine. A reference to some reading materials towards this goal (if technically feasible at all) for self-study is also ok. The platform I am working on is Linux 20.04 with GNU development toolchain (the version of ld is 2.34) on x86 CPU (the ELF format is elf32−i386).
I realize this is a bit late, but for posterity:
Note that in the following text, most of the numbers are going to be in hexadecimal, since that's how they are output by binutils.
Code (usually?) resides in the .text section, so find out where that is in the ELF file: readelf -S <ELF file>
This should output section headers, the relevant output should look like this:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[13] .text PROGBITS 0000000000411000 00011000
0000000000465bac 0000000000000000 AX 0 0 4096
Note that the address is 411000 and the offset is 11000 in the above output, so the address is offset+400000.
Figure out the address of your symbol: nm -gC <ELF file> | grep <func name>
For function main, this got me (among other things): 0000000000411455 T main, which means that the function main is located at the address 411455.
Since readelf told us that offset=address-400000, the code for main should start at byte 11455 from the start of the file.
Also note to OP: AFAIK on modern computers most memory is marked as not executable, so if you just load code into memory and jump to it, it will likely crash. There are certainly ways to allocate executable memory, but they are probably a bit more complicated than calling malloc.

ELF format: is ELF a subset of .o/.so or is ELF basically the entire .o/.so?

I'm currently doing some study on ELF format. I would like to confirm something I think is right.
ELF is a format, it stands for Executable and linkable format. In linux, everything is in ELF format.
When using gcc to compile a code with -c and -fPIC file, it transfers the code into a .o file with ELF format.
Is it correct if I say .o/.so and linux executables are ELF files? or is ELF something inside a .o/.so file? In other words, is ELF a subset of .o/.so or is ELF basically the entire .o/.so?
I would like to confirm this, because I'd like to make sure I understand this. Sorry for asking a stupid question.
Is it correct if I say .o/.so and linux executables are ELF files? or is ELF something inside a .o/.so file? In other words, is ELF a subset of .o/.so or is ELF basically the entire .o/.so?
Yes. Object files (.o), shared libraries (.so), and executables (.exe) are three of the four types of ELF files. (The fourth type is core files -- a dump of the state of a crashed process, sometimes used for post-mortem debugging.)
All four types use the same general format, but will have some differences specific to their type. For instance, an executable will typically have an entry point, whereas object files and shared libraries won't.

How does the Linux kernel determine ld.so's load address?

I know that the dynamic linker uses mmap() to load libraries. I guess it is the kernel who loads both the executable and its .interpreter into the same address space, but how does it determine where? I noticed that ld.so's load address with ASLR disabled is 0x555555554000 (on x86_64) — where does this address come from? I tried following do_execve()'s code path, but it is too ramified for me not to be confused as hell.
Read more about ELF, in particular elf(5), and about the execve(2) syscall.
An ELF file may contain an interpreter. elf(5) mentions:
PT_INTERP The array element specifies the location and
size of a null-terminated pathname to invoke
as an interpreter. This segment type is
meaningful only for executable files (though
it may occur for shared objects). However it
may not occur more than once in a file. If
it is present, it must precede any loadable
segment entry.
That interpreter is practically almost always ld-linux(8) (e.g. with GNU glibc), more precisely (on my Debian/Sid) /lib64/ld-linux-x86-64.so.2. If you compile musl-libc then build some software with it you'll get a different interpreter, /lib/ld-musl-x86_64.so.1. That ELF interpreter is the dynamic linker.
The execve(2) syscall is using that interpreter:
If the executable is a dynamically linked ELF executable, the
interpreter named in the PT_INTERP segment is used to load the needed
shared libraries. This interpreter is typically /lib/ld-linux.so.2
for binaries linked with glibc.
See also Levine's book on Linkers and loaders, and Drepper's paper: How To Write Shared Libraries
Notice that execve is also handling the shebang (i.e. first line starting with #!); see the Interpreter scripts section of execve(2). BTW, for ELF binaries, execve is doing the equivalent of mmap(2) on some segments.
Read also about vdso(7), proc(5) & ASLR. Type cat /proc/self/maps in your shell.
(I guess, but I am not sure, that the 0x555555554000 address is in the ELF program header of your executable, or perhaps of ld-linux.so; it might also come from the kernel, since 0x55555555 seems to appear in the kernel source code)

Intel binary to ELF

Really quick question here. I'm working in Ubuntu, I have a simple "Hello World!" program in assembly which I have assembled into x86 assembly. Now I want to turn that machine code into an ELF executable which my computer can run. I am aware that I could just assemble directly to ELF, the purpose of my inquiry is to discover how to make ELF binaries out of assembled machine code.
Thanks!
Final ELF executable files are typically built out of other ELF files, reorganized by the linker. The easiest way, of course, would be to specify ELF as the output format of your assembler.
1) If you really want to do this, you could start with an "empty" ELF file (that you get from compiling or assembling nothing, etc.). Then you could use objcopy --add-section, which allows you to add an arbitrary file as a section in an existing ELF file.
This will create a minimal ELF file:
$ echo "" | gcc -c -o empty.out -xc -
2) Alternatively, you could include your raw binary into another assembly file using something like nasm's incbin, which would then need to be assembled as an ELF.
3) A third option (the best so far) would be to provide your raw binary to the linker, and use a custom linker script to tell it what section to put it in (determined from the input file name). The -b flag before an input file will tell ld what type of file it is. This should let you use your flat binary file.
One of the first obstacles you're going to face is getting the entry point to point to your code. Off the top of my head I'm not sure how to edit that.
There is a Python library, pyelftools that may help you in your quest.
If it's really assembled, then it's already in the ELF format (compilers targeting Linux generally store the object code in ELF object files as well).
However, if you want a fully-functioning executable, you have to feed the object file to a linker.

Resources