Locate a function from ELF executable file - linux

In a rather obscure use case, instead of loading the whole ELF executable file into memory, I'd like to load only the part of ELF file that contains a particular function. The difficulty I am facing is: I don't know how to locate where the code of this particular function is in the ELF file. If I had this piece of information, I would use it to load the disk sector(s) containing this part of ELF file into memory, and jump to it. But, being not very familiar with ELF file format and how ld works, I don't know how to get this piece of information. All the information I know is the function name (Just C function, no overload). Or, is this possible to find out the position of a function from headers of ELF file at all?
I would greatly appreciate it if you could give me some help to locate a particular function in ELF executable file. It would be perfect if I can know both its starting and ending position, but only the starting position is also fine. A reference to some reading materials towards this goal (if technically feasible at all) for self-study is also ok. The platform I am working on is Linux 20.04 with GNU development toolchain (the version of ld is 2.34) on x86 CPU (the ELF format is elf32−i386).

I realize this is a bit late, but for posterity:
Note that in the following text, most of the numbers are going to be in hexadecimal, since that's how they are output by binutils.
Code (usually?) resides in the .text section, so find out where that is in the ELF file: readelf -S <ELF file>
This should output section headers, the relevant output should look like this:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[13] .text PROGBITS 0000000000411000 00011000
0000000000465bac 0000000000000000 AX 0 0 4096
Note that the address is 411000 and the offset is 11000 in the above output, so the address is offset+400000.
Figure out the address of your symbol: nm -gC <ELF file> | grep <func name>
For function main, this got me (among other things): 0000000000411455 T main, which means that the function main is located at the address 411455.
Since readelf told us that offset=address-400000, the code for main should start at byte 11455 from the start of the file.
Also note to OP: AFAIK on modern computers most memory is marked as not executable, so if you just load code into memory and jump to it, it will likely crash. There are certainly ways to allocate executable memory, but they are probably a bit more complicated than calling malloc.

Related

How does file(1) utility discern from an ELF shared object and an ELF executable?

The problem is ELF shared libraries and normal executables don't differ at all if you look at their ELF headers. On my Linux machine(Debian 11.4) even e_type field of the ELF header is set to shared object file as ht utility reports even when the file in consideration is an ELF executable. It looked like the only easy and reliable method to differ ELF executable files from ELF shared libraries, and for some reason GCC fills out e_type field only with this value. Nevertheless, file(1) utility can accurately tell me if the input file I give to her is an executable or shared library.
The first answer to this question suggests that the file(1)'s code should look for PT_INTERP program header:
distinguish shared objects from position independent executables
This sounds reasonable because all shared libraries get loaded after the executable file is loaded in first place, so they don't need to load interpreter one more time because a normal executable would already have done it.
Also I found this in file(1) source code when I was looking how magic.mgc file is compiled:
0 name elf-le
>16 leshort 0 no file type,
!:mime application/octet-stream
>16 leshort 1 relocatable,
!:mime application/x-object
>16 leshort 2 executable,
!:mime application/x-executable
>16 leshort 3 ${x?pie executable:shared object},
and I cannot understand what the last line means but it seems I can find an answer to my question if I understand this.
>16 leshort 3 ${x?pie executable:shared object},
This means "look at a 2-byte little endian word at offset 16 of the file. If it has the value '3' then check if the file is executable or not (permissions bit). If it is, the type is 'pie executable', otherwise it is 'shared object'"
You can look at the magic(5) man page for info about this syntax.

Why does the .bss segment have no executable attribute?

I have an ELF 32-bit executable file named orw from the pwnable.tw: https://pwnable.tw/challenge/. In my Ubuntu18.04, the .bss segment can be executed:
But in my Ubuntu20 and IDA Pro, the .bss segment have no executable attributes, why?
Why does the .bss segment have no executable attribute?
In a normal executable .bss should not have execute permissions, so it's the Ubuntu 18.04 result that is strange, not the other way around.
The following are all relevant here:
output from readelf -Wl orw
kernel versions
output from cat /proc/cpuinfo
emulator details (if you are using some kind of emulator).
I suspect that you are using an emulator, and it's set up to emulate pre-NX-bit processor (where the W bit implied X bit as well).
Alternatively, the executable lacks PT_GNU_STACK segment, in which case this answer is likely the correct one -- kernel defaults have changed for such binaries.
.bss is a segment for uninitialized global variables, so It's not normally executable (it doesn't need to). If you want it executable (because you are compiling machine code that you want to be able to test) you will probably need to select a special segment or to create two segments (one executable and other read/write) overlapping to allow to write the code while you can also execute it. This can be already specified in the standard script you use to link executables (with a different name, sure) or if it has not been done for you, you can specify a linker script that allows for those to be created. Read the linker documentation (in full, sorry) to know how the linker deals with this (and other) idiosynchracies of your processor architecture.
I don't know what architecture you are using, but for example, intel processors have an execution bit permissions in the segments, as they have read and write, which means that the memory access for an executable segment must be an opcode fetch access and not a data read access to load a data register. If you want to access the text segment for data reading, then you need to add also read access to the text segment to be able to see the code you are executing.

How does GDB perform base addresses of shared libraries [ internals of info sharedlibrary command]

I am trying to understand the internal working behind GDB commands. After initial homework of understanding about elf / shared libraries / address space randomization, I attempted to understand how GDB make sense between the executable and corefile.
solib.c contains the implementation of shared library processing. Esp am interested in the info sharedlibrary command.
The comment on the solib.c goes like this..
/* Relocate the section binding addresses as recorded in the shared
object's file by the base address to which the object was actually
mapped. */
ops->relocate_section_addresses (so, p);
I could not understand much from this comment. Can somebody explain me in plain english how relocation happens? i.e Every time when an executable loads a shared object, it is going to load at some location say X, and all the symbols inside the shared library will be located at fixed offset, say X+Y with some size Z. My question is, how does gdb does the same range of address relocation, so that it matches with the load segments in the corefile. How it takes that hint from executable.
how does gdb does the same range of address relocation, so that it matches with the load segments in the corefile
In other words, how does GDB find the relocation X?
The answer depends on the operating system.
On Linux, GDB finds _DYNAMIC[] array of struct Elf{32,64}_Dyns in the core file, which contains an element with .d_tag == DT_DEBUG.
The .d_ptr in that element points to struct r_debug (see /usr/include/link.h), which points to a linked list of struct link_maps, which describe all loaded shared libraries and their relocations in l_addr.
The relevant file in GDB is solib-svr4.c.
EDIT:
I see that, there are no .dynamic sections in the corefile.
There shouldn't be. There is a .dynamic section in the executable and a matching LOAD segment in the core (the segment will "cover" the .dynamic section, and have the contents that was there at runtime).

How does ELF file format defines the stack?

I'm studying the ELF file format, so I compiled a small program, dumped the section headers and their contents from the resulting executable.
The ELF header contains the entry point address, which points into start of the .text section.
I also found the .data section that contains the static data and .rodata that contains the read only data... I expect there is a section for the stack too, but I can't find that section.
I also expect that at some point ESP is set to the top of some section but I can't find anything like that in the disassembly.
So how does ESP gets its initial value?
The following figure describes the memory map of a typical C ELF executable on x86.
The process loads the .text and .data sections at the base address.
The main-stack is located just below and grows downwards.
Each thread and function-call will have its own-stack / stack-frame.
This is located located below the main-stack.
Each stack is separated by a guard page to detect Stack-Overflow.
Hence one does NOT need a dedicated stack section in the ELF file.
However within the man pages for ELF, one does find a couple of things in an ELF file that control the stack attributes. Mainly the executable permissions to the stack in memory.
PT_GNU_STACK
GNU extension which is used by the Linux kernel to control the state of the stack via the flags set in the p_flags member.
.note.GNU-stack
This section is used in Linux object files for declaring stack attributes. This section is of type SHT_PROGBITS. The only attribute used is SHF_EXECINSTR. This indicates to the GNU linker that the object file requires an executable stack.

Intel binary to ELF

Really quick question here. I'm working in Ubuntu, I have a simple "Hello World!" program in assembly which I have assembled into x86 assembly. Now I want to turn that machine code into an ELF executable which my computer can run. I am aware that I could just assemble directly to ELF, the purpose of my inquiry is to discover how to make ELF binaries out of assembled machine code.
Thanks!
Final ELF executable files are typically built out of other ELF files, reorganized by the linker. The easiest way, of course, would be to specify ELF as the output format of your assembler.
1) If you really want to do this, you could start with an "empty" ELF file (that you get from compiling or assembling nothing, etc.). Then you could use objcopy --add-section, which allows you to add an arbitrary file as a section in an existing ELF file.
This will create a minimal ELF file:
$ echo "" | gcc -c -o empty.out -xc -
2) Alternatively, you could include your raw binary into another assembly file using something like nasm's incbin, which would then need to be assembled as an ELF.
3) A third option (the best so far) would be to provide your raw binary to the linker, and use a custom linker script to tell it what section to put it in (determined from the input file name). The -b flag before an input file will tell ld what type of file it is. This should let you use your flat binary file.
One of the first obstacles you're going to face is getting the entry point to point to your code. Off the top of my head I'm not sure how to edit that.
There is a Python library, pyelftools that may help you in your quest.
If it's really assembled, then it's already in the ELF format (compilers targeting Linux generally store the object code in ELF object files as well).
However, if you want a fully-functioning executable, you have to feed the object file to a linker.

Resources