Why the "SIZEOF_HEADERS" in ld script is 0? - gnu

Below is some snippet from my ld script:
Output format:
OUTPUT_FORMAT("elf32-i386", "elf32-i386",
"elf32-i386")
OUTPUT_ARCH(i386)
Memory layout:
MEMORY
{
CFLASH (xri) : ORIGIN = 0x20000000, LENGTH = 0x1000
(...omitted...)
}
REGION_ALIAS("CODE", CFLASH)
Sections layout:
SECTIONS
{
PROVIDE (__executable_start = SEGMENT_START("text-segment", ORIGIN(CODE)));
. = SEGMENT_START("text-segment", ORIGIN(CODE)) + SIZEOF_HEADERS; <=== PLACE 1
(...omitted...)
.startup_bsp :
{
KEEP (*(.startup_bsp))
. = ALIGN (., 0x4);
} >CODE =0xF4F4F4F4
(...omitted...)
As I understand, the location counter "." stands for the VMA. At PLACE 1, the VMA should have been set to 0x20000000 + SIZEOF_HEADERS.
But as I dumped the section headers from the generated ELF file, I see this:
So the VMA and LMA are still 0x20000000, where is the SIZEOF_HEADERS??? I think it should be the ELF file header size but it seems to be 0. Why?
According to here:
SIZEOF_HEADERS
Return the size in bytes of the output file's headers. This is information which appears at the start of the output file. You can use
this number when setting the start address of the first section, if
you choose, to facilitate paging.
When producing an ELF output file, if the linker script uses the SIZEOF_HEADERS builtin function, the linker must compute the number of
program headers before it has determined all the section addresses and
sizes. If the linker later discovers that it needs additional program
headers, it will report an error `not enough room for program
headers'. To avoid this error, you must avoid using the SIZEOF_HEADERS
function, or you must rework your linker script to avoid forcing the
linker to use additional program headers, or you must define the
program headers yourself using the PHDRS command (see PHDRS).

So far, I guess it is the >CODE that overrides the VMA calculated from the location counter .. The .startup_bsp section is appended to the CODE memory region.
And since it is the first section for CODE region, it occupies the starting address of that region.

Related

Cannot interpret ELF 64-bit obj file section header tables

I have a relocatable 64-bit .o file at hand and would like to read each section header. From the wiki I see that the 2-byte value stored in 0x28 should be the start of section header table. The value is 0x0328.
I then go to address 0x0328 and judging from other information each section header is of 0x40 size and I have a total of 14 headers (this information is confirmed with readelf). However then I got lost, and what I interpreted is completely different from what readelf told me:
Section Header 0: I see all 64 bytes are 00, which matches what readelf told me;
Section Header 1: I see the following values from 0x368:
However readelf told me that:
What did I miss?

How does the ELF64 loader know to update the initial addresses in .got.plt?

Consider the following program hello.c:
#include <stdio.h>
int main(int argc, char** argv)
{
printf("hello");
return 0;
}
The file is compiled with gcc -o hello -Og -g hello.c and then loaded with gdb hello.
Inspecting the GOT for the call to printf with p 'printf#got.plt' gives
$1 = (<text from jump slot in .got.plt, no debug info>) 0x1036 <printf#plt+6>
which is the offset of the second instruction in the corresponding PLT entry relative to the start of the section.
After starting and linking the program with starti, p 'printf#got.plt' now gives
$2 = (<text from jump slot in .got.plt, no debug info>) 0x555555555036 <printf#plt+6>
which is the absolute address of the second instruction in the corresponding PLT entry.
I understand what is going on and why. My question is how does the dynamic linker/loader know to update the section offset (0x1036) to the absolute address (0x555555555036)?
A p &'printf#got.plt' before linking gives
$1 = (<text from jump slot in .got.plt, no debug info> *) 0x4018 <printf#got.plt>
and readelf -r simple shows a relocation entry for this address
Relocation section '.rela.plt' at offset 0x550 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000004018 000200000007 R_X86_64_JUMP_SLO 0000000000000000 printf#GLIBC_2.2.5 + 0
But my reading of the System V Application Binary Interface AMD64 Architecture Processor Supplement, p.76, is that these relocation entries are only used when LD_BIND_NOW is non-null. Are there other relocation entries that I missed? What is the mechanism for rebasing offsets relative to the GOT's ultimate address?
According to Drepper's How To Write Shared Libraries, the dynamic linker relocates two kinds of dependencies:
Relative relocations: dependencies to locations within the same object. The linker simply adds the load address of the object to the offset to the target destination.
Symbol relocations: more complicated and expensive process based on a sophisticated symbol resolution algorithm.
For the PLT's GOT, Drepper states (§1.5.5) At startup time the dynamic linker fills the GOT slot with the address pointing to the second instruction of the appropriate PLT entry. A reading of the glibc source code suggests that the linker indeed loops through the R_X86_64_JUMP_SLOT relocations (elf/do-rel.h:elf_dynamic_do_Rel) and increments the offsets they contain (sysdeps/x86_64/dl-machine.h:elf_machine_lazy_rel):
if (__glibc_likely (r_type == R_X86_64_JUMP_SLOT))
{
/* Prelink has been deprecated. */
if (__glibc_likely (map->l_mach.plt == 0))
*reloc_addr += l_addr;
else
...
when lazy PLT binding is used (the default case).

Extend section in Mach-O file

I am trying to extract libraries from the Dyld_shared_cache, and need to fix in external references.
For example, the pointers in the __DATA.__objc_selrefs section usually point to data outside the mach-o file, to fix that I would have to copy the corresponding c-string from the dyld and append it to the __TEXT.__objc_methname section.
Though from my understanding of the Mach-O file format, this extension of the __TEXT.__objc_methname would shift all the sections after it and would force me to fix all the offsets and pointers that reference them. Is there a way to add data to a section without breaking a lot of things?
Thanks!
Thanks to #Kamil.S for the idea about adding a new load command and section.
One way to achieve adding more data to a section is to create a duplicate segment and section and insert it before the __LINKEDIT segment.
Slide the __LINKEDIT segment so we have space to add the new section.
define the slide amount, this must be page-aligned, so I choose 0x4000.
add the slide amount to the relevant load commands, this includes but is not limited to:
__LINKEDIT segment (duh)
dyld_info_command
symtab_command
dysymtab_command
linkedit_data_commands
physically move the __LINKEDIT in the file.
duplicate the section and change the following1
size, should be the length of your new data.
addr, should be in the free space.
offset, should be in the free space.
duplicate the segment and change the following1
fileoff, should be the start of the free space.
vmaddr, should be the start of the free space.
filesize, anything as long as it is bigger than your data.
vmsize, must be identical to filesize.
nsects, change to reflect how many sections your adding.
cmdsize, change to reflect the size of the segment command and its section commands.
insert the duplicated segment and sections before the __LINKEDIT segment
update the mach_header
ncmds
sizeofcmds
physically write the extra data in the file.
you can optionally change the segname and sectname fields, though it isn't necessary. thanks Kamil.S!
UPDATE
After clarifing with OP that extension of __TEXT.__objc_methname would happen during Mach-O post processing of an existing executable I had a fresh look on the problem.
Another take would be to create a new load command LC_SEGMENT_64 with a new __TEXT_EXEC.__objc_methname segment / section entry (normally __TEXT_EXEC is used for some kernel stuff but essentially it's the same thing as __TEXT). Here's a quick POC to ilustrate the concept:
#import <Foundation/Foundation.h>
int main(int argc, const char * argv[]) {
#autoreleasepool {
printf("%lx",[NSObject new]);
}
return 0;
}
Compile like this:
gcc main.m -c -o main.o
ld main.o -rename_section __TEXT __objc_methname __TEXT_EXEC __objc_methname -lobjc -lc
Interestingly only ld up to High Sierra 10.14.6 generates __TEXT.__objc_methname, no trace of it on Catalina, it's done differently.
UPDATE2.
Playing around with it, I noticed execution rights for __TEXT segment (and __TEXT_EXEC for that matter) are not required for __objc_methname to work.
Even better specific segment & section names are not required:
I could pull off:
__DATA.__objc_methname
__DATA_CONST.__objc_methname
__ARBITRARY.__arbitrary
or in my case last __DATA section
__DATA.__objc_classrefs where the original the data got concatenated by the selector name.
It's all fine as long as a proper null terminated C-string with the selector name is there. If I intentionally break the "new\0" in hex editor or MachOView I'll get
"+[NSObject ne]: unrecognized selector sent to instance ..."
upon launching my POC executable so the value is used for sure.
So to sum __TEXT.__objc_methname section itself is likely some debugger hint made by the linker. The app runtime seems to only need selector names as char* anywhere in memory.

How to MODIFY an ELF file with Python

I am trying to modify an ELF file's .text segment using python.
I successfully acquired the .text field so then I can simply change the bit that I want. The thing is that pyelftools does not provide any way to generate an ELF file from the ELF object.
So what I tried is the following:
I've created a simple helloworld program in c, compiled it and got the a.out file. Then I used the pyelftools to disassemble it.
To change/edit any section of the ELF file I simply used pyelftools's ELFFile class methods to acquire the field's (i) offset and (ii) size. So then I know exactly where to look inside the binary file.
So after getting the values-margins of the field (A,B) I simply treated the file like a normal binary. The only thing I did is to do a file.seek(A) to move the file pointer to the specific section that I wish to modify.
def edit_elf_section(elf_object,original_file,section):
elf_section = elf_object.get_section_by_name(section)
# !! IMPORTANT !!
section_start = elf_section['sh_offset'] # NOT sh_addr. sh_addr is the logical address of the section
section_end = section_start + elf_section['sh_size']
original_file.seek(section_start)
# Write whatever you want to the file #
assert(original_file.tell() <= section_end) # You've written outside the section
To validate the results you can use the diff binary to see that the files are/aren't identical

significance of address 0x8048080

why when i debug asm source in gdb is 0x8048080 the address chosen for the starting entry point into code? this is just a relative offset, not an actual offset of into memory of an instruction, correct?
There is no special significance to address 0x8048080, but there is one for address 0x08048000.
The latter address is the default address, on which ld starts the first PT_LOAD segment on Linux/x86. On Linux/x86_64, the default is 0x400000, and you can change the default by using a "custom" linker script. You can also change where .text section starts with -Wl,-Ttext,0xNNNNNNNN flag.
After ld starts at 0x08048000, it adds space for program headers, and proceeds to link the rest of the executable according to its built-in linker script, which you can see if you pass in -Wl,--verbose to your link line.
For your program, the size of program headers appears to always be 0x80, so your .text section always starts at 0x8048080, but that is by no means universal.
When I link a trivial int main() { return 0; } program, I get &_start == &.text at 0x8048300, 0x8048178 or 0x8048360, depending on which compiler I use.
0×8048080 is the default entry point in virtual memory used by the Linux ld linker. You can change it to whatever you want.
for more details check out: http://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints/

Resources