Understanding ELF64 text/data segment layout/padding - linux

I'm trying to brush up on UNIX viruses and one text I'm reading mentions that parasitic code can be inserted in the padding between the text and the data segment, supposedly up to 2MB in size on x86-64 systems. But when I compile a simple hello world program with gcc -no-pie...
#include <stdio.h>
int main()
{
printf("hello world\n");
}
...and inspect its segment headers with readelf -W -l I get:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x0002d8 0x0002d8 R 0x8
INTERP 0x000318 0x0000000000400318 0x0000000000400318 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x000588 0x000588 R 0x1000
LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x0001c5 0x0001c5 R E 0x1000
LOAD 0x002000 0x0000000000402000 0x0000000000402000 0x000138 0x000138 R 0x1000
LOAD 0x002e00 0x0000000000403e00 0x0000000000403e00 0x000230 0x000238 RW 0x1000
DYNAMIC 0x002e10 0x0000000000403e10 0x0000000000403e10 0x0001d0 0x0001d0 RW 0x8
...
I assume the segment starting at virtual address 0x401000 is the text segment and the one starting at 0x430e00 is the data segment. But what are the other two read-only LOAD segment? And how precisely does padding work here? There's no padding to 2MB boundaries to be seen and even assuming padding to 4KB boundaries, why does the data segment not start at address 0x403000?

But what are the other two read-only LOAD segment?
See this answer.
There's no padding to 2MB boundaries
The BFD linker used to align segments on 2MiB boundary because that's the maximum page size an x86_64 system can be configured with.
It no longer does this (not sure when the change was made).
The text you are reading is probably out of date.

Related

why virtual address of LOAD program header and runtime virtual address shown by gdb is different?

I've been trying to understand elf file format and on elf format documentation, VirtAddr of LOAD header should be the virtual address of the loaded segment. But gdb memmap shows segments to be loaded at different virt address.
$ readelf -l
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000560 0x0000000000000560 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x00000000000001e5 0x00000000000001e5 R E 0x1000
LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
0x0000000000000118 0x0000000000000118 R 0x1000
LOAD 0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
0x0000000000000248 0x0000000000000250 RW 0x1000
gdb memmap
Entry point: 0x555555555040
0x00005555555542a8 - 0x00005555555542c4 is .interp
0x00005555555542c4 - 0x00005555555542e4 is .note.ABI-tag
0x00005555555542e4 - 0x0000555555554308 is .note.gnu.build-id
0x0000555555554308 - 0x0000555555554324 is .gnu.hash
0x0000555555554328 - 0x00005555555543d0 is .dynsym
0x00005555555543d0 - 0x0000555555554454 is .dynstr
0x0000555555554454 - 0x0000555555554462 is .gnu.version
0x0000555555554468 - 0x0000555555554488 is .gnu.version_r
0x0000555555554488 - 0x0000555555554548 is .rela.dyn
0x0000555555554548 - 0x0000555555554560 is .rela.plt
0x0000555555555000 - 0x000055555555501b is .init
0x0000555555555020 - 0x0000555555555040 is .plt
0x0000555555555040 - 0x00005555555551d5 is .text
0x00005555555551d8 - 0x00005555555551e5 is .fini
0x0000555555556000 - 0x000055555555600a is .rodata
0x000055555555600c - 0x0000555555556040 is .eh_frame_hdr
0x0000555555556040 - 0x0000555555556118 is .eh_frame
0x0000555555557de8 - 0x0000555555557df0 is .init_array
0x0000555555557df0 - 0x0000555555557df8 is .fini_array
0x0000555555557df8 - 0x0000555555557fd8 is .dynamic
0x0000555555557fd8 - 0x0000555555558000 is .got
0x0000555555558000 - 0x0000555555558020 is .got.plt
0x0000555555558020 - 0x0000555555558030 is .data
0x0000555555558030 - 0x0000555555558038 is .bss
VirtAddr of LOAD header should be the virtual address of the loaded segment.
This is only true for ELF images of type ET_EXEC.
But you have an ELF image of type ET_DYN (probably a position independent executable), and these are relocated at runtime to a different virtual address.

file offset vs virtual address in a shared library

For a shared library file, how to convert between the file offset and virtual address of the definition of a symbol?
In ELF document, for a symbol in a symbol table,
In executable and shared object files, st_value holds a virtual address. To make these files' symbols more useful for the dynamic linker, the section offset (file interpretation) gives way to a virtual address (memory interpretation) for which the seciton number is irrelevant.
But how can I get the according offset in the file? Or given an offset, how can I calculate the virtual address(file interpretation to memory interpretation)?
Imagine a scenario like this. During the execution of a process, suppose it is using a function implemented in a shared library, say libx.so, and that the library file is mapped into a region represented by vma.
//addr holds the value of PC
offset = (vma->vm_pgoff << PAGE_SIZE) + addr -vma->vm_start;
As I understand it, now offset holds the offset of the instruction in the library file. Given this offset, I'd like to know the function name. One way is to calculate the the virtual address corresponding to offset, and compare the virtual address with the st_values in the symbol table. If st_values are processed to be stored in ascending order, then st_value_1 < virtual_address < st_value_2 means st_name_1 is what I'm looking for. So the problem lies in the conversion.
For reference, data structure of a symbol table entry is:
typedef struct{
Elf32_Word st_name;
Elf32_Addr st_value;
Elf32_Word st_size;
unsigned char st_info;
unsigned char st_other;
Elf32_Half st_shndx;
}Elf32_Sym;
The program header tables PT_LOAD entries define how the loader/linker is expected to map parts of the ELF file in the virtual address space. You should use this if you want to convert between file offset and (relative) virtual memory addresses:
~$ readelf -l /lib/i386-linux-gnu/libc-2.24.so
Elf file type is DYN (Shared object file)
Entry point 0x18400
There are 10 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x00000034 0x00000034 0x00140 0x00140 R E 0x4
INTERP 0x166374 0x00166374 0x00166374 0x00013 0x00013 R 0x4
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x00000000 0x00000000 0x1b01c8 0x1b01c8 R E 0x1000
LOAD 0x1b0260 0x001b1260 0x001b1260 0x02c74 0x0579c RW 0x1000
DYNAMIC 0x1b1db0 0x001b2db0 0x001b2db0 0x000f0 0x000f0 RW 0x4
NOTE 0x000174 0x00000174 0x00000174 0x00044 0x00044 R 0x4
TLS 0x1b0260 0x001b1260 0x001b1260 0x00008 0x00048 R 0x4
GNU_EH_FRAME 0x166388 0x00166388 0x00166388 0x061ec 0x061ec R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
GNU_RELRO 0x1b0260 0x001b1260 0x001b1260 0x01da0 0x01da0 R 0x1
For example, considering this symbol
Num: Value Size Type Bind Vis Ndx Name
188: 0005df80 35 FUNC GLOBAL DEFAULT 13 fopen##GLIBC_2.1
It's (relative) virtual address is 0x0005df80. It belongs to the first PT_LOAD entry which ranges in relative virtual memory from 0x00000000 to 0x00000000 + 0x1b01c8. It's offset within the segment is Value - VirtAddr = 0x00000000. It's offset within the file is thus PhysAddr + (Value - VirtAddr) = 0005df80.

How can I get two page aligned(0x1000) and separated program headers?

I am trying to implement custom loader and
want to locate two program headers(segment) for data and code with 0x1000 aligned.
I fixed some part of the default linker script and get weird results.
**Default linker script.**
. = ALIGN (CONSTANT (MAXPAGESIZE)) - ((CONSTANT (MAXPAGESIZE) - .) &
(CONSTANT (MAXPAGESIZE) - 1));
. = DATA_SEGMENT_ALIGN (CONSTANT(MAXPAGESIZE),CONSTANT (COMMONPAGESIZE));
**Modified linker script**
. = ALIGN (0x1000);
. = DATA_SEGMENT_ALIGN(0x1000, 0x1000);
when I compiled the binary with default linker script, it is 0x200000 aligned
and have two program headers.
LOAD 0x0000000000000000 0x0000000050000000 0x0000000050000000
0x0000000000001058 0x0000000000001058 R E 200000
LOAD 0x0000000000001fe8 0x0000000050201fe8 0x0000000050201fe8
0x0000000000000028 0x00000000000000c0 RW 200000
but I get below result with modified linker script.
LOAD 0x0000000000000000 0x0000000050000000 0x0000000050000000
0x0000000000002010 0x00000000000020a8 RWE 200000
It seems that the data section and code section is mixed in one program header.
However, I want to make my program have two page aligned(0x1000) program headers
LOAD1 0x0000000050000000 ~ 0x0000000050002340 R E
LOAD2 0x0000000050003000 ~ 0x0000000050006790 RW
Please let me know some directions.

How to modify the GNU linker to have separate 'RWE ' PT_LOAD segment

I have a program which when converted to binary using deafult options I get this.
>readelf -lW /tmp/sample
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x13e33f 0x13e33f R E 0x200000
LOAD 0x13e510 0x000000000073e510 0x000000000073e510 0x005160 0x007cc8 RW 0x200000
I want to have a separate LOAD segment with RWE permissions after the LOAD segment with RW (i.e. data segment) shown above. One approach to do this is to modify the custom GNU linker script to pick my new sections and put them in a separate segment after the bss segment. This will cause it to appear as third LOAD segment.
Adding this after bss end in linker script
.my_section = .;
.my_section : { *(.my_section)}
This is how it appears
>readelf -lW /tmp/sample
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x13e33f 0x13e33f R E 0x200000
LOAD 0x13e510 0x000000000073e510 0x000000000073e510 0x005160 0x007cc8 RW 0x200000
LOAD 0x13f000 0x000000000073f000 0x000000000073f000 0x0d6b00 0x0d6b00 RW 0x200000
How to get executable permissions as well in this segment? What changes I need to do to the linker script?

ELF Program Headers: MemSiz vs. FileSiz

readelf -l /bin/bash gives me this:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001a 0x000000000000001a R 1
[Requesting program interpreter: /lib/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000aeef4 0x00000000000aeef4 R E 200000
LOAD 0x00000000000afde0 0x00000000006afde0 0x00000000006afde0
0x0000000000003cec 0x000000000000d3c8 RW 200000
DYNAMIC 0x00000000000afdf8 0x00000000006afdf8 0x00000000006afdf8
0x0000000000000200 0x0000000000000200 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x000000000009dbc0 0x000000000049dbc0 0x000000000049dbc0
0x0000000000002bb4 0x0000000000002bb4 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 8
GNU_RELRO 0x00000000000afde0 0x00000000006afde0 0x00000000006afde0
0x0000000000000220 0x0000000000000220 R 1
Why is MemSiz not equal to FileSiz for some LOAD segments? What should be done with the memory region included by MemSiz but not FileSiz?
The loadable segment in question appears to be the program's data segment.
The data segment in an program contains space for both initialized and
uninitialized program variables. Values for initialized variables are
stored in the program's executable. Uninitialized program variables do not
need to stored anywhere; instead space is reserved for them in a
special zero-sized section named ".bss".
The file size of an executable's data segment can thus be less than
its in-memory size.
To illustrate:
/*
* Space for the intialized variable 'x' would be reserved the
* executable's ".data" section, along with its initial value.
*/
int x = 42;
/*
* Space for the uninitialized variable 'y' would be reserved in
* the ".bss" section; no file space would be allocated in the
* executable.
*/
int y;
On unix-like systems, the portion of the data segment mapped to the
".bss" section would be zero-filled at program load time.

Resources