Introduction
I'm compiling a simple assembly code (Intel syntax, x86, Linux) printing "Hello World!". Here it is:
SECTION .rodata
msg: db 'Hello world!', 0xA
msg_len: equ $ - msg
SECTION .text
global _start
_start:
mov eax, 4 ; `write` system call
mov ebx, 1 ; to stdout
mov ecx, msg
mov edx, msg_len
int 0x80
mov eax, 1 ; `exit` system call
xor ebx, ebx ; exit code 0
int 0x80
I compile it with the following commands:
nasm -f elf32 -o hello_world.o hello_world.s
ld -m elf_i386 -o hello_world hello_world.o
The code works perfectly fine, but what I'm concerned about are the file sizes:
-rwxrwxr-x 1 nikolay nikolay 8704 Apr 27 15:20 hello_world
-rw-rw-r-- 1 nikolay nikolay 243 Apr 26 22:16 hello_world.s
-rw-rw-r-- 1 nikolay nikolay 640 Apr 27 15:20 hello_world.o
Problem
The object file is slightly bigger than the source code is, but it seems reasonable because there should be some metadata or something in the ELF files, which the source code doesn't contain, right? But the executable file is more than 10 times bigger than even the object file!
Moreover, there are some zero bytes along the object file, but I wouldn't say there are too many of them. However, there are lots of zeros in the executable (see screenshots of both in the Additional info section).
Investigation
I have tried reading some articles about ELF, including the Wikipedia and the manual pages. I didn't read all of them very carefully, so I might have missed something, but what I found helpful was the dumpelf utility (from the pax-utils package, installable via apt), using which I dumped my elf files and found something which is likely to be the cause of these zero streams:
In all the three headers of the executable there is the p_align field set:
.p_align = 4096 , /* (min mem alignment in bytes) */
This should mean that each of the sections is supposed to be padded with zero bytes so that its length is a multiple of 4096. And because each of the following sections has relatively small size, there are lots of zero bytes to be added, and that's where those zeros come from.
Question(s)
So, I'm wondering:
Am I right? Are these zero bytes added to make the sections long enough?
I have also noticed that the first three sections ('', '.rodata', '.text') begin at 0, 4096 and 8192 respectively, but the following ones ('.symtab', '.strtab', '.shstrtab') seem to be not aligned anymore: they begin at 8208, 8368 and 8422... Why? What is happening here?
What do we need this alignment for? In the programming headers, there are p_vaddr and p_paddr fields which are set to the addresses where the first three sections begin, so what is the reason to align sections if we already know the exact addresses of sections from headers? Does it have something to do with memory pages (which are of size 4KiB on my machine)?
When do I want/need to, and how do I change the alignment value? Looks like there should be a linker argument to change this value. I have found the --nmagic argument in the ld's manual, which disables the alignment completely (and, hooray!, the executable file no has the same size as the object file), but I guess the alignment exists on purpose, so maybe I just need to decrease the value so that it better fits my case?
I'd really appreciate answers to any of these questions or any other details on anything here if you know something I have missed. Please, also, tell me if I was wrong anywhere. Thank you in advance!
Additional info
A dump of my object file (with xxd hello_world.o | grep -E '0000|$' --color=always | less -R):
Part of a dump of my executable file (with the command similar to above):
A new section begins at the address 0x1000
Output of dumpelf hello_world.o:
#include <elf.h>
/*
* ELF dump of 'hello_world.o'
* 640 (0x280) bytes
*/
Elf32_Dyn dumpedelf_dyn_0[];
struct {
Elf32_Ehdr ehdr;
Elf32_Phdr phdrs[0];
Elf32_Shdr shdrs[7];
Elf32_Dyn *dyns;
} dumpedelf_0 = {
.ehdr = {
.e_ident = { /* (EI_NIDENT bytes) */
/* [0] EI_MAG: */ 0x7F,'E','L','F',
/* [4] EI_CLASS: */ 1 , /* (ELFCLASS32) */
/* [5] EI_DATA: */ 1 , /* (ELFDATA2LSB) */
/* [6] EI_VERSION: */ 1 , /* (EV_CURRENT) */
/* [7] EI_OSABI: */ 0 , /* (ELFOSABI_NONE) */
/* [8] EI_ABIVERSION: */ 0 ,
/* [9-15] EI_PAD: */ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
},
.e_type = 1 , /* (ET_REL) */
.e_machine = 3 , /* (EM_386) */
.e_version = 1 , /* (EV_CURRENT) */
.e_entry = 0x0 , /* (start address at runtime) */
.e_phoff = 0 , /* (bytes into file) */
.e_shoff = 64 , /* (bytes into file) */
.e_flags = 0x0 ,
.e_ehsize = 52 , /* (bytes) */
.e_phentsize = 0 , /* (bytes) */
.e_phnum = 0 , /* (program headers) */
.e_shentsize = 40 , /* (bytes) */
.e_shnum = 7 , /* (section headers) */
.e_shstrndx = 3
},
.phdrs = {
/* no program headers ! */ },
.shdrs = {
/* Section Header #0 '' 0x40 */
{
.sh_name = 0 ,
.sh_type = 0 , /* [SHT_NULL] */
.sh_flags = 0 ,
.sh_addr = 0x0 ,
.sh_offset = 0 , /* (bytes) */
.sh_size = 0 , /* (bytes) */
.sh_link = 0 ,
.sh_info = 0 ,
.sh_addralign = 0 ,
.sh_entsize = 0
},
/* Section Header #1 '.rodata' 0x68 */
{
.sh_name = 1 ,
.sh_type = 1 , /* [SHT_PROGBITS] */
.sh_flags = 2 ,
.sh_addr = 0x0 ,
.sh_offset = 352 , /* (bytes) */
.sh_size = 13 , /* (bytes) */
.sh_link = 0 ,
.sh_info = 0 ,
.sh_addralign = 4 ,
.sh_entsize = 0
},
/* Section Header #2 '.text' 0x90 */
{
.sh_name = 9 ,
.sh_type = 1 , /* [SHT_PROGBITS] */
.sh_flags = 6 ,
.sh_addr = 0x0 ,
.sh_offset = 368 , /* (bytes) */
.sh_size = 31 , /* (bytes) */
.sh_link = 0 ,
.sh_info = 0 ,
.sh_addralign = 16 ,
.sh_entsize = 0
},
/* Section Header #3 '.shstrtab' 0xB8 */
{
.sh_name = 15 ,
.sh_type = 3 , /* [SHT_STRTAB] */
.sh_flags = 0 ,
.sh_addr = 0x0 ,
.sh_offset = 400 , /* (bytes) */
.sh_size = 51 , /* (bytes) */
.sh_link = 0 ,
.sh_info = 0 ,
.sh_addralign = 1 ,
.sh_entsize = 0
},
/* Section Header #4 '.symtab' 0xE0 */
{
.sh_name = 25 ,
.sh_type = 2 , /* [SHT_SYMTAB] */
.sh_flags = 0 ,
.sh_addr = 0x0 ,
.sh_offset = 464 , /* (bytes) */
.sh_size = 112 , /* (bytes) */
.sh_link = 5 ,
.sh_info = 6 ,
.sh_addralign = 4 ,
.sh_entsize = 16
},
/* Section Header #5 '.strtab' 0x108 */
{
.sh_name = 33 ,
.sh_type = 3 , /* [SHT_STRTAB] */
.sh_flags = 0 ,
.sh_addr = 0x0 ,
.sh_offset = 576 , /* (bytes) */
.sh_size = 37 , /* (bytes) */
.sh_link = 0 ,
.sh_info = 0 ,
.sh_addralign = 1 ,
.sh_entsize = 0
},
/* Section Header #6 '.rel.text' 0x130 */
{
.sh_name = 41 ,
.sh_type = 9 , /* [SHT_REL] */
.sh_flags = 0 ,
.sh_addr = 0x0 ,
.sh_offset = 624 , /* (bytes) */
.sh_size = 8 , /* (bytes) */
.sh_link = 4 ,
.sh_info = 2 ,
.sh_addralign = 4 ,
.sh_entsize = 8
},
},
.dyns = dumpedelf_dyn_0,
};
Elf32_Dyn dumpedelf_dyn_0[] = {
/* no dynamic tags ! */ };
Output of dumpelf hello_world:
#include <elf.h>
/*
* ELF dump of 'hello_world'
* 8704 (0x2200) bytes
*/
Elf32_Dyn dumpedelf_dyn_0[];
struct {
Elf32_Ehdr ehdr;
Elf32_Phdr phdrs[3];
Elf32_Shdr shdrs[6];
Elf32_Dyn *dyns;
} dumpedelf_0 = {
.ehdr = {
.e_ident = { /* (EI_NIDENT bytes) */
/* [0] EI_MAG: */ 0x7F,'E','L','F',
/* [4] EI_CLASS: */ 1 , /* (ELFCLASS32) */
/* [5] EI_DATA: */ 1 , /* (ELFDATA2LSB) */
/* [6] EI_VERSION: */ 1 , /* (EV_CURRENT) */
/* [7] EI_OSABI: */ 0 , /* (ELFOSABI_NONE) */
/* [8] EI_ABIVERSION: */ 0 ,
/* [9-15] EI_PAD: */ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
},
.e_type = 2 , /* (ET_EXEC) */
.e_machine = 3 , /* (EM_386) */
.e_version = 1 , /* (EV_CURRENT) */
.e_entry = 0x8049000 , /* (start address at runtime) */
.e_phoff = 52 , /* (bytes into file) */
.e_shoff = 8464 , /* (bytes into file) */
.e_flags = 0x0 ,
.e_ehsize = 52 , /* (bytes) */
.e_phentsize = 32 , /* (bytes) */
.e_phnum = 3 , /* (program headers) */
.e_shentsize = 40 , /* (bytes) */
.e_shnum = 6 , /* (section headers) */
.e_shstrndx = 5
},
.phdrs = {
/* Program Header #0 0x34 */
{
.p_type = 1 , /* [PT_LOAD] */
.p_offset = 0 , /* (bytes into file) */
.p_vaddr = 0x8048000 , /* (virtual addr at runtime) */
.p_paddr = 0x8048000 , /* (physical addr at runtime) */
.p_filesz = 148 , /* (bytes in file) */
.p_memsz = 148 , /* (bytes in mem at runtime) */
.p_flags = 0x4 , /* PF_R */
.p_align = 4096 , /* (min mem alignment in bytes) */
},
/* Program Header #1 0x54 */
{
.p_type = 1 , /* [PT_LOAD] */
.p_offset = 4096 , /* (bytes into file) */
.p_vaddr = 0x8049000 , /* (virtual addr at runtime) */
.p_paddr = 0x8049000 , /* (physical addr at runtime) */
.p_filesz = 31 , /* (bytes in file) */
.p_memsz = 31 , /* (bytes in mem at runtime) */
.p_flags = 0x5 , /* PF_R | PF_X */
.p_align = 4096 , /* (min mem alignment in bytes) */
},
/* Program Header #2 0x74 */
{
.p_type = 1 , /* [PT_LOAD] */
.p_offset = 8192 , /* (bytes into file) */
.p_vaddr = 0x804A000 , /* (virtual addr at runtime) */
.p_paddr = 0x804A000 , /* (physical addr at runtime) */
.p_filesz = 13 , /* (bytes in file) */
.p_memsz = 13 , /* (bytes in mem at runtime) */
.p_flags = 0x4 , /* PF_R */
.p_align = 4096 , /* (min mem alignment in bytes) */
},
},
.shdrs = {
/* Section Header #0 '' 0x2110 */
{
.sh_name = 0 ,
.sh_type = 0 , /* [SHT_NULL] */
.sh_flags = 0 ,
.sh_addr = 0x0 ,
.sh_offset = 0 , /* (bytes) */
.sh_size = 0 , /* (bytes) */
.sh_link = 0 ,
.sh_info = 0 ,
.sh_addralign = 0 ,
.sh_entsize = 0
},
/* Section Header #1 '.text' 0x2138 */
{
.sh_name = 27 ,
.sh_type = 1 , /* [SHT_PROGBITS] */
.sh_flags = 6 ,
.sh_addr = 0x8049000 ,
.sh_offset = 4096 , /* (bytes) */
.sh_size = 31 , /* (bytes) */
.sh_link = 0 ,
.sh_info = 0 ,
.sh_addralign = 16 ,
.sh_entsize = 0
},
/* Section Header #2 '.rodata' 0x2160 */
{
.sh_name = 33 ,
.sh_type = 1 , /* [SHT_PROGBITS] */
.sh_flags = 2 ,
.sh_addr = 0x804A000 ,
.sh_offset = 8192 , /* (bytes) */
.sh_size = 13 , /* (bytes) */
.sh_link = 0 ,
.sh_info = 0 ,
.sh_addralign = 4 ,
.sh_entsize = 0
},
/* Section Header #3 '.symtab' 0x2188 */
{
.sh_name = 1 ,
.sh_type = 2 , /* [SHT_SYMTAB] */
.sh_flags = 0 ,
.sh_addr = 0x0 ,
.sh_offset = 8208 , /* (bytes) */
.sh_size = 160 , /* (bytes) */
.sh_link = 4 ,
.sh_info = 6 ,
.sh_addralign = 4 ,
.sh_entsize = 16
},
/* Section Header #4 '.strtab' 0x21B0 */
{
.sh_name = 9 ,
.sh_type = 3 , /* [SHT_STRTAB] */
.sh_flags = 0 ,
.sh_addr = 0x0 ,
.sh_offset = 8368 , /* (bytes) */
.sh_size = 54 , /* (bytes) */
.sh_link = 0 ,
.sh_info = 0 ,
.sh_addralign = 1 ,
.sh_entsize = 0
},
/* Section Header #5 '.shstrtab' 0x21D8 */
{
.sh_name = 17 ,
.sh_type = 3 , /* [SHT_STRTAB] */
.sh_flags = 0 ,
.sh_addr = 0x0 ,
.sh_offset = 8422 , /* (bytes) */
.sh_size = 41 , /* (bytes) */
.sh_link = 0 ,
.sh_info = 0 ,
.sh_addralign = 1 ,
.sh_entsize = 0
},
},
.dyns = dumpedelf_dyn_0,
};
Elf32_Dyn dumpedelf_dyn_0[] = {
/* no dynamic tags ! */ };
The alignment is 4096 bytes, which is the page size on this architecture. This is not a coincidence, as the man page says about nmagic: "Turn off page alignment of sections".
By the size of the normal (non-nmagic) binary you can guess the linker laid out three pages, presumably with different access (code = not writable, data = not executable, rodata = read only), these rights can only be set per-page. The disk layout matches the layout in RAM when it is running.
This is important for demand paging. When the program starts, the entire executable file is basically mmaped and pages are loaded from disk as needed through page faults. Also pages can be shared between its other running instances (this is more important for dynamic libraries) and can be evicted from RAM when needed due to memory pressure.
The nmagic executable still is loaded into three pages when run, but as those no longer match what is on disk, it is not demand paged. I wouldn't recommend using the option on anything larger.
Note: if you make a longer-running executable (add reading of input perhaps), you can examine the memory layout details of the running process by looking at /proc/[pid]/maps and smaps.